Python的:将复杂的字符串的字典由统一code为ASCII字符串、字典、复杂、Python

2023-09-11 22:50:12 作者:驭风

可能重复:   How得到的字符串对象,而不是统一code从JSON那些在Python?

我有很多的投入从JSON API调用解析多层次的字典。这些字符串都在UNI code,这意味着有大量的 u'stuff这样的。我使用 JQ 玩弄的结果,并需要将这些结果转换为ASCII。

我知道我可以写一个函数,只是把它转换这样的:

 高清转换(输入):
    如果isinstance(输入,字典):
        RET = {}
        对于东西在输入:
            RET =转换(东东)
    ELIF isinstance(输入列表):
        RET = []
        在我的范围(LEN(输入))
            RET =转换(输入[I])
    ELIF isinstance(输入,STR):
        RET = input.en code(ASCII)
    ELIF:
        RET =输入
    返回RET
 

这甚至是否正确?不确定。这不是我要问你,虽然什么。

就是我要问的是,这是一个典型的暴力解决问题的方法。必须有一个更好的办法。更Python的方式。我在算法方面的专家,但这个看起来并不特别快任。

那么,有没有更好的办法?或者,如果没有,可以在此功能得到改善...?

后回答修改

马克·阿梅里奥的回答是正确的,但我想张贴它的修改版本。他的函数可以使用Python的2.7+和我在2.6以至于不得不将其转换:

 高清转换(输入):
    如果isinstance(输入,字典):
        返回字典((转换(键),转换(值))为重点,价值input.iteritems())
    ELIF isinstance(输入列表):
        [为元素的输入转换(元)]返回
    ELIF isinstance(输入,单code):
        返回input.en code(UTF-8)
    其他:
        返回输入
 
Python的字符串

解决方案

递归似乎是要走的路在这里,但如果你在蟒蛇2.xx的要被检查 UNI $ C $ç,不是 STR (即 STR 键入重presents字符串字节,而 UNI code 键入UNI code字符的字符串;也从另一个继承,它是单向code型串中显示的帧间preTER用Au在他们面前)。

还有你贴code(尾随 ELIF有点语法错误:应该是一个其他),而你却在输入或者是一本字典或列表的情况下返回相同的结构。 (在一字典中的情况下,你返回最终密钥的转换后的版本;!以列表的情况下,你回到最终元件的转换版本既不是右边)

您还可以通过使用COM prehensions让你的code pretty的和Python的。

下面的话,就是我建议你:

 高清转换(输入):
    如果isinstance(输入,字典):
        返回{转换(键):转换(值)键,值input.iteritems()}
    ELIF isinstance(输入列表):
        [为元素的输入转换(元)]返回
    ELIF isinstance(输入,单code):
        返回input.en code(UTF-8)
    其他:
        返回输入
 

最后一件事。我改变了连接code('ASCII码')连接code('utf-8')。我的理由如下:包含在ASCII字符集中的字符的任意单code字符串将重新由相同的字节串ASCII psented当EN codeD为$ P $当EN $ C $光盘UTF-8,因此使用UTF-8,而不是ASCII不能破坏任何东西和变化将是不可见的,只要你处理的仅使用ASCII字符的UNI code字符串。但是,这种变化扩展了功能范围,以便能够从整个UNI code字符集处理字符的字符串,而不仅仅是ASCII的,要这样的事情永远是必要的。

Possible Duplicate: How to get string Objects instead Unicode ones from JSON in Python?

I have a lot of input as multi-level dictionaries parsed from JSON API calls. The strings are all in unicode which means there is a lot of u'stuff like this'. I am using jq to play around with the results and need to convert these results to ASCII.

I know I can write a function to just convert it like that:

def convert(input):
    if isinstance(input, dict):
        ret = {}
        for stuff in input:
            ret = convert(stuff)
    elif isinstance(input, list):
        ret = []
        for i in range(len(input))
            ret = convert(input[i])
    elif isinstance(input, str):
        ret = input.encode('ascii')
    elif :
        ret = input
    return ret

Is this even correct? Not sure. That's not what I want to ask you though.

What I'm asking is, this is a typical brute-force solution to the problem. There must be a better way. A more pythonic way. I'm no expert on algorithms, but this one doesn't look particularly fast either.

So is there a better way? Or if not, can this function be improved...?

Post-answer edit

Mark Amery's answer is correct but I would like to post a modified version of it. His function works on Python 2.7+ and I'm on 2.6 so had to convert it:

def convert(input):
    if isinstance(input, dict):
        return dict((convert(key), convert(value)) for key, value in input.iteritems())
    elif isinstance(input, list):
        return [convert(element) for element in input]
    elif isinstance(input, unicode):
        return input.encode('utf-8')
    else:
        return input

解决方案

Recursion seems like the way to go here, but if you're on python 2.xx you want to be checking for unicode, not str (the str type represents a string of bytes, and the unicode type a string of unicode characters; neither inherits from the other and it is unicode-type strings that are displayed in the interpreter with a u in front of them).

There's also a little syntax error in your posted code (the trailing elif: should be an else), and you're not returning the same structure in the case where input is either a dictionary or a list. (In the case of a dictionary, you're returning the converted version of the final key; in the case of a list, you're returning the converted version of the final element. Neither is right!)

You can also make your code pretty and Pythonic by using comprehensions.

Here, then, is what I'd recommend:

def convert(input):
    if isinstance(input, dict):
        return {convert(key): convert(value) for key, value in input.iteritems()}
    elif isinstance(input, list):
        return [convert(element) for element in input]
    elif isinstance(input, unicode):
        return input.encode('utf-8')
    else:
        return input

One final thing. I changed encode('ascii') to encode('utf-8'). My reasoning is as follows: any unicode string that contains only characters in the ASCII character set will be represented by the same byte string when encoded in ASCII as when encoded in utf-8, so using utf-8 instead of ASCII cannot break anything and the change will be invisible as long as the unicode strings you're dealing with use only ASCII characters. However, this change extends the scope of the function to be able to handle strings of characters from the entire unicode character set, rather than just ASCII ones, should such a thing ever be necessary.

 
精彩推荐