Reliable way of handling non-ASCII characters in Python? -
i have column spreadsheet header contains non-ascii characters thus:
'campaign' if pop string interpreter, get:
'\xc3\xaf\xc2\xbb\xc2\xbfcampaign' the string 1 keys in rows of csv.dictreader()
when try populate new dict with value of key:
spends['campaign'] = 2 i get:
key error: '\xc3\xaf\xc2\xbb\xc2\xbfcampaign' if print value of keys of row, can see '\xef\xbb\xbfcampaign'
obviously can update program access key thus:
spends['\xef\xbb\xbfcampaign'] but there "better" way of doing this, in python? indeed, if value of key every changes contain other non-ascii characters, all-encompassing way of handling non-ascii characters may arise?
in general, should decode bytestring unicode text using corresponding character encoding possible on input. and, in reverse, encode unicode text bytestring late possible on output. apis such io.open() can implicitly code sees unicode.
unfortunately, csv module not support unicode directly on python 2. see unicodereader, unicodewriter in doc examples. create analog csv.dictreader or alternative pass utf-8 encoded bytestrings csv module.
Comments
Post a Comment