Reliable way of handling non-ASCII characters in Python? -

February 15, 2010

i have column spreadsheet header contains non-ascii characters thus:

'ï»¿campaign'

if pop string interpreter, get:

'\xc3\xaf\xc2\xbb\xc2\xbfcampaign'

the string 1 keys in rows of csv.dictreader()

when try populate new dict with value of key:

spends['ï»¿campaign'] = 2

i get:

key error: '\xc3\xaf\xc2\xbb\xc2\xbfcampaign'

if print value of keys of row, can see '\xef\xbb\xbfcampaign'

obviously can update program access key thus:

spends['\xef\xbb\xbfcampaign']

but there "better" way of doing this, in python? indeed, if value of key every changes contain other non-ascii characters, all-encompassing way of handling non-ascii characters may arise?

in general, should decode bytestring unicode text using corresponding character encoding possible on input. and, in reverse, encode unicode text bytestring late possible on output. apis such io.open() can implicitly code sees unicode.

unfortunately, csv module not support unicode directly on python 2. see unicodereader, unicodewriter in doc examples. create analog csv.dictreader or alternative pass utf-8 encoded bytestrings csv module.

Search This Blog

JVParth

Reliable way of handling non-ASCII characters in Python? -

Comments

Post a Comment

Popular posts from this blog

toolbar - How to add link to user registration inside toobar in admin joomla 3 custom component -

linux - disk space limitation when creating war file -

How to provide Authorization & Authentication using Asp.net, C#? -