encoding - Need to find the requests equivalent of openurl() from urllib2 -


i trying modify script use requests library instead of urllib2 library. haven't used before , looking equivalent of urlopen("http://www.example.org").read(), tried requests.get("http://www.example.org").text function.

this works fine normal everyday html, when fetch url (https://gtfsrt.api.translink.com.au/feed/seq) doesn't seem work.

so wrote below code print out responses same url using both requests , urllib2 libraries.

import urllib2 import requests  #urllib2 request request = urllib2.request("https://gtfsrt.api.translink.com.au/feed/seq") result = urllib2.urlopen(request)  #requests request result2 = requests.get("https://gtfsrt.api.translink.com.au/feed/seq") print result2.encoding  #urllib2 write text open("output.txt", 'w').close() text_file = open("output.txt", "w") text_file.write(result.read()) text_file.close()  open("output2.txt", 'w').close() text_file = open("output2.txt", "w") text_file.write(result2.text) text_file.close() 

the openurl().read() works fine requests.get().text doesn't work given url. suspect has encoding, don't know what. thoughts?

note: supplied url feed in google protocol buffer format, once receive message give feed google library interprets it.

your issue you're making requests module interpret binary content in response text.

a response requests library has 2 main way access body of response:

since protocol buffers binary format, should use result2.content in code instead of result2.text.


response.content return body of response as-is, in bytes. binary content want. text content contains non-ascii characters means content must have been encoded server bytestring using particular encoding indicated either http header or <meta charset="..." /> tag. in order make sense of bytes therefore need decoded after receiving using charset.

response.text convenience method you. assumes response body text, , looks @ response headers find encoding, , decodes you, returning unicode.

but if response doesn't contain text, wrong method use. binary content doesn't contain characters, because it's not text, whole concept of character encoding not make sense binary content - it's applicable text composed of characters. (that's why you're seeing response.encoding == none - it's bytes, there no character encoding involved).

see response content , binary response content in requests documentation more details.


Comments

Popular posts from this blog

toolbar - How to add link to user registration inside toobar in admin joomla 3 custom component -

linux - disk space limitation when creating war file -