beautifulsoup - extract yahoo finance balance sheet with python -
i learning use beautifulsoup , python extract html table. tried using following code extract balance sheet google. however, can't seem rows scraped correctly.
i can't manage omit rows spacer , don't manage extract rows of totals (eg. total asset).
any advice? advice on simplifying code valuable.
from bs4 import beautifulsoup import requests def bs_extract(stock_ticker): url= 'https://finance.yahoo.com/q/bs?s='+str(stock_ticker)+'&annual' source_code = requests.get(url) plain_text=source_code.text soup = beautifulsoup(plain_text) c1= "" c2= "" c3= "" c4= "" c5= "" table = soup.find("table", { "class" : "yfnc_tabledata1" }) # print (table) row in table.findall("tr"): cells = row.findall("td") if len(cells)==5: c1=cells[0].find(text=true) c2=cells[1].find(text=true) c3=cells[2].find(text=true) c4=cells[3].find(text=true) c5=cells[4].find(text=true) elif len(cells)==6: c1=cells[1].find(text=true) c2=cells[2].find(text=true) c3=cells[3].find(text=true) c4=cells[4].find(text=true) c5=cells[5].find(text=true) elif len(cells)==1: c1=cells[0].find(text=true) c2="" c3="" c4="" c5="" else: pass print(c1,c2,c3,c4,c5) bs_extract('goog')
you might find easier data structured, through yql. see http://goo.gl/qkewxw
Comments
Post a Comment