python - BeautifulSoup get only the "general" text in a td tag, and nothing in nested tags -


say html looks this:

<td>potato1 <span somestuff...>potato2</span></td> ... <td>potato9 <span somestuff...>potato10</span></td> 

i have beautifulsoup doing this:

for tag in soup.find_all("td"):     print tag.text 

and get

potato1 potato2 .... potato9 potato10 

would possible text that's inside tag not text nested inside span tag?

you can use .contents as

>>> tag in soup.find_all("td"): ...     print tag.contents[0] ... potato1 potato9 

what does?

a tags children available list using .contents.

>>> tag in soup.find_all("td"): ...     print tag.contents ... [u'potato1 ', <span somestuff...="">potato2</span>] [u'potato9 ', <span somestuff...="">potato10</span>] 

since interested in first element, go for

print tag.contents[0] 

Comments

Popular posts from this blog

How to provide Authorization & Authentication using Asp.net, C#? -

toolbar - How to add link to user registration inside toobar in admin joomla 3 custom component -

How to use Authorization & Authentication in Asp.net, C#? -