ruby - How do I read the content of every HTML tag using Mechanize? -
how write mechanize scraper content every html tag on web page? or need convert page string , use regex content between \<.*?\> , \<\/.*?\>?
to find more information regarding writing web scraper mechanize take @ following tutorials:
- http://readysteadycode.com/howto-scrape-websites-with-ruby-and-mechanize
- http://www.icicletech.com/blog/web-scraping-with-ruby-using-mechanize-and-nokogiri-gems
also keep in mind mechanize uses nokogiri gem underlying scraping. if not attached mechanize consider using nokogiri parse html tags.
do not convert page string , use regex html content. see this answer more information on why bad idea.
--edit--
as @pguardiario mentioned in comment below, code content each tag page.search(*).map &:text
Comments
Post a Comment