internet explorer - Importing/scraping an website into excel -
i trying scrape data database, , have pretty set. in ie tab has me logged in database, , paste query link there through vba. how extract data returns ie tab , put excel cell or array.
this code have opening query:
sub import() dim row integer dim strtargetfile string dim wb workbook dim test string dim ie object call fill_array_cultivar row = 3 4 sheets.add.name = cultivar_array(row, 1) strtargetfile = "https://www3.wipo.int/pluto/user/jsp/select.jsp?fl=app_date%2cden_info%2cden_final&hl=false&json.nl=map&wt=json&type=upov&start=0&qi=3-nncxq6etevv184o9nnd5yg%3d%3d&q=cc%3ait%20and%20latin_name%3a(zea%20mays)%20and%20den_info%3a" & trim(cultivar_array(row, 1)) & "&facet=false" set ie = getie("https://www3.wipo.int" & "*") if not ie nothing ie.navigate (strtargetfile) else msgbox "ie not found!" end if next row end sub and appropriate function:
'find ie window matching (partial) url 'assumes no frames. function getie(saddress string) object dim objshell object, objshellwindows object, o object dim retval object, surl string set retval = nothing set objshell = createobject("shell.application") set objshellwindows = objshell.windows 'see if ie open each o in objshellwindows surl = "" on error resume next surl = o.document.location on error goto 0 if surl <> "" if surl saddress & "*" set retval = o exit end if end if next o set getie = retval end function what website returns me white page line of text. here example:
{"response":{"start":0,"docs":[{"den_final":"abacus","app_date":"1998-01-13t22:59:59z"}],"numfound":1},"qi":"3-nncxq6etevv184o9nnd5yg==","sv":"bswa2.wipo.int","lastupdated":1436333633993} ps. tried using importxml function, import website, error page, not recognize me logged in.
i found solution, simple hard find. can grab ie.document.body.innertext text need. see code updated below:
sub import() dim row integer dim strtargetfile string dim wb workbook dim test string dim ie object dim pagetext string call fill_array_cultivar row = 3 4 sheets.add.name = cultivar_array(row, 1) strtargetfile = "https://www3.wipo.int/pluto/user/jsp/select.jsp?fl=app_date%2cden_info%2cden_final&hl=false&json.nl=map&wt=json&type=upov&start=0&qi=3-nncxq6etevv184o9nnd5yg%3d%3d&q=cc%3ait%20and%20latin_name%3a(zea%20mays)%20and%20den_info%3a" & trim(cultivar_array(row, 1)) & "&facet=false" set ie = getie("https://www3.wipo.int" & "*") if not ie nothing ie.navigate (strtargetfile) until ie.readystate = 4: doevents: loop pagetext = ie.document.body.innertext activesheet.cells(1, 1) = pagetext pagetext = empty else msgbox "ie not found!" end if next row end sub
Comments
Post a Comment