I want to scrape some html pages and am thinking to utilize the asynchttpclient for it. Below is what I wrote
import httpclient, streams, asyncdispatch, htmlparser, xmltree
const totalPage = 1 #
let theurl = "http://learn.shayhowe.com/html-css/organizing-data-with-tables/"
var
asynclient = newAsyncHttpClient()
client = newHttpClient()
pages = newSeq[Future[AsyncResponse]](totalPage)
# synchronous get for comparison
var
contentHtml = client.get(theurl).bodyStream.parseHtml
tbody = contentHtml.findAll("tbody")
# To override ctrl+c for stopping the loop
proc toquit() {.noconv.} =
echo "sync tbody length is ", tbody.len
asynclient.close
client.close
quit QuitSuccess
setControlCHook toquit
# this part for asynchronous get
for page in 1 .. totalPage:
pages[page-1] = asynclient.get theurl
pages[page-1].callback = proc(fres: Future[AsyncResponse]) {.thread.} =
var
asyncres = fres.read
content = waitFor asyncres.body
html = content.newStringStream.parseHtml # there's warning that
# parseHtml is not GC-safe
trbody = html.findAll("tbody") # same as above to get tbody tag
echo "async tbody length is ", trbody.len
runForever()
What I got was
trbody is @[] # this is result from async
# then pressing ctrl+c to stop
sync tbody length is 29
The question is, why I can't get the html/xml tag after parsed it to XmlNode while I can get the result from synchronous get?
Did I do something wrong for asynchronous call?
Sorry, I didn't recheck the posted code above, there were typos because I didn't copied directly from the problem I wanted to solved. It's fixed now.
I think I misunderstood the cb in callback argument. I thought it didn't return any value since the manual didn't state it but before I put the return type, the compiler complained "type mismatch" when I used the {.async.} pragma, it expected cb: proc(...): T while I provided cb: proc(...).
I solved it with synchronous httpclient.get before but now I think I can try using asynchronous again.
Thanks for help :)
EDIT: nimsuggest didn't report an error when trying using {.async} but compiler throw error about expecting type Future. Hmm.