What nimtools can do this? I would like something like beautifulspup for pyhon.
Thanks
+1 for HTML Tidy
Using httpclient lib, you request the webpage and get the body content back. From there you run the Tidy executable to clean the raw HTML from that response- making the HTML "standard compliant". Convert that into an XML tree with htmlparser lib and get the data you need with:
https://nim-lang.org/docs/xmltree.html (helps find your patterns, printing things) https://nim-lang.org/docs/parseutils.html (helps find your patterns) https://nim-lang.org/docs/strtabs.html (helps with accessing some XML tree attributes)
The htmlparser lib has a nice example of scraping links and uses some of the libraries above.
I used to use Python because it had a "standards compliant" parser and beautiful soup but once I learned about Tidy I didn't need to use Python anymore. All scraping can be accomplished through Nim's stdlib.
Like so:
import os, streams, parsexml, strutils
if paramCount() < 1:
quit("Usage: htmlrefs filename[.html]")
var filename = addFileExt(paramStr(1), "html")
var s = newFileStream(filename, fmRead)
if s == nil: quit("cannot open the file " & filename)
var x: XmlParser
open(x, s, filename)
while true:
next(x)
if x.kind == xmlEof: break
if x.kind == xmlAttribute and cmpIgnoreCase(x.attrKey, "href") == 0:
echo "found a link: ", x.attrValue
x.close()