Hey guys, the result of innerText after parsing html document seems to be strange.
Source code:
from strformat import `&`
import htmlparser
import xmlparser
import xmltree
let doc = """
<html>
<body>
<h1>Test Title :
<strong>Hello, world!</strong>
</h1>
</body>
</html>"""
let html = parseHtml(doc)
echo &">>{html.innerText}<<"
let html2 = parseXml(doc)
echo &">>>{html2.innerText}<<<"
Output:
>>
Test Title :
Hello, world!
<<
>>>Test Title :
Hello, world!<<<
Desired output:
>>> Test Title : Hello, world! <<<
With html system, multiple spaces are ignored as you know. Is the current output valid? Or bug?
Is the HTML spec relevant? The nodes are being converted to text, not HTML. Whitespace is part of that text.
The inconsistency between htmlParser and xmlParser is odd though.