In the following example:
import htmlparser, xmltree, strtabs, streams
const
test = """
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<!-- This file is generated by Nimrod. -->
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<body>
<img src="test.txt">
</body></html>
"""
proc mangle() =
var
html = test.new_string_stream.parse_html
DID_CHANGE: bool
for img in html.find_all("img"):
let src = img.attrs["src"]
if not src.is_nil:
img.attrs["src"] = "Something else"
DID_CHANGE = true
if DID_CHANGE:
echo "Did change, output:", html
when isMainModule: mangle()
an input HTML string extracted from nimrod's own documentation generator is being modified to change URLs in img tags. The output is:
Did change, output:<document>
<!-- This file is generated by Nimrod. -->
<html xml:lang="en" lang="en" xmlns="http://www.w3.org/1999/xhtml">
<body>
<img src="Something else" />
</body></html>
</document>
The original XML and doctype is removed and replaced with a document one. This breaks the original rendering. How can I preserve the source structure?use something like
const xmlBoilerplate = """
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
"""
proc renderHtml(n: PXmlNode): string =
result = xmlBoilerplate
for child in n: result.add(child)
It's certainly a kludge, but then PXmlNode don't store Doctypes and parseHtml ignores them too... We need a doctype module in addition to what we have to deal with this properly.