I'm working on a simple Instagram image downloader on Nim.
Example is this image: https://www.instagram.com/p/B1oqkXKFlcD
My idea is 1st get the HTML code of that link, like this:
import httpClient
var client = newHttpClient()
var src = "https://www.instagram.com/p/B1oqkXKFlcD"
var htmlsrc = client.getContent(src)
echo htmlsrc
That works fine. You'll get lots of HTML lines. The actual image itself can be found with the og:image tag:
<meta property="og:image" content="https://instagram.fcgk23-1.fna.fbcdn.net/vp/e7663a7f70afcedf1d66424a310bba19/5E13D4FF/t51.2885-15/e35/p1080x1080/67664270_675154659649319_4991162461801475991_n.jpg?_nc_ht=instagram.fcgk23-1.fna.fbcdn.net&_nc_cat=111" />
Now, how can I extract the link: https://instagram.fcgk23-1.fna.fbcdn.net/vp/e7663a..... ? Once I do that, then I can pass it to a image downloading function.
OK. Another attempt:
import httpClient
import re
import xmltree
import htmlparser
import streams
import nimquery
import strutils
var client = newHttpClient()
var url = "https://www.instagram.com/p/B1oqkXKFlcD"
var htmlsrc = client.getContent(url)
let xml = parseHtml(newStringStream(htmlsrc))
let elements = xml.querySelectorAll("meta")
for x in 0 .. elements.len-1:
if contains(elements[x].text, "og:image"):
echo elements[x]
My intention is to print only the line which contains og:image.
It crashes, unfortunately:
fatal.nim(39) sysFatal
Error: unhandled exception: xmltree.nim(176, 10) `n.k in {xnText, xnComment, xnCData, xnEntity}` [AssertionError]
Try this
import httpClient
import xmltree
import htmlparser
import strtabs
var client = newHttpClient()
var url = "https://www.instagram.com/p/B1oqkXKFlcD"
var htmlsrc = client.getContent(url)
let xml = parseHtml(htmlsrc)
for meta in xml.findAll("meta"):
if meta.attrs.hasKey("property") and meta.attrs["property"] == "og:image":
echo "URL: ", meta.attrs["content"]
this works
import httpclient, xmltree, htmlparser, strtabs
import nimquery
var client = newHttpClient()
var url = "https://www.instagram.com/p/B1oqkXKFlcD"
var htmlsrc = client.getContent(url)
let xml = parseHtml(htmlsrc)
let elements = xml.querySelectorAll("[property='og:image']")
for e in elements:
echo e.attrs["content"]
I tried @filip and @SolitudeSF' solutions.
Both give the same result, which is slightly incorrect:
https://instagram.fcgk12-1.fna.fbcdn.net/vp/ff6ac1dc7b428f4177e1d34989c82765/5E13D4FF/t51.2885-15/e35/p1080x1080/67664270_675154659649319_4991162461801475991_n.jpg?_nc_ht=instagram.fcgk12-1.fna.fbcdn.net_nc_cat=1
When the URL is opened on browser, it gives you "URL signature mismatch"
The correct URL is:
https://instagram.fcgk12-1.fna.fbcdn.net/vp/ff6ac1dc7b428f4177e1d34989c82765/5E13D4FF/t51.2885-15/e35/p1080x1080/67664270_675154659649319_4991162461801475991_n.jpg?_nc_ht=instagram.fcgk12-1.fna.fbcdn.net&_nc_cat=1
Notice that there's a & after fna.fbcdn.net. Maybe this is a xmltree bug?
xmltree could handle it, so ... bug report please.
It is a known issue: https://github.com/nim-lang/Nim/issues/1034 and https://github.com/nim-lang/Nim/issues/11713