I think it would be beneficial for external search engines if the forum would create a sitemap of all posts. The robots.txt could contain an entry for the sitemap that shows search engines the way:
Sitemap: https://forum.nim-lang.org/sitemap.xml
For the sitemap generation I created a small library ( https://github.com/enthus1ast/nimSimpleSitemap ) that generates sitemaps in this form:
sitemap.xml
sitemap_0.xml
sitemap_1.xml
sitemap_2.xml
sitemap_recent.xml
Sitemaps should not exceed 50_000 entries per file, so we need to create multiple files in the future (my lib already does this)
The sitemap.xml links all the other sitemaps.
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://forum.nim-lang.org/sitemap_recent.xml</loc>
<lastmod>2024-02-14T12:55:35+01:00</lastmod>
</sitemap>
<sitemap>
<loc>https://forum.nim-lang.org/sitemap_0.xml</loc>
<lastmod>2024-02-14T12:55:35+01:00</lastmod>
</sitemap>
<sitemap>
<loc>https://forum.nim-lang.org/sitemap_1.xml</loc>
<lastmod>2024-02-14T12:55:35+01:00</lastmod>
</sitemap>
<sitemap>
<loc>https://forum.nim-lang.org/sitemap_2.xml</loc>
<lastmod>2024-02-14T12:55:35+01:00</lastmod>
</sitemap>
</sitemapindex>
sitemap_0.xml to sitemap_N.xml
This contains the actual links, each limited to 50_000.
<urlset xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd http://www.google.com/schemas/sitemap-image/1.1 http://www.google.com/schemas/sitemap-image/1.1/sitemap-image.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
<url>
<loc>https://forum.nim-lang.org/t/1/foo</loc>
<lastmod>2023-02-14T13:10:42+01:00</lastmod>
</url>
<url>
<loc>https://forum.nim-lang.org/t/2/foo</loc>
<lastmod>2023-02-14T12:10:42+01:00</lastmod>
</url>
</urlset>
sitemap_recent.xml
Contains the x posts that were last updated.
<urlset xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd http://www.google.com/schemas/sitemap-image/1.1 http://www.google.com/schemas/sitemap-image/1.1/sitemap-image.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
<url>
<loc>https://forum.nim-lang.org/t/1/foo</loc>
<lastmod>2023-02-14T13:10:42+01:00</lastmod>
</url>
</urlset>
I am in the process of changing the code of the forum so that the sitemaps are generated at startup. But this sitemap generation must be executed periodically.
At the moment I'm thinking of an asynchronous process that sleeps for about 1 day and then regenerates the sitemap. But maybe there are better options, for example it could run as a standalone application with a cron job, but then we need to avoid locking the forum's sqlite.
Any ideas?
I've looked through the forums routes, there is one that also allows the name of the post in the url:
/t/123/foo
i think the sitemap should also contains those, i've the feeling that this is even more searchengine friendly.Here is a paper from google about sitemaps:
https://developers.google.com/search/docs/crawling-indexing/sitemaps/build-sitemap