nimforum mirror - Nim forum spam handling

PMunch (orginal) [2025-06-22T20:38:16+02:00] view original

For people keeping an eye on the code running this forum (available at GitHub) I've recently tried to merge in some open PRs, fix some common annoyances, and generally just improve the forum slightly over the past month or so.

One of the issues which I've tried to solve is something that almost exclusively applies to moderators. Unbeknownst to most of our regular users this forums receives a surprising amount of spam. There is a captcha, an e-mail verification, and each new user requires human action to get the status of User (by default users are Moderated which means only moderators and administrators can see the users messages). Even with these steps we can see as much as 5-10 spam posts from new accounts in a single day, although typically it averages to about 1-2 a day. As you can imagine this is pretty tedious.

So what have we done to prevent spam? A couple of things:

First some low-hanging fruit like making sure that the RSS feed and the threads.json list doesn't contain spam posts. This might have fooled spam bots to believe that their topics actually received traffic. In fact many spam threads had a view-count somewhere in the double digits, possibly from RSS feed readers auto-loading them. The forum actually relied on client-side filtering to remove spam topics, this meant that the spam data was still sent to the client, and this unfortunately applied to the RSS as well. The current forum does this filtering on the back-end meaning spam topics will now not get any traffic at all and won't be visible to other than the spammers themselves.

Then the more active bits. I've implemented StopForumSpam to check all e-mails of people trying to sign up. A lot of the spam accounts in our database failed this check, but after it went live it seems that accounts are created, post spam, and get abandoned too fast for them to be added to the site.

I also tried a simple heuristics check as a lot of our spam is very predictable. Topics with all caps or just containing a URL, mentions of dollar amounts, and mentions of certain keywords were all pretty common and together formed a score. If the score was high enough the account got marked as a spammer.

As I was adding the word-list part of the heuristics check I did a full check on the entire forum database. We have nearly 80k posts on the forum from 8k users (about half of which are User accounts and about 2k spam accounts) so there is a pretty good amount of data to draw from. By weighing every word on how often it appeared in spam posts vs. normal posts I've generated a list and selected a threshold that when run over historical data would take almost 90% of spam with a 0.2% false positive rate. Spammers typically only have one post so I believe the 90% figure would be pretty accurate, but this was done on all posts and not simply on first posts so the 0.2% false positive rate is probably even lower.

These checks together has almost completely fixed the spam issue! And at the very least made it even more invisible to our users thanks to the RSS fixes. All in all the experience for users and moderators alike should be improved. Currently the fixes live in a separate branch but will be merged into the master branch of the forum once they've had a bit more time in testing (currently they've been running for about a week). The spam word-list will however not be shared as spammers could use this to manipulate their messages to pass the filter. If you are running a forum based on Nimforum (please say hi in the comments!) you can use this script to generate a wordlist compatible with the feature currently residing on the autospam branch:


import tables, strutils
import db_connector/db_sqlite

var db = open(connection="./nimforum.db", user="", password="",  database="nimforum")

var wordUsage: Table[string, tuple[normal, spam: int]]
for row in db.fastRows(sql"SELECT person.status, post.content FROM person, post WHERE post.author == person.id"):
  if row[0] notin ["Spammer", "User", "Moderator", "Admin"]: continue
  let
    words = row[1].toLowerAscii.splitWhitespace()
    spammer = row[0] == "Spammer"
  for word in words:
    if spammer:
      wordUsage.mgetOrPut(word, (0,0)).spam += 1
    else:
      wordUsage.mgetOrPut(word, (0,0)).normal += 1

var file = open("wordlist.csv", fmWrite)
for word, usage in wordUsage:
  file.writeLine usage.spam / (usage.normal + usage.spam), ",\"", word, "\""
file.close()

Mirror of forum.nim-lang.org

13134 :: Nim forum spam handling