For people keeping an eye on the code running this forum (available at GitHub) I've recently tried to merge in some open PRs, fix some common annoyances, and generally just improve the forum slightly over the past month or so.
One of the issues which I've tried to solve is something that almost exclusively applies to moderators. Unbeknownst to most of our regular users this forums receives a surprising amount of spam. There is a captcha, an e-mail verification, and each new user requires human action to get the status of User (by default users are Moderated which means only moderators and administrators can see the users messages). Even with these steps we can see as much as 5-10 spam posts from new accounts in a single day, although typically it averages to about 1-2 a day. As you can imagine this is pretty tedious.
So what have we done to prevent spam? A couple of things:
These checks together has almost completely fixed the spam issue! And at the very least made it even more invisible to our users thanks to the RSS fixes. All in all the experience for users and moderators alike should be improved. Currently the fixes live in a separate branch but will be merged into the master branch of the forum once they've had a bit more time in testing (currently they've been running for about a week). The spam word-list will however not be shared as spammers could use this to manipulate their messages to pass the filter. If you are running a forum based on Nimforum (please say hi in the comments!) you can use this script to generate a wordlist compatible with the feature currently residing on the autospam branch:
import tables, strutils
import db_connector/db_sqlite
var db = open(connection="./nimforum.db", user="", password="", database="nimforum")
var wordUsage: Table[string, tuple[normal, spam: int]]
for row in db.fastRows(sql"SELECT person.status, post.content FROM person, post WHERE post.author == person.id"):
if row[0] notin ["Spammer", "User", "Moderator", "Admin"]: continue
let
words = row[1].toLowerAscii.splitWhitespace()
spammer = row[0] == "Spammer"
for word in words:
if spammer:
wordUsage.mgetOrPut(word, (0,0)).spam += 1
else:
wordUsage.mgetOrPut(word, (0,0)).normal += 1
var file = open("wordlist.csv", fmWrite)
for word, usage in wordUsage:
file.writeLine usage.spam / (usage.normal + usage.spam), ",\"", word, "\""
file.close()
making sure that the RSS feed and the threads.json list doesn't contain spam posts.
Ahhhh so, so much nicer in my RSS client! Thanks so much for this.
@SpotlightKid, quite possible. I just wanted to get something working quickly so I didn't actually look into any kind of research in the area.
@forkbomb, no problem! The fact that the RSS feed was full of spam never sat quite right with me..