Spam Indexer

Subversion Code

spam_mgr.py

See spam_www for info on the PHP web interface to the processing this script does.

Requirements

  • SQL Alchemy
  • Python MySQL bindings

Setup

Database

Create a database with the following format

CREATE TABLE `spam` (
  `filename` varchar(255) NOT NULL,
  `rec_date` datetime DEFAULT NULL,
  `sent_to` varchar(255) NOT NULL,
  `sender` varchar(255) NOT NULL,
  `subject` varchar(255) NOT NULL,
  `spam_score` text NOT NULL,
  `message` text NOT NULL,
  PRIMARY KEY  (`filename`),
  KEY `rec_date` (`rec_date`),
  FULLTEXT KEY `message` (`message`),
  FULLTEXT KEY `spam_score` (`spam_score`)
) ENGINE=MyISAM

Config File

  1. cp config.ini.sample config.ini
  2. Edit the SPAM section of the config.ini with your specific database settings and paths

Adding to Cron

Run at the top of every hour. Anecdotal evidence as to msgs/hour: my server processed 6000 messages in 4 minutes.

0 * * * * python /path/to/server_manager/spam_mgr.py

FAQ

Cannot find the config.ini

The script needs to be called from the directory that it resides in. You can hard code the path to the config.ini in the script to correct this if you have to call the script from another directory.

Remove old spam

DELETE FROM `spam` WHERE rec_date < '2007-01-01'

Bugs/ToDo

  • Add a total of the number of spam messages in the server to the index.php
  • Look at limiting the number of results to keep memory usage down (pagination?)
  • Look at only grabbing required fields on the search page to limit memory usage
  • Parse the message date and store that in the db so that it can be sorted by the front end
  • Parse the X-SPAM-Status field and store the spam tests scored. Perhaps build a front end around viewing how often a certain test hits or something?
    • Add text to the front page to basically say “Parsed XXX messages hitting XXX spam rules”
  • Look at implementing the db backend in sqlite with full text indexing of the messages
    • Test speed different in query vs mysql

Screenshots

Default Index

Search for rharding results

Search results ordered by date

Viewing a blocked message

"Released" spam message

 
software/spam_indexer/home.txt · Last modified: 14:41 14/07/2007 (external edit)