Table of Contents
Spam Indexer
Subversion Code
spam_mgr.py
See spam_www for info on the PHP web interface to the processing this script does.
Requirements
- SQL Alchemy
- Python MySQL bindings
Setup
Database
Create a database with the following format
CREATE TABLE `spam` ( `filename` varchar(255) NOT NULL, `rec_date` datetime DEFAULT NULL, `sent_to` varchar(255) NOT NULL, `sender` varchar(255) NOT NULL, `subject` varchar(255) NOT NULL, `spam_score` text NOT NULL, `message` text NOT NULL, PRIMARY KEY (`filename`), KEY `rec_date` (`rec_date`), FULLTEXT KEY `message` (`message`), FULLTEXT KEY `spam_score` (`spam_score`) ) ENGINE=MyISAM
Config File
- cp config.ini.sample config.ini
- Edit the SPAM section of the config.ini with your specific database settings and paths
Adding to Cron
Run at the top of every hour. Anecdotal evidence as to msgs/hour: my server processed 6000 messages in 4 minutes.
0 * * * * python /path/to/server_manager/spam_mgr.py
FAQ
Cannot find the config.ini
The script needs to be called from the directory that it resides in. You can hard code the path to the config.ini in the script to correct this if you have to call the script from another directory.
Remove old spam
DELETE FROM `spam` WHERE rec_date < '2007-01-01'
Bugs/ToDo
Add a total of the number of spam messages in the server to the index.phpLook at limiting the number of results to keep memory usage down (pagination?)Look at only grabbing required fields on the search page to limit memory usageParse the message date and store that in the db so that it can be sorted by the front end- Parse the X-SPAM-Status field and store the spam tests scored. Perhaps build a front end around viewing how often a certain test hits or something?
- Add text to the front page to basically say “Parsed XXX messages hitting XXX spam rules”
- Look at implementing the db backend in sqlite with full text indexing of the messages
- Test speed different in query vs mysql
Screenshots
software/spam_indexer/home.txt · Last modified: 14:41 14/07/2007 (external edit)








