Movatterモバイル変換


[0]ホーム

URL:


Any Neural Net code in Python? I want to filter out spam email

Alex Martellialeaxit at yahoo.com
Thu Apr 19 03:23:44 EDT 2001


"Ken Seehof" <kens at sightreader.com> wrote in messagenews:mailman.987656068.4191.python-list at python.org...    [snip -- some quoting-level problems -- Ken's quoting Dan]"""> How about this - apply a whole set of tests to the message. Each test> gives a "spammness" score - e.g. 10 points for being all caps, 50 points> for having the word 'viagara', 100 points for having a suspicious From:> address like *@yahoo.com. Add the scores from the different tests, and> if the sum exceeds, say, 200 points, then call it "spam.">> So, how do you figure out a good value for each test score? This is where> you could use a neural network or genetic algorithm. Pick a set of> scores, feed the program lots of messages (both spam and non-spam), and> see how accurate it is. Iterate until it rejects every spam email and> accepts every non-spam..."""There may not exist a vector of feature weights that performsperfectly, of course.  What one generally wants is a vector offeature weights that _optimizes_ some performance score."""Excellent idea, Dan.  That's conveniently sidesteps the most difficultissue: getting the neural network to actually come up with linguisticrules.  Once an intelligent human specifies the set of rules, the neural"""Right.  Extracting the features for classification is an orderof magnitude harder that weighing them optimally.My old-fashioned approach to such feature-weighting problems isto apply a general-purpose optimization algorithm (simulatedannealing, for choice).  That's easy to code/test/tune and letsme experiment with all sort of "weird" nonlinearities in theclassification engine, as long as I can get a classifier thattakes a vector of N real parameters and can be run on the trainingset to produce a classification whose 'cost' is then measurable.False-positives and false-negatives can of course easily begiven different costs in this approach, and in some cases beingable to get a three-way classifier (yes/no/dunno, with somecost for each dunno answer of course) can be important.A faithful Python transcription of Goffe's Fortran tutorialprogram for simulated annealing (the Fortran original is athttp://emlab.berkeley.edu/Software/abstracts/goffe895.html)turns out to be less than 600 lines, over half of which aredocstrings, comments and printing-functions that only existto help gain understanding about the algorithm, the functionone is studying, etc.  Unfortunately, I'm not sure I canredistribute that transcription, given Goffe's copyright --it IS a derived work of his copyrighted one.  It could ofcourse be redone in a more Pythonical mold, and to use someunderlying extension module if available (I am not aware ofother Simulated Annealing implementations in Python, or asPython extension modules, at this time, although of courseit's likely that many exist -- but I can't find them on thenet!).  I have written Dr Goffe asking for permission, andI think I can in the meantime email sa.py privately (thoughnot "publish" it) if requested.Alex


More information about the Python-listmailing list

[8]ページ先頭

©2009-2025 Movatter.jp