logo       

Re: Too many false positives...goodbye, razor: msg#00021

mail.spam.razor.user

Subject: Re: Too many false positives...goodbye, razor

Nice, those numbers are quite informative Theo. Makes me think that I was right to set my min_cf to 11, effectively eliminating the noise present in range_01_10, which seems to have a pretty bad S/O.

It also makes me think that upping my min_cf past 11 is also well worth it.

Thanks for posting the data Theo.



At 06:38 PM 2/11/2003 -0500, Theo Van Dinter wrote:
On Tue, Feb 11, 2003 at 06:30:46PM -0500, Matt Kettler wrote:
> In general it seems to be rare that it FPs when the cf is very high, but
> there's a LOT of flaky matches in the lower end.

Well, based on ~90k messages (wow, that's not many...) that are from a
recent spamassassin mass-check run:

OVERALL% SPAM% HAM% S/O RANK SCORE NAME
16.286 42.8416 0.1639 0.996 0.98 0.01 RAZOR2_CF_RANGE_91_100
1.095 2.7929 0.0641 0.978 0.87 0.01 RAZOR2_CF_RANGE_21_30
0.205 0.5222 0.0125 0.977 0.87 0.01 RAZOR2_CF_RANGE_81_90
0.268 0.6777 0.0196 0.972 0.86 0.01 RAZOR2_CF_RANGE_41_50
0.145 0.3667 0.0107 0.972 0.85 0.01 RAZOR2_CF_RANGE_51_60
0.806 1.9686 0.0997 0.952 0.81 0.01 RAZOR2_CF_RANGE_11_20
0.165 0.3990 0.0232 0.945 0.79 0.01 RAZOR2_CF_RANGE_61_70
0.173 0.4137 0.0267 0.939 0.77 0.01 RAZOR2_CF_RANGE_31_40
0.553 1.2175 0.1496 0.891 0.66 0.01 RAZOR2_CF_RANGE_01_10
0.147 0.3227 0.0410 0.887 0.65 0.01 RAZOR2_CF_RANGE_71_80

So basically what this means is that all CF values have false positives.
91-100 is the best matching the most messages while not FPing too much.
Everything else seems to be more questionable due to low hit rates.



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise