logo       

RE: TRIEs in the core (was: Re: Module for simple processing of " log" file: msg#00052

Subject: RE: TRIEs in the core (was: Re: Module for simple processing of " log" files_

> Orton, Yves wrote:
> [...]
> > <shameless plug>
> > But David and the other Regexp authors need to update their
> code to take
> > advantage of 5.9.2 and later innate TRIE optimisation. They
> still have
> > room for optimising the patterns that they build but they
> will need to
> > build fairly different looking patterns to really harness
> the TRIE regop.
> >
> > </shameless plug>
>
> No, I've been following the threads on p5p. I've been looking hard at
> the stuff I do, and the patterns I generate come from little patterns
> that all tend to feature lots of metacharacters (otherwise
> I'd be doing
> hash lookups or index()), correct me if I'm wrong, such
> patterns don't
> benefit from your trie optimisations. E.g., what happens with
>
> FROM MRS\. [A-Z]+ [A-Z]+
> FROM MRS [A-Z]+ [A-Z]+
> FROM MR [A-Z]+ [A-Z]+
> FROM MR\. [A-Z]+ [A-Z]+
> FROM: MRS\. [A-Z]+ [A-Z]+
> FROM: MRS [A-Z]+ [A-Z]+
> FROM: MR [A-Z]+ [A-Z]+
> FROM: MR\. [A-Z]+ [A-Z]+
>
> (actual patterns lifted from Nigerian spam). R::A produces
>
> FROM:? MRS?\.? [A-Z]+ [A-Z]+
>
> Instead of the whole mess or'ed together. I'm seriously
> lacking time to  benchmark the differences.

Ill see what I can do.

Also I think this is a perfectly reasonable output. But what about when you add TO: variants to the list? Or a different header field? You would then want to end up with

 /(FROM|TO):? MRS?\.? [A-Z]+ [A-Z]+/

Which would allow the tree optimization, although its likely the full expansion of the first part would be faster as it would require less regops to be executed which in itself speeds things up.

Which is what i was trying to get at (although i expressed myself poorly). There is still room for perl side regex optimisation, it just needs to be made aware of the TRIE support now built into Perl, and possibly the A-C support if it gets applied.

Yves




<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

Recently Viewed:
boot-loaders.gr...    php.pear.genera...    debugging.valgr...    kde.redhat.user...    text.xml.xsl.ge...    culture.languag...    hardware.microc...    java.servicemix...    redhat.release....    web.zope.plone....    user-groups.lin...    opendarwin.webk...    video.mjpeg.use...    sysutils.bcfg2....    encryption.gpg....    lx-office.devel...    xfree86.forum/2...    mail.mutt.devel...    acpi.devel/2003...    qnx.openqnx.dev...    network.irc.irs...    freebsd.devel.m...   
Home | blog view | USPTO Patent Archive | advertise | OSDir is an inevitable website. super tiny logo

Free Magazines

Cisco News
Receive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business.
subscribe

Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field.
subscribe

The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business.
subscribe

Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company.
subscribe

Total Telecom Total Telecom is "The Economist of the communications industry".
subscribe