logo       

Re: Include/exclude lists: msg#00286

nutch-user.lucene.apache.org

Subject: Re: Include/exclude lists

i would suggest that you implement an urlfilter plugin which is doing that.
which is mapping hosts to regexp rules.

Paul Tomblin schrieb:
> Is there any way other than the config files to specify the url filter
> parameters? I have a few dozen sites to crawl, and for each site I
> want to specify its own includes and excludes. I don't want to have
> to go into the config file and change the
> <property><name>urlfilter.regex.file</name> each time. Can I specify
> that on the command line to bin/nutch generate or something?
>
>

<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | Mail Home | sitemap | FAQ | advertise