|
Re: Include/exclude lists: msg#00286nutch-user.lucene.apache.org
i would suggest that you implement an urlfilter plugin which is doing that. which is mapping hosts to regexp rules. Paul Tomblin schrieb: > Is there any way other than the config files to specify the url filter > parameters? I have a few dozen sites to crawl, and for each site I > want to specify its own includes and excludes. I don't want to have > to go into the config file and change the > <property><name>urlfilter.regex.file</name> each time. Can I specify > that on the command line to bin/nutch generate or something? > >
|
|
||||||||||||||||||||||||||
|
|
|
| News | Mail Home | sitemap | FAQ | advertise |