|
crawl-tool.xml: msg#00257nutch-user.lucene.apache.org
i have tried the recrawl script of susam pal and have wondered why url filtering no longer works. http://wiki.apache.org/nutch/Crawl the mystery is only Crawl.java adds crawl-tool.xml to the NutchConfiguration. Configuration conf = NutchConfiguration.create(); conf.addResource("crawl-tool.xml"); Fetcher.java and all the other tools which filter the outlinks do not add this. this is really confusing me and i have spent some time to figure this out. regards reinhard
|
|
||||||||||||||||||||||||||
|
|
|
| News | Mail Home | sitemap | FAQ | advertise |