logo       

crawl-tool.xml: msg#00257

nutch-user.lucene.apache.org

Subject: crawl-tool.xml

i have tried the recrawl script of susam pal and have wondered why
url filtering no longer works.
http://wiki.apache.org/nutch/Crawl

the mystery is

only Crawl.java adds crawl-tool.xml to the NutchConfiguration.

Configuration conf = NutchConfiguration.create();
conf.addResource("crawl-tool.xml");

Fetcher.java and all the other tools which filter the outlinks do not
add this.
this is really confusing me and i have spent some time to figure this out.

regards
reinhard







<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | Mail Home | sitemap | FAQ | advertise