logo       

adding [-numFetchers numFetchers] to crawl: msg#00244

nutch-user.lucene.apache.org

Subject: adding [-numFetchers numFetchers] to crawl

How do I set the number of Map tasks when I do a command like



hadoop jar nutch-1.0.job org.apache.nutch.crawler.Crawl



?



I think I'm going to try out the change below, is there any reason not
to do it, or is Crawl supposed to be more of a demo and I should write
some script or my own crawler class?



> diff Crawl.java.orig Crawl.java

53c53

< ("Usage: Crawl <urlDir> [-dir d] [-threads n] [-depth i]
[-topN N]");

---

> ("Usage: Crawl <urlDir> [-dir d] [-threads n] [-depth i]
[-topN N] [-numFetchers]");

65a66

> int numFetchers = -1;

78a80,82

> } else if ("-numFetchers".equals(args[i])) {

> numFetchers = Integer.parseInt(args[i+1]);

> i++;

116c120

< Path segment = generator.generate(crawlDb, segments, -1, topN,
System

---

> Path segment = generator.generate(crawlDb, segments,
numFetchers, topN, System

<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | Mail Home | sitemap | FAQ | advertise