|
adding [-numFetchers numFetchers] to crawl: msg#00244nutch-user.lucene.apache.org
How do I set the number of Map tasks when I do a command like hadoop jar nutch-1.0.job org.apache.nutch.crawler.Crawl ? I think I'm going to try out the change below, is there any reason not to do it, or is Crawl supposed to be more of a demo and I should write some script or my own crawler class? > diff Crawl.java.orig Crawl.java 53c53 < ("Usage: Crawl <urlDir> [-dir d] [-threads n] [-depth i] [-topN N]"); --- > ("Usage: Crawl <urlDir> [-dir d] [-threads n] [-depth i] [-topN N] [-numFetchers]"); 65a66 > int numFetchers = -1; 78a80,82 > } else if ("-numFetchers".equals(args[i])) { > numFetchers = Integer.parseInt(args[i+1]); > i++; 116c120 < Path segment = generator.generate(crawlDb, segments, -1, topN, System --- > Path segment = generator.generate(crawlDb, segments, numFetchers, topN, System
|
|
||||||||||||||||||||||||||
|
|
|
| News | Mail Home | sitemap | FAQ | advertise |