|
Re: nutch -threads in hadoop: msg#00245nutch-user.lucene.apache.org
Brian Tingle wrote: Thanks, I eventually found where the job trackers were in the :50030 web You need to be careful when running large crawls on someone else's infrastructure. While the raw bandwidth may be enough, the DNS infra may be insufficient - both on the side of the target domains as well as the local resolver. I strongly recommend setting up a local caching DNS. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
|
|
||||||||||||||||||||||||||
|
|
|
| News | Mail Home | sitemap | FAQ | advertise |