|
how to exclude some external links: msg#00297nutch-user.lucene.apache.org
Hi, I would like to know how can I modify nutch code to exclude external links with certain extensions. For example, if have in urls mydomain.com and my domain.com has a lot of links like mydomain.com/mylink.shtml, then I want nutch not to fetch(crawl) these kind of urls at all. Thanks Alex.
|
|
||||||||||||||||||||||||||
|
|
|
| News | Mail Home | sitemap | FAQ | advertise |