logo       

how to exclude some external links: msg#00297

nutch-user.lucene.apache.org

Subject: how to exclude some external links




Hi,

I would like to know how can I modify nutch code to exclude external links with
certain extensions. For example, if have in urls mydomain.com and my domain.com
has a lot of links like mydomain.com/mylink.shtml, then I want nutch not to
fetch(crawl) these kind of urls at all.




Thanks
Alex.







Google Custom Search

News | Mail Home | sitemap | FAQ | advertise