logo       

Re: Pages with Specific URLS.: msg#00239

nutch-user.lucene.apache.org

Subject: Re: Pages with Specific URLS.

because?
you mean urls which contain a query part?

they can be crawled.
the default nutch configuration excludes them by this filter rule in
conf/crawl-urlfilter.txt

# skip URLs containing certain characters as probable queries, etc.
-[?*!@=]


Zaihan schrieb:
> Hi All,
>
> I'm sure I've read somewhere before that URLs that is made like
> http://www.site.com/categories.asp?cid=25&page=9
>
> Can't be crawled. Is that true?
>
> Warmest Regards,
> Zaihan
>
>
>
>
>

<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | Mail Home | sitemap | FAQ | advertise