logo       

Re: Dumping what I have?: msg#00278

nutch-user.lucene.apache.org

Subject: Re: Dumping what I have?

yes, there are tools which you can use to dump the content of crawl db,
link db and segments.

dump=./crawl/dump
bin/nutch readdb $crawl/crawldb -dump $dump/crawldb
bin/nutch readlinkdb $crawl/linkdb -dump $dump/linkdb
bin/nutch readseg -dump $1 $dump/segments/$1

you will get more info if you call

bin/nutch readdb
bin/nutch readlinkdb
bin/nutch readseg

Paul Tomblin schrieb:
> The nutch data files are pretty opaque, and even "strings" can't extract
> anything except the occasional URL. Is there any code to dump the contents
> of the various files in a human readable form?
>
>

Google Custom Search

News | Mail Home | sitemap | FAQ | advertise