|
Re: IO exception while adding field in Parsedata parsemeta.: msg#00251nutch-user.lucene.apache.org
On Fri, Jul 24, 2009 at 17:21, Saurabh Suman<saurabhsuman289@xxxxxxxxxx> wrote: > > Hi > I am usinh Nutch-1.0. I want to add field in parseData parseMeta. > In org.apache.nutch.parse.html.HtmlParser two fields are already added in > original code. > Â Â Â Â Â Â Â Â Â Â Â Âmetadata.set(Metadata.ORIGINAL_CHAR_ENCODING, > encoding); > Â Â Â Â Â Â Â Â Â Â Â Âmetadata.set(Metadata.CHAR_ENCODING_FOR_CONVERSION, > encoding); > i added third field > Â Â Â Â Â Â Â Â Â Â Âmetadata.set(Metadata.AGE, "23"); > > in org.apache.nutch.indexer.IndexerMapReduce in public void reduce(Text key, > Iterator<NutchWritable> values, > Â Â Â Â Â Â Â Â Â Â OutputCollector<Text, NutchDocument> output, Reporter > reporter) > Â Âthrows IOException method > two fields are being added Âin NutchDocument. > > Â NutchDocument doc = new NutchDocument(); > Â Âfinal Metadata metadata = parseData.getContentMeta(); > > Â Â// add segment, used to map from merged index back to segment files > Â Âdoc.add("segment", metadata.get(Nutch.SEGMENT_NAME_KEY)); > > Â Â// add digest, used by dedup > Â Âdoc.add("digest", metadata.get(Nutch.SIGNATURE_KEY)); > > > i added third field what i have set in HtmlParser like this. > Âdoc.add("age", parseData.getParseMeta().get("age")); > > ÂBy doing so , at indexing level i am getting exception as follow- > > LinkDb: adding segment: > file:/home/ithurs/nutch-1.0/crawl/segments/20090724193527 > LinkDb: done > Indexer: starting > Â Exception in thread "main" java.io.IOException: Job failed! > Â Â Â Âat org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232) > Â Â Â Âat org.apache.nutch.indexer.Indexer.index(Indexer.java:72) > Â Â Â Âat org.apache.nutch.crawl.Crawl.main(Crawl.java:152) > > > please tell me > (i)How to remove this exception? > (ii)how can i add new field in ParseData parseMeta? You are probably adding your field to parseMeta so trying to get it from contentMeta fails. Just do a parseData.getParseMeta in indexer and it may work. > -- > View this message in context: > http://www.nabble.com/IO-exception-while-adding-field-in-Parsedata-parsemeta.-tp24645429p24645429.html > Sent from the Nutch - User mailing list archive at Nabble.com. > > -- DoÄacan GÃney
|
|
||||||||||||||||||||||||||
|
|
|
| News | Mail Home | sitemap | FAQ | advertise |