|
IO exception while adding field in Parsedata parsemeta.: msg#00247nutch-user.lucene.apache.org
Hi I am usinh Nutch-1.0. I want to add field in parseData parseMeta. In org.apache.nutch.parse.html.HtmlParser two fields are already added in original code. metadata.set(Metadata.ORIGINAL_CHAR_ENCODING, encoding); metadata.set(Metadata.CHAR_ENCODING_FOR_CONVERSION, encoding); i added third field metadata.set(Metadata.AGE, "23"); in org.apache.nutch.indexer.IndexerMapReduce in public void reduce(Text key, Iterator<NutchWritable> values, OutputCollector<Text, NutchDocument> output, Reporter reporter) throws IOException method two fields are being added in NutchDocument. NutchDocument doc = new NutchDocument(); final Metadata metadata = parseData.getContentMeta(); // add segment, used to map from merged index back to segment files doc.add("segment", metadata.get(Nutch.SEGMENT_NAME_KEY)); // add digest, used by dedup doc.add("digest", metadata.get(Nutch.SIGNATURE_KEY)); i added third field what i have set in HtmlParser like this. doc.add("age", parseData.getParseMeta().get("age")); By doing so , at indexing level i am getting exception as follow- LinkDb: adding segment: file:/home/ithurs/nutch-1.0/crawl/segments/20090724193527 LinkDb: done Indexer: starting Exception in thread "main" java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232) at org.apache.nutch.indexer.Indexer.index(Indexer.java:72) at org.apache.nutch.crawl.Crawl.main(Crawl.java:152) please tell me (i)How to remove this exception? (ii)how can i add new field in ParseData parseMeta? -- View this message in context: http://www.nabble.com/IO-exception-while-adding-field-in-Parsedata-parsemeta.-tp24645429p24645429.html Sent from the Nutch - User mailing list archive at Nabble.com.
|
|
||||||||||||||||||||||||||
|
|
|
| News | Mail Home | sitemap | FAQ | advertise |