logo       

Related Msgs: audio.musicbrai...    enbd.general/20...    ietf.idr/2002-0...    java.ant-contri...    gnu.make.genera...    qplus.devel/200...    video.freevo.cv...    os.netbsd.ports...    yellowdog.gener...    xfree86.cvs/200...    search.nutch.us...    freedesktop.xse...    programming.swi...    capabilities.ge...    telephony.pbx.a...    mail.sylpheed.c...    db.firebase.por...    boot-loaders.u-...    recreation.radi...    netbsd.bugs/200...    web.zope.plone....    user-groups.lin...   

Convert HTML to XHTML with namespace prefix using Neko + Xerces: msg#00045

Subject: Convert HTML to XHTML with namespace prefix using Neko + Xerces
Alias: How to force a default namespace to use prefix

Sorry if I missed something important, I'm quite new to namespace problematics.
But I'm deadlocked at the last point to solve of the whole transformation 
process.

Everything works nice, except that XHTML namespace is set as default namespace, 
so no prefixes, preferably 'html' prefix, is not included in element names when 
serialized back to string.

I'm getting:
<html xmlns="http://www.w3.org/1999/xhtml";>
<body> some <b> bold </b> text </body>
</html>

But I need:
<html xmlns:html="http://www.w3.org/1999/xhtml";>
<html:body> some <html:b> bold </html:b> text </html:body>
</html>

Why? Because in reality I pick peaces of html - often corrupt! - from database 
transforming them to valid xhtml and finally assemble them into another, bigger 
XML, with multiple namespaces.  Indeed, I build RSS/Atom feed.

So my question is like:
how to force a default namespace to use prefix. 
Is this relevant to parser or serializer (transformer)?
how to pick a prefix name for namespace. Preferably 'html'.

Here is my code:

// set up Neko parser, set html tag fixing routines and namespaces on
org.cyberneko.html.parsers.DOMParser parser = new DOMParser();

parser.setFeature(
   "http://cyberneko.org/html/features/balance-tags";, true);
parser.setProperty(
   "http://cyberneko.org/html/properties/names/elems";, "lower");
parser.setFeature(
   "http://cyberneko.org/html/features/override-namespaces";, 
   true);
parser.setFeature(
   "http://cyberneko.org/html/features/insert-namespaces";,
    true);
parser.setProperty(
   "http://cyberneko.org/html/properties/namespaces-uri";,
   "http://www.w3.org/1999/xhtml";);
            
// parse html fragment, fix it and return full and valid XML document
parser.parse(
   new InputSource(
   new StringReader(htmlFragment)));
return  parser.getDocument();

// ..OK, let's serialize it back to string!

// prepare serializer
StringWriter sw = new StringWriter();
Transformer t = TransformerFactory.newInstance()
  .newTransformer();
t.setOutputProperty(OutputKeys.METHOD, "xml");
t.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");

// Serialize DOM tree
t.transform(new DOMSource(node),new StreamResult(sw));
String outputXHTML = sw.toString();

P.S.
NekoHTML parser is a real treasure! Helping much with closing html 
tags, misballanced tags etc. Thanks Andy.



Try Searching:
servers, voip, java, networking, microsoft ...
<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

Home | blog view | USPTO Patent Archive | advertise | OSDir is an inevitable website. super tiny logo