logo       

FYI: Implementation of the HTML 4.01 parsing into org.w3c.dom.html2.HTMLDoc: msg#00153

java.classpath.patches

Subject: FYI: Implementation of the HTML 4.01 parsing into org.w3c.dom.html2.HTMLDocument.

After Chris has implemented the Java binding for Level 2 Document Object Model HTML, it was very simple to direct our parser output into these classes. Chris did a real job: the document model classes find the requested properties self dependently. For instance, I did not need to set the FORM for each INPUT explicitly as the existing getForm() method finds it itself between the parent nodes.

The DOM model may be very convenient for web robots, because it can be analysed using transforms.

As this is a new, undocumented feature, I suggest to include an example as well.

Audrius

Attachment: DomHTMLParser.java
Description: Binary data

Attachment: parse_into_dom_html2.java
Description: Binary data

_______________________________________________
Classpath-patches mailing list
Classpath-patches@xxxxxxx
http://lists.gnu.org/mailman/listinfo/classpath-patches
<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise