logo       

Re: Thought on future of XMLC: msg#00119

java.enhydra.xmlc

Subject: Re: Thought on future of XMLC


Hi Arno!

Arno Schatz <list@xxxxxxxxxxxxx> writes:
> We are setting the bar a bit higher if we require html being corecct to the
> spec. Most html developers can live without the spec until they get to know
> XMLC and JTidy. On my other project, I am working with an external html
> design studio. Even though they are very capable, I again have to explain
> what correct HTML is and what not.

Is it too much to ask people to know how to use their tools? You are doing
the internet a big favor by forcing them to know how to write correct
HTML. Sorry you have to deal with this attitude, it sounds frustrating.

It was a major mistake for browsers to ever support invalid html. It's why
their is so much invalid HTML and why it's so hard to make compatibile pages,
develop new web browsers, write tools that analyze and index HTML. If
browsers had simply rejected bad html, there wouldn't be any.

> Technically XMLC does not need to do that either, if it adopts the concept
> of not changing more of the original html as neccessary.

Actually, it does. XMLC transforms the document into an object hierarchy; the
original document is gone. The DOM does not represent the formatting of text
file that was inputed, it represents the data in it.

There is no way to produce the original formatting, even if valid, only the
original meaning. If invalid, it leads to impossible to handle cases. For
instance, I have seen <BR></BR> used. This is not valid html, and doesn't
render the same as <BR>. Yet there is no way for XMLC to represent this in
the DOM. It would always output <BR>. Modifing the DOM to be able to
represent any type of invalid html would be monumental undertaking.

> However, it insists on generating the HTML from nodes even if they were not
> changed (instead of taking the original HTML from the file), which makes it
> slower and IHMO more difficult to use.

This seems very hard to implement and would result in something complex and
unpredictable.. XMLC has no way to know what is going to be modified. It
would have to know how to get back to the original parts of the document for
every subtree, however defining the subtrees means parsing the document. The
rendering of parts of the document would be different depending on if it was
coming from the DOM or the document. If this really needed, I think it
indicates that XMLC is fundamentally flawed.

> One great thing about xmlc is, that it lets the html developer work the way
> they are used to (with static html). This being said, I don't think we
> should force them to validate their HTML, because that is not the way they
> work. I think the tools should work the way most people are used to work
> successfully on projects. (Not adapting the people to a way of working we
> think they should work.)

Java programmers are expected to develop valid java, why shouldn't HTML
developers write valid HTML? Sorry, I just don't buy this. It's why there is
a HTML standard. HTML is a file format, not free form text. There are a lot
of unemployed web designers, they can be replaced! If the way people want to
work is create something that isn't HTML, only looks like it, well, it isn't
HTML.

I don't thing we (as techincal people) should be encouraging improper use of
tools, even if it is they way they `like' to work. We should have the
patience to educate people who don't come from a computer background on how
computers sortware works and why doing something that is almost right makes it
an artificial intelligence problem to deal with.

Programmer's have to make compormises to implement things within the
constraints of the languages they use, html developers might have to
compromise their design a little to be able to do it in html, but that is just
part using software.

> taking the design goal of xmlc serious, that it only supports valid
> documents, then why not breaking the build if xmlc encouters any warning?

I actually think it should. At least in the case where it needs to modify the
contents to produce a valid document. This seems to be a big part of the
problem you are having; JTidy tries to correct the document to valid HTML, but
the results are not predictable.

The way Tidy was designed to be used was it would correct the document, then
one examines it and edits it as needed. However, when using XMLC, one can't
edit the results, one has to go back to the orignal document. Their is a big
delay between creating and compiling. Since they are warnings, people ignore
them (especially when some of the warnings are debatable as being actual
warnings).

Sorry, I made a big mistake by not figuring out how to turn these in to
errors. IMHO, this would be a good thing to add to XMLC, with an option to
reenable the document correction.

> Why support the Swing parser? I think these two things are already
> kompromises.

The swing parser is there for historic reasons; it was the original parser,
since there was no other freely available parser at the time. It probably
should not be used.

If people really want to use invalid html, XMLC is the wrong tool. One of the
tools based on string substitution is the way to go. However, IMHO, invalid
html is the wrong thing to do no matter what tool they are using.


Take care
Mark


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise