logo       

[ tidy-Bugs-1642186 ] Parser too greedy over <script> blocks: msg#00043

web.html-tidy.tracker

Subject: [ tidy-Bugs-1642186 ] Parser too greedy over <script> blocks

Bugs item #1642186, was opened at 2007-01-23 07:21
Message generated for change (Settings changed) made by hoehrmann
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=390963&aid=1642186&group_id=27659

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: HTML/XHTML Parser
Group: Current - all platforms
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Nobody/Anonymous (nobody)
>Assigned to: Nobody/Anonymous (nobody)
Summary: Parser too greedy over <script> blocks

Initial Comment:
Input:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
<html>
<head><title></title>
<body>
<script type="text/javascript">
"<script"
</script>
</body>
</html>

Output:
D:\Misc\qc>tidy test.html
line 7 column 15 - Warning: '<' + '/' + letter not allowed here
line 8 column 5 - Warning: '<' + '/' + letter not allowed here
line 9 column 5 - Warning: '<' + '/' + letter not allowed here
line 5 column 9 - Warning: missing </script>
line 5 column 9 - Warning: missing </script>
Info: Doctype given is "-//W3C//DTD HTML 4.01//EN"
Info: Document content looks like HTML 4.01 Strict
Info: No system identifier in emitted doctype
5 warnings, 0 errors were found!

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
<html>
<head>
<meta name="generator" content=
"HTML Tidy for Windows (vers 14 February 2006), see www.w3.org">
<title></title>
</head>
<body>
<script type="text/javascript">
"<script"
<\/script>
<\/body>
<\/html>
</script>
</body>
</html>

To learn more about HTML Tidy see http://tidy.sourceforge.net
Please send bug reports to html-tidy@xxxxxx
HTML and CSS specifications are available from http://www.w3.org/
Lobby your company to join W3C, see http://www.w3.org/Consortium

As you can see the tidy'ed output is worse than the original.

If you need anything else from me drop me an email at nate at redtetrahedron.org




----------------------------------------------------------------------

Comment By: Geoff (geoffmc)
Date: 2007-01-25 19:03

Message:
Logged In: YES
user_id=1408861
Originator: NO

See patch - http://tidy.sf.net/issue/1644645

----------------------------------------------------------------------

Comment By: Björn Höhrmann (hoehrmann)
Date: 2007-01-23 11:06

Message:
Logged In: YES
user_id=188003
Originator: NO

Better <script> parsing algorithms would certainly be most welcome.

----------------------------------------------------------------------

Comment By: Arnaud Desitter (arnaud02)
Date: 2007-01-23 10:58

Message:
Logged In: YES
user_id=566665
Originator: NO

I think this is by design although it could be revisited.

----------------------------------------------------------------------

You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=390963&aid=1642186&group_id=27659

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Tidy-tracker mailing list
Tidy-tracker@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/tidy-tracker
<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise