osdir.com
mailing list archive

Subject: [ tidy-Bugs-1636028 ] Tidy API function tidyNodeGetText escapes output - msg#00010

List: web.html-tidy.tracker

Date: Prev Next Index Thread: Prev Next Index
Bugs item #1636028, was opened at 2007-01-15 08:27
Message generated for change (Comment added) made by sf-robot
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=390963&aid=1636028&group_id=27659

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: TidyLib APIs
Group: Current - all platforms
>Status: Closed
Resolution: Fixed
Priority: 6
Private: No
Submitted By: ayermakov (ayermakov)
Assigned to: Nobody/Anonymous (nobody)
Summary: Tidy API function tidyNodeGetText escapes output

Initial Comment:
We use tidy mainly through its API, we walk parse tree and extract information
about each node in the tree, and transform it in our own tree structure. The
issue happens when we use tidyNodeGetText() on the TidyNode_Text,
TidyNode_Comment and TidyNode_CDATA type of node.

Consider the input:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
<HEAD>
<TITLE> New Document </TITLE>
<SCRIPT type="text/javascript">document.write("<STYLE>");</SCRIPT>
</HEAD>

<BODY>
Hello!
</BODY>
</HTML>

I'm walking the tidy parse tree and print out a text of each node (for
simplicity reasons). Here is a small function to walk the tree:

static void processNode(TidyDoc tdoc, TidyNode tnod)
{
TidyNode child;

for ( child = tidyGetChild(tnod); child; child = tidyGetNext(child) )
{
ctmbstr name = tidyNodeGetName( child );
TidyNodeType type = tidyNodeGetType(child);
switch ( type )
{
case TidyNode_Comment:
{
TidyBuffer text = {0};
tidyBufInit(&text);
if (tidyNodeGetText(tdoc, child, &text))
{
printf("TidyNode_Comment: %s\n",
text.bp);
tidyBufFree(&text);
}
}
break;
case TidyNode_Text:
{
TidyBuffer text = {0};
tidyBufInit(&text);
if (tidyNodeGetText(tdoc, child, &text))
{
printf("TidyNode_Text: %s\n", text.bp);
tidyBufFree(&text);
}
}
break;
case TidyNode_CDATA:
{
TidyBuffer text = {0};
tidyBufInit(&text);
if (tidyNodeGetText(tdoc, child, &text))
{
printf("TidyNode_CDATA: %s\n", text.bp);
tidyBufFree(&text);
}
}
break;
case TidyNode_Start:
{
processNode( tdoc, child);
}
break;
default:
break;
}
}
}

I call the function as follows:

processNode(tdoc, tidyGetRoot(tdoc));


The output of the function is:

TidyNode_Text: New Document
TidyNode_Text: document.write("&lt;STYLE&gt;");
TidyNode_Text: Hello!

Note that angle brackets converted to sgml entities. It's not clear why, seems
a bug to me.

Even if there is some reason behind this, we would like to get an undistorted
original text without escaping. Is there any option to do that?

----------------------------------------------------------------------

>Comment By: SourceForge Robot (sf-robot)
Date: 2008-02-26 19:20

Message:
Logged In: YES
user_id=1312539
Originator: NO

This Tracker item was closed automatically by the system. It was
previously set to a Pending status, and the original submitter
did not respond within 30 days (the time period specified by
the administrator of this Tracker).

----------------------------------------------------------------------

Comment By: Arnaud Desitter (arnaud02)
Date: 2008-01-27 02:36

Message:
Logged In: YES
user_id=566665
Originator: NO

See http://tidy.sf.net/issue/1877642.

----------------------------------------------------------------------

Comment By: John Snelson (jpcs)
Date: 2008-01-22 12:58

Message:
Logged In: YES
user_id=1041934
Originator: NO

I have submitted a patch that addresses this bug here:

http://sourceforge.net/tracker/index.php?func=detail&aid=1877642&group_id=27659&atid=390965

My chosen method to fix this has been to introduce a new method,
tidyNodeGetValue(), that does not serialize the node.

----------------------------------------------------------------------

Comment By: BjÃrn HÃhrmann (hoehrmann)
Date: 2007-01-16 15:02

Message:
Logged In: YES
user_id=188003
Originator: NO

tidyNodeGetText is supposed to partially serialize a node, it is not meant
to get an element's text content, I think it would be incorrect for it to
not escape special characters in text. We lack a function to get the
content of text nodes, see the Jan 2003 thread "How to access lexbuf?" on
tidy-develop. I think the addition of such a function would address the
requestor's problem.

If tidyNodeGetText is to improved indepentently of the addition of such a
function, it should continue to escape normal text nodes, whether CDATA
element content like for <script> and <style> is escaped should depend on
whether XML/XHTML output is requested, and the content of comments and PIs
should probably never be escaped as if it was text.

----------------------------------------------------------------------

Comment By: Arnaud Desitter (arnaud02)
Date: 2007-01-16 14:35

Message:
Logged In: YES
user_id=566665
Originator: NO

Bjoern,
Could you comment on this patch ?
Thanks,

----------------------------------------------------------------------

Comment By: ayermakov (ayermakov)
Date: 2007-01-16 05:58

Message:
Logged In: YES
user_id=1688233
Originator: YES

Well, seems the bug entry 1166491 describes the same issue. However I do
have a slightly different opinion how it should be resolved. I believe that
tidyNodeGetText should output text of any node 'as-is', without any
processing (escaping). It's true not only for script and style type of
node, but also for a regular html text.
Or at least it should be under control of some option.
File Added: 1636028.diff

----------------------------------------------------------------------

Comment By: Arnaud Desitter (arnaud02)
Date: 2007-01-16 00:44

Message:
Logged In: YES
user_id=566665
Originator: NO

See http://tidy.sf.net/issue/1166491 which contains a patch that may be
correct.
It would be nice if you could provide a patch with a rationale so this
issue could be nailed down.

----------------------------------------------------------------------

You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=390963&aid=1636028&group_id=27659

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Tidy-tracker mailing list
Tidy-tracker@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/tidy-tracker
Was this page helpful?
Yes No
Thread at a glance:

Previous Message by Date: click to view message preview

[ tidy-Bugs-1901689 ] 12.4.1.2 associate labels...flagging hidden tags erroneously

Bugs item #1901689, was opened at 2008-02-25 12:29 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=390963&aid=1901689&group_id=27659 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Accessibility Group: Current - all platforms Status: Open Resolution: None Priority: 5 Private: No Submitted By: Mufasa (mufasa) Assigned to: Nobody/Anonymous (nobody) Summary: 12.4.1.2 associate labels...flagging hidden tags erroneously Initial Comment: Hidden (and a few other) input fields are getting flagged under Accessibility Level 2 rule 12.4.1.2 ("associate labels explicitly with form controls") when they should not be. See http://www.w3.org/TR/2007/WD-WCAG20-TECHS-20071211/H44.html#H44-description for more info. The important parts are: <quote> Note 1: Elements that use explicitly associated labels are: * input type="text" * input type="checkbox" * input type="radio" * input type="file" * input type="password" * textarea * select Note 2: The label element is not used for the following because labels for these elements are provided via the value attribute (for Submit and Reset buttons), the alt attribute (for image buttons), or element content itself (button). * Submit and Reset buttons (input type="submit" or input type="reset") * Image buttons (input type="image") * Hidden input fields (input type="hidden") * Script buttons (button elements or <input type="button">) </quote> (I have not tested if the other controls in this list are incorrectly flagged or not.) This is especially noticeable for any developers that use frameworks, such as ASP.NET, that use hidden fields to track view state on every page. Example: <input type="hidden" name="ctl1000" id="__VIEWSTATE" value="/wEPDwdfczNTUyMzEyZGTrFxuLgwqTI4uljgQqg==" /> (Although, any hidden type input tag will flag this rule, when it shouldn't.) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=390963&aid=1901689&group_id=27659 ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

Next Message by Date: click to view message preview

[ tidy-Bugs-1820496 ] button inside label throws a warning

Bugs item #1820496, was opened at 2007-10-26 00:42 Message generated for change (Comment added) made by nobody You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=390963&aid=1820496&group_id=27659 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: HTML/XHTML Parser Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: button inside label throws a warning Initial Comment: ---snip--- <label for="d_id" accesskey="d">Sen<span style="text-decoration:underline;"><button id="d_id" name="d" value="send" type="submit" title="Click to send your message">d</span></button></label> ---snip--- throws a warning - placing label inside button fixes this, but w3c validator throws error... dave@xxxxxxxxxx ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2008-02-28 08:20 Message: Logged In: NO Am I blind of this is invalid html ? You open span then button then close span inside button... How is that possible to post such weird code and request anything ?! But warning really happens if wrap button into ANY inline tag like SPAN or A or any other. Next html will generate 3 warnings: <p><span><button type="button">Test</button></span></p> - Warning: inserting implicit <span> - Warning: replacing unexpected button by </button> - Warning: missing </button> I terribly hate this bug... ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=390963&aid=1820496&group_id=27659 ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

Previous Message by Thread: click to view message preview

[ tidy-Bugs-1901689 ] 12.4.1.2 associate labels...flagging hidden tags erroneously

Bugs item #1901689, was opened at 2008-02-25 12:29 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=390963&aid=1901689&group_id=27659 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Accessibility Group: Current - all platforms Status: Open Resolution: None Priority: 5 Private: No Submitted By: Mufasa (mufasa) Assigned to: Nobody/Anonymous (nobody) Summary: 12.4.1.2 associate labels...flagging hidden tags erroneously Initial Comment: Hidden (and a few other) input fields are getting flagged under Accessibility Level 2 rule 12.4.1.2 ("associate labels explicitly with form controls") when they should not be. See http://www.w3.org/TR/2007/WD-WCAG20-TECHS-20071211/H44.html#H44-description for more info. The important parts are: <quote> Note 1: Elements that use explicitly associated labels are: * input type="text" * input type="checkbox" * input type="radio" * input type="file" * input type="password" * textarea * select Note 2: The label element is not used for the following because labels for these elements are provided via the value attribute (for Submit and Reset buttons), the alt attribute (for image buttons), or element content itself (button). * Submit and Reset buttons (input type="submit" or input type="reset") * Image buttons (input type="image") * Hidden input fields (input type="hidden") * Script buttons (button elements or <input type="button">) </quote> (I have not tested if the other controls in this list are incorrectly flagged or not.) This is especially noticeable for any developers that use frameworks, such as ASP.NET, that use hidden fields to track view state on every page. Example: <input type="hidden" name="ctl1000" id="__VIEWSTATE" value="/wEPDwdfczNTUyMzEyZGTrFxuLgwqTI4uljgQqg==" /> (Although, any hidden type input tag will flag this rule, when it shouldn't.) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=390963&aid=1901689&group_id=27659 ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

Next Message by Thread: click to view message preview

[ tidy-Bugs-1820496 ] button inside label throws a warning

Bugs item #1820496, was opened at 2007-10-26 00:42 Message generated for change (Comment added) made by nobody You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=390963&aid=1820496&group_id=27659 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: HTML/XHTML Parser Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: button inside label throws a warning Initial Comment: ---snip--- <label for="d_id" accesskey="d">Sen<span style="text-decoration:underline;"><button id="d_id" name="d" value="send" type="submit" title="Click to send your message">d</span></button></label> ---snip--- throws a warning - placing label inside button fixes this, but w3c validator throws error... dave@xxxxxxxxxx ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2008-02-28 08:20 Message: Logged In: NO Am I blind of this is invalid html ? You open span then button then close span inside button... How is that possible to post such weird code and request anything ?! But warning really happens if wrap button into ANY inline tag like SPAN or A or any other. Next html will generate 3 warnings: <p><span><button type="button">Test</button></span></p> - Warning: inserting implicit <span> - Warning: replacing unexpected button by </button> - Warning: missing </button> I terribly hate this bug... ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=390963&aid=1820496&group_id=27659 ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
Loading Comments...
Home | News | Patents | Sitemap | FAQ | advertise

Advertising by