|
[ tidy-Bugs-1161797 ] --word-2000 always outputs numeric entities: msg#00063web.html-tidy.tracker
Bugs item #1161797, was opened at 2005-03-12 04:04 Message generated for change (Comment added) made by hoehrmann You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=390963&aid=1161797&group_id=27659 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Harriet Bazley (harriet) Assigned to: Nobody/Anonymous (nobody) Summary: --word-2000 always outputs numeric entities Initial Comment: Version reports itself as "HTML Tidy for RISC OS released on 1st December 2004" - no discernible version number.... The --word-2000 option seems to override the --numeric-entities option; even with an explicit "--numeric-entities no" in the command line, ASCII characters with the top bit set (specifically, the Windows 'smart' quotes present in just about every Microsoft Word document, which look dreadful in a non-Windows character set) are translated as … etc, rather than the relevant named entities. This means I can *either* strip out the Word-generated rubbish *or* use named entities, but not both :-( ---------------------------------------------------------------------- >Comment By: Björn Höhrmann (hoehrmann) Date: 2005-08-18 22:34 Message: Logged In: YES user_id=188003 Tidy actually uses numeric references here for portability, in particular if the output is XHTML there is no other option, it's not allowed to use named entities without also having a DTD. The situation for HTML is basically the same but browsers care less. I'm not sure there is anything we can do about this. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-04-09 23:17 Message: Logged In: NO simpify ---------------------------------------------------------------------- Comment By: Harriet Bazley (harriet) Date: 2005-03-13 23:34 Message: Logged In: YES user_id=208570 I had a nasty suspicion the behaviour was by design; however, since I need to use named entities for portability, this makes the translation rather less than useful :-( ---------------------------------------------------------------------- Comment By: Björn Höhrmann (hoehrmann) Date: 2005-03-13 04:17 Message: Logged In: YES user_id=188003 Tidy does not output a document type declaration by default if there are proprietary elmements and/or attributes in the document (and thus no document type declaration could be applicable to the document), so no doctype and so no named entity references is by design. Whether --doctype loose should have different behavior I do not know really... ---------------------------------------------------------------------- Comment By: Harriet Bazley (harriet) Date: 2005-03-13 03:40 Message: Logged In: YES user_id=208570 If I specify an explicit "--doctype loose" Tidy does output a document type declaration. However, it still uses numbered entities (see attached output). ---------------------------------------------------------------------- Comment By: Harriet Bazley (harriet) Date: 2005-03-12 21:11 Message: Logged In: YES user_id=208570 In the nature of things such documents tend to be enormous; however, I've snipped one down to a single representative paragraph (plus screeds of MS-specific header) and attached it. The output (from '*Tidy --word-2000 yes --numeric-entities no test/html') *doesn't* include a document type declaration, despite the fact that the first warning generated is "missing <!DOCTYPE> declaration". I've tried specifying a '--doctype auto' parameter, but this doesn't have any effect. I'm surprised, since in my previous experience Tidy *does* insert a doctype where this is missing.... ---------------------------------------------------------------------- Comment By: Björn Höhrmann (hoehrmann) Date: 2005-03-12 05:56 Message: Logged In: YES user_id=188003 It would help if you attach a simple test case. HTML Tidy will only output named entity references if it outputs a document type declaration, as you'd otherwise get references to undefined entities which would confuse both XML and SGML processors. So, unless the output includes a document type declaration this is not a bug. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=390963&aid=1161797&group_id=27659 ------------------------------------------------------- SF.Net email is Sponsored by the Better Software Conference & EXPO September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf |
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | [ tidy-Feature Requests-1238794 ] CFG: option to keep spaces in CDATA values: 00063, SourceForge.net |
|---|---|
| Next by Date: | [ tidy-Feature Requests-1067539 ] quote-ampersand flag being ignored (xml mode): 00063, SourceForge.net |
| Previous by Thread: | [ tidy-Feature Requests-1238794 ] CFG: option to keep spaces in CDATA valuesi: 00063, SourceForge.net |
| Next by Thread: | [ tidy-Feature Requests-1067539 ] quote-ampersand flag being ignored (xml mode): 00063, SourceForge.net |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
| News | FAQ | advertise |