|
[ tidy-Support Requests-691011 ] Cannot tidy Word HTML (if you can call it : msg#00027web.html-tidy.tracker
Support Requests item #691011, was opened at 2003-02-21 18:51 Message generated for change (Comment added) made by nobody You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=390964&aid=691011&group_id=27659 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Usage Problem Group: Current - all platforms Status: Open Resolution: None Priority: 5 Submitted By: Ben Noblet (bnoblet) Assigned to: Nobody/Anonymous (nobody) Summary: Cannot tidy Word HTML (if you can call it that) Initial Comment: Thanks guys for a great job ... TidyLib is great and it usually does exactly what we need it to. Unfortunately it seems MS-Word is producing even more hokey HTML than ever before. I have a document pasted from Word which contains <? xml> directives and smarttags (<st1:xx> etc) and all manner of other rubbish. Very little or none of the text seem to end up in the tidied document with any combination of options I can configure. I use TidyATL (COM) but have been testing this using the TidyUI as well. Am I doing something wrong? Is this document just too bad to be tidied? Thanks Ben ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-08-10 03:59 Message: Logged In: NO >From Slashdot - "Yes, Office 2000 has the above tool- and Office 2002 or 2003 has it on the Save As menu. The option you want is "Web Page (filtered)|*.html". I saw an interview once with somebody on the Word development team, and he claimed that the original Save As HTML was built for passing Word Documents over the web- and never meant to be read by human beings as a web page at all. Web Page (filtered) cuts out all the extra shyte that Save As HTML used to put in for managing version controled updates and changing the font every bloody character- and builds a real web page." http://ask.slashdot.org/askslashdot/05/08/09/170209.shtml? tid=215&tid=95&tid=4 ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2004-05-12 03:36 Message: Logged In: NO i believe this is related to tidy's lack of namespace support: http://sourceforge.net/tracker/index.php?func=detail&aid=743952&group_id=27659&atid=390966 the dodgywordhtml.htm included is just a table, and is missing the namespace declaration which you can see on this bug report: http://lists.w3.org/Archives/Public/html-tidy/2003OctDec/0053.html hope that helps, - p ---------------------------------------------------------------------- Comment By: Nick Morley (nmorley) Date: 2003-03-11 08:44 Message: Logged In: YES user_id=731422 Ben, I am having exactly the same issue a you. It appears that word XP as opposed to the older 2000 version is the culprit. It adds loads more tags that HTML tidy falls over on. The <st:xx> ones you mention as well as <o;p> and loads more. I am trying to output my file as XHTML and I have tried every combination of options to achieve this to no avail? Anyone any wiser? ---------------------------------------------------------------------- Comment By: Terry Teague (terry_teague) Date: 2003-02-21 19:58 Message: Logged In: YES user_id=225318 What are the options you are trying? Do you want HTML output, XHTML output or XML output? Do you have any control over how the MS Word document is converted to HTML? I tried a few things on your sample input, that would produce output (such as using "--force-output yes" or declaring new tags that matched the smarttags), but I couldn't get rid of the <?xml> directive(s). I'm not the MS-Word HTML guru. Maybe someone else will answer. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=390964&aid=691011&group_id=27659 ------------------------------------------------------- SF.Net email is Sponsored by the Better Software Conference & EXPO September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf |
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | [ tidy-Feature Requests-1255633 ] no error message at missing </p>-tag: 00027, SourceForge.net |
|---|---|
| Next by Date: | [ tidy-Bugs-1256372 ] numeric-entities always has same effect in config file: 00027, SourceForge.net |
| Previous by Thread: | [ tidy-Feature Requests-1255633 ] no error message at missing </p>-tagi: 00027, SourceForge.net |
| Next by Thread: | [ tidy-Bugs-1256372 ] numeric-entities always has same effect in config file: 00027, SourceForge.net |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
| News | FAQ | advertise |