logo       

[ tidy-Support Requests-691011 ] Cannot tidy Word HTML (if you can call it : msg#00027

web.html-tidy.tracker

Subject: [ tidy-Support Requests-691011 ] Cannot tidy Word HTML (if you can call it that)

Support Requests item #691011, was opened at 2003-02-21 18:51
Message generated for change (Comment added) made by nobody
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=390964&aid=691011&group_id=27659

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Usage Problem
Group: Current - all platforms
Status: Open
Resolution: None
Priority: 5
Submitted By: Ben Noblet (bnoblet)
Assigned to: Nobody/Anonymous (nobody)
Summary: Cannot tidy Word HTML (if you can call it that)

Initial Comment:
Thanks guys for a great job ... TidyLib is great and it
usually does exactly what we need it to.

Unfortunately it seems MS-Word is producing even
more hokey HTML than ever before.

I have a document pasted from Word which contains <?
xml> directives and smarttags (<st1:xx> etc) and all
manner of other rubbish.

Very little or none of the text seem to end up in the
tidied document with any combination of options I can
configure. I use TidyATL (COM) but have been testing
this using the TidyUI as well.

Am I doing something wrong? Is this document just too
bad to be tidied?

Thanks
Ben


----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-08-10 03:59

Message:
Logged In: NO

>From Slashdot -

"Yes, Office 2000 has the above tool- and Office 2002 or 2003
has it on the Save As menu. The option you want is "Web
Page (filtered)|*.html". I saw an interview once with somebody
on the Word development team, and he claimed that the
original Save As HTML was built for passing Word Documents
over the web- and never meant to be read by human beings as
a web page at all. Web Page (filtered) cuts out all the extra
shyte that Save As HTML used to put in for managing version
controled updates and changing the font every bloody
character- and builds a real web page."

http://ask.slashdot.org/askslashdot/05/08/09/170209.shtml?
tid=215&tid=95&tid=4

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2004-05-12 03:36

Message:
Logged In: NO

i believe this is related to tidy's lack of namespace support:

http://sourceforge.net/tracker/index.php?func=detail&aid=743952&group_id=27659&atid=390966

the dodgywordhtml.htm included is just a table, and is
missing the namespace declaration which you can see on this
bug report:

http://lists.w3.org/Archives/Public/html-tidy/2003OctDec/0053.html

hope that helps,

- p


----------------------------------------------------------------------

Comment By: Nick Morley (nmorley)
Date: 2003-03-11 08:44

Message:
Logged In: YES
user_id=731422

Ben,

I am having exactly the same issue a you. It appears that
word XP as opposed to the older 2000 version is the culprit.
It adds loads more tags that HTML tidy falls over on.

The <st:xx> ones you mention as well as <o;p> and loads
more.

I am trying to output my file as XHTML and I have tried every
combination of options to achieve this to no avail?

Anyone any wiser?

----------------------------------------------------------------------

Comment By: Terry Teague (terry_teague)
Date: 2003-02-21 19:58

Message:
Logged In: YES
user_id=225318

What are the options you are trying? Do you want HTML output, XHTML
output or XML output? Do you have any control over how the MS Word
document is converted to HTML?

I tried a few things on your sample input, that would produce output
(such as using "--force-output yes" or declaring new tags that
matched
the smarttags), but I couldn't get rid of the <?xml> directive(s).

I'm not the MS-Word HTML guru. Maybe someone else will answer.

----------------------------------------------------------------------

You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=390964&aid=691011&group_id=27659


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise