logo       

Re: xnote woes: msg#00067

Subject: Re: xnote woes
Diwaker Gupta wrote:

Another issue with the line breaks is that <content> is not able to
treat markup properly if they come after a new line. For instance, the
above is displayed on getnote as:
Note ID:1 (Sat Oct 23 2004)
Key: dijkstra-the
The structure of the THE multiprogramming system
          \n [followed by the rest of the content if any]

It just eats the markup and shows spaces instead.

The reason refdb is "eating" your markup is that your markup is invalid. Allow me to explain using a stripped-down version of one of your examples:

.................................................................

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE xnoteset PUBLIC "-//Markus Hoenicka//DTD Xnote V1.1//EN" "http://refdb.sourceforge.net/dtd/xnote-1.1/xnote.dtd";>
<xnoteset>
   <xnote>
       <title>The structure of the THE multiprogramming system</title>
       <content>
           <pre>
               Q. What problem does the paper address?
               A. The design and implementation of a multiprogramming system 
with an
               emphasis on provable correctness, and a hierarchical architecture
               which allows for systematic testing.
           </pre>
       </content>
   </xnote>
</xnoteset>
.................................................................

If you attempt to validate this document against the xnote dtd it will fail as 
the 'content' element does not contain a 'pre' child element.

The above example gives the following xhtml output:
.................................................................
$ refdbc -C getnote -t xhtml ":NID:>0"
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 
"http://www.w3.org/TR/xhtml1/DTD/xhtml1
-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml";><head><meta http-equiv="Content-Type" 
content="text/html; c
harset=UTF-8"/>
<title>refdb reference list</title><meta name="generator" content="refdb 0.9.5-pre5" 
id="generator" />
<link rel="stylesheet" type="text/css" href="/usr/local/share/refdb/css/refdb.css" 
/>
</head>
<body>
<h1 class='h1'>refdb reference list</h1>
<div class="record">
<h2 class='ID'>Note ID:1</h2>
<p class='date'>Date: Sun Oct 24 2004</p>
<p class='citekey'>Key: david2004</p>
<p class='title'>The structure of the THE multiprogramming system</p>
<p class='note'>\n                      \n        </p>
</div>
</body>
</html>
1 note(s) retrieved
$
.................................................................

[For following examples I will show only content element from the source and the '<p 
class="note">' fragment from the output.]

To demonstrate that it is the 'pre' element causing the problem, here is the 
same example without it:

Source:
.................................................................
<content>
        Q. What problem does the paper address?
        A. The design and implementation of a multiprogramming system with an
        emphasis on provable correctness, and a hierarchical architecture
        which allows for systematic testing.
</content>
.................................................................

xhtml output:
.................................................................
<p class='note'>\n                              Q. What problem does the paper 
address?\n            A. The design and implementation of a multiprogramming system with 
an\n                          emphasis on provable correctness, and a hierarchical 
architecture\n                           which allows for systematic testing.\n        
</p>
.................................................................

There are two things of interest to note.  First, the spaces are still there.  This is 
because they are present in the content element.  This is a very important point to 
understand.  Anything between '<content>' and '</content>' is part of the 
content element's content.  This includes the spaces which are faithfully rendered just as 
they appeared in the input and _this_is_the_correct_behaviour_.  If you do not want the 
spaces, remove them from the original by starting your content lines flush left against the 
left margin.

Secondly, all lines of the content are present in the output.  From this we 
know that the 'pre' element tags caused the loss of data shown in the first 
example above.  Since you have supplied refdb with invalid xml, the results 
will be unpredictable.  In this case refdb's xml parser fails silently.  One 
can argue refdb should be providing you with an error message, or at least a 
warning that content has been truncated, but you have, nonetheless, supplied an 
invalid input file.

As I understand it, the content of an element is assumed to be either character 
data (i.e., the element's value or content) or markup (i.e., child elements).  
When the file is parsed all markup is interpreted at that time as markup.  What 
you appear to want is for certain markup to be ignored when it is parsed for 
input to the refdb database, so it can be output as 'raw' xhtml in the output.  
I do not think this is a reasonable expectation.

I also do not think it is possible.  You might think to use a "cdata" block (see 
<http://www.w3.org/TR/2000/REC-xml-20001006#sec-cdata-sect>).  Unfortunately, it is 
assumed that markup within cdata blocks is preserved _for_display_.  As a result, left and 
right angle brackets are entitized in xhtml output.  For example, the following content:
.................................................................
                <content><![CDATA[<pre>
        Q. What problem does the paper address?
        A. The design and implementation of a multiprogramming system with an
        emphasis on provable correctness, and a hierarchical architecture
        which allows for systematic testing.
</pre>]]></content>
.................................................................

is rendered into xhtml output as:
.................................................................
<p class='note'>&lt;pre&gt;\n   Q. What problem does the paper address?\n       
A. The design and impl
ementation of a multiprogramming system with an\n       emphasis on provable 
correctness, and a hierar
chical architecture\n   which allows for systematic testing.\n&lt;/pre&gt;</p>
.................................................................

The opening and closing 'pre' tags have been converted into entities, i.e., '<pre>' has 
been converted into '&lt;pre&gt;'.  Note that this is not peculiar to refdb.  If you 
create a standard docbook document with a cdata block containing markup and use xsltproc to 
generate xhtml output you will find it converts the brackets to entities so as to display the 
tags as character data instead of treating them as markup.

Others may know of a way to preserve markup _as_markup_ within character data, 
but I do not.

Regards,
David.



-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

Recently Viewed:
web.pylons.gene...    hurd.l4/2002-10...    kernel.commits....    user-groups.lin...    yellowdog.gener...    java.drools.use...    security.openva...    package-managem...    linux.debian.us...    qnx.openqnx.dev...    genealogy.gramp...    file-systems.if...    voip.wengophone...    tex.context/200...    ietf.smime/2003...    audio.csound.de...    culture.region....    xfree86.devel/2...    mobile.kannel.u...    distributed.con...    education.engli...    org.user-groups...    bug-tracking.gn...    recreation.bicy...   
Home | blog view | USPTO Patent Archive | advertise | OSDir is an inevitable website. super tiny logo

Free Magazines

Cisco News
Receive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business.
subscribe

Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field.
subscribe

The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business.
subscribe

Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company.
subscribe

Total Telecom Total Telecom is "The Economist of the communications industry".
subscribe