Diwaker Gupta wrote:
Another issue with the line breaks is that <content> is not able to
treat markup properly if they come after a new line. For instance, the
above is displayed on getnote as:
Note ID:1 (Sat Oct 23 2004)
Key: dijkstra-the
The structure of the THE multiprogramming system
\n [followed by the rest of the content if any]
It just eats the markup and shows spaces instead.
The reason refdb is "eating" your markup is that your markup is
invalid. Allow me to explain using a stripped-down version of one of
your examples:
.................................................................
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE xnoteset
PUBLIC "-//Markus Hoenicka//DTD Xnote V1.1//EN"
"http://refdb.sourceforge.net/dtd/xnote-1.1/xnote.dtd">
<xnoteset>
<xnote>
<title>The structure of the THE multiprogramming system</title>
<content>
<pre>
Q. What problem does the paper address?
A. The design and implementation of a multiprogramming system
with an
emphasis on provable correctness, and a hierarchical architecture
which allows for systematic testing.
</pre>
</content>
</xnote>
</xnoteset>
.................................................................
If you attempt to validate this document against the xnote dtd it will fail as
the 'content' element does not contain a 'pre' child element.
The above example gives the following xhtml output:
.................................................................
$ refdbc -C getnote -t xhtml ":NID:>0"
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1
-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type"
content="text/html; c
harset=UTF-8"/>
<title>refdb reference list</title><meta name="generator" content="refdb 0.9.5-pre5"
id="generator" />
<link rel="stylesheet" type="text/css" href="/usr/local/share/refdb/css/refdb.css"
/>
</head>
<body>
<h1 class='h1'>refdb reference list</h1>
<div class="record">
<h2 class='ID'>Note ID:1</h2>
<p class='date'>Date: Sun Oct 24 2004</p>
<p class='citekey'>Key: david2004</p>
<p class='title'>The structure of the THE multiprogramming system</p>
<p class='note'>\n \n </p>
</div>
</body>
</html>
1 note(s) retrieved
$
.................................................................
[For following examples I will show only content element from the source and the '<p
class="note">' fragment from the output.]
To demonstrate that it is the 'pre' element causing the problem, here is the
same example without it:
Source:
.................................................................
<content>
Q. What problem does the paper address?
A. The design and implementation of a multiprogramming system with an
emphasis on provable correctness, and a hierarchical architecture
which allows for systematic testing.
</content>
.................................................................
xhtml output:
.................................................................
<p class='note'>\n Q. What problem does the paper
address?\n A. The design and implementation of a multiprogramming system with
an\n emphasis on provable correctness, and a hierarchical
architecture\n which allows for systematic testing.\n
</p>
.................................................................
There are two things of interest to note. First, the spaces are still there. This is
because they are present in the content element. This is a very important point to
understand. Anything between '<content>' and '</content>' is part of the
content element's content. This includes the spaces which are faithfully rendered just as
they appeared in the input and _this_is_the_correct_behaviour_. If you do not want the
spaces, remove them from the original by starting your content lines flush left against the
left margin.
Secondly, all lines of the content are present in the output. From this we
know that the 'pre' element tags caused the loss of data shown in the first
example above. Since you have supplied refdb with invalid xml, the results
will be unpredictable. In this case refdb's xml parser fails silently. One
can argue refdb should be providing you with an error message, or at least a
warning that content has been truncated, but you have, nonetheless, supplied an
invalid input file.
As I understand it, the content of an element is assumed to be either character
data (i.e., the element's value or content) or markup (i.e., child elements).
When the file is parsed all markup is interpreted at that time as markup. What
you appear to want is for certain markup to be ignored when it is parsed for
input to the refdb database, so it can be output as 'raw' xhtml in the output.
I do not think this is a reasonable expectation.
I also do not think it is possible. You might think to use a "cdata" block (see
<http://www.w3.org/TR/2000/REC-xml-20001006#sec-cdata-sect>). Unfortunately, it is
assumed that markup within cdata blocks is preserved _for_display_. As a result, left and
right angle brackets are entitized in xhtml output. For example, the following content:
.................................................................
<content><![CDATA[<pre>
Q. What problem does the paper address?
A. The design and implementation of a multiprogramming system with an
emphasis on provable correctness, and a hierarchical architecture
which allows for systematic testing.
</pre>]]></content>
.................................................................
is rendered into xhtml output as:
.................................................................
<p class='note'><pre>\n Q. What problem does the paper address?\n
A. The design and impl
ementation of a multiprogramming system with an\n emphasis on provable
correctness, and a hierar
chical architecture\n which allows for systematic testing.\n</pre></p>
.................................................................
The opening and closing 'pre' tags have been converted into entities, i.e., '<pre>' has
been converted into '<pre>'. Note that this is not peculiar to refdb. If you
create a standard docbook document with a cdata block containing markup and use xsltproc to
generate xhtml output you will find it converts the brackets to entities so as to display the
tags as character data instead of treating them as markup.
Others may know of a way to preserve markup _as_markup_ within character data,
but I do not.
Regards,
David.
-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
|