logo       

high value unicode characters: msg#00018

Subject: high value unicode characters
Hello,

We're using Xerces SAX2Print, version 2.5.0 (xerces-c_2_5_0-solaris_27-cc_62) and have run into a problem with a few "high value" unicode characters. What we would like to do is validate the file and convert it to UTF-8. The SAX2Print process completes with no error but there appears to be some strange characters after the high value unicode characters (𝖢, 𝖧 and 𝒫) in the output.

    The command is: # SAX2Print -v=always -x=UTF-8 test1.xml

The error that I get using SAX2Print on the output XML file is:

    Fatal Error at file test1-out.xml, line 5, char 35
      Message: Got an unexpected trailing surrogate character


Any idea what is going wrong here?

Thanks in advance,
josh


=========================
<?xml version="1.0"?>
<!DOCTYPE test SYSTEM "test.dtd">
<test>
        <testPara>
<head>1. high value Unicode characters and some punctuation as entities</head> <p>Assuming &#x1D5A2;&#x1D5A7;, Hindman [ht1] showed that the existence of certain ultrafilters on the power set of the natural numbers is equivalent to Hindman&#x2019;s Theorem. Adapting this work to a countable setting formalized in RCA<sub>0</sub>, this article proves the equivalence of the existence of certain ultrafilters on countable Boolean algebras and an iterated form of Hindman&#x2019;s Theorem, which is closely related to Milliken&#x2019;s Theorem.</p>
        </testPara>
        <testPara>
<head>2. high value Unicode char and some Greek as entities</head> <p>This article is a continuation of our search for tautologies that are hard even for strong propositional proof systems like EF, cf. [Kra-wphp,Kra-tau]. The particular tautologies we study, the &#x03C4;-formulas, are obtained from any &#x1D4AB;/poly map g; they express that a string is outside of the range of g. Maps g considered here are particular pseudorandom generators. The ultimate goal is to deduce the hardness of the &#x03C4;-formulas for at least EF from some general, plausible computational hardness hypothesis.</p>
        </testPara>
</test>
=========================
<!ELEMENT test (testPara+) >
<!ELEMENT testPara (head, p) >
<!ELEMENT head (#PCDATA) >
<!ELEMENT p (#PCDATA | b | i | sub)* >
<!ELEMENT b (#PCDATA) >
<!ELEMENT i (#PCDATA) >
<!ELEMENT sub (#PCDATA) >
=========================


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

Recently Viewed:
boot-loaders.gr...    php.pear.genera...    debugging.valgr...    kde.redhat.user...    text.xml.xsl.ge...    culture.languag...    hardware.microc...    java.servicemix...    redhat.release....    web.zope.plone....    user-groups.lin...    opendarwin.webk...    video.mjpeg.use...    sysutils.bcfg2....    encryption.gpg....    lx-office.devel...    xfree86.forum/2...    mail.mutt.devel...    acpi.devel/2003...    qnx.openqnx.dev...    network.irc.irs...    freebsd.devel.m...   
Home | blog view | USPTO Patent Archive | advertise | OSDir is an inevitable website. super tiny logo

Free Magazines

Cisco News
Receive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business.
subscribe

Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field.
subscribe

The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business.
subscribe

Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company.
subscribe

Total Telecom Total Telecom is "The Economist of the communications industry".
subscribe