I’m new to XML parsing and am using elementtree to parse
a very simple xml file that is similar to the one below:
<groceries>
<category name=”fruit”>
<item name=”apple” number=”8”/>
<item name=”banana” number=”12”/>
</cagetory>
<category name=”frozen”>
<item name=”icecream” flavor=”chocolate”
number=”1”/>
<item name=”pizza” make=”tombstone”
type=”cheese” number=”3”/>
</category>
</groceries>
Basically I am just trying to parse out the values
and do something with them in my program. I’d like to define what the “correct”
syntax for this file is and have it checked automatically. For example, in
this simple example I’d like to check that only a single type of
subelement is allowed under <groceries>, namely <category> And
that a only a single type of sublement is allowed under <cagetory>,
namely <item>. Further, I’d like to check that <item> can
have the attributes ‘name’ and ‘number’, but no
others. Etc, etc.
Right now I am using elementtree to check this
within my program. IE, I traverse the tree and make sure the xml meets the
specifications. Based on my reading, it seems that I can use a DTD or scheme
to define the “grammar” of my xml, but I can’t figure out how
to actually set this up. It seems like I should just be able to specify my DTD
or Schema and then call ElementTree’s parse method and Elementtre should
just give me errors if the xml doesn’t conform to the rules. Maybe
Elementtree just doesn’t support this? I see some doc that says lxml
supports schema’s, will lxml do this for me?
What are other folks doing to deal with this? It
seems like such an obvious need, yet when I google I don’t find anything
pointing me to a simple solution.
Thanks!
Margie