logo       

Re: OSM Schema Design: msg#00088

Subject: Re: OSM Schema Design
Immanuel Scholz wrote:
Hi,

I have seen that. It's a good start and it's good to be able to validate
the XML data generated by the server with the XML Schema.

Currently there does not exist an XML Schema in form of a schema file.
I referred to this schema file on the svn server:
        http://www.openstreetmap.org/svn/schema/osm.xsd

I guess above osn.xsd schema had been validated against the XML examples
you are referring below as 'verbal XML schema'.

Sadly this schema format is hardly readable for humans·.. :-)

Everytime I say "XML Schema" I mean the roughly verbal definition at the
wiki.

You refer to these pages?
        http://www.openstreetmap.org/wiki/index.php/XML_Schema
        http://www.openstreetmap.org/wiki/index.php/Talk:XML_Schema

This explains a lot.

Since we already have performance problems within the XML code of the
server, I would strongly prefer not to add an automatic XML validation
until some profiling measurement is set up to show that this is not an
impact to performance too (I don't trust Ruby's XML code anymore ;-)

I guess the XML Schema had been created to model the XML structure and
to validate some XML samples generated by the server in order to assure
it delivers valid XML.

I am uncertain, whether good validating parsers would be faster or slower than non-validating ones.

It is my understanding that in mid term OSM street map data needs to be
structured differently than today, as there are many things, which have
not been covered yet.

Maybe, maybe not. I believe the current data structure is powerful enough
to handle all needs an open STREET map could have.

I think the central point for everything should be a node. Data should
become structured in such a manner, that the computer understand what
the objects are and can present them in an  appropriate manner for the
current view ('usage scenario').

The way the data is currently structured, it is kept simple. What is good.
The current data structure seems to support drawing of a map, while other usage-scenarios such as navigation with text or voice output will require enhanced data structures. For example, here the computer must really be able to distinguish a roundabout from an plain circular street, as the a message like 'leave the roundabout at the second exit' is expected.

rivers,
Property "class=river" on a street could be a way to do this.
Looks quite wired to me. Property on a track would be better and street
should be property of a track.

lakes,
Property "class=lake" on an area.
Looks better.
But still there may be islands inside ;-)
Maybe areas will need have other areas inside.

bridges,
"class=bridge" on a line segment
OK.
But here seems to become enhanced information required. If the bridge X of street A has been build cross street B, then it would be useful to associate that information with the map data.

house numbers
"house_number=xx" on a node
But the node must be somehow part of a street or have a reference to a street. Otherwise (e.g. at a crossing) its impossible to tell, to which street it belongs. Additional further information must be provided, to tell on which street-side the house is located.

forests
"class=forest" on an area
OK. But also forests may have holes with non forest parts...
See above.
street types (e.g. motorways, country roads, city roads, bicycle roads)
OK.

one way streets
Direction must be defined relative to the street.

All these are properties on a street.

railways,
Property on a street
Looks quite wired to me.

A tram driving in the middle of a street would be a street in the middle of a street?

restrictions (max vehicle speed,
Either property on a street or on a line segment
Property of one or more a directed line segments. Restrictions depend on the driving direction.

max vehicle height,
Either property on a street or on a line segment
Property of one or more a directed line segments at a particular node?

max vehicle weight
Either property on a street or on a line segment
Property of one or more a directed line segments at a particular node?

roundabouts,
 > Property on a node for small roundabouts or on several line segments for

more complex ones. Maybe on a street which contain exactly all line
segments which participate on the roundabout.
Prefer property of a type 'roundabout' and of type street too. ;-)
Connected to several tracks of type street. It might be worth to
consider storing the angels at which the tracks arrive at the roundabout
in order to have better maps for navigation.

motorway drive-up,
Property on a line segment, node or street - depending how complex the
drive up is.
There are usually a lot of different motorway drive-up types. For navigation purposes it might become required, to assign the type in order to provide accurate direction.

country borders,
Although I disbelieve this should be in an streetmap database, if you want
to enter it, make it a property on a street surrounding the country.
Looks rather like a huge type of area me.

information to support routing,
Property on the object you want to give hint for.
Something that might be used to calculate the duration (time, length) of a travel between two points and is used by some kind of least cost
routing algorithm to calculate the fastest or shortest path.

railway stations, etc.
Property on a node.
See house number.

I hope you see my point. I strongly disagree to make the data structure
more complex than necessary if not given a good argument.

Maybe there will be reasons for not expressing something as properties but
include it into the data structure. Please argue why you think a change of
the data structure is necessary for the examples above.

I fully agree with you that data structure should be kept simple. Also it is likely better to think twice before not well understood data elements become introduced.

I see 2 main things, why a complexer data structure might become worth to be considered:

Reason 1. To reduce data volume.
E.g. a roundabout needs just a center, a radius. If the roundabout becomes enhanced with exit IDs defined by degrees e.g. from north, each exit ID would replace an additional node.

Other examples could be modeling streets by using curves instead line segments.


Reason 2. To support other use cases than drawing a map.
E.g.
- for navigation is additional direction information (text, voice) needed
- to enhance searching / indexing capabilities (like show me all roundabouts, etc.) - to calculate routes (e.g. fastest, shortest, fastest with vehicle height 4 m)


> Wiki pages ..
Maybe helping there is what you want ?

Yes. Above discussion should be shifted to the Wiki pages.

I see room for simplifying the API. For example I don't see the need to
get single objects by id, when they already come fully described out of
the map - request.

Steve is currently simplifying the database and I am sure has some ideas
for changes to 0.3 API too.. ;-)

I think the "code after demand" approach is far better here.
No, not the waterfall model - it never works.
I propose rather to capture the causal relationships between data structure and possible usage scenarios as early as possible.

The better these relations are commonly understood, the more intentional decisions are made, the better they get.



but it is the current plan to
test implement a CSV output of the object schema (all XML
stuff replaced
with a simple CSV), because the server spent most of the time encoding
the XML.
Forget CSV. This will make en-/decoding very had and error prone. Despite you will severely fail, when the data structure should become more complex as today. Thus it will even become a risk in the future.

However, even the XML encoding is too slow.
I'd guess then there is a problem in the XML implementation, either in the way then data is stored or in the way the stored data is serialized.

Compare speed to using 'print' command.

By the the way: unlike XML Schemes, which not only define the data
structure, but also how the data is encoded, ASN.1 schemes just define
how the data is structured and keep the data encoding separated to an
appropriate encoder.

I think XML was not chosen because XML Schemes has to be used. XML was
chosen because it is the first and simplest idea that worked. Evidence to
this is, that no XML scheme validation is present anywhere in the code
now.

If you know ASN.1 well  and if you point some ruby coders to ASN.1
libraries and define a ASN.1 scheme on how data are transfered

Libraries with open source license are currently only available in C. I expect the effort build support for another language as high.

Here is an excellent open source ASN.1 compiler:
http://lionet.info/asn1c/

A designing an OSM ASN.1 schema would be no problem for me.

If http: is capable to transfer binary data the existing BER encoder could be used. Otherwise a new ASN.1 encoder type would have to be developed too.

and if you convince that coder that ASN.1 is better than, say, CSV or XML, then 
maybe
it get implemented and used as transport mechanism.

I guess, that in encoding data the benefit of ASN.1 compared to XML is not so dramatic (factor 2), if you have an efficient XML parser, while in decoding the speed may be significant (factor 20). These numbers apply to using the ASN.1 BER encoder.

Here are some ASN.1-XML encoding speed comparisons:
        http://www.obj-sys.com/docs/ASN1forBinXML.pdf

These are from one ASN.1 tool vendor, but I have seen similar documents from other sources.

I never looked at ASN.1 more than I was forced to during study.
> To me it looks weird, bloated and complex.

It takes some time to get used to the syntax. Originally it had been designed to specify protocols. later on it was enhanced to encoding of protocols.
It's good in:
- protocol specification
- design of backward compatible protocols
- efficient encoding (information per byte), as needed for low data rate bearers - low memory consumption and processor requirements as available for embedded system

> I prefer simple solutions.
I'd recommend to optimize XML first. Going to ASN.1 for encoding would only be useful, if there is a significant benefit from reduced message size or increased decoding speed.

Using an ASN.1 schema instead of an XML DTD to specify the data structure might still be an option worth to check.

Br,
Michael


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

Recently Viewed:
audio.irate.dev...    yellowdog.gener...    ietf.ips/2002-0...    xfree86.fonts/2...    busybox/2003-07...    emacs.jdee/2004...    linux.mandrake....    hardware.microc...    user-groups.lin...    science.analysi...    version-control...    db.filemaker.de...    cluster.openmos...    mail.eyebrowse....    text.xml.xerces...    kde.devel.kwrit...    finance.moneyda...    gcc.regression/...    network.routing...    os.freebsd.deve...    recreation.radi...    qnx.openqnx.dev...    python.xml/2002...   
Home | blog view | USPTO Patent Archive | advertise | OSDir is an inevitable website. super tiny logo

Free Magazines

Cisco News
Receive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business.
subscribe

Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field.
subscribe

The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business.
subscribe

Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company.
subscribe

Total Telecom Total Telecom is "The Economist of the communications industry".
subscribe