Marcel,
Marcel Ferrante wrote:
- As DMOZ, in the heart of project I don't suggest a big xtm file for
my system makes the queries. I had been downloaded the dmoz file and
it is more than 1 GB...Beeing pratical, I suggest implemet a ER model
of topic maps concept, that is diferent of XTM as you said.
- So, the next question is: why I'm using the XTM after all ?
- To interchange the data. If some one want makes his
classification of-line or to make available the service to another
applications. This point of view see the web services in the next
moment.
Yes, exactly! There is no point in maintaining an ontology (any kind of
data, actually) as an XTM document, propably not even as a topic map in
a topic map engine[1]. The issue is to make data available *as* a
topic map (looking at the data through topic map eyes). I realized this
after converting some thesauri into XTM...it felt so useless, given that
the thesaurus was already stored in a suitable format.
The key (IMHO) is to use XTM as the message mime type (assuming we'll
have application/xtm+xml at some point in time) for HTTP based
interactions with data providers (services/stores) such as DMOZ.
Why don't you, for experimental purposes, write a CGI that mimiks XTM
based communication with www.dmoz.org, by scraping DMOZ's HTML and
turning it into XTM. I did that once for Google's link: feature - it's
fun and very educating.
Jan
[1] For highly demensional data it does make sense, but usually the
domain that the data is about is in itself constraining enough to
justify storage in a relational database.
"since creating an ontology for life, the universe, and everything is
quite a challenge."
- Let's start with simplicity. The focus is organize the URLs in the beginning.
- The objective at first is fill the lack of DMOZ. For me this
project stopped in the time. It is the same thing, same procedure for
the user since three years ago. Points to attack:
- The structure of DMOZ is confuse the concepts. In the same
taxonomy we could find agregation, specialization, localization, etc.
- They use a poor faceted classification. The resource (URL)
appers in the many topics but it's and the topic? Should allows this
too.
> So their struture shall be divided the faceted categories, like
is present in project like flamenco or facetmap. To divide we can use
a good web thesaurus (eg eurovoc).
> And the principal: The user must have the possibility to
classify the URLs and topics using the mapic topics concetps. It maybe
has a wizard to trainne the user to do this.
- The navigation show only one hierchical level. So, to goes to a
extremity the use have to wait the page refresh a five or six times.
Very, very boring !!
> See www.knowledgeprocessors.com
- The search in the directory (by google) show the URL's in the topics.
> I want produce a filter or reflection in the structure. That's
a navegation combined with the search like flamenco
(http://bailando.sims.berkeley.edu/flamenco-interface.html)
- Do a prototipe to feel the reactions.
- In the begging I'm thinking just use mysql that is free, but we
can use oracle if the project increase it's dimension.
- For the future the project we can thing:
- Construct a client software for the user do it's classification
with more agility or off line.
- Retrieve the best URL classitication done. The favorities or
bookmarks of the users.
- Don't limited the topics maps crawler in the DMOZ project, the
Wikipedia is the next victim (and I see google in the last battle,
with Bill don't arrive before..)
To finalize: "as well as man-hours and sheer know-how"
I'm talking from Brasil, thank you for attention, I became very
suprise when the answer arrive from a name that I took from the thesis
that I read, pardon me for my english, and you can divide your costs
by 5 if the project here. I'm serious, this is only a fact.
|