logo       

Re: Merging of Distributed Topic Maps based on the Subject Identity Measure: msg#00022

Subject: Re: Merging of Distributed Topic Maps based on the Subject Identity Measure (SIM) Approach
Hi Lutz,

* maicher-jNDFPZUTrfTw9Zu3TmXbXSJk02hg1TJes0AfqQuZ5sE@xxxxxxxxxxxxxxxx
| 
| We are still interested in a vital discussion about the problem we
| want to solve. We don't share the optimism that PSI repositories
| will be widely adopted in distributed and heterogeneous
| communities. Therefore we introduced our SIM approach. The SIM is a
| measure, which determines how closely related the Subject of two
| Topics might be. This decision is made automatically and only based
| on the Content provided by the regarding Topic (Maps). For more
| details, we suggest a closer look to our papers.

I liked your approach quite a lot, and thought it very interesting.
I'd really like to try it out and see to what extent it actually works
on real data. (There are some topic maps in the Omnigator that have
overlapping subjects without having the same PSIs. We could also try
it on the XML conference papers topic map, instead of the existing
ad-hoc heuristics used for merging there.)

One thing I found strange was that you defined your own measure for
string distance instead of reusing existing measures such as
Levenshtein distance. Why was that?

Also, I'm uncertain about the URI similarity measure. If two URIs are
nearly the same, what does that tell you? It's unlikely to be because
the author mistyped the URI, because mistyped URIs don't work at all.
And if they both work, URI equivalence rules will reveal this in some
cases (which your measure does not take into account). Finally, if you
consider the subject identifiers

  (1) http://psi.example.org/something/#european-union
  (2) http://psi.example.org/something/#african-union
  (3) http://psi.noe.no/other/#european-union

then (1) and (3) are much more likely to identify the same subject
than (1) and (2) are.

Another consideration is that I think types are extremely important.
If the names are the same but the types are disjoint (person and
place, say) then you can safely ignore the names. You might even want
to make the algorithm consider typing topics first, and only
afterwards go after the instances.

Not sure if this is helpful, but it may be worth considering, if you
haven't already.

-- 
Lars Marius Garshol, Ontopian         <URL: http://www.ontopia.net >
GSM: +47 98 21 55 50                  <URL: http://www.garshol.priv.no >


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

Recently Viewed:
linux.arklinux....    user-groups.lin...    kde.usability/2...    ietf.ipp/2002-0...    mail.spam.spamc...    os.netbsd.devel...    audio.cd-record...    text.unicode.de...    php.documentati...    games.fps.halfl...    window-managers...    suse.oracle.gen...    bug-tracking.gn...    video.dvdrip.us...    xfree86.cvs/200...    java.netbeans.m...    network.argus/2...    culture.sf.kill...    debian.ports.al...    freebsd.questio...    qplus.devel/200...    handhelds.palm....   
Home | blog view | USPTO Patent Archive | advertise | OSDir is an inevitable website. super tiny logo

Free Magazines

Cisco News
Receive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business.
subscribe

Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field.
subscribe

The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business.
subscribe

Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company.
subscribe

Total Telecom Total Telecom is "The Economist of the communications industry".
subscribe