Hi,
before I came to know about topic maps, I often thought about how to
build a system that could manage my personal knowledge and resources
(ranging from formally structured documents to documents in natural
language). During that time, I often asked myself how to represent the
relationships between terms in a thesaurus/dictionary, the
meta-information *about* resources and the resources themselves (which
could be either files on my hard disk or on the web or "offline"
physical entities like books in my shelf).
What I soon recognized was that a document like a scientific article
could exist more than once on the web or on my hard disk and that it
could exist in different formats (PDF, PS, HTML, whatever). It could
even exist several times in my shelf.
Thinking in relational database terms, a nice way to represent all
that IMHO is to build a table with terms (that is, the thesaurus),
having a n:m relationship to a table of documents. This table has a
1:n relationship to a table containing data about resources, which
have properties like "location", "format", etc.
Before I knew about topic maps, I discovered RDF. In RDF, everything
would be straightforward to represent (so it seems to me, at least).
Build a RDF-schema for the thesaurus (classes = {term, synonym, etc.},
relationships = {broaderTerm/narrowerTerm, related_to, etc.}), for the
different types of documents and for the different types of resources.
Term instances would then point to instances of many different
document types (like "scientific article" or "book"), which again
would point to instances of a resource. Resources can have locations.
Let's give one example in RDF, which might be not entirely correct
with regard to syntax, but hopefully serves to demonstrate my idea
better (pn = personal namespace :-):
<pn:Term rdf:about="http://.../rdf#ai">
<pn:label xml:lang="en">Artificial Intelligence</pn:label>
<pn:label xml:lang="de">Künstliche Intelligenz</pn:label>
<pn:broaderTerm rdf:resource="http://.../rdf#computer_science"/>
<pn:occurs_in rdf:resource="http://.../rdf#book1"/>
<pn:occurs_in rdf:resource="http://.../rdf#book2"/>
...
</pn:Term>
<pn:Book rdf:about="http://.../rdf#book1"/>
<pn:title>Practical RDF</pn:title>
<pn:author>Shelley Powers</pn:author>
<pn:publisher>...</pn:publisher>
<pn:language>English</pn:language>
...
<pn:resource rdf:resource="http://.../rdf#resource1"/>
<pn:resource rdf:resource="http://.../rdf#resource2"/>
<pn:resource rdf:resource="http://.../rdf#resource3"/>
</pn:Book>
...
<pn:File rdf:about="http://.../rdf#resource1"/>
<pn:format>application/pdf</pn:format>
<pn:locatedAt rdf:resource="http://.../rdf#location1"/>
<pn:locatedAt rdf:resource="http://..."/>
...
</pn:File>
<pn:PhysicalObject rdf:about="http://.../rdf#resource2"/>
<pn:locatedAt rdf:resource="http://.../rdf#location2"/>
</pn:PhysicalObject>
<pn:PhysicalObjectRef rdf:about="http://.../rdf#resource3"/>
<pn:locatedAt rdf:resource="http://.../rdf#location3"/>
</pn:PhysicalObjectRef>
<pn:HarddiskLocation rdf:about="http://.../rdf#location1"/>
<pn:url>file:///.../practical_rdf.pdf</pn:url>
</pn:HarddiskLocation>
<pn:PhysicalLocation rdf:about="http://.../rdf#location2"/>
<pn:name>My bookshelf, third book from the left</pn:name>
</pn:PhysicalLocation>
<pn:WebsiteLocation rdf:about="http://.../rdf#location3"/>
<pn:url>http://www.amazon.com/.../</pn:url>
</pn:WebsiteLocation>
I hope that this example is more or less self-explaining to those who
know RDF. Nevertheless, let's make some remarks. As you can see, it is
possible to express that a certain term (=topic! ;-) appears/occurs in
a document of class/type "Book". Furthermore, we can say that several
resources are connected with this book. We can express of what type
such resources are (a file, a physical object or a reference to a
physical object) and where such resources are located (be it at an URL
or my own bookshelf). The same type of file may be even located at
different URLs.
When I discovered topic maps, I soon realized that they are very
similar to the thesaurus I wanted to create a RDF-schema for, since
the terms can be mapped directly to topics.
What is no longer clear to me at all is how to express the rest of the
example with the help of topic maps. Especially, because I don't
understand the semantics of the occurrence-tag. What I think I have
understood so far is that "occurrence" can be interpreted as "the
resource pointed at is *about* the given topic, in the context given
by the scope-tag".
Since the book "Practical RDF" in part covers the topic "Artificial
Intelligence", my first idea to model it with a topic map was to
create a topic "Artifical Intelligence" together with several
occurrences, representing all the resources.
Of course, I could try to describe the resources further, using
reification. But that creates redundant information, at least if
describing resources representing the same article or book several
times. I'm not sure at the moment if a topic can have *more* than one
resourceRef-tag under <subjectIdentity>. If it is allowed, that might
be a solution: list all the resources for the topic under "occurrence"
and describe such resources further using reification. That way it
also could become clear that some resources describe the same thing.
Another way maybe is to create the book as a seperate topic and to
make all occurrences of the topic "Artificial Intelligence" instances
of the topic "Practical RDF (the book)". Is that correct at all? Or do
I have to reify the resource and make that resource an instance of the
topic "Practical RDF (the book)"?
Anyhow, why should I list all resources in which that topic occurs? In
the RDF example, I have connected the topic only with the book it
appears in. From there, I can easily infer new knowledge, namely that
the topic must also occur in all the resources (instances) of the book.
That leaves the question why it is not possible in TMs to say that a
topic is *about* another topic, in other words, that a topic *occurs*
in another topic. Why can topics only occur in addressable information
resources and not in topics which represent a class of such resources?
But it is also not clear to me what I'm describing at all when
reifying a resource. Do I describe the HTML page at the given
location, or do I describe what that HTML page is an instance of -
that is, the book! And how can I describe the location itself further?
The problem I see here is that there is no URL from which I can
retrieve the book in my bookshelf! Again, we have the problem that an
occurrence can only point to an addressable information resource. In
other words, it can't point to the book in my bookshelf, since the
bookshelf had to be represented as a topic ... and so on.
What I'm really trying to model is that there exists an abstract class
"The book with the title x", which has several instances (inheriting
all the properties of the book, of course) in different formats
distributed across several locations in the virtual world and the
physical world. (It is clear to me that even the RDF example doesn't
represent all that very good, since the book-resource is not an
instance of a class "the book with the title x" - but with RDF + RDFS
+ OWL, this certainly *could* be represented correctly.)
All that leaves me with the impression that it is better to represent
all documents, resources and locations as topics and to define my own
association "occurs-in" with a similar semantic to the
"occurrence"-tag. As we have seen in the RDF-example, thereby
connections between the info-layer (thesaurus = topic map), the
(abstract) documents and the resources which
represent/realize/instantiate such documents become possible, and
we're also able to describe the locations further. Especially, I can
also point to physical locations of a resource, like my bookshelf!
This is possible without using the separate syntactical construct of
an occurence tag, though not in a standardized way, as I have to admit.
As you can see, there is much confusion on my side about the semantics
of the occurence tag and the related semantics of reifying a resource
(making it perhaps an instance of a book?) and instantiating an
occurrence type. What is an occurrence type, by the way? Moreover, it
is not clear to me how to speak about resource locations in topic
maps, that is, how to reify them.
Is it possible at all to express the above given RDF example (or
better: what I intent to express with it) in a TM - and how would you
do it? How does one describe a resource further? How does one describe
the location of a resource? Is it possible at all to reify not only a
resource, but also its location? How does one point to resources being
part of the physical world?
I would be very pleased if someone could answer some of my questions.
Many greetings,
Marco
PS: Since I have given that RDF example, it is also not clear to me
why so many people are asking for a connection between RDF and TMs.
Isn't it always possible to express TM-constructs in RDF? What I mean
is, once you have translated the TM-DTD into RDFS, it is possible to
express the several parts of a TM as instances of that schema. I think
with the right inference rules and the right ontology in OWL, one
might even constrain such RDF instances in the same way as
"XML-instances" are constrained by their DTD (or XML-schema) and
define a semantics for it. Looks rather like a syntactical problem to
me. Short example (probably wrong, just to demonstrate the concept):
<tm:Association rdf:about="http://.../rdf#written-by">
<tm:member rdf:resource="http://.../rdf#x1"/>
<tm:member rdf:resource="http://.../rdf#x2"/>
</tm:Association>
<tm:AssociationMember rdf:about="http://.../rdf#x1"/>
<tm:roleSpec rdf:resource="http://.../rdf#topic1"/>
<tm:topicRef rdf:resource="http://.../rdf#topic2"/>
</tm:AssociationMember>
(... and so on ...)
|