PhiHo Hoang writes:
> > We regularly run programs with over 200MB of data (that's after
> > loading it into XSB). With some care it runs fine (of course have to
> > have enough RAM to store it all and everything else that is needed.)
>
> Would you please elaborate on this "data".
>
> Are they "facts", "rules" or just numbers, text...
It varies. We are using the CDF package in XSB and it stores
"ontologies". For example, we have a taxonomy which is an extension
of the UNSPSC taxonomy, a taxonomy of parts and services, which is
four levels deep and contains about 12,000 nodes. Our extension makes
it about 60,000 nodes. It is represented in CDF, which identifies
classes and object with small binary terms, represents subclass as a
binary predicate over these id's, and represents relationships as
3-ary predicates over these id's. So "all" of these are facts, except
of course, for the routines that process these facts, doing
inheritance, etc. Those facts are compiled and not stored in the
dynamic database. They are both numbers and text and small
structures.
(And then we have part data, that describes the 5,000,000 parts that
are managed by our customer. We don't (can't) load all that data into
memory, but we sometimes process batches of it.)
> > The big issue is good indexing.
> >
>
> If the data are just relational facts, would it be better to leverage "real"
> DBMS?
Yes, we could (and sometimes do) put them into a database, but we
generally don't use the database retrieval mechanisms (except to load
them) since they are too slow to do things like inheritance. That
requires a recursive evaluation and retrieving rows tuple-at-a-time
from a relational database is just too slow. We have room to load
them into memory and then we run at program speeds, not external
database speeds. That can be up to a couple of orders of magnitude
faster.
> I have been wondering how well would Prolog inference engine scale in
> handling relational data.
>
> If DBMS is used to store the relational facts, can their indexing capability
> be used for Prolog?
Yes, of course. The problem of efficiently interfacing Prolog with
Relational databases has a long and rich history. There's a lot of
work on it. My conclusion is that if you're doing things that
relational databases do well (store large amounts of flat data in
relatively few tables and can retrieve what you need with SQL
queries), the by all means use a RDBMS. If your data is of complex,
hierarchical structure, and it doesn't fit easily into a few
relational tables (without ridiculously many null values), then you
need something else. If it fits in memory (or you can batch load and
process it in pieces that will fit in memory), then Prolg works
better. If there's no way you could fit it in memory, then either you
suffer with relational technology, or try object-oriented database
technology, or build your own hybrid.
Regards,
-David
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
|