logo       
Google Custom Search
    AddThis Social Bookmark Button
-->

Re: [bdbxml] Regular expressions in bdbxml: msg#00072

Subject: Re: [bdbxml] Regular expressions in bdbxml
[Greg Burd]
> >      Thanks for raising this issue.  How important is this feature?
> > I happen to think that this will be a common use case.  If you do too
> > (this is a general question for the list not just Adam), drop me some
> > email and let me know.  The question is:
> >
> > "Should Berkeley DB XML support an index type that optimizes access
> > to data using regular expressions?"
>

I think this should be very much a part of the road-mapped extension of
indexing to sub-document level (i.e. using indexes to search within
documents, rather than just to identify the documents containing matches as
at present).  Since regexes are part of the XPath2/XQuery specs, I think
users will expect them not just to be supported (as they indeed are at
present) but supported in a way that gives optimum performance.

> > As a follow up, which is more important, regex or full-text?

I doubt if it's possible to rank these two as far as importance to users
(and developers as bdbxml library users) is concerned. They are different
needs corresponding (mainly) to different use-cases.  As for what the
relative importance should be to the sleepycat developers, I guess that has
to be a matter of internal project priorities. All I would say as an
outsider is that developing true fulltext facilities is a big undertaking in
its own right, additional to the current bdxml agendas (there was a thread
exploring this a while back)  whereas I would regard addition of
index-assistance for regexes as a necessary further step along a path where
bdbxml is already far advanced. So while I personally would regard the
eventual introduction of fulltext features as awelcome  bonus, I'd view
long-term absence of index-supported regexes as a serious defect, for my
main use-cases at any rate.

[Adam Rambousek]
> For our application, regular expressions are very important. It's
> application for managing large dictionaries in XML format (mostly WordNet
> kind). We need various searching options and one of them is searching
> literals using regex (so, regex is needed only for values of one element
> that contains a word or two).

Multi-lingual lexicography is my main field of activity, and I don't think
bdxml is suitable for it yet, precisely because it doesn't (yet) provide the
granularity of indexing that complex queries against highly-structured
lexical data require. However, the road-mapped developments in forthcoming
versions will change that dramatically (and I had rather assumed that regex
support would be built into the extensions to current indexing methods that
those developments entail). For now, though, of the open-source native XML
databases, I find that only eXist has the sub-document level indexing power
and flexibility that my dictionary applications require.

Michael Beddow








------------------------------------------
To remove yourself from this list, send an
email to xml-unsubscribe-E1rGyZxLSgzby3iVrkZq2A@xxxxxxxxxxxxxxxx




<Prev in Thread] Current Thread [Next in Thread>