logo       

Re: After creating index nothing is found anymore: msg#00178

Subject: Re: After creating index nothing is found anymore
Re: your final question, there is currently no way
to get just the doc ids.

Best you can do to ensure efficiency is
to select a small fragment of the resource
thus minimizing the excess bytes transmitted.

WRT the index creation problem,
you could help the project if you can
write a unit test demonstrating that
index creation during collection creation
fails, and filing a bug report in
bugzilla including your unit test.

Thanks!

-Terry

Sascha Kulawik wrote:

Have you tried running the unit tests?

No, not now.
The http://marc.theaimsgroup.com/?l=xindice-users&m=107426829426034&w=2
solved my problem, but I dont know why - so I won't create the index during
the creation of the collection, furthermore after that with that given
function.
I've recreated the indexes with the patterns link@* and document@* - this is
much better. A result for a Xpath Query takes 30ms with about 1000 Documents
- thats far good enough for my workcase.
Actually there is only one problem left - how could I speed up the Querys,
where I don't need the Xpath result. So - is there any solution to get a
ResultSet back without any Documents in? In this case I need only the
DocumentIds res.getDocumentId();

Thank you very much for your help,

Sascha

The test IndexedSearchTest in
java/tests/src/org/apache/xindice/integration/client/services
includes a number of tests that test not only whether or not indexed searching is working, but also test whether or not indexing speeds up the query. One of the tests uses the following query:

//phone[starts-with(@call, 'n')]

That is very similar to the query used in:

> If I'm doing a search like "//document[@src='170']", everything works fine, > except that it takes the same time as without an index.

The IndexedSearchTest indexer for this case uses the pattern "*@call" to speed up the //phone[starts-with(@call, 'n')] query. The pattern says index all "call" Attributes regardless of what Element they belong to.

Your indexer is defined with pattern "link@viewid". Since it does not index ALL possible viewid Attributes (only the viewid Attribute of the link Element) Xindice cannot use this index to search all occurrences of viewid Attributes. Thus, you see no speedup. Try pattern "*@viewid" instead.

I would expect to see the IndexedSearchTest fail if there is a problem. Otherwise, perhaps you have a corrupted index. Try removing it and reindexing.

-Terry

Sascha Kulawik wrote:

Hello,

I finally getting headage during the configuration of
Xindice. I'm using Xindice 1.1b3 (currently Ive tried a CVS checkout from today morning) in Jboss with Jetty as exploded war-archive.
I've created a collection with following code snippet:

---------------------------------------------------------------
String collectionConfig = "<collection compressed=\"false\" name=\""+collectionName+"\">"+ "<filer class=\"org.apache.xindice.core.filer.BTreeFiler\"
gzip=\"false\"/>"+
"<indexes>"+ "<index class=\"org.apache.xindice.core.indexer.ValueIndexer\" name=\"internalLink_attr_idx\" pattern=\"link@viewid\" type=\"String\"/>"+ "<index class=\"org.apache.xindice.core.indexer.ValueIndexer\" name=\"document_attr_idx\" pattern=\"document\" type=\"String\"/>"+ "</indexes>"+ "</collection>"; col = DatabaseManager.getCollection(uri); CollectionManager collman = (CollectionManager) col.getService("CollectionManager", "1.0"); try { collman.createCollection(collectionName, XercesHelper.string2Dom(collectionConfig));
}catch(Exception exe) {
String errMsg = "Error during the converting of the
Collection-String
to XML-DOM"; log.error(errMsg); throw new XMLDBException(ErrorCodes.VENDOR_ERROR, -1, errMsg, exe); }
---------------------------------------------------------------

If I'm doing a search like "//document[@src='170']",
everything works fine, except that it takes the same time as without an index.
If I'm trying to search for "//link[@viewid='2045']",
nothing happens,
no result, nothing. Without the index I will get some
results back. This Xpath search is very fast (80ms), but without any result it is obvious needless :) The idx file of the first one is about 30kB in size, the second one is 6kB - this is the default I think.
For the first Xpath Query it is only relevant, if this
document exists in any xml document in the collection. I've seen on MARC, that this could be done faster, so that the result of this Xpath Query will be only the Document itself or the id of the document. How is this possible?
Thank you all very much,

Sascha







<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

Recently Viewed:
audio.irate.dev...    yellowdog.gener...    ietf.ips/2002-0...    xfree86.fonts/2...    busybox/2003-07...    emacs.jdee/2004...    linux.mandrake....    hardware.microc...    user-groups.lin...    science.analysi...    version-control...    db.filemaker.de...    cluster.openmos...    mail.eyebrowse....    text.xml.xerces...    kde.devel.kwrit...    finance.moneyda...    gcc.regression/...    network.routing...    os.freebsd.deve...    recreation.radi...    qnx.openqnx.dev...    python.xml/2002...   
Home | blog view | USPTO Patent Archive | advertise | OSDir is an inevitable website. super tiny logo

Free Magazines

Cisco News
Receive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business.
subscribe

Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field.
subscribe

The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business.
subscribe

Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company.
subscribe

Total Telecom Total Telecom is "The Economist of the communications industry".
subscribe