logo       

Choosing A Webhost:
A web hosting service is a type of Internet hosting service that allows individuals and organizations to provide their own website accessible via the World Wide Web. Web hosts are companies that provide space on a server they own for use by their clients as well as providing Internet connectivity, typically in a data center. Web hosts can also provide data center space and connectivity to the Internet for servers they do not own to be located in their data center, called colocation. more...

Re: searching and sorting by date: msg#00124

search.xapian.general

Subject: Re: searching and sorting by date

Querying is more complex than it needs to be. We could easily add
something into the bindings so you could do:

----------------------------------------------------------------------
for match in database.query("my search terms):
# do something
pass
----------------------------------------------------------------------

I think that would be very useful.


d = Document(TextField("foo bar bang"), Keyword("genre", "punk"))
idx.index(d)
for result in idx.search("foo genre:punk"):
print result


Maybe it's just me, but I don't know what that is doing.

It's a sample of how one uses Xapwrap. Sorry I wasn't clear about that.

I'll take a
guess and say you're adding "foo bar bang" (term generated, stemmed),
and then a term you're intending to be used in a boolean
fashion.

Yep

Looks to me like you don't want to use the omega term style
here, because you'd have to write more code to set the
genre->something mapping, and then pass that to the index for when you
run searches.

Xapwrap manages those mappings for you. That's one of the really nice things it does out of the box. When a document is indexed, the keys of "Keywords" are remembered in a dictionary and the query parser is automatically configured with the appropriate prefixes. You can either save/restore the mapping from a dictionary (which I use), or Xapwrap has support for storing its metadata in document==1 in the xapian database.

As to term generation, the procedure you explained in your previous message (very thoroughly, thank you!) to generate terms according to the same capitalization convention is done by Xapwrap already, at least as far as I can tell (there are probably some differences). It handles and prefixes text and keywords and has classes for terms and values as well.

In this case, your Index.search() method won't be able
to use the QueryParser.

What do you expect:

print result

to return? You haven't given the underlying Xapian document anything
to display...

By default in Xapwrap it just prints the score and document id. You can access values of the document form the result, so if title was an existing value:

print result['values']['title']

would print the title of the matching document.

Note that Xapian doesn't currently include term generators for
indexing in the library. There has been discussion of this, which
might take care of the fact that, in the first two lines, you're
asking for an indexer.

Right. While the term generation explanation makes sense once it's explained, it's a tough concept for a new user to jump over right away.

Xapwrap is, as I understand, intended to give a simple interface to
doing the most common types of indexing and searching. Searching is
something that could be easier, but if you're looking for indexing
facilities I'd use scriptindex unless I needed to do something fairly
sophisticated, in which case I'd want to be working at the raw term
level. If and when we have term generators (and possibly even
indexers) shipping as part of Xapian or a bundled extension, they
could easily appear in the bindings as well.

Well that would be great, Xapwrap currently indexes terms along the lines of the capitalization scheme you described, so I don't think it would be difficult to move my concepts from one to the other.

Again, maybe that's me - I like to know what's going into my database :-)

Xapwrap does not occlude the database from you, the index classes they provide are just wrappers, and the database objects themselves are easily accessed via the 'db' attribute. it gives you quite a bit of flexibility on what terms get generated, and when that fails, hey, it's Python. ;) But I know how you feel, libraries of a very raw nature have their own set of risks that are unacceptable for many applications.


(Btw, there are python libraries in core that make you do pretty much
as much work; they tend to have simpler access methods for common
tasks, which is probably almost all of what we're lacking. Just be
thankful we're not doing XML generation under Java :-)

every day!

-Michel


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

Recently Viewed:
qnx.openqnx.dev...    gcc.libstdc++.c...    solaris.opensol...    information-ret...    misc.misterhous...    web.catalyst.ge...    apache.webservi...    redhat.release....    hardware.lirc/2...    kernel.autofs/2...    technology.sust...    linux.vdr/2003-...    editors.lyx.gen...    org.user-groups...    netbsd.devel.pk...    xdg.devel/2004-...    version-control...    jakarta.slide.d...    debian.packages...    creativecommons...    ports.ppc.embed...    bug-tracking.bu...   
Home | blog view | USPTO Patent Archive | advertise | OSDir is an inevitable website. super tiny logo

Free Magazines

Cisco News
Receive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business.
subscribe

Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field.
subscribe

The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business.
subscribe

Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company.
subscribe

Total Telecom Total Telecom is "The Economist of the communications industry".
subscribe

Navigation