|
|
| <prev next> |
Choosing A Webhost: |
Indexing multiple numbers in one field: msg#00042lang.perl.modules.plucene
Before I learned all the details of how Plucene works, I devised an indexing scheme for a simple Web application, an application that allows multiple users to take, save, index and share notes. The notes live in a relational database, but I would use Plucene for text indexing. My text indexing schema looked roughly like this: note / UnStored (search default) title / Text created / Text updated / Text user_ids / Text ...where user_ids consisted of multiple numbers separated by spaces. I need to track user_ids so that Sally doesn't see Janes notes except in cases where Jane has set up a special sharing relationship. I used Plucene::Plugin::Analyzer::PorterAnalyzer as the analyzer so I would get stemming for the notes. Of course, this does not work. First of all, Plucene::Analysis::LetterTokenizer defines a token regex of /[[:alpha:]]+/, which does not match numbers. Secondly, Lingua::Stem::En::stem deletes anything that is not [A-Za-z]. I can define user_ids to be a Plucene::Document::Field of type Keyword, which works fine if there is only one user_id. But this fails to handle multiple user_ids. I am curious for thoughts on the best course of action. I can change user_ids to user_id (of type Keyword) and simply enter an instance of the document into the index for each associated user. I can write my own analyzer, but this will entail replacing/editing not only LetterTokenizer but also Lingue::Stem::En. (Nevertheless, I'm leaning in this direction. I'd like people to be able to run a search like "Windows 3.0" or "perl5" and get a meaningful result.) I can write a new field class, called Keywords, which would be just like Keyword except tokenized based on whitespace or some other optional field separator. Or I could use a separate index for each user, but I have ruled out this approach because eventually some notes will be shared among quite a few users. Thoughts? Cheers Ryan
|
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | Indexing multiple numbers in one field, Ryan Tate |
|---|---|
| Next by Date: | Re: Indexing multiple numbers in one field, Marvin Humphrey |
| Previous by Thread: | Re: Indexing multiple numbers in one field, Ryan Tate |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
Free MagazinesCisco NewsReceive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business. subscribe Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field. subscribe The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business. subscribe Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company. subscribe Total Telecom Total Telecom is "The Economist of the communications industry". subscribe |