|
|
Choosing A Webhost: |
Re[2]: question about russian stemmer: msg#00002search.snowball
Hello > The English stemmer gives a scheme for including exceptions, which you might > try and adapt to the Russian stemmer if the "Kiev" case was sufficiently > important. I'm not a linguist, i'm just a programmer. SBL definitions look very uncommon for me, i will try to find out where to put exceptions. May be you can help me, how to add just one exception: stem Kiev => Kiev. Or if it hard, as workaround I make my stemmer subclass which looks for exceptions and use it, or if word is not listed in exceptions call Snowball. > You must of course realise that the stemmers are not 100% accurate, and a > certain rate of error is inevitable. These errors do not necessarily degrade > retrieval performance however (see the Introduction to Snowball). > > Are there many other words that mis-stem in a similar way? No, this was first and only one problem (at least for now). I'm writting search engine, which index all word in text. And i noticed when i search "Kieva" (Kiev's or 'of Kiev' in english), my search engine does not find text containing word "Kiev". When i started to search where is the error i've found that stemmer, stems 'Kiev' as 'Ki', and stem('Kiev') != stem('Kieva'), ('Ki' != 'Kiev') Thank you PS. I'm sorry for my english.
|
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | Re: question about russian stemmer, Martin Porter |
|---|---|
| Next by Date: | Re[2]: question about russian stemmer, Oleg Bartunov |
| Previous by Thread: | Re: question about russian stemmer, Martin Porter |
| Next by Thread: | Re[2]: question about russian stemmer, Oleg Bartunov |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
Free MagazinesCisco NewsReceive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business. subscribe Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field. subscribe The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business. subscribe Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company. subscribe Total Telecom Total Telecom is "The Economist of the communications industry". subscribe |