|
|
Choosing A Webhost: |
Re[2]: question about russian stemmer: msg#00003search.snowball
On Fri, 13 Feb 2004, "Yuri" wrote: > Hello > > > The English stemmer gives a scheme for including exceptions, which you might > > try and adapt to the Russian stemmer if the "Kiev" case was sufficiently > > important. > > I'm not a linguist, i'm just a programmer. SBL definitions look very uncommon > for me, i will try to find out where to put exceptions. May be you can > help me, how to add just one exception: stem Kiev => Kiev. > > Or if it hard, as workaround I make my stemmer subclass which looks for > exceptions > and use it, or if word is not listed in exceptions call Snowball. I think subclassing of our Perl interface would be the best way. There are too many exceptions, so it's impractical if anybody's complains would resulted in modifying snowball rules. > > > You must of course realise that the stemmers are not 100% accurate, and a > > certain rate of error is inevitable. These errors do not necessarily degrade > > retrieval performance however (see the Introduction to Snowball). > > > > Are there many other words that mis-stem in a similar way? > > No, this was first and only one problem (at least for now). > I'm writting search engine, which index all word in text. > And i noticed when i search "Kieva" (Kiev's or 'of Kiev' in english), > my search engine does not find text containing word "Kiev". > > When i started to search where is the error i've found that stemmer, > stems 'Kiev' as 'Ki', and stem('Kiev') != stem('Kieva'), > ('Ki' != 'Kiev') > > Thank you > > PS. I'm sorry for my english. > > > _______________________________________________ > Snowball-discuss mailing list > Snowball-discuss@xxxxxxxxxxxxxxxxxx > http://lists.tartarus.org/mailman/listinfo/snowball-discuss > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@xxxxxxxxxx, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
|
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | Re[2]: question about russian stemmer, "Yuri" |
|---|---|
| Next by Date: | BSD License conditions, John Challis |
| Previous by Thread: | Re[2]: question about russian stemmer, "Yuri" |
| Next by Thread: | BSD License conditions, John Challis |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
Free MagazinesCisco NewsReceive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business. subscribe Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field. subscribe The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business. subscribe Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company. subscribe Total Telecom Total Telecom is "The Economist of the communications industry". subscribe |