Please take our Survey
logo       

Choosing A Webhost:
A web hosting service is a type of Internet hosting service that allows individuals and organizations to provide their own website accessible via the World Wide Web. Web hosts are companies that provide space on a server they own for use by their clients as well as providing Internet connectivity, typically in a data center. Web hosts can also provide data center space and connectivity to the Internet for servers they do not own to be located in their data center, called colocation. more...

Re[2]: question about russian stemmer: msg#00002

search.snowball

Subject: Re[2]: question about russian stemmer

Hello

> The English stemmer gives a scheme for including exceptions, which you might
> try and adapt to the Russian stemmer if the "Kiev" case was sufficiently
> important.

I'm not a linguist, i'm just a programmer. SBL definitions look very uncommon
for me, i will try to find out where to put exceptions. May be you can
help me, how to add just one exception: stem Kiev => Kiev.

Or if it hard, as workaround I make my stemmer subclass which looks for
exceptions
and use it, or if word is not listed in exceptions call Snowball.

> You must of course realise that the stemmers are not 100% accurate, and a
> certain rate of error is inevitable. These errors do not necessarily degrade
> retrieval performance however (see the Introduction to Snowball).
>
> Are there many other words that mis-stem in a similar way?

No, this was first and only one problem (at least for now).
I'm writting search engine, which index all word in text.
And i noticed when i search "Kieva" (Kiev's or 'of Kiev' in english),
my search engine does not find text containing word "Kiev".

When i started to search where is the error i've found that stemmer,
stems 'Kiev' as 'Ki', and stem('Kiev') != stem('Kieva'),
('Ki' != 'Kiev')

Thank you

PS. I'm sorry for my english.


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

Recently Viewed:
qnx.openqnx.dev...    gcc.libstdc++.c...    solaris.opensol...    information-ret...    misc.misterhous...    web.catalyst.ge...    apache.webservi...    redhat.release....    hardware.lirc/2...    kernel.autofs/2...    technology.sust...    linux.vdr/2003-...    editors.lyx.gen...    org.user-groups...    netbsd.devel.pk...    xdg.devel/2004-...    version-control...    jakarta.slide.d...    debian.packages...    creativecommons...    ports.ppc.embed...    bug-tracking.bu...   
Home | blog view | USPTO Patent Archive | advertise | OSDir is an inevitable website. super tiny logo

Free Magazines

Cisco News
Receive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business.
subscribe

Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field.
subscribe

The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business.
subscribe

Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company.
subscribe

Total Telecom Total Telecom is "The Economist of the communications industry".
subscribe