Please take our Survey
logo       

Choosing A Webhost:
A web hosting service is a type of Internet hosting service that allows individuals and organizations to provide their own website accessible via the World Wide Web. Web hosts are companies that provide space on a server they own for use by their clients as well as providing Internet connectivity, typically in a data center. Web hosts can also provide data center space and connectivity to the Internet for servers they do not own to be located in their data center, called colocation. more...

Re: Snowball French stemming: msg#00011

search.snowball

Subject: Re: Snowball French stemming

Hi Martin,

Thanks for the reply. I just read the introductory page on the Snowball
site this morning and learned that the stemmers are not perfect ..... :o)

I guess the problem I encountered was about consistency. As I mentioned in
my original email, I am using this French stemmer with OpenFTS. After all
the texts have been indexed and stored, I would expect (maybe this is my
wrong expectation) that if I search for the word "français", all documents
containing the words "français" and "française" would come up. Or,
conversely, a search for the word "française" would bring up documents
containing the word "français" as well. But because of how the algorithm
works, the stemming result of these 2 words are different. Thus the search
result did not come up as expected.

So probably both words should be stemmed to "franç" or "franc" just to be
consistent ? On the other hand, maybe this word and its feminine form is
just a special case (e.g. I tried "provençale" and "provençal" and both were
stemmed to "provençal"). In any case, I have already made a note such that
it may be something I have to live with when my application is implemented.


Fred


----- Original Message -----
From: <martin.porter@xxxxxxxxxxxxxxx>
To: "Fred Fung" <fred.fung@xxxxxxxxxxxxx>;
<snowball-discuss@xxxxxxxxxxxxxxxxxx>
Sent: Friday, December 12, 2003 11:04 AM
Subject: Re: [Snowball-discuss] Snowball French stemming


> Fred,
>
> Of course, the stemmers are not perfect, so errors of this type will
happen.
> Even so, there does seem to be room for improvement. -ais is a verb
ending,
> which is why it taken off franc,ais (the stemmer does not know this is not
a
> verb form). But it is also a common adjectival form: japonais, anglais,
> franc,ais etc and might be removed accordingly (mauvais is an exception
here).
> If I made this change would you be interested?
>
> Martin
>
>
>


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

Recently Viewed:
qnx.openqnx.dev...    gcc.libstdc++.c...    solaris.opensol...    information-ret...    misc.misterhous...    web.catalyst.ge...    apache.webservi...    redhat.release....    hardware.lirc/2...    kernel.autofs/2...    technology.sust...    linux.vdr/2003-...    editors.lyx.gen...    org.user-groups...    netbsd.devel.pk...    xdg.devel/2004-...    version-control...    jakarta.slide.d...    debian.packages...    creativecommons...    ports.ppc.embed...    bug-tracking.bu...   
Home | blog view | USPTO Patent Archive | advertise | OSDir is an inevitable website. super tiny logo

Free Magazines

Cisco News
Receive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business.
subscribe

Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field.
subscribe

The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business.
subscribe

Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company.
subscribe

Total Telecom Total Telecom is "The Economist of the communications industry".
subscribe