|
|
Choosing A Webhost: |
Re: Snowball French stemming: msg#00011search.snowball
Hi Martin, Thanks for the reply. I just read the introductory page on the Snowball site this morning and learned that the stemmers are not perfect ..... :o) I guess the problem I encountered was about consistency. As I mentioned in my original email, I am using this French stemmer with OpenFTS. After all the texts have been indexed and stored, I would expect (maybe this is my wrong expectation) that if I search for the word "français", all documents containing the words "français" and "française" would come up. Or, conversely, a search for the word "française" would bring up documents containing the word "français" as well. But because of how the algorithm works, the stemming result of these 2 words are different. Thus the search result did not come up as expected. So probably both words should be stemmed to "franç" or "franc" just to be consistent ? On the other hand, maybe this word and its feminine form is just a special case (e.g. I tried "provençale" and "provençal" and both were stemmed to "provençal"). In any case, I have already made a note such that it may be something I have to live with when my application is implemented. Fred ----- Original Message ----- From: <martin.porter@xxxxxxxxxxxxxxx> To: "Fred Fung" <fred.fung@xxxxxxxxxxxxx>; <snowball-discuss@xxxxxxxxxxxxxxxxxx> Sent: Friday, December 12, 2003 11:04 AM Subject: Re: [Snowball-discuss] Snowball French stemming > Fred, > > Of course, the stemmers are not perfect, so errors of this type will happen. > Even so, there does seem to be room for improvement. -ais is a verb ending, > which is why it taken off franc,ais (the stemmer does not know this is not a > verb form). But it is also a common adjectival form: japonais, anglais, > franc,ais etc and might be removed accordingly (mauvais is an exception here). > If I made this change would you be interested? > > Martin > > >
|
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | Re: Snowball French stemming, Martin Porter |
|---|---|
| Next by Date: | Re: Snowball French stemming, Martin Porter |
| Previous by Thread: | Re: Snowball French stemming, Martin Porter |
| Next by Thread: | Re: Snowball French stemming, Martin Porter |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
Free MagazinesCisco NewsReceive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business. subscribe Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field. subscribe The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business. subscribe Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company. subscribe Total Telecom Total Telecom is "The Economist of the communications industry". subscribe |