|
|
Choosing A Webhost: |
Re: Spanish stemmer with accents stripped before stemming: msg#00005search.snowball
Hi, Martin, Thank you very much for your replies... >Obviously to us, it a bit easier to look at the problem > from the snowball angle, rather than think about the generated java > after it's been put inside lucene! As far as the snowball script is > concerned, I believe you could strip out accents from the source, > eliminate the duplicate strings in the amongs(..) that would result, and > recompile, getting the effect you want. OK... We just tried that, and it works very well so far! Thanks a bunch. > (Incidentally, I have hit this problem with Spanish stemming before, but > it was a long while ago -- before the development of snowball.) Accents are the boogeyman of day-to-day written Spanish usage and it's hard to imagine an effective search engine that obliges users to type them correctly. > Thinking about it further, this will not work, since the strings are > placed in tables which would need to be fully reorganised if any of > the > characters in the strings were readjusted. (it is the way a snowball > 'among' is implemented). > > The only way to do this is to modify the stem.sbl file for Spanish, > regenerate the java code with the snowball compiler (which you can > download) and replace the old java with the new in your application. Ah... So now we know why our previous attempt failed. It occurs to me that perhaps it would be a good idea to modify Snowball's Spanish stemmer to accept both accented and accent-stripped input. Greetings, Andrew Green P.S. Our little server was down during the weekend--sorry--it's back online again--though now a commit error to the repository has made the relevant files difficult to access--though now they seem less relevant, I suppose.
|
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | Re: Spanish stemmer with accents stripped before stemming, Martin Porter |
|---|---|
| Next by Date: | Re: Spanish stemmer with accents stripped before stemming, Martin Porter |
| Previous by Thread: | Re: Spanish stemmer with accents stripped before stemming, Martin Porter |
| Next by Thread: | Re: Spanish stemmer with accents stripped before stemming, Martin Porter |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
Free MagazinesCisco NewsReceive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business. subscribe Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field. subscribe The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business. subscribe Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company. subscribe Total Telecom Total Telecom is "The Economist of the communications industry". subscribe |