|
|
Choosing A Webhost: |
A problem with replacing letters: msg#00015search.snowball
Hello, I'm working on a Hungarian stemmer and I have a problem I haven't been able to solve. The code is added below. I have a routine called v_ending which replaces "a acute" and "e acute" by "a" and "e". If I simply delete them it works but when I actually try replacing instead of an "a" I get an "a acute". For instance if I test it on the word "hagyásában" I ought to get "hagyása" (with ban removed and a acute replaced) but I get "hagyásá". Similar things happen with a word like "kimenetelében". I suspect I am missing something simple but I just can't figure out what goes wrong. Thank you Anna Tordai ************************** // Hungarian stemmer. routines ( mark_regions v_ending R1 R2 case ) externals ( stem ) integers ( p1 p2 ) groupings ( v ) stringescapes {} /* special characters (in ISO Latin I) */ stringdef a' hex 'E1' // a-acute stringdef e' hex 'E9' //e-acute stringdef i' hex 'ED' //i-acute stringdef o' hex 'F3' //o-acute stringdef o" hex 'F6' //o-umlaut stringdef oq hex 'F5' //o-double acute stringdef u' hex 'FA' //u-acute stringdef u" hex 'FC' //u-umlaut stringdef uq hex 'FB' //u-double acute //vowels define v 'aeiou{a'}{e'}{i'}{o'}{o"}{oq}{u'}{u"}{uq}' define mark_regions as ( $p1 = limit $p2 = limit (gopast v (test substring among('cs' 'gy' 'sz' 'ty') setmark p1)) or (goto v gopast non-v setmark p1) goto v gopast non-v setmark p2 ) backwardmode ( define R1 as $p1 <= cursor define R2 as $p2 <= cursor define v_ending as ( [substring] among( '{a'}' (<- 'a') '{e'}' (<- 'e') ) ) define case as ( [substring] among( 'ban' //inessive 'ben' //inessive ) delete v_ending ) ) define stem as ( do mark_regions backwards ( do case ) )
|
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | Norwegian stemmer, Helge Thomas Hellerud |
|---|---|
| Next by Date: | Re: A problem with replacing letters, Martin Porter |
| Previous by Thread: | I´m not sure about Snowball use???, lina maria franco toro |
| Next by Thread: | Re: A problem with replacing letters, Martin Porter |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
Free MagazinesCisco NewsReceive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business. subscribe Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field. subscribe The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business. subscribe Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company. subscribe Total Telecom Total Telecom is "The Economist of the communications industry". subscribe |