|
|
Choosing A Webhost: |
Question about Porter2 Step 4: msg#00026search.snowball
Hi folks, This is what the Porter 2 definition (http://snowball.tartarus.org/english/stemmer.html) has to say about a part of Step 4: > Search for the longest among the following suffixes, and, > if found and in R2, perform the action indicated. > > ... (removed the non-relevant part of step 4) > > ion > delete if preceded by s or t" When I feed the word "unquestionably" to my stemmer, it returns "unquest", while the provided sample list of stemmed words shows the word being stemmed to "unquestion" (and so does http://snowball.tartarus.org/demo.php?words=unquestionably) When step 4 kicks in, this is what the word looks like: u n q u e s t i o n | | | R2------ R1-------------- According to the Porter2 definition described on the site, ion should be removed because it's preceded a "t", and "ion" is located in R2 Has the step 4 rules been changed, or has the provided dictionary/stemmed list (and demo) not been updated for the Porter2 method? What should I do? Thanks Best regards, Håvard Lindset
|
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | RE: Problems with step 5 in the Porter2 algorithm, Martin Porter |
|---|---|
| Next by Date: | Lindset question, Martin Porter |
| Previous by Thread: | Problems with step 5 in the Porter2 algorithm, Håvard Lindset |
| Next by Thread: | Lindset question, Martin Porter |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
Free MagazinesCisco NewsReceive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business. subscribe Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field. subscribe The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business. subscribe Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company. subscribe Total Telecom Total Telecom is "The Economist of the communications industry". subscribe |