|
|
Choosing A Webhost: |
Re: Testing a stemmer: msg#00007search.snowball
Yes, I'm sorry I wasn't exactly clear. What I mean is I have finished the Hungarian stemmers and I need to create a word list to submit it with to the Snowball site. I've downloaded the libstemmer_c package where I found a program called stemwords.c which can be used to stem an entire list of words with. Unfortunately I don't really understand how I can use a stemmer of my own making. In the modules.h file in the libstemmer directory it says I can't edit the file manually and the module names come from a mkmodules.pl file which isn't in the package. In other words is there some way I can insert the c version of my stemmer somewhere so I can stem a word list using this package? Thank you Anna Tordai > This seems to be a very general question, as it raises the whole issue of > stemmer evaluation! But the simplest test is to arrange for two column > output (http://snowball.tartarus.org/algorithms/german/diffs.txt etc)and > inspect it by hand. > > It can be useful to work with a similar list, sorted from the end of the > word (a reverse index). Lovins mentions using these two lists in her early > paper. > > Martin > >>Dear Snowball people, >> >>What would be the simplest way of testing a self-made stemmer on a word >>list. That is, is there a some kind of testing program? >> >>Thank you, >> >>Anna Tordai > > >
|
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | Re: Perl wrappers updated, Martin Porter |
|---|---|
| Previous by Thread: | Re: Testing a stemmer, Martin Porter |
| Next by Thread: | Perl wrappers updated, Marvin Humphrey |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
Free MagazinesCisco NewsReceive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business. subscribe Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field. subscribe The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business. subscribe Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company. subscribe Total Telecom Total Telecom is "The Economist of the communications industry". subscribe |