Please take our Survey
logo       

Choosing A Webhost:
A web hosting service is a type of Internet hosting service that allows individuals and organizations to provide their own website accessible via the World Wide Web. Web hosts are companies that provide space on a server they own for use by their clients as well as providing Internet connectivity, typically in a data center. Web hosts can also provide data center space and connectivity to the Internet for servers they do not own to be located in their data center, called colocation. more...

Re: Bug report?: msg#00011

search.snowball

Subject: Re: Bug report?



Alexander,

Yes, I was aware of this, and should explain:

The Porter stemmer, as originally defined, reduces "s" to null, and is
faithfully
implemented in the stemmer at

http://snowball.tartarus.org/porter/stemmer.html

The version of the Porter stemmer which I distributed for many years stems
"s" to
"s" however. This is because it has a couple of improvements (points of
DEPARTURE)
from the published algorithm which everyone has come to accept. These
improvements
are in the slightly different version of the stemmer at

http://www.tartarus.org/~martin/PorterStemmer/

and are clearly marked DEPARTURE in the commments in the ANSI C version of the
stemmer - as well being described in the accompanying text.

I can't alter this now, bugs or not, because of the status of the Porter stemmer
as a described algorithm, but the Snowball Porter2 stemmer fixes these
problems and
many others besides.

I would agree that it is not helpful to stem "s" to null, but would not
agree that
stemming to null is invariably bad (although none of the Snowball stemmers on
current release do so). See the notes introducing the Russian stemmer.

I can't explain the problems you had with email I'm afraid. I've certainly
received executables, and files containing viruses, as unwanted attachments,
within the past few months.

Martin

> I found a phrase
>
> "In any case a string of length 1 will be unchanged if passed
>through the algorithm".
>
>Indeed, I always thought a stemmer should NOT produce empty stems, no? This
>is very inconvenient in practice since it changes file formats, word counts,
>etc.
>
>However, it seems the algorithm does strip "s" -> "". (This is the only rule
>producing empty strings.) In effect, the program at
>http://snowball.tartarus.org/porter/stemmer.html does it; I attach the
>corresponding files (I found no way to send the executable due to a paranoic
>antivirus software at Tartarus).
>
>Is this correct? Wouldn't you rather change the unconditional rule
>
> S -> cats -> cat
>
>to
>
> (*v or *c) S -> cats -> cat
>
>Thank you!
>Alexander


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

Recently Viewed:
qnx.openqnx.dev...    gcc.libstdc++.c...    solaris.opensol...    information-ret...    misc.misterhous...    web.catalyst.ge...    apache.webservi...    redhat.release....    hardware.lirc/2...    kernel.autofs/2...    technology.sust...    linux.vdr/2003-...    editors.lyx.gen...    org.user-groups...    netbsd.devel.pk...    xdg.devel/2004-...    version-control...    jakarta.slide.d...    debian.packages...    creativecommons...    ports.ppc.embed...    bug-tracking.bu...   
Home | blog view | USPTO Patent Archive | advertise | OSDir is an inevitable website. super tiny logo

Free Magazines

Cisco News
Receive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business.
subscribe

Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field.
subscribe

The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business.
subscribe

Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company.
subscribe

Total Telecom Total Telecom is "The Economist of the communications industry".
subscribe