Please take our Survey
logo       

Choosing A Webhost:
A web hosting service is a type of Internet hosting service that allows individuals and organizations to provide their own website accessible via the World Wide Web. Web hosts are companies that provide space on a server they own for use by their clients as well as providing Internet connectivity, typically in a data center. Web hosts can also provide data center space and connectivity to the Internet for servers they do not own to be located in their data center, called colocation. more...

Re: Snowball API versioning: msg#00002

search.snowball

Subject: Re: Snowball API versioning

Oleg Bartunov wrote:
Hi there,

I'm asking about API versioning to let third-party products track
snowball changes. We use snowball stemmer in our full-text search engine for PostgreSQL. Currently, it's extension module but we expect it will became
built-in core FTS in the next release, which should happen in two months.
There were several (we noticed 2) API changes this year. It's headache !
I suggest to follow standard versioning scheme major.minor.

Some kind of versioning would indeed be a good idea, but it's not clear to me what the API changes you're referring to are: as far as I can see, these are the ways in which the code accessible from snowball changes:

1. Internal changes to the compiler, resulting in the generated stemmer code being different, but behaving the same.
2. New features being added to the snowball language, but old .sbl files will still produce equivalent output.
3. Changes to the definition of the snowball language, resulting in .sbl files no-longer producing equivalent output.
4. Changes to a snowball script, such that it produces different output.
5. Changes to the libstemmer interface (ie, the libstemmer.h file, for C).


IIRC, there have been several changes of type 1, but none of 2 or 3 in recent months/years. There have been no changes to libstemmer.h since August 2005.

Therefore, I suspect you're talking about changes of type 4. I would like to add versioning to the stemming algorithms at some point, such that each change to an algorithm increments the version number, but haven't had time to do this yet.

Also, I would like to modify the libstemmer interface such that the current version of a stemming algorithm can be obtained, and also such that a particular version of a stemming algorithm can be requested. It would also be possible to compile a version of libstemmer such that several old versions of a particular stemmer were available. This would allow a database to store the stemmer version number which was used to index with, so that searches can use the same stemmer version. However, a newly created database would simply use the latest stemmer version. Again, I simply haven't had time to do this yet.

For now, I recommend that you simply take a copy of libstemmer into your distribution, and update that static version of libstemmer as appropriate when you make new releases of your distribution.

I don't think that a major.minor versioning scheme would be appropriate here, but maybe you are thinking of something different to me (in which case, please enlighten me).

--
Richard


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

Recently Viewed:
qnx.openqnx.dev...    gcc.libstdc++.c...    solaris.opensol...    information-ret...    misc.misterhous...    web.catalyst.ge...    apache.webservi...    redhat.release....    hardware.lirc/2...    kernel.autofs/2...    technology.sust...    linux.vdr/2003-...    editors.lyx.gen...    org.user-groups...    netbsd.devel.pk...    xdg.devel/2004-...    version-control...    jakarta.slide.d...    debian.packages...    creativecommons...    ports.ppc.embed...    bug-tracking.bu...   
Home | blog view | USPTO Patent Archive | advertise | OSDir is an inevitable website. super tiny logo

Free Magazines

Cisco News
Receive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business.
subscribe

Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field.
subscribe

The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business.
subscribe

Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company.
subscribe

Total Telecom Total Telecom is "The Economist of the communications industry".
subscribe