|
|
Choosing A Webhost: |
Re: a simple algorithm problem: msg#00005search.snowball
On Thu, Jan 06, 2005 at 10:20:43AM +0000, Martin Porter wrote: > >Presumably this still restricts Snowball to code points in the BMP? Or > >does it just restrict it to recognising and doing things with > >characters at code points in the BMP, passing through any others? > > It would be the latter. Since stemming is applicable to a system of > languages, all of whose characters are, I would assert, in the BMP, I do > think that is a problem. I'd agree that, right now, acting on code points outside the BMP shouldn't be needed. Providing other characters are passed through, it will cope happily with anything I can reasonably think of throwing at it :-) > >What's the character encoding of snowball scripts at the moment? > The scripts themselves are in ASCII, and ASCII assumptions are made in the > Snowball compiler. So when you were talking about strings being in UTF-8, were you talking about input and output only? I wasn't awarethe concept of 'string' applied to anything other than things in the Snowball language itself ... or do you mean that strings would be stored internally to a running snowball stemmer in UTF-8? Cheers, James -- /--------------------------------------------------------------------------\ James Aylett xapian.org james@xxxxxxxxxxxx uncertaintydivision.org
|
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | Re: a simple algorithm problem, Martin Porter |
|---|---|
| Next by Date: | Re: a simple algorithm problem, Olly Betts |
| Previous by Thread: | Re: a simple algorithm problem, Martin Porter |
| Next by Thread: | Hungarian stemmer, A. Tordai |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
Free MagazinesCisco NewsReceive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business. subscribe Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field. subscribe The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business. subscribe Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company. subscribe Total Telecom Total Telecom is "The Economist of the communications industry". subscribe |