Please take our Survey
logo       

Choosing A Webhost:
A web hosting service is a type of Internet hosting service that allows individuals and organizations to provide their own website accessible via the World Wide Web. Web hosts are companies that provide space on a server they own for use by their clients as well as providing Internet connectivity, typically in a data center. Web hosts can also provide data center space and connectivity to the Internet for servers they do not own to be located in their data center, called colocation. more...

Re: Searching: msg#00006

audio.freedb.devel

Subject: Re: Searching

Hi,

Wednesday, August 28, 2002, 12:57:08 AM, vilm0001@xxxxxxxxxxxxxxxxxxxxxxx wrote:

> OK, I hope this has the correct "From" field...

Yes, was correct :)

> About UTF-8 : it doesn't affect the index, but it does affect the hashing
> function. Also, there has to be a decision as to whether to strip some UTF-8
> characters down to ascii for misspelling/ascii client reasons, or leave them
> as-is : eg a good part of the latin-1 suplement ( 0080 - 00FF) and most of
> extended latin A ( 0100 - 017F ) consist of "modified" letters which might
> often be replaced by their "base" letters in user input. IMHO these should be
> converted to their base characters in the fulltext search index, to improve
> the number of 'valid' matches. This doesn't mean we try to convert everything
> to ascii (that would be silly at best) just that eg. u with umlauts (00FC)
> ends up as a straight u. The diacritical marks (0300-036F I think) can just
> be dropped. I'm not sure what to do about things like the control pictures
> (2400-243F)- should these be 'applied' and the result 'read' or are they
> intended to be 'read' verbatim?

I don't know enough about UTF-8 to be able to comment on this. :(

> As regards the existing html search, I tried to give it a spin, but it dumps
> out with "Can't locate Net/freedb/file.pm" (which I assume is a freedb file
> parser from a freedb perl binding?), and I couldn't find any reference to
> this in CVS or on the freedb site, and it doesn't seem to be part of the
> CDDB::File or Net::freedb perl modules ... (help?)

Seems like you forgot to get the required stuff from the hyx-tools at
http://sourceforge.net/projects/hyx-tools
The p5-net-freedb package contains the necessary modules. lmd is also
needed - for generating the index.

> Linking files with different diskids would probably be a good idea (eg. file
> could consist of one line: "LINK=<discid>" or maybe include track offsets too

Yes, we should _definitely_ keep the track offsets of the entries to
be linked.

> The "hard" ;) bit is what to do
> about it ... let the user confirm the match, link automatically if it is a
> good match, or set some sort of certainty threshold below which the user
> chooses? Anyway, this kind of basic database admin is a slightly different
> problem, it just happens to be a lot easier with fulltext searching...

I'd say link automatically if the match is "good enough" and the track
offsets are at least "fuzzy matching".

> For the time being, I'll just write the module in C for ascii as a standalone
> app for easy testing. I'll set it up so that integration into the server sw
> will consist merely of adding a couple of function calls, and including the
> module sources. You can probably expect a working prototype in a few days to
> a few weeks (I just started work experience, 9-5 5 days/week, 20 weeks, and
> I'm still getting used to it after the light hours at Uni) which I'll put up
> on CVS or whatever so you can have a tinker with it.

Great :) If you want to use our CVS repository, please give me your
Sourceforge nick and I'll add you as a developer.

Well, it would be great to hear some comments from other people as
well ;)

- Jörg


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

Recently Viewed:
video.h264.deve...    technology.erps...    drivers.hostap/...    user-groups.lin...    games.railroad-...    handhelds.linux...    lang.harbour.de...    recreation.radi...    culture.publica...    xfree86.devel/2...    music.john-cage...    otrs.cvs/2003-0...    network.e-smith...    asplinux.suppor...    qnx.openqnx.dev...    ietf.nfsv4/2005...    editors.vim/200...    kde.devel.kopet...    web.zope.zwiki....    freebsd.devel.m...    java.xdoclet.de...    php.simpletest....    bacula.user/200...    security.virus....   
Home | blog view | USPTO Patent Archive | advertise | OSDir is an inevitable website. super tiny logo

Free Magazines

Cisco News
Receive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business.
subscribe

Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field.
subscribe

The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business.
subscribe

Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company.
subscribe

Total Telecom Total Telecom is "The Economist of the communications industry".
subscribe

Navigation