|
|
Choosing A Webhost: |
Re: RDBMS schema (MySQL): msg#00011audio.freedb.devel
Hello, Thursday, August 29, 2002, 6:21:10 PM, Ryan Fox wrote: > I've created a schema for cddb data storage that I submit for > comments/feedback. It is below, followed by my comments. In case it > gets mangled, it's also available at > http://www.amerisuk.com/~rfox/cddb/schema.sql . If you like, you can > plop it into MySQL (version >= 3.23.23) using a command like 'mysql -u > root -p < ./schema.sql' after you save it to a file. Hmm, I'm not sure if I understand everything correctly (I'm pretty sure that I don't) so here are my questions: > create database cddb; > use cddb; > create table disc ( > id int unique auto_increment not null, > length int not null, > discid char(8) not null unique primary key, You cannot make the discid a unique key - currently the discid/genre category pair is unique, which is certainly not optimal. As there are lots of collisions we would loose too many entries by using that solution - this is simply not feasible. You would need a new discid for that! > revisionlead varchar(255), > revisionversion varchar(255) not null, > revisionrelease varchar(255), > revisionlevel varchar(255), What are these fields meant for? An entry has a revision, but why 4 different fields titles revision-something? btw: We also need to store the name of the submitting program and it's version. Perhaps we should also store when an entry was submitted (important for generating update-archives (we should still think about releasing the database archives). > artist varchar(255) not null, > title varchar(255) not null, > releaseyear decimal(4), > maingenre char(16) not null, > genre varchar(255) not null, > numtitles int not null, > extendeddata text, ok. > playorder varchar(255), We don't need to store anything here - the PLAYORDER= line is always empty. > fulltext key(artist), fulltext key(title), fulltext (artist,title), > key(maingenre), key(genre), key(releaseyear), key(discid) > ); Looks like you want to implement search via the database. As database searches can be pretty slow (especially for left hand wildcard searches) we should IMHO rather use Yuri's method for search. > create table discoffsets ( > disc_id int not null, > subid int not null, > offset int not null, > key(disc_id), key(subid) > ); > create table disctracks ( > disc_id int not null, > number int not null, > name varchar(255) not null, At least currently track names can have a zero length. This is useful e.g. for "fake tracks" like on CDs, which have a bonus track on track 99. We should also have a separate field for track artist name - e.g for samplers. > extendeddata text, > key(disc_id), key(name) > ); > There are 3 tables. Disc, discoffsets, and disctracks. Disc contains > info such as album title, genre, release year, etc., and there would be > 1 record per album. In the interest of data normalization, I think > artist name and genre should have separate tables. This would make it > easier to search for albums by the same artist or genre, as well as use > less disk space. I did not implement the disc table in this fashion > because of the difficulty in converting from the current format to that > normalized format. It would probably be easier if we moved from the > current format to this intermediate sql format, and then moved towards > data normalization in a later version. For the production system we shouldn't make changes too often - as all the mirrors have to do them and we need to verify that the changes were done properly etc. Question is, if we really need this separate tables, if we use the method proposed by Yuri for searching. > The discoffsets table references an id of an album, and a subid (more on > that in a minute), and an offset. There would be 1 record per album, > per unique variation, per offset. For example, if the cddb database > contained to different versions of album A, there would be 1 record in > the disc table (as all of that info is the same between the 2 albums), > and 1 set of records in the disc offset table (1 per offset) for each > version of the album. So far so good. But have you thought about how you want to find the matching entry easily? At least for exact matches the server software needs to find a certain discid - which may not be the discid of the master CD. searching in the discoffsets table for that might not be optimal, I think. > Using this method, this should take care of the current linked album > discussion, as well aas move us down a path towards better data storage. > Hopefully I'm not too far off track with this. Please respond with > thoughts/questions/flames/whatever. :) Well, this were my first thoughts. Now you can flame me or whatever ;) - Joerg
|
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | RDBMS schema (MySQL), Ryan Fox |
|---|---|
| Next by Date: | Re: Searching, Joerg Hevers |
| Previous by Thread: | RDBMS schema (MySQL), Ryan Fox |
| Next by Thread: | Re: RDBMS schema (MySQL), Ryan Fox |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
Free MagazinesCisco NewsReceive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business. subscribe Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field. subscribe The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business. subscribe Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company. subscribe Total Telecom Total Telecom is "The Economist of the communications industry". subscribe |