|
Re: Update for tv_grab_se for additional description: msg#00101tv.xmltv.devel
Hi! Thanks for the patch. There is now a new grabber for Sweden in CVS. The new grabber is called tv_grab_se_swedb and it grabs all data from the site tv.swedb.se that I maintain together with Fredrik Högberg and Oscar Carlsson. We fetch data directly from the press-services for each TV-company, process it and upload data to tv.swedb.se in xmltv-format. This means that the tv_grab_se_swedb grabber can be kept very simple and doesn't need to be updated just because some TV-company changes the format of their data. The biggest drawback with tv_grab_se_swedb right now is that we don't have data for SvT, but we hope to have that shortly. We only have to clarify the rights-issues with SvT. tv.swedb.se contains data for the following channels: * C More Film * Canal+ * Canal+ Film1 * Canal+ Film2 * Canal+ Sport * Kanal 5 * Ticket 1 * TV 1000 * TV 1000 Action * TV 1000 Classic * TV 1000 Family * TV 1000 Nordic * Tv 3 * TV4 * TV4 Film * TV4 Med i tv * TV4 Plus * TV 8 * Viasat Explorer * Viasat History * Viasat Nature/Action * Viasat Sport 1 * Viasat Sport 2 * Viasat Sport 3 * ZTV We have start, stop, title and description for all channels. If you want to have SvT you can always do something like this for now: tv_grab_se_swedb > file1.xml tv_grab_se > file2.xml tv_cat file1.xml file2.xml > output.xml You can download tv_grab_se_swedb from http://cvs.sourceforge.net/viewcvs.py/xmltv/xmltv/grab/se_swedb/tv_grab_se_swedb?rev=1.1&view=auto Please give it a try and report success or failure. I hope to include it in the next release of xmltv, but I need more testers to be able to do that. I will hold on to your patch and apply it if there is interest in keeping tv_grab_se around. I hope to make tv_grab_se_swedb superior to tv_grab_se in every way. /Mattias On Sun, 2004-11-14 at 01:04 +0100, Mårtensson Roger wrote: > Hello XMLTV'ers. > > Being a very satisfied user I thought I should contribute with something. :-) > This is my first patch(ever) so please be gentle. > > tv_grab_se does not currently support <desc> for swedish TV3(and other Viasat > channels). The homepage has the information but the grabber does not do the > extra pageloading required. > > This is a diff-file(diff -u) to tv_grab_se which adds description support for > Swedish TV3(and maybe all other swedish viasat channels current supported. > Not tested though). > > I'm not saying this is the absolute best way to add the support but you have > to start somewhere. :) > > The state of the patch is: It works for me > > It has one drawback and that is it will increase the load on the webbserver. > This is because we have to grab a new webpage for all programs inte the > webbguide. If the webpage has 20 programs then tv_grab_se will fetch atleast > 21 webpages. > > What I did: > Added two new subroutines: > * get_html_desc_viasat > Does almost the same as get_html_viasat except it builds the URL > differently(almost a raw cut and paste). > * get_data_desc_viasat > Lots of cut and paste here too from get_data_viasat but does searching > differently. > * changed get_data_viasat > This to add the extra webpageloading required. > > For me it does what it supposed to do. Not long-time tested. Don't think it > will break the other viasat channel(that is using the get_data_viasat > subroutine) but not tested. > > There is some things I'm not so sure about and might need some more eyes on. > > One of the things I have "hardwritten" into the code is the way the > description page is intepreted. I take for granted that the page only > contains one description. > > The xml search filter I use to get the nodelist is: > "//tr/td[div/\@class='head'" > Most of the time this is not a problem and I assumed that this will only > appear once. > The problem is that sometimes there is no description and on this I return a > warning. Should it return a warning? There is some programs that does not > have a description so the warning is not valid. > > I have also noted that sometimes the search returns more than one hit. I > haven't looked at it in depth so I'm not sure if it's two descriptions or > only one. I return a warning for this too, but like above. It might not be a > warning. > > The last thing I'm not so sure about because I haven't studied the XMLTV DTD. > Can you have both <desc> and <url> at the same time? > > Anyway.. Here is my work and I hope some of it will be used. I want my > descriptions! :) ------------------------------------------------------- This SF.Net email is sponsored by: InterSystems CACHE FREE OODBMS DOWNLOAD - A multidimensional database that combines robust object and relational technologies, making it a perfect match for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8 |
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | [ xmltv-Bugs-1063705 ] tv_grab_fr doesn't work: 00101, SourceForge.net |
|---|---|
| Next by Date: | [ xmltv-Bugs-1063705 ] tv_grab_fr doesn't work: 00101, SourceForge.net |
| Previous by Thread: | Update for tv_grab_se for additional descriptioni: 00101, Mårtensson Roger |
| Next by Thread: | Re: Update for tv_grab_se for additional description: 00101, Christian Vandendorpe |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
| News | FAQ | advertise |