logo       

Re: Update for tv_grab_se for additional description: msg#00101

tv.xmltv.devel

Subject: Re: Update for tv_grab_se for additional description

Hi!

Thanks for the patch. There is now a new grabber for Sweden in CVS. The
new grabber is called tv_grab_se_swedb and it grabs all data from the
site tv.swedb.se that I maintain together with Fredrik Högberg and Oscar
Carlsson. We fetch data directly from the press-services for each
TV-company, process it and upload data to tv.swedb.se in xmltv-format.
This means that the tv_grab_se_swedb grabber can be kept very simple and
doesn't need to be updated just because some TV-company changes the
format of their data.

The biggest drawback with tv_grab_se_swedb right now is that we don't
have data for SvT, but we hope to have that shortly. We only have to
clarify the rights-issues with SvT.

tv.swedb.se contains data for the following channels:

* C More Film
* Canal+
* Canal+ Film1
* Canal+ Film2
* Canal+ Sport
* Kanal 5
* Ticket 1
* TV 1000
* TV 1000 Action
* TV 1000 Classic
* TV 1000 Family
* TV 1000 Nordic
* Tv 3
* TV4
* TV4 Film
* TV4 Med i tv
* TV4 Plus
* TV 8
* Viasat Explorer
* Viasat History
* Viasat Nature/Action
* Viasat Sport 1
* Viasat Sport 2
* Viasat Sport 3
* ZTV

We have start, stop, title and description for all channels.

If you want to have SvT you can always do something like this for now:

tv_grab_se_swedb > file1.xml
tv_grab_se > file2.xml
tv_cat file1.xml file2.xml > output.xml

You can download tv_grab_se_swedb from

http://cvs.sourceforge.net/viewcvs.py/xmltv/xmltv/grab/se_swedb/tv_grab_se_swedb?rev=1.1&view=auto

Please give it a try and report success or failure. I hope to include it
in the next release of xmltv, but I need more testers to be able to do
that.

I will hold on to your patch and apply it if there is interest in
keeping tv_grab_se around. I hope to make tv_grab_se_swedb superior to
tv_grab_se in every way.

/Mattias

On Sun, 2004-11-14 at 01:04 +0100, Mårtensson Roger wrote:
> Hello XMLTV'ers.
>
> Being a very satisfied user I thought I should contribute with something. :-)
> This is my first patch(ever) so please be gentle.
>
> tv_grab_se does not currently support <desc> for swedish TV3(and other Viasat
> channels). The homepage has the information but the grabber does not do the
> extra pageloading required.
>
> This is a diff-file(diff -u) to tv_grab_se which adds description support for
> Swedish TV3(and maybe all other swedish viasat channels current supported.
> Not tested though).
>
> I'm not saying this is the absolute best way to add the support but you have
> to start somewhere. :)
>
> The state of the patch is: It works for me
>
> It has one drawback and that is it will increase the load on the webbserver.
> This is because we have to grab a new webpage for all programs inte the
> webbguide. If the webpage has 20 programs then tv_grab_se will fetch atleast
> 21 webpages.
>
> What I did:
> Added two new subroutines:
> * get_html_desc_viasat
> Does almost the same as get_html_viasat except it builds the URL
> differently(almost a raw cut and paste).
> * get_data_desc_viasat
> Lots of cut and paste here too from get_data_viasat but does searching
> differently.
> * changed get_data_viasat
> This to add the extra webpageloading required.
>
> For me it does what it supposed to do. Not long-time tested. Don't think it
> will break the other viasat channel(that is using the get_data_viasat
> subroutine) but not tested.
>
> There is some things I'm not so sure about and might need some more eyes on.
>
> One of the things I have "hardwritten" into the code is the way the
> description page is intepreted. I take for granted that the page only
> contains one description.
>
> The xml search filter I use to get the nodelist is:
> "//tr/td[div/\@class='head'"
> Most of the time this is not a problem and I assumed that this will only
> appear once.
> The problem is that sometimes there is no description and on this I return a
> warning. Should it return a warning? There is some programs that does not
> have a description so the warning is not valid.
>
> I have also noted that sometimes the search returns more than one hit. I
> haven't looked at it in depth so I'm not sure if it's two descriptions or
> only one. I return a warning for this too, but like above. It might not be a
> warning.
>
> The last thing I'm not so sure about because I haven't studied the XMLTV DTD.
> Can you have both <desc> and <url> at the same time?
>
> Anyway.. Here is my work and I hope some of it will be used. I want my
> descriptions! :)




-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise