logo       

Re: Re: rfe: tb_grab_hu/tv_sort (fwd): msg#00181

tv.xmltv.devel

Subject: Re: Re: rfe: tb_grab_hu/tv_sort (fwd)

Stefan Siegl írta:

see attached patch below. In case you're in need of a complete file,

it's available from [1].

I got it.

It'd be nice if you could give it a try and test whether it's output
is okay.
Now testing, will take some time port.hu listings are slow.
Meanwhile, I have subscribed to xml-devel. Seems a nice community, responses are fast, so you really deserve nice users w/o extra troubles (i.e., asking for CC-ed, etc.)

Comparing the website to the output is a little bit painful

I use to do this with tvime and a web browser :-))

(I've compared only a couple of channels per source so far).

Especially, are there some guys, using the Romanian grabber, here?

Not me, unfortunately.

Your port.ro page is even more distorted. There are images after
some shows, however the admins prefer to begin a new <table> after
each image instead of surrounding each with a <td> ... strange.
Would be nice if you could compare some pages as well - even so it
should be okay ..


Since the grabber's sources provide three days of data per page I

furthermore changed it to increase by three days in the bumping
function nextday().
Seems suspicious to me.
The URL-s used are the follwoings. e.g. for station "mtv" on today ("mtv" stands for "Magyar Televízió", not for "Music Televison"):
http://www.port.hu/pls/tv/tv.prog?i_days=1&i_ch_nr=1&i_ch=1

i_days is the day offset from current date, current day has an offset 1 (one), beware, the page always shows ONE DAY

i_ch is the channel identifier

i_ch_nr is number of channels per page. It is used for customizable programme pages. Usign a web browser and the usual combobox on the web page, i_ch_nr is equal or bigger than 3, that is, displaying channals (i_ch)th, (i_ch+1)th and (i_ch+2)th, to speed up things. Three channel programmes fit nicely on 1024x768 screens.

Gabor

Seems to me that it works, and speeds up things
a lot ...


@{Ed,Robert}: As I'm not the maintainer of tv_grab_huro, is it okay
for you that I commit the patch to CVS?


best regards,
Stef@n



[1] http://home.vr-web.de/stefan-siegl/xmltv/tv_grab_huro/

------------------------------------------------------------------------

? new
? new2
? new3
? new3s
? new4
? new4s
? new5
? new5s
? new6
? new6s
? new7
? out
? patch
? test
? tv_grab_huro.cache
Index: tv_grab_huro
===================================================================
RCS file: /cvsroot/xmltv/xmltv/grab/huro/tv_grab_huro,v
retrieving revision 1.6
diff -u -5 -r1.6 tv_grab_huro
--- tv_grab_huro 19 Sep 2004 14:16:01 -0000 1.6
+++ tv_grab_huro 29 Nov 2004 19:37:21 -0000
@@ -322,11 +322,11 @@
}

# Make list of pages to fetch for each day.
my @days;
my $day=UnixDate($now,'%Q');
-for (my $i=1+$opt_offset;$i<$opt_days+$opt_offset+1;$i++) {
+for (my $i=1+$opt_offset;$i<$opt_days+$opt_offset+1;$i+=3) {
push @days, [ $day, $i ];
$day=nextday($day); die if not defined $day;
}

# This progress bar is for both downloading and parsing. Maybe
@@ -393,11 +393,89 @@
};

# parse the page to a document object
my $tree = HTML::TreeBuilder->new();
$tree->parse($data);
- my @program_data = get_program_data($tree);
+
+ my @datatables; + # page consists of two main tables, split by an advertisement
+ # we need to reorder those tables, to grab continued column by column
+ # + # actually we assign to @datatables like this:
+
+ # UPPER MAJOR TABLE:
+ # 0 10 20
+ # 1 11 21
+ # 12
+ #
+ # <<< the ad >>>
+ # LOWER MAJOR TABLE:
+ # 5 15 25
+ # 16
+ #
+ + my $i = 0;
+ my $lasttime = 0;
+ # assign to @datatables in order: 0, 2, 4, 1, 3, 5, etc.
+ foreach my $tab($tree->look_down
+ # "width"=>215 unfortunately isn't specified all the time
+ ("_tag"=>"table", "cellspacing"=>2)) {
+ my $width = $tab->attr(qw(width));
+ next unless($width eq "100%" || $width == 215);
+
+ # time is printed in <strong /> tags, require those to skip the
+ # headings - which we don't really care for ...
+ next unless($tab->look_down("_tag"=>"strong"));
+
+
+ # especially on port.ro there aren't only two tables per day column,
+ # there are even more, split by images, etc.
+ #
+ # why the hell don't they continue the table and put the image
+ # right into a <tr><td> <img> </td></tr> thingy??
+ #
+ # tsts...
+ #
+
+
+ # extract the first time specified in this table-piece ...
+ $tab->as_text() =~ m/([012][0-9]):([0-5][0-9])/
+ or die "unable to parse returned html page";
+ my $time = $1 * 60 + $2;
+ $time += 24 * 60 if($time < 6 * 60 && ($i % 10 > 4));
+
+ #print "this: $time, last: $lasttime ...\n";
+
+ if($time < $lasttime) {
+ # this table is in the same major table, but in the next column
+ # since it's first time is before the last time of the prev. tab.
+ $i = $i - ($i % 5) + 10;
+ }
+
+ if($time > 19 * 60 && ($i % 10 < 5)) {
+ # first time time's after 19 o'clock => lower table
+ $i = 5;
+ }
+
+
+ # lookup last time in this minor table ... (as base for comparing)
+ $tab->as_text() =~ m/.*([012][0-9]):([0-5][0-9])/
+ or die "unable to parse returned html page";
+ $lasttime = $1 * 60 + $2;
+ $lasttime += 24 * 60 if($lasttime < 6 * 60 && ($i % 10 > 4));
+
+
+ #print "assigning datatables entry ", $i, ".\n"; + #$tab->dump();
+ $datatables[$i++] = $tab;
+ }
+
+ my @program_data;
+ foreach(@datatables) {
+ push @program_data, get_program_data($_)
+ if defined;
+ }

if (not @program_data) {
warn "no programs found, skipping\n";
return ();
}
@@ -586,9 +664,9 @@

# Bump a YYYYMMDD date by one.
sub nextday( $ ) {
my $d = shift;
my $p = parse_date($d);
- my $n = DateCalc($p, '+ 1 day');
+ my $n = DateCalc($p, '+ 3 day');
return UnixDate($n, '%Q');
}



Attachment: ziegler.vcf
Description: Vcard

<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise