logo       

Re: rfe: tb_grab_hu/tv_sort (fwd): msg#00167

tv.xmltv.devel

Subject: Re: rfe: tb_grab_hu/tv_sort (fwd)



Ed Avis írta:

---------- Forwarded message ----------
Date: Sun, 28 Nov 2004 14:47:25 +0100
From: "[ISO-8859-2] Ziegler Gábor" <ziegler@xxxxxxxxxxxxxxxxx>
To: Ed Avis <ed@xxxxxxxxxxx>
Subject: rfe: tb_grab_hu/tv_sort
Dear Ed,

It seems, that Port.hu, the data-source for tv_grab_hu has chnaged(?) its HTML page layout that tv_grab_hu parses. It has introduced(?) a new ad-banner into its page before the main eveneing film of each channel. This has caused the splitting of the HTML-table that holds the programme-data into two separate HTL table. In-between there is the advertisement..

This seems to result in an XML output file that seems to confuse tv_sort. It mis-detects each programme that happens to be listed *after* that table-split as "overlapping" programme and just removes it.

Some additional details,

I have read further a little bit, it is not tv_sort's fault, the missing programme data do not even reach the output of tv_grab_huro. It finishes parsing just before the secodn table that cotnains the advertisement and fails to read the last of the programme data (i.e., the third table).

It seems to me that tv_grab_huro uses the perl HTML::Element module's look_down() method.
The result of the table split by port.hu is the following oversimplified page-layout:
-----------------------------------------------------------------------------------------------------------------
<html><body> ....

<table><!-- first table-half of programme data -->
....<td align="right" valign="top" ... >...programme-data row....</td> </table>

<table>.....ad banner placement....</table>

<table><!-- second table-half of programme data, tv_gra_huro fails to parse it -->
....<td align="right" valign="top"...>...programme-data row....</td> </table>

</body></html>
---------------------------------------------------------------------------------------------------
IIRC the tv_grab_huro "looks-down" as-follows:

my @txt_cont = $tree->look_down("_tag"=>"td", "align"=>"right",
"valign"=>"top");

This should work for both tables (i.e., 1st and 3rd) if the $tree points to a
common parent of both tables, if I read correctly the docs:
The perldoc page sais that look_down() gives only the first result in certain
circusmtances, but I am not a perl programmer.
However, it apparently does not.

Any idea?

Thanks, Gabor
ps.: pls, keep me in the CC, I am already receive 40+ mails daily (beyond 70+ spams), I would like not to subscribe yet another mailing list, especially where the technology (perl) beyond my knowledge.



The final symptom is that tvtime always displays the programme just before the table split during the whole evening and whole night. That unfortunate programme seems to start usually arounde 19:00...19:30 and seems to last until dawn.

It is clear for me, that the table-parsing code has to be upgraded to be able to handle tables split into two instead of single, nonolithic tables.

I am familiar with XML but totally unfamiliar with perl, so I cannot solve this problem easily, however, I guess that is has to be an easy-hack.

Who should I report to this problem who is willing and able to fix this?

Thanks, in advance:
Gabor






Attachment: ziegler.vcf
Description: Vcard

<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise