logo       

Choosing A Webhost:
A web hosting service is a type of Internet hosting service that allows individuals and organizations to provide their own website accessible via the World Wide Web. Web hosts are companies that provide space on a server they own for use by their clients as well as providing Internet connectivity, typically in a data center. Web hosts can also provide data center space and connectivity to the Internet for servers they do not own to be located in their data center, called colocation. more...

RFC: CGI::Application::Plugin::Output::PDF: msg#00135

lang.perl.modules.cgi-appplication

Subject: RFC: CGI::Application::Plugin::Output::PDF

Hello folks,

I'm working on an output plugin that will convert html content to
pdf. It handles setting the content-type and content-disposition
headers before returning.

You can see preliminary work at:

http://zacks.org/cgiapp/pdf/

The pod is inline at the end of this message.

Right now the actual conversion is done in a helper module. For
the moment, only HTMLDoc (via HTML::HTMLDoc) is supported. I am
planning support for PDF::FromHTML and html2ps/ps2pdf.

Apologies for the long discussion below, but I'm looking for
advice on how to best proceed with development.

I am not sure of the best way to handle calling to the helper
module to handle the conversion. For now, I have this code in
place:

# $opts{converter} is HTMLDoc, for example

my $pack= "CGI::Application::Plugin::Output::PDF::$opts{converter}";
eval "require $pack";
croak "Can't load converter [$pack]" if $@;

return $pack->convert( ... );

Is this filthy? One of the things I don't like about it is calling
convert() as a class method -- it seems unnatural since (as of
now) none of the modules are object-oriented. On the other hand,
I believe 'no strict "refs"' would be necessary to call it as a
function (with the package name as a variable).

Another approach is similar but doesn't invoke the helper routine
via the converter class:

# set $pack as above
my $convert= $pack->can('convert')
or croak "converter [$pack] doesn't know how to 'convert'";

return $convert->( \$html, $args->{converter_args} );

Is one way better than the other? I feel like I'm missing
something obvious and both approaches are poor.


Another issue is how to handle configuration. Right now the user
has the option to import the pdf_output() method, which should be
called at the end of a runmode. This method will set the
content-type header, convert html content to pdf, and return the
pdf content.

This method takes some optional named parameters which can select
the converter to use and specify options specific to that
converter. So a user importing the method can pass parameters for
configuration purposes.

If the user has CGI::Application version 4 or newer, and he does
not request any symbols for import, pdf_output() is automatically
installed as a postrun callback. This allows for transparent
conversion from html to pdf:

use CGI::Application::Plugin::Output::PDF;

# ...

return $template->output; # sent to browser as pdf


This is convenient, but it limits the user's ability to configure
the behavior of the plugin. For example, the user doesn't have a
way to specify which converter to use.

One option would be to use arguments to import to configure the
plugin. For example:

use CGI::Application::Plugin::Output::PDF converter => 'HTMLDoc';

This can get a bit messy, however. It would also be nice for the
user to be able to specify the output filename, or specify some
parameters specific to the selected converter. I don't know how
many options are too many to handle in the import() method.

When calling pdf_output() directly, this is not a problem, as it
takes an optional hash reference of named parameters to handle
these configuration options, among others.

What is the best practice for those who are using the transparent
postrun callback?

Thanks for reading and for any advice you may have.

-E


NAME
CGI::Application::Plugin::Output::PDF - Generate PDF output from a
CGI::Application runmode

SYNOPSIS
For CGI::Application >= 4.0:

use CGI::Application::Plugin::Output::PDF;

# in some runmode...

# html content will be automatically converted to pdf
return $template->output;

For CGI::Application < 4.0:

use CGI::Application::Plugin::Output::PDF qw(pdf_output);

# in some runmode...

return $self->pdf_output( \$template->output );

DESCRIPTION
"CGI::Application::Plugin::Output::PDF" provides a method, "pdf_output",
and a function, "html_to_pdf", to convert html content to pdf.

The "pdf_output" method may be called directly, or, for
CGI::Application(3) version 4 and above, a postrun callback will be
added to automatically, unless the user requests any symbols for export.

XXX should this be the case? or always add the callback?

EXPORT
This module does not export any symbols by default. You may import the
"pdf_output" method and/or the "html_to_pdf" function on request:

use CGI::Application::Plugin::Output::PDF qw(pdf_output);

You may export both routines using the export tag ":all":

use CGI::Application::Plugin::Output::PDF qw(:all);

NOTE: For CGI::Application(3) version 4 and above, a postrun callback
will be added to automatically convert html content to pdf, unless the
user requests that any symbols be exported.

Subclasses of previous versions of CGI::Application(3) will need to
export the "pdf_output" method and call it directly:

return $self->pdf_output( \$template->output );

METHODS
pdf_output
# in a runmode

# $template is an HTML::Template object, for example
my $html_output= $template->output;

return $self->pdf_output( \$html_output,
{ filename => 'download.pdf',
converter => 'HTMLDoc', }
);

This method generates a pdf file from html content and sends it
directly to the user's browser. It sets the content-type header to
'application/pdf' and sets the content-disposition header to
'attachment'.

It should be invoked through a CGI::Application(3) subclass object.

It takes two parameters. The first, which is required, is a
reference to a scalar containing the html content for conversion.
The second is a reference to a hash of named parameters, all of
which are optional:

converter
The module to be used for converting html content to pdf.
The current options are "HTMLDoc" (default), "HTML2PS", and
"PDFFromHTML".

See CONVERTERS below for further discussion of the merits of
each.

filename
The name of the file which will be sent in the HTTP
content-disposition header. The default is "download.pdf".

FUNCTIONS
html_to_pdf
my $pdf= html_to_pdf( \$html_content,
{ filename => 'download.pdf',
converter => 'HTMLDoc', }
);

# do something with $pdf

This function converts html content to pdf content and returns it.
It takes the same parameters as "pdf_output" (above), except that it
is a function, so it should not be invoked through an object.

In addition, the named parameter "filename" is ignored, as it is not
applicable to this function.

CONVERTERS
NOTE: This section is incomplete.

In general, css is not well-supported.

In addition, It may be necessary to use full paths for images and links
in your html to get a close representation of your web page marked up as
pdf.

HTMLDoc
This converter uses the HTML::HTMLDoc(3) module.

From "http://www.htmldoc.org":

HTMLDOC supports most HTML 3.2 elements, some HTML 4.0 elements,
and can generate title and table of contents pages. The 1.8.x
releases do not support stylesheets.

css/stylesheets
Unsupported

paths Under a web environment, had success passing
"$ENV{DOCUMENT_ROOT" to HTML::HTMLDoc(3) object to fix
relative image paths.

PDFFromHTML
This converter uses the PDF::FromHTML(3) module.

css/stylesheets
PDF::FromHTML does not support css.

paths XXX Unknown.

HTML2PS
This converter passes the html content to html2ps(1) and then to
ps2pdf(1).

Be aware that large table cells may not render as expected. From
"http://user.it.uu.se/~jan/html2psug.html":

Rendering HTML tables well is a non-trivial task. For
"real" tables, that is representation of tabular data,
html2ps usually generates reasonably good output. When
tables are used for layout purposes, the result varies
from good to useless. This is because a table cell is
never broken across pages. So if a table contains a cell
with a lot of content, the entire table may have to be
scaled down in size in order to make this cell fit on a
single page. Sometimes this may even result in unreadable
output.

css/stylesheets
html2ps supports css to a limited extent, but the styles
must be specified on the command line or in a configuration
file.

paths html2ps allows the user to specify either a root file path
or a base URL to be used for relative paths in the html
content.

AUTHOR
Evan A. Zacks "<zackse@xxxxxxxx>"

SEE ALSO
PDF::FromHTML(3), HTML::HTMLDoc, html2ps(1), CGI::Application(3)

COPYRIGHT & LICENSE
Copyright 2005 Evan A. Zacks, All rights reserved.

This program is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.

REVISION
$Id: PDF.pm 2 2005-09-22 06:57:17Z zackse $


---------------------------------------------------------------------
Web Archive: http://www.mail-archive.com/cgiapp@xxxxxxxxxxxxxxxxx/
http://marc.theaimsgroup.com/?l=cgiapp&r=1&w=2
To unsubscribe, e-mail: cgiapp-unsubscribe@xxxxxxxxxxxxxxxxx
For additional commands, e-mail: cgiapp-help@xxxxxxxxxxxxxxxxx




<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

Recently Viewed:
krysalis.sandbo...    web.zope.zwiki/...    gnome.apps.gnum...    xfree86.newbie/...    editors.vim/200...    mozilla.enigmai...    boot-loaders.gr...    network.vnc.ult...    redhat.release....    java.geronimo.u...    os.netbsd.devel...    horde.wicked/20...    linux.lsb.discu...    ietf.ips/2005-0...    alsa.devel/2002...    user-groups.lin...    package-managem...    debian.devel.da...    security.cyrus....    video.gstreamer...   
Home | blog view | USPTO Patent Archive | advertise | OSDir is an inevitable website. super tiny logo

Free Magazines

Cisco News
Receive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business.
subscribe

Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field.
subscribe

The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business.
subscribe

Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company.
subscribe

Total Telecom Total Telecom is "The Economist of the communications industry".
subscribe

Navigation