|
|
Subject: the heart and soul of project gutenberg - msg#00332
the heart and soul of project gutenberg is the plain-text file.
over the years, it has been scorned and even attacked outright.
some people say it's ugly. and it's far too low-tech for others.
but somehow, it has survived and even thrived in a way that
no other e-book technology ever has. in the process, i have
grown to appreciate its tenacity, and grasp its inner beauty.
this thread is for those having a love-affair with plain-text.
thank you, michael hart, for having the smarts and tenacity
to stick to your guns on plain-text. you were right all along...
-bowerbird
---
For subscription help visit http://listserv.unc.edu
Thread at a glance:
Previous Message by Date:
Re: A Gutenberg Todo list
> A UTF-8 file should *definitely* have a byte order mark.
This is an old argument. On Unix, UTF-8 files do not have a byte order
mark. Given that that is where they are most popular (Windows mostly
uses UTF-16), there's certainly an argument for UTF-8 files not having
a byte-order mark.
> If the file needs characters outside of ASCII, then you have already broken
> ASCII compatibility.
You don't understand what ASCII compatability is. ASCII compatability means
two things; that an ASCII file appears as ASCII, and that any ASCII bytes
mean what they mean in ASCII. Given that, most programs can process UTF-8
or ISO-8859-1 or EUC-JP files without worrying about the encoding.
> The existence of three non-ASCII bytes at the beginning
> of the file (specifically ) does not seem a major problem to me even if
> you did include the BOM in a file that only needed ASCII.
grep Christmas *.txt > holiday.txt
grep Holiday *.txt >> holiday.txt
If *.txt was all the PG books, and your files had BOMs, you've just
spread BOMs throughout holiday.txt.
> Yes, most definitely. Making the file 50% bigger vs. handling UTF-16, I'd
> say it is definitely worth it.
If you were worried about size, why not use a compressed format?
> Handling UTF-16 and UTF-32 should be trivial for your software. It certainly
> is in Java.
Java isn't Perl or C. Supporting multiple encodings certainly adds complexity.
--
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm
---
For subscription help visit http://listserv.unc.edu
Next Message by Date:
Re: the heart and soul of project gutenberg
On Fri, 30 Jan 2004 Bowerbird-YDxpq3io04c@xxxxxxxxxxxxxxxx wrote:
> the heart and soul of project gutenberg is the plain-text file.
>
> over the years, it has been scorned and even attacked outright.
> some people say it's ugly. and it's far too low-tech for others.
>
> but somehow, it has survived and even thrived in a way that
> no other e-book technology ever has. in the process, i have
> grown to appreciate its tenacity, and grasp its inner beauty.
>
> this thread is for those having a love-affair with plain-text.
>
> thank you, michael hart, for having the smarts and tenacity
> to stick to your guns on plain-text. you were right all along...
>
> -bowerbird
>
While I still support plain text as much as ever, if only because
it can be so easily used in virtually every search engine, mailer,
word processor, text editor, read aloud program, file reader, etc.
. . .I have been oft misquoted as being against [all] other formats.
Not true.
I am willing to try any and all formats, much to the chagrin of
some of our people who are even more anti-proprietary formats
than I am. Some of these have VERY good reasons, such as the
designed impossibility of making corrections in these formats.
However, I figure that putting up samples of these formats,
and including a note that indicates that these files will not
be updated as often, if at all, would be worthwhile.
Whatever formats our readers want, I think we should try,
at least for experimental examples. If we don't get a lot
of feedback on these experiments, we just let them go until
someone wants to experiment with them again.
The major concern here is the cost/benefit ratio. . . .
We can get a LOT more done in the easier formats. . . .
Someday we might be able to automate moving the corrections
from the easier formats to the more difficult on the fly,
and then this become pretty much a moot point.
Thanks!!!
So Nice To Hear From You!
Michael
Give eBooks!!!
As of January 30, 2004
~11,150 FreeBooks at:
http://gutenberg.net
~8,850 to go to 20,000
We are ~1/9 of the way
from 10,000 to 20,000.
Michael S. Hart
<hart-e+AXbWqSrlAAvxtiuMwx3w@xxxxxxxxxxxxxxxx>
Project Gutenberg
Executive Coordinator
"*Internet User ~#100*"
---
For subscription help visit http://listserv.unc.edu
Previous Message by Thread:
Mail to DP
I tried to send a message to Charles at DP and got the following message.
This is the address on the DP page. Is there a new one?
----- The following addresses had permanent fatal errors -----
<charlz-J/4ZKqteuv4kHnnMZ6rJPA@xxxxxxxxxxxxxxxx>
no such address.
nwolcott2-cIpcPs7DjqbWs/AcZQh2Cw@xxxxxxxxxxxxxxxx Friar Wolcott, Gutenberg
Abbey, Sherwood Forrest
---
For subscription help visit http://listserv.unc.edu
Next Message by Thread:
Re: the heart and soul of project gutenberg
On Fri, 30 Jan 2004 Bowerbird-YDxpq3io04c@xxxxxxxxxxxxxxxx wrote:
> the heart and soul of project gutenberg is the plain-text file.
>
> over the years, it has been scorned and even attacked outright.
> some people say it's ugly. and it's far too low-tech for others.
>
> but somehow, it has survived and even thrived in a way that
> no other e-book technology ever has. in the process, i have
> grown to appreciate its tenacity, and grasp its inner beauty.
>
> this thread is for those having a love-affair with plain-text.
>
> thank you, michael hart, for having the smarts and tenacity
> to stick to your guns on plain-text. you were right all along...
>
> -bowerbird
>
While I still support plain text as much as ever, if only because
it can be so easily used in virtually every search engine, mailer,
word processor, text editor, read aloud program, file reader, etc.
. . .I have been oft misquoted as being against [all] other formats.
Not true.
I am willing to try any and all formats, much to the chagrin of
some of our people who are even more anti-proprietary formats
than I am. Some of these have VERY good reasons, such as the
designed impossibility of making corrections in these formats.
However, I figure that putting up samples of these formats,
and including a note that indicates that these files will not
be updated as often, if at all, would be worthwhile.
Whatever formats our readers want, I think we should try,
at least for experimental examples. If we don't get a lot
of feedback on these experiments, we just let them go until
someone wants to experiment with them again.
The major concern here is the cost/benefit ratio. . . .
We can get a LOT more done in the easier formats. . . .
Someday we might be able to automate moving the corrections
from the easier formats to the more difficult on the fly,
and then this become pretty much a moot point.
Thanks!!!
So Nice To Hear From You!
Michael
Give eBooks!!!
As of January 30, 2004
~11,150 FreeBooks at:
http://gutenberg.net
~8,850 to go to 20,000
We are ~1/9 of the way
from 10,000 to 20,000.
Michael S. Hart
<hart-e+AXbWqSrlAAvxtiuMwx3w@xxxxxxxxxxxxxxxx>
Project Gutenberg
Executive Coordinator
"*Internet User ~#100*"
---
For subscription help visit http://listserv.unc.edu
|
|