osdir.com
mailing list archive F.A.Q. -since 2001!



Subject: the heart and soul of project gutenberg -
msg#00332

List: culture.literature.e-books.gutenberg.volunteers

Mail Archive Navigation:
by Date: Prev Next Date Index by Thread: Prev Next Thread Index

the heart and soul of project gutenberg is the plain-text file.

over the years, it has been scorned and even attacked outright.
some people say it's ugly. and it's far too low-tech for others.

but somehow, it has survived and even thrived in a way that
no other e-book technology ever has. in the process, i have
grown to appreciate its tenacity, and grasp its inner beauty.

this thread is for those having a love-affair with plain-text.

thank you, michael hart, for having the smarts and tenacity
to stick to your guns on plain-text. you were right all along...

-bowerbird

---
For subscription help visit http://listserv.unc.edu



Thread at a glance:

Previous Message by Date:

Re: A Gutenberg Todo list

> A UTF-8 file should *definitely* have a byte order mark. This is an old argument. On Unix, UTF-8 files do not have a byte order mark. Given that that is where they are most popular (Windows mostly uses UTF-16), there's certainly an argument for UTF-8 files not having a byte-order mark. > If the file needs characters outside of ASCII, then you have already broken > ASCII compatibility. You don't understand what ASCII compatability is. ASCII compatability means two things; that an ASCII file appears as ASCII, and that any ASCII bytes mean what they mean in ASCII. Given that, most programs can process UTF-8 or ISO-8859-1 or EUC-JP files without worrying about the encoding. > The existence of three non-ASCII bytes at the beginning > of the file (specifically ) does not seem a major problem to me even if > you did include the BOM in a file that only needed ASCII. grep Christmas *.txt > holiday.txt grep Holiday *.txt >> holiday.txt If *.txt was all the PG books, and your files had BOMs, you've just spread BOMs throughout holiday.txt. > Yes, most definitely. Making the file 50% bigger vs. handling UTF-16, I'd > say it is definitely worth it. If you were worried about size, why not use a compressed format? > Handling UTF-16 and UTF-32 should be trivial for your software. It certainly > is in Java. Java isn't Perl or C. Supporting multiple encodings certainly adds complexity. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm --- For subscription help visit http://listserv.unc.edu

Next Message by Date:

Re: the heart and soul of project gutenberg

On Fri, 30 Jan 2004 Bowerbird-YDxpq3io04c@xxxxxxxxxxxxxxxx wrote: > the heart and soul of project gutenberg is the plain-text file. > > over the years, it has been scorned and even attacked outright. > some people say it's ugly. and it's far too low-tech for others. > > but somehow, it has survived and even thrived in a way that > no other e-book technology ever has. in the process, i have > grown to appreciate its tenacity, and grasp its inner beauty. > > this thread is for those having a love-affair with plain-text. > > thank you, michael hart, for having the smarts and tenacity > to stick to your guns on plain-text. you were right all along... > > -bowerbird > While I still support plain text as much as ever, if only because it can be so easily used in virtually every search engine, mailer, word processor, text editor, read aloud program, file reader, etc. . . .I have been oft misquoted as being against [all] other formats. Not true. I am willing to try any and all formats, much to the chagrin of some of our people who are even more anti-proprietary formats than I am. Some of these have VERY good reasons, such as the designed impossibility of making corrections in these formats. However, I figure that putting up samples of these formats, and including a note that indicates that these files will not be updated as often, if at all, would be worthwhile. Whatever formats our readers want, I think we should try, at least for experimental examples. If we don't get a lot of feedback on these experiments, we just let them go until someone wants to experiment with them again. The major concern here is the cost/benefit ratio. . . . We can get a LOT more done in the easier formats. . . . Someday we might be able to automate moving the corrections from the easier formats to the more difficult on the fly, and then this become pretty much a moot point. Thanks!!! So Nice To Hear From You! Michael Give eBooks!!! As of January 30, 2004 ~11,150 FreeBooks at: http://gutenberg.net ~8,850 to go to 20,000 We are ~1/9 of the way from 10,000 to 20,000. Michael S. Hart <hart-e+AXbWqSrlAAvxtiuMwx3w@xxxxxxxxxxxxxxxx> Project Gutenberg Executive Coordinator "*Internet User ~#100*" --- For subscription help visit http://listserv.unc.edu

Previous Message by Thread:

Mail to DP

I tried to send a message to Charles at DP and got the following message. This is the address on the DP page. Is there a new one? ----- The following addresses had permanent fatal errors ----- <charlz-J/4ZKqteuv4kHnnMZ6rJPA@xxxxxxxxxxxxxxxx> no such address. nwolcott2-cIpcPs7DjqbWs/AcZQh2Cw@xxxxxxxxxxxxxxxx Friar Wolcott, Gutenberg Abbey, Sherwood Forrest --- For subscription help visit http://listserv.unc.edu

Next Message by Thread:

Re: the heart and soul of project gutenberg

On Fri, 30 Jan 2004 Bowerbird-YDxpq3io04c@xxxxxxxxxxxxxxxx wrote: > the heart and soul of project gutenberg is the plain-text file. > > over the years, it has been scorned and even attacked outright. > some people say it's ugly. and it's far too low-tech for others. > > but somehow, it has survived and even thrived in a way that > no other e-book technology ever has. in the process, i have > grown to appreciate its tenacity, and grasp its inner beauty. > > this thread is for those having a love-affair with plain-text. > > thank you, michael hart, for having the smarts and tenacity > to stick to your guns on plain-text. you were right all along... > > -bowerbird > While I still support plain text as much as ever, if only because it can be so easily used in virtually every search engine, mailer, word processor, text editor, read aloud program, file reader, etc. . . .I have been oft misquoted as being against [all] other formats. Not true. I am willing to try any and all formats, much to the chagrin of some of our people who are even more anti-proprietary formats than I am. Some of these have VERY good reasons, such as the designed impossibility of making corrections in these formats. However, I figure that putting up samples of these formats, and including a note that indicates that these files will not be updated as often, if at all, would be worthwhile. Whatever formats our readers want, I think we should try, at least for experimental examples. If we don't get a lot of feedback on these experiments, we just let them go until someone wants to experiment with them again. The major concern here is the cost/benefit ratio. . . . We can get a LOT more done in the easier formats. . . . Someday we might be able to automate moving the corrections from the easier formats to the more difficult on the fly, and then this become pretty much a moot point. Thanks!!! So Nice To Hear From You! Michael Give eBooks!!! As of January 30, 2004 ~11,150 FreeBooks at: http://gutenberg.net ~8,850 to go to 20,000 We are ~1/9 of the way from 10,000 to 20,000. Michael S. Hart <hart-e+AXbWqSrlAAvxtiuMwx3w@xxxxxxxxxxxxxxxx> Project Gutenberg Executive Coordinator "*Internet User ~#100*" --- For subscription help visit http://listserv.unc.edu
blog comments powered by Disqus

Home | News | Sitemap | FAQ | advertise | OSDir is an Inevitable website. GBiz is too!