logo       

Re: Controlling access to pdf/doc files (db "better" than filesystem?): msg#00120

security.web-applications

Subject: Re: Controlling access to pdf/doc files (db "better" than filesystem?)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sat, 28 Feb 2004 11:13:21 -0800
"David Wall @ Yozons, Inc." <dwall@xxxxxxxxxx> wrote:

> > that in SQL Server is that all data in SQL Server is split over ~8k
> > pages. When you add a BLOB it needs to be split into 8k chunks. When you
>
> But filesystems also store data into pages, often much smaller than 8k
> chunk.

I agree that storing files with their metadata for such a solution in a
database is a better solution than storing files. It's also probably more
secure, since the web developer is less likely to botch some permissions,
security, or sanity checks and since most database systems already have some
sanity checks built in. Your reasoning in that last sentence is a bit off,
though: Database systems (such as MySQL, PgSQL, ThinkSQL, and MSSQL) all must
use the filesystem, so their 8k chunks may not match, and the storage may be
out of phase. This is just a result of overlaying one file storage paradigm
over another, and shouldn't cause too much trouble speed-wise. By adding a
layer on top of the filesystem, you do increase the likelihood of inefficiency.

That said, there's a counterargument: Databases, or at least smart ones, are
built to cache data efficiently into memory. If your database server has
enough memory, it may even become faster than serving the file off of the
filesystem directly. The reasoning for this is that the filesystem cache (if
there is any at all) also includes shared libraries and other files which are
currently executing, given priority over any sort of data caching. This cache
is also limited in space, in most implementations, so as not to take too much
precious RAM. Databases, however, are generally built with the assumption that
if you are using a database server for anything that could use significant
caching, or for major resource-intensive tasks (like serving hundreds of
thousands of users), then the database server will be the prime service of the
machine, and therefore may take up significant amounts
of resources (specifically, cache more stuff into memory). So, in some
situations I'd ima
gine database file storage would in fact be _faster_ for retrieval than
filesystem storage. This is based on too many assumptions regarding the
database server's design and the operating system underlying the database
server, and the server machine being used, and so I don't give it much credit.

Then again, I may be wrong...

>
> Our Signed & Secured application stores all files as BLOBs in a database for
> all of transactional and backup capabilities, but we've never run tests of
> 100+ concurrent web users downloading files to see if the database or the
> filesystem would be faster. In general, faster was less important to us
> being able to support lots of concurrent requests because the speed of
> retrieval from the db was always assumed to be faster than it could be
> streamed back across typically slower Internet links. After all, the data
> has to be sent back to a user's web browser, so the speed of the transfer is
> limited by the slowest link between the browser and the web server.

This is the right attitude. Speed where it is useful, administrative
efficiency whenever possible.

Ido

>
> David
>
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFAQQAhmhQsAkXAJP0RAtsIAJ0YEU2nqXhbrrEEbjuJ6ENNPnBuGwCgo1gS
z2SccYIaCJwsvmk2bnpgZmw=
=0tLv
-----END PGP SIGNATURE-----



<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise