logo       
Google Custom Search
    AddThis Social Bookmark Button
-->

Re: A trivial patch, and a question about large objects.: msg#00036

Subject: Re: A trivial patch, and a question about large objects.

On 11/09/2004, at 4:00 AM, Federico Di Gregorio wrote:

thank you very much for the patch. second, are you sure you really need
large objects? they will reside in the database anyway, so why don't
just use BYTEA fields?

I can't answer for Marco, but (before I managed to get our spec
changed so all the large binaries are stored on the filesystem
and the problem went away) the major problems with using BYTEA
fields were:
        - Memory overhead. I think my benchmarks showed memory required
          for the psycopg.BINARY() object was about 3 or 4 times the
          size of the original (I suspect every character was being quoted,
          rather than just the required ones, or memory was being preallocated
          for the worst case). This would make it impossible to store
          binaries over 500MB size on some architectures due to 2GB
          memory address space limitations.
        - CPU overhead. Doing all that quoting takes a looong time.
          Inserting BYTEA with psycopg took over twice as long as
          inserting using OID using pyPgSQL (20 seconds vs 9 seconds
          for a 18MB file on a 1GHz G4 PowerBook running OSX 10.3).

I think for serious work, psycopg.BINARY objects need to work as
streams, accepting either a string os a file-like object as their
constructor and quoting the data as it is actually sent to postgres
(so an exception would be raised if you tried to use the same
psycopg.BINARY object in two different queries and you constructed
it from a file-like object that doesn't support seek).

conn = psycopg.connect('user=postgres host=localhost dbname=test')

# create a new large object and write data to it
lobj = conn.openlo(None)
lobj.write("datadatadatadatadatadata...")
print "My oid is", lobj.oid
lobj.close()

# open an existing large object and read some data:
lobj = conn.openlo(oid)
all_the_content = lopbj.read()
lobj.seek(0)
all_the_content_again = lopbj.read()
lobj.close()

they are files, after all: give them a file-like interface!

Much nicer - no need to duplicate the C API, since this isn't C :-)
For most uses, the following could even be the standard syntax:

>>> cur = con.cursor()
>>> blob = psycopg.OID(open('foo.dump','r'))
>>> cur.execute('INSERT INTO FOO (x) VALUES (%(blob)s)', vars())
>>> cur.execute('SELECT x FROM FOO')
>>> r = cur.fetchone()
>>> blob = r[0].read()

psycopg would do the open and write for you if the psycopg.OID
object does not already have a valid OID number (avoiding duplicates
in the database - is this wanted? Might get tricky using the same
psycopg.OID object with multiple database connections).

-- Stuart Bishop <stuart-4lSSOvFEPszcGevYNGnvVw@xxxxxxxxxxxxxxxx>
http://www.stuartbishop.net/

Attachment: PGP.sig
Description: This is a digitally signed message part

_______________________________________________
Psycopg mailing list
Psycopg-IAPFreCvJWPBWskQ1e/+sw@xxxxxxxxxxxxxxxx
http://lists.initd.org/mailman/listinfo/psycopg
<Prev in Thread] Current Thread [Next in Thread>