|
osdir.com mailing list archive F.A.Q. -since 2001! |
|
|
|
Subject: Re: pysqlite design decisions - msg#00015List: python.db.pysqlite.user
by Date: Prev Next Date Index by Thread: Prev Next Thread Index
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1 glyph-TyWPi3/qHqPQT0dZR+AlfA@xxxxxxxxxxxxxxxx wrote: | As I said before, it's not the performance difference that concerns me | most. It's that I want to keep the object around (and yes, | re-concatenating all thoses strings together to re-create the statement | _is_ a significant cost), and I would rather it be the appropriate type | that actually supports the operation that I need than a string. I don't see the big difference between saving one string vs saving one statement that is the prepared result of the string. Since the concatenation is the work, you are only doing it once either way. | Axiom does this by just forcing every cursor to run to completion all | the time, but that's not a good solution. I'd prefer what you are | describing quite a bit. In that case I will provide the option for the Connection to close all the cursors :-) | Great! Personally I would prefer sharing everything, to facilitate | eventual sharing of code as well. Not my call though ;). Not mine either. I guess Gerhard needs to work with the initd people or lump in with his stuff after moving things around a little bit, adding fields to bug tracker etc. | The percentage of development effort given to maintenance as opposed to | initial development approaches 100% as T approaches infinity :). There are cases where that isn't true especially if you are sitting on top of other layers. drh does not expect there to ever be a SQLite 4. The vast majority of changes are expected to be in the SQL which is transparent to apsw since I don't care what it is. Since writing the initial version of apsw, I haven't changed the core as the core API of SQLite has not changed, with the exception of the Python 2.5 64 bit changes. All the maintenance has been adding new code to provide access to some new features of SQLite. | A gradual merging could still save lots and lots of time in | future releases. I think that having an initial test suite is probably the first thing that is needed. That way underlying code can be re-engineered and breakages detected. That is one reason drh like his test suite - he doesn't have to be too nervous about changing and refactoring code, or adding new features. Is there a DBAPI test suite available? Roger -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) iD8DBQFFc/ScmOOfHg372QQRAqvwAJ9ARov+c+SKu6dJKsuQo6Mr5MU81ACeJgPS y0BL8COsVkhEm0igSYYjSfE= =IyjC -----END PGP SIGNATURE-----
Thread at a glance:
Previous Message by Date:Re: pysqlite design decisions-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 glyph-TyWPi3/qHqPQT0dZR+AlfA@xxxxxxxxxxxxxxxx wrote: | The more sqlite-like semantics were handy, but the fact | that strings would randomly come back as 8-bit or unicode depending on | their contents was annoying :-\. They should only ever come back as 8 bit if they are pure ASCII (ie 7bit). How did this cause any problems? It was intentional this way so that simple apps that only ever used ASCII only ever had ASCII. | I hope that is no longer the case. Have there been any similar tests | recently? I haven't tested since then, but you will note that I said it was my workload with my data. I am sure other results will vary. I even found that you should be careful about compiling SQLite with -DNDEBUG. If not present in the compile flags then you get ~30% better performance, but not assertions. | >SQLite itself sets the bar really high. Its test suite has something | >like 98% coverage. | | As well it should! To achieve that drh has special filesystem code that fakes disk fulls, power being pulled etc. Sadly we can't get Python to randomly fail. I would love to have something like a valgrind extension that could examine the call stack, look for apsw being present and cause a failure once. (That way the top level test can be repeated and then fail at the next malloc point). | There's very little python code in pysqlite. (or do you mean there is | actually python code in sqlite itself somewhere? I don't see any.) I meant the former. I just saw that pysqlite has a bunch of .py files. ~ From further observation, it looks like dbapi2.py is the only real one (88 lines) with the rest being test code. | Keep it locked on the GIL? That would happen as a side effect anyway. I just got very nervous when considering doing the same thing for apsw. The combinations of when destructors run, invalidation etc can leave loopholes unless you are very diligent. | I wouldn't know though, from what I've | heard, sqlite itself is only middling good at dealing with multiple | threads. I use multiple processes. You heard wrong. If you only use the db objects in the same thread you obtained them then there is no difference between what you should get from multiple threads vs multiple processes. The Python layers above are serialized due to the GIL but SQLite itself is fine. | Hmm. I'm not so sure about that. I don't know if it belongs where it | is in particular in pysqlite, but the operations offered by the DB layer | should be "compile this statement" and "execute this statement". The SQLite API wouldn't change. The reason for it being part of SQLite core is that it keeps the cache behind the scenes and returns cached prepared statements for the same compile string. And since SQLite understands SQL, it has a far better idea of what constitutes the same string. It can also correctly handle invalidation and cleanup. This is something regular expression libraries have done for ages. | I have considered that a few times, but pysqlite and apsw both do some | useful stuff too - converting exceptions, mapping types, etc. Converting exceptions is easy. A simple table of integers to exception type objects. For the type mapping, the SQLite builtin types are trivial. Trying to pretend there are more types is something that should be in Python code anyway. | Maybe I could do it better tuned for Axiom if I spent a few months on my own | custom ctypes binding, but the point is I don't want to spend a lot of | time hacking at this level :). Attached is one that took just over an hour. It is also the first time I have used ctypes. In a 200 line file you get your prepared statements that you can fling around and reuse as you wish. All the types are supported except for buffer/blob and that is just a time issue. I don't know what you would spend the rest of the several months on :-) | Also, there is a fair amount of interest in eventually making Axiom work | with different databases. I don't want to diverge unnecessarily far | from DB-API bindings to make this more difficult than it needs to be | (and it is already very difficult). My personal opinion is that DBAPI is a crock. It should be a guideline, not a requirement. The reason is that developers will end up having to write database specific code no matter what. They all have their own quirks, performance issues, random compliance with SQL standards etc. Every database using project has made distinct back ends. If DBAPI was actually good then that mostly wouldn't be necessary. Where DBAPI is useful is in defining things that would otherwise be assigned arbitrarily. For example it says that the function that takes SQL and runs it is called 'execute'. Getting consistency that way is convenient for developers. (ie it should be a SHOULD standard not a MUST standard) | This is very cool. For those following along, the reason for having C functions as part of the traceback is to give you more context about what went wrong. Once entering SQLite code there are circumstances when calls come back out (eg user defined functions collations, virtual tables) and so the traceback lets you see what is inbetween. Having local variables gives you the potential for more information in the traceback since they can be printed out. See this recipe in the Python Cookbook: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52215 I use an enhanced version of that in BitPim. This is an example of the output: http://sourceforge.net/mailarchive/forum.php?thread_id=30834376&forum_id=31264 | I assume that it would have to go into a new major rev, though, because | I do think the main Python devs would object to a whole new module being | included. (Best to ask them though, and not speculate). My current APSW already requires SQLite 3.3.8 and due to bugs I found in that (memory leaks), I expect you'd be best off with the next release of SQLite. I assume Python is going to have to hold the SQLite version the same for the lifetime of 2.5 otherwise code will break between different versions. Roger -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) iD8DBQFFc/DlmOOfHg372QQRAs62AKCmFd1iZN2HZuENo9P5BZF7jhqsdgCfZF+k 2uso8EwFoWdrECjWGR99hvg= =RUwR -----END PGP SIGNATURE----- #!/usr/bin/env python # A ctypes binding to SQLite import ctypes def utf8me(s): return unicode(s).encode("utf_8") def fromutf8(s): return s.decode("utf_8") SQLITE_OK=0 SQLITE_ERROR=1 SQLITE_ROW=100 SQLITE_DONE=101 # todo: fill out rest and make distinct exceptions for each def SQLiteError(Exception): pass def getexception(code, connection): if code==SQLITE_OK: return None # todo: this would need to switch on the error code to get the right exception return SQLiteError(fromutf8(connection.library.dll.sqlite3_errmsg(connection.db))) def raiseexception(code, db): e=getexception(code, db) if e is not None: raise e class CSqlite: # SQlite constants SQLITE_STATIC=ctypes.c_void_p(0) SQLITE_TRANSIENT=ctypes.c_void_p(-1) class Statement: # states we can be in STATE_END=0 STATE_FIRST_ROW=1 STATE_ROW=2 STATE_BEGIN=3 def __init__(self, connection, stmt): self.connection=connection self.dll=self.connection.library.dll self.stmt=stmt self.db=connection.db self.state=self.STATE_BEGIN def __del__(self): if self.stmt: self.dll.sqlite3_finalize(self.stmt) self.stmt=None def __iter__(self): return self def reset(self): self.state=self.STATE_BEGIN res=self.dll.sqlite3_reset(self.stmt) raiseexception(res, self.connection) try: res=self.dll.sqlite3_clear_bindings(self.stmt) raiseexception(res, self.connection) except AttributeError: pass # needs more recent version of SQLite def _bind(self, n, value): # haven't done blob/buffer if isinstance(value, (unicode, str)): utfvalue=utf8me(value) res=self.dll.sqlite3_bind_text(self.stmt, n, utfvalue, len(utfvalue), CSqlite.SQLITE_TRANSIENT) elif isinstance(value, int): res=self.dll.sqlite3_bind_int(self.stmt, n, value) elif isinstance(value, long): res=self.dll.sqlite3_bind_int64(self.stmt, n, ctypes.c_longlong(value)) elif isinstance(value, float): res=self.dll.sqlite3_bind_double(self.stmt, n, ctypes.c_double(value)) elif value is None: res=self.dll.sqlite3_bind_null(self.stmt, n) else: raise Exception("unsupported data type "+`value`) raiseexception(res, self.connection) def execute(self, *bindings): self.reset() for i in range(self.dll.sqlite3_bind_parameter_count(self.stmt)): name=self.dll.sqlite3_bind_parameter_name(self.stmt, i+1) if name: name=fromutf8(name) self._bind(i+1, bindings[0]['name']) # must be dict like else: self._bind(i+1, bindings[i]) self.state=self.STATE_BEGIN try: self.next() # greedy execution since user won't call next() with statements that don't return data except StopIteration: pass return self def _getfield(self, field): t=self.dll.sqlite3_column_type(self.stmt, field) if t==1: # integer return self.dll.sqlite3_column_int64(self.stmt, field) elif t==2: # float return self.dll.sqlite3_column_double(self.stmt, field) elif t==3: # text # this should use sqlite3_column_bytes so that embedded nulls are retained return fromutf8(self.dll.sqlite3_column_text(self.stmt, field)) elif t==4: # blob pass else: # 5 = null return None def next(self): if self.state==self.STATE_END: raise StopIteration() if self.state!=self.STATE_FIRST_ROW: res=self.dll.sqlite3_step(self.stmt) if res==SQLITE_DONE: self.state=self.STATE_END raise StopIteration() if res==SQLITE_ERROR: # get actual error code from sqlite3_reset res=self.dll.sqlite3_reset(self.stmt) if res!=SQLITE_ROW: raiseexception(res, self.connection) if self.state==self.STATE_BEGIN: self.state=self.STATE_FIRST_ROW return self.state=self.STATE_ROW return tuple([self._getfield(field) for field in range(self.dll.sqlite3_column_count(self.stmt))]) class Connection: def __init__(self, library, filename): self.library=library self.db=ctypes.c_void_p() res=self.library.dll.sqlite3_open(utf8me(filename), ctypes.byref(self.db)) e=getexception(res, self.db) # db is always allocated even on failure if e: self.library.dll.sqlite3_close(self.db) self.db=None raise e def __del__(self): if self.db: self.library.dll.sqlite3_close(self.db) self.db=None self.library=None def prepare(self, sql): zsql=utf8me(sql) stmt=ctypes.c_void_p() tail=ctypes.c_char_p() res=self.library.dll.sqlite3_prepare(self.db, zsql, -1, ctypes.byref(stmt), ctypes.byref(tail)) if len(tail.value): if stmt: self.library.dll.sqlite3_finalize(stmt) raise Exception("Only one SQL statement can be provided: left "+`tail`) raiseexception(res, self) return CSqlite.Statement(self, stmt) def __init__(self, filename="libsqlite3.so"): dll=ctypes.CDLL(filename) # some annotations dll.sqlite3_errmsg.restype = ctypes.c_char_p dll.sqlite3_bind_parameter_name.restype = ctypes.c_char_p dll.sqlite3_column_double.restype = ctypes.c_double dll.sqlite3_column_int64.restype = ctypes.c_longlong dll.sqlite3_column_text.restype = ctypes.c_char_p self.dll=dll def connection(self, filename): return CSqlite.Connection(self, filename) if __name__=='__main__': import os try: os.remove("testdb") except: pass c=CSqlite() connection=c.connection("testdb") stmt=connection.prepare("create table foo(x,y)") stmt.execute() stmt=connection.prepare("insert into foo values(?,?)") stmt.execute("some text", u"\N{BLACK STAR} \N{WHITE STAR} \N{LIGHTNING} \N{COMET} ") stmt.execute(1, 2) stmt.execute(1.1, 2.1) stmt.execute(None, None) stmt=connection.prepare("select * from foo") for row in stmt.execute(): print row try: # check exceptions stmt=connection.prepare("blah blah blah") except SQLiteError: pass _______________________________________________ pysqlite mailing list pysqlite-IAPFreCvJWPBWskQ1e/+sw@xxxxxxxxxxxxxxxx http://lists.initd.org/mailman/listinfo/pysqlite Next Message by Date:Re: pysqlite design decisionsOn 09:56 am, rogerb-JFdGOZ7s+BwWQnjQ7V0W7w@xxxxxxxxxxxxxxxx wrote:>glyph@xxxxxxxxxx wrote:>| The more sqlite-like semantics were handy, but the fact>| that strings would randomly come back as 8-bit or unicode depending on>| their contents was annoying :-\.>>They should only ever come back as 8 bit if they are pure ASCII (ie>7bit). How did this cause any problems? It was intentional this way so>that simple apps that only ever used ASCII only ever had ASCII.Axiom explicitly forbids use of str objects to represent text anywhere. You get unicode back from text fields, you get str objects back from blob fields.If you let your application APIs mix str and unicode freely, then there's no way to debug transcoding bugs. You get some string like "foo-b2o" and you think, "oh how nice, ASCII, no need to encode that", but actually it's u'foo\N{ETHIOPIC SYLLABLE SEE}' encoded with punycode.The size optimization that you get from switching from unicode to str objects really ought to be an implementation detail of the unicode type implementation.>To achieve that drh has special filesystem code that fakes disk fulls,>power being pulled etc. Sadly we can't get Python to randomly fail.Getting Python's internal test coverage up to a decent level would be a time-consuming project, and a bit out of scope for this discussion. Still, I wish somebody would do it - I'm really happy to hear that drh has put that much thought into testing.>| Keep it locked on the GIL?>>That would happen as a side effect anyway. I just got very nervous when>considering doing the same thing for apsw. The combinations of when>destructors run, invalidation etc can leave loopholes unless you are>very diligent.Threads are always scary. Pretty much any threaded code that involves locked, shared data structures is going to be impossible to test thoroughly - you can call any function at any point from any other function... you have to reason really hard about where everything's going to happen.The statement cache doesn't strike me as especially scary for any reason though.>| I've heard, sqlite itself is only middling good at dealing with multiple>| threads.>You heard wrong.The impression may simply have been garnered from the proximity of threading bugs to database bugs in extremely poorly tested projects such as Trac.>My personal opinion is that DBAPI is a crock.This opinion seems really common. Is there a DB-API 3.0 in the works that fixes some of the more obvious issues?_______________________________________________ pysqlite mailing list pysqlite-IAPFreCvJWPBWskQ1e/+sw@xxxxxxxxxxxxxxxx http://lists.initd.org/mailman/listinfo/pysqlite Previous Message by Thread:Re: pysqlite design decisionsOn 03:19 am, rogerb-JFdGOZ7s+BwWQnjQ7V0W7w@xxxxxxxxxxxxxxxx wrote:>glyph@xxxxxxxxxx wrote:>| Just a thought: one way to do this would be to simply have a>| (cursor/connection).prepare() method, which returns an object that can>| be used interchangeably with a string passed to "execute".>>The model for this would be compile() and exec/eval in Python core.>>However I am somewhat skeptical of exposing these compiled methods. Do>they really make that much of a performance difference to you? My>expectation is that if they become necessary then they will be added to>the SQLite core itself transparently, just like the various regular>_expression_ libraries do.As I said before, it's not the performance difference that concerns me most. It's that I want to keep the object around (and yes, re-concatenating all thoses strings together to re-create the statement _is_ a significant cost), and I would rather it be the appropriate type that actually supports the operation that I need than a string.>You also can't close the containing database object at the SQLite level>either. I am about to add close() methods to apsw because of Python 2.5>lazy finalization means any amount of time can lapse between the final>reference being dropped and the GC running. Something I am considering>is having the ability for the Connection close method to go off and>close all stray cursors.Axiom does this by just forcing every cursor to run to completion all the time, but that's not a good solution. I'd prefer what you are describing quite a bit.>| Huh, that looks a lot more complicated and specific than what I want. I>| would just like to be able to specify a python callable that gets called>| with a 2-tuple of (sql, args) each time a statement was executed. (or>| really, (statement, args), if there were statement objects).>>http://www.rogerbinns.com/apsw.html#tracers :-)Exactly :)>I am totally happy for APSW to be hosted with pysqlite. Someone else>will need to work out things like should there be separate trac instance>or the same, shared or the same Subversion etc.Great! Personally I would prefer sharing everything, to facilitate eventual sharing of code as well. Not my call though ;).>As for the duplication of effort, that has already been done (sunk>cost). I can certainly see a layer of Python code on top to do silly>transaction semantics required by DBAPI as well as pretending that>SQLite has more than 5 types.The percentage of development effort given to maintenance as opposed to initial development approaches 100% as T approaches infinity :). Both apsw and pysqlite have *years* of maintenance ahead of them, if not decades. A gradual merging could still save lots and lots of time in future releases.>My future plans are to track updates in SQLite in a timely manner,>remain as entirely C code, and to keep improving the testing. I also>make no attempt to support older versions of SQLite. ie there is no>SQLite version dependent code.This all sounds fine to me... version control can probably do an OK job of maintaining branches for earlier releases for use with earlier versions of sqlite._______________________________________________ pysqlite mailing list pysqlite-IAPFreCvJWPBWskQ1e/+sw@xxxxxxxxxxxxxxxx http://lists.initd.org/mailman/listinfo/pysqlite Next Message by Thread:Re: pysqlite design decisionsOn 09:56 am, rogerb-JFdGOZ7s+BwWQnjQ7V0W7w@xxxxxxxxxxxxxxxx wrote:>glyph@xxxxxxxxxx wrote:>| The more sqlite-like semantics were handy, but the fact>| that strings would randomly come back as 8-bit or unicode depending on>| their contents was annoying :-\.>>They should only ever come back as 8 bit if they are pure ASCII (ie>7bit). How did this cause any problems? It was intentional this way so>that simple apps that only ever used ASCII only ever had ASCII.Axiom explicitly forbids use of str objects to represent text anywhere. You get unicode back from text fields, you get str objects back from blob fields.If you let your application APIs mix str and unicode freely, then there's no way to debug transcoding bugs. You get some string like "foo-b2o" and you think, "oh how nice, ASCII, no need to encode that", but actually it's u'foo\N{ETHIOPIC SYLLABLE SEE}' encoded with punycode.The size optimization that you get from switching from unicode to str objects really ought to be an implementation detail of the unicode type implementation.>To achieve that drh has special filesystem code that fakes disk fulls,>power being pulled etc. Sadly we can't get Python to randomly fail.Getting Python's internal test coverage up to a decent level would be a time-consuming project, and a bit out of scope for this discussion. Still, I wish somebody would do it - I'm really happy to hear that drh has put that much thought into testing.>| Keep it locked on the GIL?>>That would happen as a side effect anyway. I just got very nervous when>considering doing the same thing for apsw. The combinations of when>destructors run, invalidation etc can leave loopholes unless you are>very diligent.Threads are always scary. Pretty much any threaded code that involves locked, shared data structures is going to be impossible to test thoroughly - you can call any function at any point from any other function... you have to reason really hard about where everything's going to happen.The statement cache doesn't strike me as especially scary for any reason though.>| I've heard, sqlite itself is only middling good at dealing with multiple>| threads.>You heard wrong.The impression may simply have been garnered from the proximity of threading bugs to database bugs in extremely poorly tested projects such as Trac.>My personal opinion is that DBAPI is a crock.This opinion seems really common. Is there a DB-API 3.0 in the works that fixes some of the more obvious issues?_______________________________________________ pysqlite mailing list pysqlite-IAPFreCvJWPBWskQ1e/+sw@xxxxxxxxxxxxxxxx http://lists.initd.org/mailman/listinfo/pysqlite
blog comments powered by Disqus
|
|