logo       
Google Custom Search
    AddThis Social Bookmark Button
-->

Re: My ridiculous "ID" fixation :-): msg#00231

Subject: Re: My ridiculous "ID" fixation :-)
On Saturday 29 January 2005 01:02, S Clark wrote:
> On Friday 28 January 2005 05:28 pm, Joerg Ostertag wrote:
> [...blah blah blah not needing to find out the secret mystery id blah
> blah...]
>
> > Wouldn't an index on the desired fields (MACADDR,WEP) do the same job?
>
> [...]
>
> > > ID#. This makes the one advantage I see to "mystery number" ID's
> > > (automatic assignment of the ID)
> >
> > If you make the id autoincrement it will be assigned by mysql.
>
> That's what I meant - that's the one useful "trick" that you can get with
> the "secret mystery integer" key that you just can't get with a more
> flexible unique ID - on the other hand, the INTENT for the meaningful
> unique ID would be that the ID is predictable (e.g. you always know it's
> the MAC address if the type it "WLAN" or "WLAN-WEP", which is a piece of
> information that you have before you even go to update the database.  Being
> predictable, there'd never be a NEED to "auto-assign" a random ID code,
> since you'd always know what the key should be before you even contact the
> database server.
>
> The issue of what to use for unique id's of other feature types is,
> definitely, a non-trivial one admittedly.  For some there are obvious ones
> ("macaddr" for WLAN*, "waypoint" for geocaches on Navicache.com) but for
> some of the more generic ones (e.g. GNIS and "geonames" data) it's a bit
> more problematic.  I'm hoping you're about to point out a much easier way
> to do what I'm trying to do...
>
> [...]
>
> > If I use an integer (which is automatically gernerated by the DB) i have
> > a short easy to handle and fast index for the relational database. This
> > way I can easily make joins between the tables. And I have no problems
> > findung a unique key, because the database itself handles generating a
> > unique PRIMARY key for me. This won't stop us from generating other index
> > on different fields for updates of specific data.
>
> [...]
>
> > As of my understanding, you want the 'meaningfull' index for updating
> > purpose. So you can replace existing entries by more acurate/corrected
> > ones. And you don't want to fiddle with getting the ID and then replacing
> > the values?
> >
> > I would think we can accomplish this by creating an index on the field
> > you want (MAC-Address for Example). This should enable you to do an
> > update of your keys the same way you plan it.
> >
> > What do you think about this.
>
> I think that perhaps you're about to point out some replace/update syntax
> that I have stupidly missed that does the "insert or update as appropriate"
> with a single query without needing to replace the "secret mystery number"
> ID...
>
> This issue for me isn't just updating, but importing an updated set of data
> that may contain some NEW data that needs to be added AND some updated
> versions of existing data.  I'm looking to be able to tell the database
> (for example) "here's the current location of GEOCACHE 'N012345'" and have
> the database able to either update the existing N012345 cache information
> if it's already there OR add a new row for N012345, without having to do
> multiple queries for each row.  (The current Navicache data includes around
> 5200 rows, which isn't all THAT big, but it's growing...)
>
> I had THOUGHT that "indexing" just created an ordered file for faster
> searching, rather than for finding unique rows (i.e. you can have duplicate
> values for multiple rows in an indexed field) - therefore you still need to
> do either two queries for every insert (select to see if the row already
> exists, then another query to either insert or update) or one massive
> select and store the results in memory to determine whether to insert or
> update.  In other words, you still have to do a "pre-insert check" on every
> row, the database just answers each initial "select" a bit more
> quickly...or so I thought.
>
> The short version of what I'm trying to get to is that I'd like to be able
> to tell the database "if this row's feature (i.e. WLAN, TOWER, GEOCACHE...)
> already exists, just update it with this information, otherwise add a new
> row" without having to do multiple queries for reach row (going back to the
> WLAN example, using the "macaddr" field as the unique id, I don't NEED to
> know in advance if the record is already in the table - I just issue a
> "REPLACE" and that's the end of it.  In it's current "secret mystery
> number" structure, GPSKismet does two queries - a select to see if the row
> exists, then a branch to either an insert or update depending on what the
> first query returned...)
>
> Incidentally, in case it's not obvious, at this stage I'm not really that
> worried about convincing people to implement the concept.  It's pretty
> clear that I'm the only one worried about it, which to me implies that I'm
> just missing something.  Besides, there ARE workarounds (it's not THAT big
> of a deal for me to just do a "DELETE" for all rows from a particular
> dataset and then INSERT all of the updated rows...).  Just trying to cure
> my own ignorance at this point, that's all...

I don't think you're missing anything.

I don't know of any proper solution to the problem.  For POIs, there is a 
unique identifier for widely-spaced items; the lat/lon position, with a 
little leeway for collision detection with other entries of the same POI with 
slightly different lat/lon (+10 feet N, for example).  But, that collision 
detection takes a lot of work.  I think it's been done already, but it will 
be CPU-intensive when performing large updates to the set.

Christopher Jastram
http://www.livingresource.net
cej102937
--
This message was part of the gpsdrive mailinglist
unsubscribing can be done by sending a mail containing a body of:
-quote--
unsubscribe gpsdrive
-unquote--
to majordomo-ahljRfkWE0EwtzFM8z8+AfP6llvjuJOh@xxxxxxxxxxxxxxxx



<Prev in Thread] Current Thread [Next in Thread>