osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ByteOrdered partitioner when using sha-1 as partition key



On Sat, Feb 11, 2017 at 1:47 PM, Micha <micha-1@xxxxxxxxxxxxxx> wrote:
I think I was not clear enough...

I have *one* table for which the row data contains (among other values)
a sha-1 sum. There are no collisions.  I thought computing a murmur hash
for a sha-1 sum is just wasted time, as the murmur hash doesn't make the
data more random than it already is.   So it's just one table where this
matters.


 Michael


Am 11.02.2017 um 16:54 schrieb Jonathan Haddad:
> The odds of only using a sha1 as your partition key for every table you
> ever create is low. You will regret BOP until the end of time.
> On Sat, Feb 11, 2017 at 5:53 AM Edward Capriolo <edlinuxguru@xxxxxxxxx
> <mailto:edlinuxguru@xxxxxxxxx>> wrote:
>
>     Probably best to avoid bop even if you are aflready hashing keys
>     yourself. What do you do when checksuma collide? It is possible right?
>
>     On Saturday, February 11, 2017, Micha <micha-1@xxxxxxxxxxxxxx
>     <mailto:micha-1@xxxxxxxxxxxxxx>> wrote:
>
>         Hi,
>
>         my table has a sha-1 sum as partition key. Would in this case the
>         ByteOrdered partitioner be a better choice than the
>         Murmur3partitioner,
>         since the keys are quite random?
>
>
>         cheers,
>          Michael
>
>
>
>     --
>     Sorry this was sent from mobile. Will do less grammar and spell
>     check than usual.
>

The problem of using BOP is the partitioner is not set on the table/keyspace level but it is set cluster wide. So if you have two tables with different key distribution there is no way to balanced them out.

BOP I would almost consider it quasi supported at this point:

http://stackoverflow.com/questions/27939234/cassandra-byteorderedpartitioner

"no seriously your doing it wrong"

I have thought about this often, if you really need BOP, for example your generating a web index and you want to co-locate data for the same domain so you can scan it, Cassandra is a bad fit. I'm not convinced that a secondary index/mv fills the need. Hbase seems a more logical choice (to me). Where the data is logically ordered by key and the protocol splits regions as they grow.