[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ByteOrdered partitioner when using sha-1 as partition key

On Sat, Feb 11, 2017 at 1:47 PM, Micha <micha-1@xxxxxxxxxxxxxx> wrote:
I think I was not clear enough...

I have *one* table for which the row data contains (among other values)
a sha-1 sum. There are no collisions.  I thought computing a murmur hash
for a sha-1 sum is just wasted time, as the murmur hash doesn't make the
data more random than it already is.   So it's just one table where this


Am 11.02.2017 um 16:54 schrieb Jonathan Haddad:
> The odds of only using a sha1 as your partition key for every table you
> ever create is low. You will regret BOP until the end of time.
> On Sat, Feb 11, 2017 at 5:53 AM Edward Capriolo <edlinuxguru@xxxxxxxxx
> <mailto:edlinuxguru@xxxxxxxxx>> wrote:
>     Probably best to avoid bop even if you are aflready hashing keys
>     yourself. What do you do when checksuma collide? It is possible right?
>     On Saturday, February 11, 2017, Micha <micha-1@xxxxxxxxxxxxxx
>     <mailto:micha-1@xxxxxxxxxxxxxx>> wrote:
>         Hi,
>         my table has a sha-1 sum as partition key. Would in this case the
>         ByteOrdered partitioner be a better choice than the
>         Murmur3partitioner,
>         since the keys are quite random?
>         cheers,
>          Michael
>     --
>     Sorry this was sent from mobile. Will do less grammar and spell
>     check than usual.

The problem of using BOP is the partitioner is not set on the table/keyspace level but it is set cluster wide. So if you have two tables with different key distribution there is no way to balanced them out.

BOP I would almost consider it quasi supported at this point:


"no seriously your doing it wrong"

I have thought about this often, if you really need BOP, for example your generating a web index and you want to co-locate data for the same domain so you can scan it, Cassandra is a bad fit. I'm not convinced that a secondary index/mv fills the need. Hbase seems a more logical choice (to me). Where the data is logically ordered by key and the protocol splits regions as they grow.