osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Built in trigger: double-write for app migration


It reminds me of “shadow writes” described in [1].
During data migration the coordinator forwards  a copy of any write request regarding tokens that are being transferred to the new node.

[1] Incremental Elasticity for NoSQL Data Stores, SRDS’17,  https://ieeexplore.ieee.org/document/8069080


> On 18 Oct 2018, at 18:53, Carl Mueller <carl.mueller@xxxxxxxxxxxxxxx.INVALID> wrote:
> 
> tl;dr: a generic trigger on TABLES that will mirror all writes to
> facilitate data migrations between clusters or systems. What is necessary
> to ensure full write mirroring/coherency?
> 
> When cassandra clusters have several "apps" aka keyspaces serving
> applications colocated on them, but the app/keyspace bandwidth and size
> demands begin impacting other keyspaces/apps, then one strategy is to
> migrate the keyspace to its own dedicated cluster.
> 
> With backups/sstableloading, this will entail a delay and therefore a
> "coherency" shortfall between the clusters. So typically one would employ a
> "double write, read once":
> 
> - all updates are mirrored to both clusters
> - writes come from the current most coherent.
> 
> Often two sstable loads are done:
> 
> 1) first load
> 2) turn on double writes/write mirroring
> 3) a second load is done to finalize coherency
> 4) switch the app to point to the new cluster now that it is coherent
> 
> The double writes and read is the sticking point. We could do it at the app
> layer, but if the app wasn't written with that, it is a lot of testing and
> customization specific to the framework.
> 
> We could theoretically do some sort of proxying of the java-driver somehow,
> but all the async structures and complex interfaces/apis would be difficult
> to proxy. Maybe there is a lower level in the java-driver that is possible.
> This also would only apply to the java-driver, and not
> python/go/javascript/other drivers.
> 
> Finally, I suppose we could do a trigger on the tables. It would be really
> nice if we could add to the cassandra toolbox the basics of a write
> mirroring trigger that could be activated "fairly easily"... now I know
> there are the complexities of inter-cluster access, and if we are even
> using cassandra as the target mirror system (for example there is an
> article on triggers write-mirroring to kafka:
> https://dzone.com/articles/cassandra-to-kafka-data-pipeline-part-1).
> 
> And this starts to get into the complexities of hinted handoff as well. But
> fundamentally this seems something that would be a very nice feature
> (especially when you NEED it) to have in the core of cassandra.
> 
> Finally, is the mutation hook in triggers sufficient to track all incoming
> mutations (outside of "shudder" other triggers generating data)