OSDir


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Built in trigger: double-write for app migration


I might be missing something but we’ve done this operation on a few
occasions by:
1) Commission the new cluster and join it to the existing cluster as a 2nd
DC
2) Replicate just the keyspace that you want to move to the 2nd DC
3) Make app changes to read moved tables from 2nd DC
4) Change keyspace definition to remove moved keyspace from first DC
5) Split the 2DCs into separate clusters (sever network connections, change
seeds)

If it’s just a table you moving and not a whole keyspace then you can skip
step 4 and drop the unneeded tables from either side after splitting. This
might mean the new cluster needs to be temporarily bigger than the
end-state during the migration process.

Cheers
Ben

On Fri, 19 Oct 2018 at 07:04 Jeff Jirsa <jjirsa@xxxxxxxxx> wrote:

> Could be done with CDC
> Could be done with triggers
> (Could be done with vtables — double writes or double reads — if they were
> extended to be user facing)
>
> Would be very hard to generalize properly, especially handling failure
> cases (write succeeds in one cluster/table but not the other) which are
> often app specific
>
>
> --
> Jeff Jirsa
>
>
> > On Oct 18, 2018, at 6:47 PM, Jonathan Ellis <jbellis@xxxxxxxxx> wrote:
> >
> > Isn't this what CDC was designed for?
> >
> > https://issues.apache.org/jira/browse/CASSANDRA-8844
> >
> > On Thu, Oct 18, 2018 at 10:54 AM Carl Mueller
> > <carl.mueller@xxxxxxxxxxxxxxx.invalid> wrote:
> >
> >> tl;dr: a generic trigger on TABLES that will mirror all writes to
> >> facilitate data migrations between clusters or systems. What is
> necessary
> >> to ensure full write mirroring/coherency?
> >>
> >> When cassandra clusters have several "apps" aka keyspaces serving
> >> applications colocated on them, but the app/keyspace bandwidth and size
> >> demands begin impacting other keyspaces/apps, then one strategy is to
> >> migrate the keyspace to its own dedicated cluster.
> >>
> >> With backups/sstableloading, this will entail a delay and therefore a
> >> "coherency" shortfall between the clusters. So typically one would
> employ a
> >> "double write, read once":
> >>
> >> - all updates are mirrored to both clusters
> >> - writes come from the current most coherent.
> >>
> >> Often two sstable loads are done:
> >>
> >> 1) first load
> >> 2) turn on double writes/write mirroring
> >> 3) a second load is done to finalize coherency
> >> 4) switch the app to point to the new cluster now that it is coherent
> >>
> >> The double writes and read is the sticking point. We could do it at the
> app
> >> layer, but if the app wasn't written with that, it is a lot of testing
> and
> >> customization specific to the framework.
> >>
> >> We could theoretically do some sort of proxying of the java-driver
> somehow,
> >> but all the async structures and complex interfaces/apis would be
> difficult
> >> to proxy. Maybe there is a lower level in the java-driver that is
> possible.
> >> This also would only apply to the java-driver, and not
> >> python/go/javascript/other drivers.
> >>
> >> Finally, I suppose we could do a trigger on the tables. It would be
> really
> >> nice if we could add to the cassandra toolbox the basics of a write
> >> mirroring trigger that could be activated "fairly easily"... now I know
> >> there are the complexities of inter-cluster access, and if we are even
> >> using cassandra as the target mirror system (for example there is an
> >> article on triggers write-mirroring to kafka:
> >> https://dzone.com/articles/cassandra-to-kafka-data-pipeline-part-1).
> >>
> >> And this starts to get into the complexities of hinted handoff as well.
> But
> >> fundamentally this seems something that would be a very nice feature
> >> (especially when you NEED it) to have in the core of cassandra.
> >>
> >> Finally, is the mutation hook in triggers sufficient to track all
> incoming
> >> mutations (outside of "shudder" other triggers generating data)
> >>
> >
> >
> > --
> > Jonathan Ellis
> > co-founder, http://www.datastax.com
> > @spyced
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
>
> --


*Ben Slater*

*Chief Product Officer <https://www.instaclustr.com/>*

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.