OSDir


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Evolving the client protocol


Dor,

Setting the Thread Per Core code aside, will your developers commit to
contribute back both https://issues.apache.org/jira/browse/CASSANDRA-2848
and https://issues.apache.org/jira/browse/CASSANDRA-14311?

Looks like CASSANDRA-2848 has stalled even though some respectable work was
done, and CASSANDRA-14311 hasn't been started yet. Some material
contributions from your team on these two areas will be appreciated.

On Mon, Apr 23, 2018 at 6:17 PM, Dor Laor <dor@xxxxxxxxxxxx> wrote:

> On Mon, Apr 23, 2018 at 5:03 PM, Sankalp Kohli <kohlisankalp@xxxxxxxxx>
> wrote:
>
> > Is one of the “abuse” of Apache license is ScyllaDB which is using
> > Cassandra but not contributing back?
> >
>
> It's not that we have a private version of Cassandra and we don't release
> all of it or some of it back..
>
> We didn't contribute because we have a different server base. We always
> contribute where it makes sense.
> I'll be happy to have several beers or emails about the cons and pros of
> open source licensing but I don't think
> this is the case. The discussion is about whether the community wish to
> accept our contributions, we initiated it,
> didn't we?
>
> Let's be practical, I think it's not reasonable to commit C* protocol
> changes that the community doesn't intend
> to implement in C* in the short term (thread-per-core like), it's not
> reasonable to expect Scylla to contribute
> such a huge effort to the C* server. It is reasonable to collaborate around
> protocol enhancements that are acceptable,
> even without coding and make sure the protocol is enhanceable in a way that
> forward compatible.
>
>
> Happy to be proved wrong as I am not a lawyer and don’t understand various
> > licenses ..
> >
> > > On Apr 23, 2018, at 16:55, Dor Laor <dor@xxxxxxxxxxxx> wrote:
> > >
> > >> On Mon, Apr 23, 2018 at 4:13 PM, Jonathan Haddad <jon@xxxxxxxxxxxxx>
> > wrote:
> > >>
> > >> From where I stand it looks like you've got only two options for any
> > >> feature that involves updating the protocol:
> > >>
> > >> 1. Don't built the feature
> > >> 2. Built it in Cassanda & scylladb, update the drivers accordingly
> > >>
> > >> I don't think you have a third option, which is built it only in
> > ScyllaDB,
> > >> because that means you have to fork *all* the drivers and make it
> work,
> > >> then maintain them.  Your business model appears to be built on not
> > doing
> > >> any of the driver work yourself, and you certainly aren't giving back
> to
> > >> the open source community via a permissive license on ScyllaDB itself,
> > so
> > >> I'm a bit lost here.
> > >>
> > >
> > > It's totally not about business model.
> > > Scylla itself is 99% open source with AGPL license that prevents abuse
> > and
> > > forces to be committed back to the project. We also have our core
> engine
> > > (seastar) licensed
> > > as Apache since it needs to be integrated with  the core application.
> > > Recently one of our community members even created a new Seastar based,
> > C++
> > > driver.
> > >
> > > Scylla chose to be compatible with the drivers in order to leverage the
> > > existing infrastructure
> > > and (let's be frank) in order to allow smooth migration.
> > > We would have loved to contribute more to the drivers but up to
> recently
> > we:
> > > 1. Were busy on top of our heads with the server
> > > 2. Happy w/ the existing drivers
> > > 3. Developed extensions - GoCQLX - our own contribution
> > >
> > > Finally we can contribute back to the same driver project, we want to
> do
> > it
> > > the right way,
> > > without forking and without duplicated efforts.
> > >
> > > Many times, having a private fork is way easier than proper open source
> > > work so from
> > > a pure business perspective, we don't select the shortest path.
> > >
> > >
> > >>
> > >> To me it looks like you're asking a bunch of volunteers that work on
> > >> Cassandra to accommodate you.  What exactly do we get out of this
> > >> relationship?  What incentive do I or anyone else have to spend time
> > >> helping you instead of working on something that interests me?
> > >>
> > >
> > > Jon, this is certainty not the case.
> > > We genuinely wish to make true *open source* work on:
> > > a. Cassandra drivers
> > > b. Client protocol
> > > c. Scylla server side.
> > > d. Cassandra community related work: mailing list, Jira, design
> > >
> > > But not
> > > e. Cassandra server side
> > >
> > > While I wouldn't mind doing the Cassandra server work, we don't have
> the
> > > resources or
> > > the expertise. The Cassandra _developer_ community is welcome to decide
> > > whether
> > > we get to contribute a/b/c/d. Avi has enumerated the options of
> > > cooperation, passive cooperation
> > > and zero cooperation (below).
> > >
> > > 1. The protocol change is developed using the Cassandra process in a
> JIRA
> > > ticket, culminating in a patch to doc/native_protocol*.spec when
> > consensus
> > > is achieved.
> > > 2. The protocol change is developed outside the Cassandra process.
> > > 3. No cooperation.
> > >
> > > Look, I can understand the hostility and suspicious, however, from the
> C*
> > > project POV, it makes no
> > > sense to ignore, otherwise we'll fork the drivers and you won't get
> > > anything back. There is another
> > > at least one vendor today with their server fork and driver fork and it
> > > makes sense to keep the protocol
> > > unified in an extensible way and to discuss new features _together_.
> > >
> > >
> > >
> > >>
> > >> Jon
> > >>
> > >>
> > >> On Mon, Apr 23, 2018 at 7:59 AM Ben Bromhead <ben@xxxxxxxxxxxxxxx>
> > wrote:
> > >>
> > >>>>>> This doesn't work without additional changes, for RF>1. The token
> > >> ring
> > >>>> could place two replicas of the same token range on the same
> physical
> > >>>> server, even though those are two separate cores of the same server.
> > >> You
> > >>>> could add another element to the hierarchy (cluster -> datacenter ->
> > >> rack
> > >>>> -> node -> core/shard), but that generates unneeded range movements
> > >> when
> > >>> a
> > >>>> node is added.
> > >>>>> I have seen rack awareness used/abused to solve this.
> > >>>>>
> > >>>>
> > >>>> But then you lose real rack awareness. It's fine for a quick hack,
> but
> > >>>> not a long-term solution.
> > >>>>
> > >>>> (it also creates a lot more tokens, something nobody needs)
> > >>>>
> > >>>
> > >>> I'm having trouble understanding how you loose "real" rack awareness,
> > as
> > >>> these shards are in the same rack anyway, because the address and
> port
> > >> are
> > >>> on the same server in the same rack. So it behaves as expected. Could
> > you
> > >>> explain a situation where the shards on a single server would be in
> > >>> different racks (or fault domains)?
> > >>>
> > >>> If you wanted to support a situation where you have a single rack per
> > DC
> > >>> for simple deployments, extending NetworkTopologyStrategy to behave
> the
> > >> way
> > >>> it did before https://issues.apache.org/jira/browse/CASSANDRA-7544
> > with
> > >>> respect to treating InetAddresses as servers rather than the address
> > and
> > >>> port would be simple. Both this implementation in Apache Cassandra
> and
> > >> the
> > >>> respective load balancing classes in the drivers are explicitly
> > designed
> > >> to
> > >>> be pluggable so that would be an easier integration point for you.
> > >>>
> > >>> I'm not sure how it creates more tokens? If a server normally owns
> 256
> > >>> tokens, each shard on a different port would just advertise ownership
> > of
> > >>> 256/# of cores (e.g. 4 tokens if you had 64 cores).
> > >>>
> > >>>
> > >>>>
> > >>>>> Regards,
> > >>>>> Ariel
> > >>>>>
> > >>>>>> On Apr 22, 2018, at 8:26 AM, Avi Kivity <avi@xxxxxxxxxxxx> wrote:
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>> On 2018-04-19 21:15, Ben Bromhead wrote:
> > >>>>>>> Re #3:
> > >>>>>>>
> > >>>>>>> Yup I was thinking each shard/port would appear as a discrete
> > >> server
> > >>>> to the
> > >>>>>>> client.
> > >>>>>> This doesn't work without additional changes, for RF>1. The token
> > >> ring
> > >>>> could place two replicas of the same token range on the same
> physical
> > >>>> server, even though those are two separate cores of the same server.
> > >> You
> > >>>> could add another element to the hierarchy (cluster -> datacenter ->
> > >> rack
> > >>>> -> node -> core/shard), but that generates unneeded range movements
> > >> when
> > >>> a
> > >>>> node is added.
> > >>>>>>
> > >>>>>>> If the per port suggestion is unacceptable due to hardware
> > >>>> requirements,
> > >>>>>>> remembering that Cassandra is built with the concept scaling
> > >>>> *commodity*
> > >>>>>>> hardware horizontally, you'll have to spend your time and energy
> > >>>> convincing
> > >>>>>>> the community to support a protocol feature it has no (current)
> use
> > >>>> for or
> > >>>>>>> find another interim solution.
> > >>>>>> Those servers are commodity servers (not x86, but still
> commodity).
> > >> In
> > >>>> any case 60+ logical cores are common now (hello AWS i3.16xlarge or
> > >> even
> > >>>> i3.metal), and we can only expect logical core count to continue to
> > >>>> increase (there are 48-core ARM processors now).
> > >>>>>>
> > >>>>>>> Another way, would be to build support and consensus around a
> clear
> > >>>>>>> technical need in the Apache Cassandra project as it stands
> today.
> > >>>>>>>
> > >>>>>>> One way to build community support might be to contribute an
> Apache
> > >>>>>>> licensed thread per core implementation in Java that matches the
> > >>>> protocol
> > >>>>>>> change and shard concept you are looking for ;P
> > >>>>>> I doubt I'll survive the egregious top-posting that is going on in
> > >>> this
> > >>>> list.
> > >>>>>>
> > >>>>>>>
> > >>>>>>>> On Thu, Apr 19, 2018 at 1:43 PM Ariel Weisberg <
> ariel@xxxxxxxxxxx
> > >>>
> > >>>> wrote:
> > >>>>>>>>
> > >>>>>>>> Hi,
> > >>>>>>>>
> > >>>>>>>> So at technical level I don't understand this yet.
> > >>>>>>>>
> > >>>>>>>> So you have a database consisting of single threaded shards and
> a
> > >>>> socket
> > >>>>>>>> for accept that is generating TCP connections and in advance you
> > >>>> don't know
> > >>>>>>>> which connection is going to send messages to which shard.
> > >>>>>>>>
> > >>>>>>>> What is the mechanism by which you get the packets for a given
> TCP
> > >>>>>>>> connection delivered to a specific core? I know that a given TCP
> > >>>> connection
> > >>>>>>>> will normally have all of its packets delivered to the same
> queue
> > >>>> from the
> > >>>>>>>> NIC because the tuple of source address + port and destination
> > >>>> address +
> > >>>>>>>> port is typically hashed to pick one of the queues the NIC
> > >>> presents. I
> > >>>>>>>> might have the contents of the tuple slightly wrong, but it
> always
> > >>>> includes
> > >>>>>>>> a component you don't get to control.
> > >>>>>>>>
> > >>>>>>>> Since it's hashing how do you manipulate which queue packets
> for a
> > >>> TCP
> > >>>>>>>> connection go to and how is it made worse by having an accept
> > >> socket
> > >>>> per
> > >>>>>>>> shard?
> > >>>>>>>>
> > >>>>>>>> You also mention 160 ports as bad, but it doesn't sound like a
> big
> > >>>> number
> > >>>>>>>> resource wise. Is it an operational headache?
> > >>>>>>>>
> > >>>>>>>> RE tokens distributed amongst shards. The way that would work
> > >> right
> > >>>> now is
> > >>>>>>>> that each port number appears to be a discrete instance of the
> > >>>> server. So
> > >>>>>>>> you could have shards be actual shards that are simply colocated
> > >> on
> > >>>> the
> > >>>>>>>> same box, run in the same process, and share resources. I know
> > >> this
> > >>>> pushes
> > >>>>>>>> more of the complexity into the server vs the driver as the
> server
> > >>>> expects
> > >>>>>>>> all shards to share some client visible like system tables and
> > >>> certain
> > >>>>>>>> identifiers.
> > >>>>>>>>
> > >>>>>>>> Ariel
> > >>>>>>>>> On Thu, Apr 19, 2018, at 12:59 PM, Avi Kivity wrote:
> > >>>>>>>>> Port-per-shard is likely the easiest option but it's too ugly
> to
> > >>>>>>>>> contemplate. We run on machines with 160 shards (IBM POWER
> > >>> 2s20c160t
> > >>>>>>>>> IIRC), it will be just horrible to have 160 open ports.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> It also doesn't fit will with the NICs ability to automatically
> > >>>>>>>>> distribute packets among cores using multiple queues, so the
> > >> kernel
> > >>>>>>>>> would have to shuffle those packets around. Much better to have
> > >>> those
> > >>>>>>>>> packets delivered directly to the core that will service them.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> (also, some protocol changes are needed so the driver knows how
> > >>>> tokens
> > >>>>>>>>> are distributed among shards)
> > >>>>>>>>>
> > >>>>>>>>>> On 2018-04-19 19:46, Ben Bromhead wrote:
> > >>>>>>>>>> WRT to #3
> > >>>>>>>>>> To fit in the existing protocol, could you have each shard
> > >> listen
> > >>>> on a
> > >>>>>>>>>> different port? Drivers are likely going to support this due
> to
> > >>>>>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-7544 (
> > >>>>>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-11596).  I'm
> > >> not
> > >>>> super
> > >>>>>>>>>> familiar with the ticket so their might be something I'm
> missing
> > >>>> but it
> > >>>>>>>>>> sounds like a potential approach.
> > >>>>>>>>>>
> > >>>>>>>>>> This would give you a path forward at least for the short
> term.
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> On Thu, Apr 19, 2018 at 12:10 PM Ariel Weisberg <
> > >>> ariel@xxxxxxxxxxx>
> > >>>>>>>> wrote:
> > >>>>>>>>>>> Hi,
> > >>>>>>>>>>>
> > >>>>>>>>>>> I think that updating the protocol spec to Cassandra puts the
> > >>> onus
> > >>>> on
> > >>>>>>>> the
> > >>>>>>>>>>> party changing the protocol specification to have an
> > >>> implementation
> > >>>>>>>> of the
> > >>>>>>>>>>> spec in Cassandra as well as the Java and Python driver
> (those
> > >>> are
> > >>>>>>>> both
> > >>>>>>>>>>> used in the Cassandra repo). Until it's implemented in
> > >> Cassandra
> > >>> we
> > >>>>>>>> haven't
> > >>>>>>>>>>> fully evaluated the specification change. There is no
> > >> substitute
> > >>>> for
> > >>>>>>>> trying
> > >>>>>>>>>>> to make it work.
> > >>>>>>>>>>>
> > >>>>>>>>>>> There are also realities to consider as to what the
> maintainers
> > >>> of
> > >>>> the
> > >>>>>>>>>>> drivers are willing to commit.
> > >>>>>>>>>>>
> > >>>>>>>>>>> RE #1,
> > >>>>>>>>>>>
> > >>>>>>>>>>> I am +1 on the fact that we shouldn't require an extra hop
> for
> > >>>> range
> > >>>>>>>> scans.
> > >>>>>>>>>>> In JIRA Jeremiah made the point that you can still do this
> from
> > >>> the
> > >>>>>>>> client
> > >>>>>>>>>>> by breaking up the token ranges, but it's a leaky abstraction
> > >> to
> > >>>> have
> > >>>>>>>> a
> > >>>>>>>>>>> paging interface that isn't a vanilla ResultSet interface.
> > >> Serial
> > >>>> vs.
> > >>>>>>>>>>> parallel is kind of orthogonal as the driver can do either.
> > >>>>>>>>>>>
> > >>>>>>>>>>> I agree it looks like the current specification doesn't make
> > >> what
> > >>>>>>>> should
> > >>>>>>>>>>> be simple as simple as it could be for driver implementers.
> > >>>>>>>>>>>
> > >>>>>>>>>>> RE #2,
> > >>>>>>>>>>>
> > >>>>>>>>>>> +1 on this change assuming an implementation in Cassandra and
> > >> the
> > >>>>>>>> Java and
> > >>>>>>>>>>> Python drivers.
> > >>>>>>>>>>>
> > >>>>>>>>>>> RE #3,
> > >>>>>>>>>>>
> > >>>>>>>>>>> It's hard to be +1 on this because we don't benefit by boxing
> > >>>>>>>> ourselves in
> > >>>>>>>>>>> by defining a spec we haven't implemented, tested, and
> decided
> > >> we
> > >>>> are
> > >>>>>>>>>>> satisfied with. Having it in ScyllaDB de-risks it to a
> certain
> > >>>>>>>> extent, but
> > >>>>>>>>>>> what if Cassandra decides to go a different direction in some
> > >>> way?
> > >>>>>>>>>>>
> > >>>>>>>>>>> I don't think there is much discussion to be had without an
> > >>> example
> > >>>>>>>> of the
> > >>>>>>>>>>> the changes to the CQL specification to look at, but even
> then
> > >> if
> > >>>> it
> > >>>>>>>> looks
> > >>>>>>>>>>> risky I am not likely to be in favor of it.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Regards,
> > >>>>>>>>>>> Ariel
> > >>>>>>>>>>>
> > >>>>>>>>>>>> On Thu, Apr 19, 2018, at 9:33 AM, glommer@xxxxxxxxxxxx
> wrote:
> > >>>>>>>>>>>> On 2018/04/19 07:19:27, kurt greaves <kurt@xxxxxxxxxxxxxxx>
> > >>>> wrote:
> > >>>>>>>>>>>>>> 1. The protocol change is developed using the Cassandra
> > >>> process
> > >>>> in
> > >>>>>>>>>>>>>>     a JIRA ticket, culminating in a patch to
> > >>>>>>>>>>>>>>     doc/native_protocol*.spec when consensus is achieved.
> > >>>>>>>>>>>>> I don't think forking would be desirable (for anyone) so
> this
> > >>>> seems
> > >>>>>>>>>>>>> the most reasonable to me. For 1 and 2 it certainly makes
> > >> sense
> > >>>> but
> > >>>>>>>>>>>>> can't say I know enough about sharding to comment on 3 -
> > >> seems
> > >>>> to me
> > >>>>>>>>>>>>> like it could be locking in a design before anyone truly
> > >> knows
> > >>>> what
> > >>>>>>>>>>>>> sharding in C* looks like. But hopefully I'm wrong and
> there
> > >>> are
> > >>>>>>>>>>>>> devs out there that have already thought that through.
> > >>>>>>>>>>>> Thanks. That is our view and is great to hear.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> About our proposal number 3: In my view, good protocol
> designs
> > >>> are
> > >>>>>>>>>>>> future proof and flexible. We certainly don't want to
> propose
> > >> a
> > >>>>>>>> design
> > >>>>>>>>>>>> that works just for Scylla, but would support reasonable
> > >>>>>>>>>>>> implementations regardless of how they may look like.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> Do we have driver authors who wish to support both
> projects?
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Surely, but I imagine it would be a minority. ​
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>> ------------------------------------------------------------
> ---------
> > >>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.
> apache.org
> > >>> For
> > >>>>>>>>>>>> additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>> ------------------------------------------------------------
> ---------
> > >>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> > >>>>>>>>>>> For additional commands, e-mail:
> dev-help@xxxxxxxxxxxxxxxxxxxx
> > >>>>>>>>>>>
> > >>>>>>>>>>> --
> > >>>>>>>>>> Ben Bromhead
> > >>>>>>>>>> CTO | Instaclustr <https://www.instaclustr.com/>
> > >>>>>>>>>> +1 650 284 9692 <(650)%20284-9692> <(650)%20284-9692>
> > >>> <(650)%20284-9692>
> > >>>>>>>>>> Reliability at Scale
> > >>>>>>>>>> Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and
> Softlayer
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>> ------------------------------------------------------------
> ---------
> > >>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> > >>>>>>>>> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> > >>>>>>>>>
> > >>>>>>>>
> > >>> ------------------------------------------------------------
> ---------
> > >>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> > >>>>>>>> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> > >>>>>>>>
> > >>>>>>>> --
> > >>>>>>> Ben Bromhead
> > >>>>>>> CTO | Instaclustr <https://www.instaclustr.com/>
> > >>>>>>> +1 650 284 9692 <(650)%20284-9692> <(650)%20284-9692>
> > >>>>>>> Reliability at Scale
> > >>>>>>> Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer
> > >>>>>>>
> > >>>>>>
> > >>>>>> ------------------------------------------------------------
> > >> ---------
> > >>>>>> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> > >>>>>> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> > >>>>>>
> > >>>>>
> > >>>>> ------------------------------------------------------------
> > >> ---------
> > >>>>> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> > >>>>> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> > >>>>>
> > >>>>
> > >>>> --
> > >>> Ben Bromhead
> > >>> CTO | Instaclustr <https://www.instaclustr.com/>
> > >>> +1 650 284 9692 <(650)%20284-9692>
> > >>> Reliability at Scale
> > >>> Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer
> > >>>
> > >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> > For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> >
> >
>