OSDir

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Evolving the client protocol


If you donate Thread per core to C*, I am sure someone can help you review
it and get it committed.

On Thu, Apr 19, 2018 at 11:15 AM, Ben Bromhead <ben@xxxxxxxxxxxxxxx> wrote:

> Re #3:
>
> Yup I was thinking each shard/port would appear as a discrete server to the
> client.
>
> If the per port suggestion is unacceptable due to hardware requirements,
> remembering that Cassandra is built with the concept scaling *commodity*
> hardware horizontally, you'll have to spend your time and energy convincing
> the community to support a protocol feature it has no (current) use for or
> find another interim solution.
>
> Another way, would be to build support and consensus around a clear
> technical need in the Apache Cassandra project as it stands today.
>
> One way to build community support might be to contribute an Apache
> licensed thread per core implementation in Java that matches the protocol
> change and shard concept you are looking for ;P
>
>
> On Thu, Apr 19, 2018 at 1:43 PM Ariel Weisberg <ariel@xxxxxxxxxxx> wrote:
>
> > Hi,
> >
> > So at technical level I don't understand this yet.
> >
> > So you have a database consisting of single threaded shards and a socket
> > for accept that is generating TCP connections and in advance you don't
> know
> > which connection is going to send messages to which shard.
> >
> > What is the mechanism by which you get the packets for a given TCP
> > connection delivered to a specific core? I know that a given TCP
> connection
> > will normally have all of its packets delivered to the same queue from
> the
> > NIC because the tuple of source address + port and destination address +
> > port is typically hashed to pick one of the queues the NIC presents. I
> > might have the contents of the tuple slightly wrong, but it always
> includes
> > a component you don't get to control.
> >
> > Since it's hashing how do you manipulate which queue packets for a TCP
> > connection go to and how is it made worse by having an accept socket per
> > shard?
> >
> > You also mention 160 ports as bad, but it doesn't sound like a big number
> > resource wise. Is it an operational headache?
> >
> > RE tokens distributed amongst shards. The way that would work right now
> is
> > that each port number appears to be a discrete instance of the server. So
> > you could have shards be actual shards that are simply colocated on the
> > same box, run in the same process, and share resources. I know this
> pushes
> > more of the complexity into the server vs the driver as the server
> expects
> > all shards to share some client visible like system tables and certain
> > identifiers.
> >
> > Ariel
> > On Thu, Apr 19, 2018, at 12:59 PM, Avi Kivity wrote:
> > > Port-per-shard is likely the easiest option but it's too ugly to
> > > contemplate. We run on machines with 160 shards (IBM POWER 2s20c160t
> > > IIRC), it will be just horrible to have 160 open ports.
> > >
> > >
> > > It also doesn't fit will with the NICs ability to automatically
> > > distribute packets among cores using multiple queues, so the kernel
> > > would have to shuffle those packets around. Much better to have those
> > > packets delivered directly to the core that will service them.
> > >
> > >
> > > (also, some protocol changes are needed so the driver knows how tokens
> > > are distributed among shards)
> > >
> > > On 2018-04-19 19:46, Ben Bromhead wrote:
> > > > WRT to #3
> > > > To fit in the existing protocol, could you have each shard listen on
> a
> > > > different port? Drivers are likely going to support this due to
> > > > https://issues.apache.org/jira/browse/CASSANDRA-7544 (
> > > > https://issues.apache.org/jira/browse/CASSANDRA-11596).  I'm not
> super
> > > > familiar with the ticket so their might be something I'm missing but
> it
> > > > sounds like a potential approach.
> > > >
> > > > This would give you a path forward at least for the short term.
> > > >
> > > >
> > > > On Thu, Apr 19, 2018 at 12:10 PM Ariel Weisberg <ariel@xxxxxxxxxxx>
> > wrote:
> > > >
> > > >> Hi,
> > > >>
> > > >> I think that updating the protocol spec to Cassandra puts the onus
> on
> > the
> > > >> party changing the protocol specification to have an implementation
> > of the
> > > >> spec in Cassandra as well as the Java and Python driver (those are
> > both
> > > >> used in the Cassandra repo). Until it's implemented in Cassandra we
> > haven't
> > > >> fully evaluated the specification change. There is no substitute for
> > trying
> > > >> to make it work.
> > > >>
> > > >> There are also realities to consider as to what the maintainers of
> the
> > > >> drivers are willing to commit.
> > > >>
> > > >> RE #1,
> > > >>
> > > >> I am +1 on the fact that we shouldn't require an extra hop for range
> > scans.
> > > >>
> > > >> In JIRA Jeremiah made the point that you can still do this from the
> > client
> > > >> by breaking up the token ranges, but it's a leaky abstraction to
> have
> > a
> > > >> paging interface that isn't a vanilla ResultSet interface. Serial
> vs.
> > > >> parallel is kind of orthogonal as the driver can do either.
> > > >>
> > > >> I agree it looks like the current specification doesn't make what
> > should
> > > >> be simple as simple as it could be for driver implementers.
> > > >>
> > > >> RE #2,
> > > >>
> > > >> +1 on this change assuming an implementation in Cassandra and the
> > Java and
> > > >> Python drivers.
> > > >>
> > > >> RE #3,
> > > >>
> > > >> It's hard to be +1 on this because we don't benefit by boxing
> > ourselves in
> > > >> by defining a spec we haven't implemented, tested, and decided we
> are
> > > >> satisfied with. Having it in ScyllaDB de-risks it to a certain
> > extent, but
> > > >> what if Cassandra decides to go a different direction in some way?
> > > >>
> > > >> I don't think there is much discussion to be had without an example
> > of the
> > > >> the changes to the CQL specification to look at, but even then if it
> > looks
> > > >> risky I am not likely to be in favor of it.
> > > >>
> > > >> Regards,
> > > >> Ariel
> > > >>
> > > >> On Thu, Apr 19, 2018, at 9:33 AM, glommer@xxxxxxxxxxxx wrote:
> > > >>>
> > > >>> On 2018/04/19 07:19:27, kurt greaves <kurt@xxxxxxxxxxxxxxx> wrote:
> > > >>>>> 1. The protocol change is developed using the Cassandra process
> in
> > > >>>>>     a JIRA ticket, culminating in a patch to
> > > >>>>>     doc/native_protocol*.spec when consensus is achieved.
> > > >>>> I don't think forking would be desirable (for anyone) so this
> seems
> > > >>>> the most reasonable to me. For 1 and 2 it certainly makes sense
> but
> > > >>>> can't say I know enough about sharding to comment on 3 - seems to
> me
> > > >>>> like it could be locking in a design before anyone truly knows
> what
> > > >>>> sharding in C* looks like. But hopefully I'm wrong and there are
> > > >>>> devs out there that have already thought that through.
> > > >>> Thanks. That is our view and is great to hear.
> > > >>>
> > > >>> About our proposal number 3: In my view, good protocol designs are
> > > >>> future proof and flexible. We certainly don't want to propose a
> > design
> > > >>> that works just for Scylla, but would support reasonable
> > > >>> implementations regardless of how they may look like.
> > > >>>
> > > >>>> Do we have driver authors who wish to support both projects?
> > > >>>>
> > > >>>> Surely, but I imagine it would be a minority. ​
> > > >>>>
> > > >>> ------------------------------------------------------------
> ---------
> > > >>> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx For
> > > >>> additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> > > >>>
> > > >> ------------------------------------------------------------
> ---------
> > > >> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> > > >> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> > > >>
> > > >> --
> > > > Ben Bromhead
> > > > CTO | Instaclustr <https://www.instaclustr.com/>
> > > > +1 650 284 9692 <(650)%20284-9692>
> > > > Reliability at Scale
> > > > Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer
> > > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> > > For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> > For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> >
> > --
> Ben Bromhead
> CTO | Instaclustr <https://www.instaclustr.com/>
> +1 650 284 9692
> Reliability at Scale
> Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer
>