OSDir


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Druid 0.12.2-rc1 vote


Well, it's never good if a WTH?! message actually gets logged. They are
usually meant to be things that should "never" happen. I am ok with holding
off 0.12.2-rc1 until this fix is in.

On Wed, Jul 11, 2018 at 1:04 PM Jihoon Son <jihoonson@xxxxxxxxxx> wrote:

> Thanks everyone for voting.
>
> Unfortunately, I found another bug in Kafka indexing service (
> https://github.com/apache/incubator-druid/issues/5992). I think it's worth
> to include 0.12.2.
> I'm currently working on that issue and can probably finish at least by
> this week.
>
> Can we add it to 0.12.2 and vote again once a patch to fix is merged?
>
> Jihoon
>
> On Wed, Jul 11, 2018 at 10:02 AM Jonathan Wei <jonwei@xxxxxxxxxx> wrote:
>
> > +1
> >
> > On Wed, Jul 11, 2018 at 9:44 AM, Gian Merlino <gian@xxxxxxxxxx> wrote:
> >
> > > +1 from me too!
> > >
> > > On Wed, Jul 11, 2018 at 7:28 AM Charles Allen <crallen@xxxxxxxxxx>
> > wrote:
> > >
> > > > That is very helpful, thank you!
> > > >
> > > > +1 for continuing with 0.12.2-RC1
> > > >
> > > > On Tue, Jul 10, 2018 at 6:51 PM Clint Wylie <clint.wylie@xxxxxxxx>
> > > wrote:
> > > >
> > > > > Heya, sorry for the delay (and missing the sync, i'll try to get
> > better
> > > > > about showing up). I've fixed a handful of coordinator bugs post
> > 0.12.0
> > > > > (and
> > > > > not backported to 0.12.1), some of these issues go far back, some
> > back
> > > to
> > > > > when segment assignment priority for different tiers of historicals
> > was
> > > > > introduced, some are just some oddities on the behavior of the
> > balancer
> > > > > that I am unsure when were introduced. This is the complete list of
> > > fixes
> > > > > that are currently in 0.12.2 afaik, with a small description (see
> PRs
> > > and
> > > > > associated issues for more details)
> > > > >
> > > > > https://github.com/apache/incubator-druid/pull/5528 fixed an issue
> > > that
> > > > > movement did not drop the segment from the server the segment was
> > being
> > > > > moved from (this one goes waaaay back, to batch segment
> > announcements)
> > > > >
> > > > > https://github.com/apache/incubator-druid/pull/5529 changed
> behavior
> > > of
> > > > > drop to use the balancer to choose where to drop segments from,
> based
> > > on
> > > > > behavior observed caused by the issue of 5528
> > > > >
> > > > > https://github.com/apache/incubator-druid/pull/5532 fixes an issue
> > > where
> > > > > primary assignment during load rule processing would assign an
> > > > unavailable
> > > > > segment to every server with capacity until at least 1 historical
> had
> > > the
> > > > > segment (and drop it from all the others if they all loaded at the
> > same
> > > > > time), choking load queues from doing useful things
> > > > >
> > > > > https://github.com/apache/incubator-druid/pull/5555 fixed a way
> for
> > > http
> > > > > based coordinator to get stuck loading or dropping segments and a
> > > > companion
> > > > > PR that fixed a lambda that wasn't friendly to older jvm versions
> > > > > https://github.com/apache/incubator-druid/pull/5591
> > > > >
> > > > > https://github.com/apache/incubator-druid/pull/5888 makes
> balancing
> > > > honor
> > > > > a
> > > > > load rule max load queue depth setting to help prevent movement
> from
> > > > > starving loading
> > > > >
> > > > > https://github.com/apache/incubator-druid/pull/5928 doesn't really
> > fix
> > > > > anything, just does an early return to avoid doing pointless work
> > > > >
> > > > > Additionally, there are a couple of pairs of PRs that are not
> > currently
> > > > in
> > > > > 0.12.2: https://github.com/druid-io/druid/pull/5927 and
> > > > > https://github.com/apache/incubator-druid/pull/5929 and their
> > > respective
> > > > > fixes which have yet to be merged, but have been performing well on
> > our
> > > > > test cluster, https://github.com/apache/incubator-druid/pull/5987
> > and
> > > > > https://github.com/apache/incubator-druid/pull/5988. One of them
> > makes
> > > > > balancing behave in a way more consistent with expectations by
> always
> > > > > trying to move maxSegmentsToMove and more correctly tracking what
> the
> > > > > balancer is doing, and one just adds better logging (without much
> > extra
> > > > log
> > > > > volume) due to frustrations I had chasing down all these other
> > issues.
> > > > Both
> > > > > of these were slated for 0.12.2 but were pulled out because of the
> > > issues
> > > > > (which the open PRs fix afaict). I would be in favor of sliding
> them
> > in
> > > > > there, pending review of the fixes, but understand if they won't
> make
> > > the
> > > > > cut since they maybe fall a bit more on the cosmetic side of
> things.
> > > I'm
> > > > > pretty happy of the state of things on our test cluster right now,
> > but
> > > > > without these 4 patches things should still be operating more
> > correctly
> > > > > than they were before, just the differences being with balancing
> > moving
> > > > > somewhere between 0 and max, and less useful logging making future
> > > issues
> > > > > (which I have no doubts still lurk) harder to diagnose.
> > > > >
> > > > > Cheers,
> > > > > Clint
> > > > >
> > > > > On Tue, Jul 10, 2018 at 10:30 AM, Charles Allen <
> crallen@xxxxxxxxxx>
> > > > > wrote:
> > > > >
> > > > > > Brought this up in the dev sync:
> > > > > >
> > > > > > I saw a lot of PRs and fixes for Coordinator segment balancing
> > > related
> > > > to
> > > > > > some regressions that happened in 0.12.x . Is anyone able to
> give a
> > > > > rundown
> > > > > > of the state of coordinator segment management for the 0.12.2 RC?
> > > > > >
> > > > > > On Tue, Jul 10, 2018 at 10:26 AM Nishant Bangarwa <
> > > > > > nbangarwa@xxxxxxxxxxxxxxx>
> > > > > > wrote:
> > > > > >
> > > > > > > +1
> > > > > > >
> > > > > > > --
> > > > > > > Nishant Bangarwa
> > > > > > >
> > > > > > > Hortonworks
> > > > > > >
> > > > > > > On 7/10/18, 3:57 AM, "Jihoon Son" <jihoonson@xxxxxxxxxx>
> wrote:
> > > > > > >
> > > > > > >     Related thread:
> > > > > > >
> > > > > > > https://lists.apache.org/thread.html/
> > > 76755aecfddb1210fcc3f08b1d4631
> > > > > > 784a8a5eede64d22718c271841@%3Cdev.druid.apache.org%3E
> > > > > > >     .
> > > > > > >
> > > > > > >     Jihoon
> > > > > > >
> > > > > > >     On Mon, Jul 9, 2018 at 3:25 PM Jihoon Son <
> > > jihoonson@xxxxxxxxxx>
> > > > > > > wrote:
> > > > > > >
> > > > > > >     > Hi all,
> > > > > > >     >
> > > > > > >     > We have no open issues and PRs for 0.12.2 (
> > > > > > >     > https://github.com/apache/incubator-druid/milestone/27).
> > The
> > > > > > 0.12.2
> > > > > > >     > branch is already available and all PRs for 0.12.2 have
> > > merged
> > > > > into
> > > > > > > that
> > > > > > >     > branch.
> > > > > > >     >
> > > > > > >     > Let's vote on releasing RC1. Here is my +1.
> > > > > > >     >
> > > > > > >     > This is a non-ASF release.
> > > > > > >     >
> > > > > > >     > Best,
> > > > > > >     > Jihoon
> > > > > > >     >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>