osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Druid 0.12.2-rc1 vote


Heya, sorry for the delay (and missing the sync, i'll try to get better
about showing up). I've fixed a handful of coordinator bugs post 0.12.0 (and
not backported to 0.12.1), some of these issues go far back, some back to
when segment assignment priority for different tiers of historicals was
introduced, some are just some oddities on the behavior of the balancer
that I am unsure when were introduced. This is the complete list of fixes
that are currently in 0.12.2 afaik, with a small description (see PRs and
associated issues for more details)

https://github.com/apache/incubator-druid/pull/5528 fixed an issue that
movement did not drop the segment from the server the segment was being
moved from (this one goes waaaay back, to batch segment announcements)

https://github.com/apache/incubator-druid/pull/5529 changed behavior of
drop to use the balancer to choose where to drop segments from, based on
behavior observed caused by the issue of 5528

https://github.com/apache/incubator-druid/pull/5532 fixes an issue where
primary assignment during load rule processing would assign an unavailable
segment to every server with capacity until at least 1 historical had the
segment (and drop it from all the others if they all loaded at the same
time), choking load queues from doing useful things

https://github.com/apache/incubator-druid/pull/5555 fixed a way for http
based coordinator to get stuck loading or dropping segments and a companion
PR that fixed a lambda that wasn't friendly to older jvm versions
https://github.com/apache/incubator-druid/pull/5591

https://github.com/apache/incubator-druid/pull/5888 makes balancing honor a
load rule max load queue depth setting to help prevent movement from
starving loading

https://github.com/apache/incubator-druid/pull/5928 doesn't really fix
anything, just does an early return to avoid doing pointless work

Additionally, there are a couple of pairs of PRs that are not currently in
0.12.2: https://github.com/druid-io/druid/pull/5927 and
https://github.com/apache/incubator-druid/pull/5929 and their respective
fixes which have yet to be merged, but have been performing well on our
test cluster, https://github.com/apache/incubator-druid/pull/5987 and
https://github.com/apache/incubator-druid/pull/5988. One of them makes
balancing behave in a way more consistent with expectations by always
trying to move maxSegmentsToMove and more correctly tracking what the
balancer is doing, and one just adds better logging (without much extra log
volume) due to frustrations I had chasing down all these other issues. Both
of these were slated for 0.12.2 but were pulled out because of the issues
(which the open PRs fix afaict). I would be in favor of sliding them in
there, pending review of the fixes, but understand if they won't make the
cut since they maybe fall a bit more on the cosmetic side of things. I'm
pretty happy of the state of things on our test cluster right now, but
without these 4 patches things should still be operating more correctly
than they were before, just the differences being with balancing moving
somewhere between 0 and max, and less useful logging making future issues
(which I have no doubts still lurk) harder to diagnose.

Cheers,
Clint

On Tue, Jul 10, 2018 at 10:30 AM, Charles Allen <crallen@xxxxxxxxxx> wrote:

> Brought this up in the dev sync:
>
> I saw a lot of PRs and fixes for Coordinator segment balancing related to
> some regressions that happened in 0.12.x . Is anyone able to give a rundown
> of the state of coordinator segment management for the 0.12.2 RC?
>
> On Tue, Jul 10, 2018 at 10:26 AM Nishant Bangarwa <
> nbangarwa@xxxxxxxxxxxxxxx>
> wrote:
>
> > +1
> >
> > --
> > Nishant Bangarwa
> >
> > Hortonworks
> >
> > On 7/10/18, 3:57 AM, "Jihoon Son" <jihoonson@xxxxxxxxxx> wrote:
> >
> >     Related thread:
> >
> > https://lists.apache.org/thread.html/76755aecfddb1210fcc3f08b1d4631
> 784a8a5eede64d22718c271841@%3Cdev.druid.apache.org%3E
> >     .
> >
> >     Jihoon
> >
> >     On Mon, Jul 9, 2018 at 3:25 PM Jihoon Son <jihoonson@xxxxxxxxxx>
> > wrote:
> >
> >     > Hi all,
> >     >
> >     > We have no open issues and PRs for 0.12.2 (
> >     > https://github.com/apache/incubator-druid/milestone/27). The
> 0.12.2
> >     > branch is already available and all PRs for 0.12.2 have merged into
> > that
> >     > branch.
> >     >
> >     > Let's vote on releasing RC1. Here is my +1.
> >     >
> >     > This is a non-ASF release.
> >     >
> >     > Best,
> >     > Jihoon
> >     >
> >
> >
> >
>