osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Duplicate key unique constraint error


Max,

Yes, the stack trace points to the race condition issue. What can be done
to fix this? Can someone from the dev team look into this? Should I raise a
JIRA for the same?



On 2 November 2018 at 11:14:36 AM, Maxime Beauchemin (
maximebeauchemin@xxxxxxxxx) wrote:

Wait, the title of this thread is "Duplicate key unique constraint error",
to me that screams that something is not ok. If the check+insert was atomic
(insulated) this error wouldn't happen. Also I'm pretty sure when I looked
the stack trace looked like a scheduler-specific stack trace. It may be a
rare race condition, but doesn't the stack trace prove the existence of a
race condition?

Max

On Fri, Nov 2, 2018 at 10:19 AM Abhishek Sinha <abhishek@xxxxxxxxxxxx>
wrote:

> Max,
>
> If check+insert works correctly, then even multiple instances of scheduler
> running in parallel should not throw this error. I am not sure then when
> can this error happen.
>
>
>
> On 2 November 2018 at 8:37:20 AM, Maxime Beauchemin (
> maximebeauchemin@xxxxxxxxx) wrote:
>
> The scheduler should never fail hard. The schedule logic that tries to
> insert the new task instance should only insert a new one if it doesn't
> exist already and isolate that check+insert inside a database transaction.
>
> Max
>
> On Fri, Nov 2, 2018 at 5:38 AM Abhishek Sinha <abhishek@xxxxxxxxxxxx>
> wrote:
>
> > Brian,
> >
> > We use the trigger dag CLI command to trigger it manually.
> >
> > Even when you have custom operators, the duplicate key error should not
> > happen right? Shouldn't the combination of task id, dag id and execution
> > date be unique?
> >
> >
> > On 30 October 2018 at 10:23:27 PM, Abhishek Sinha (abhishek@xxxxxxxxxxxx
> )
> > wrote:
> >
> > Max,
> >
> > The schedule interval is 1 day.
> >
> >
> >
> > Sent from my iPhone
> >
> > > On 30-Oct-2018, at 9:29 PM, Maxime Beauchemin <
> > maximebeauchemin@xxxxxxxxx>
> > wrote:
> > >
> > > Also what's your schedule interval? I'm just trying to confirm that
> this
> > > isn't a "run every minute, or anytime someone blinks" kind of DAG.
> > >
> > > Max
> > >
> > > On Tue, Oct 30, 2018 at 5:48 AM Brian Greene <
> > > brian@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > >> How do you trigger it externally?
> > >>
> > >> We have several custom operators that trigger other jobs and we had to
> > be
> > >> really careful or we’d get duplicate keys for the dag run and it would
> > fail
> > >> to kick off.
> > >>
> > >> One scheduler, but we saw it repeatedly and have it noted as a thing
> to
> > >> watch out for.
> > >>
> > >> Brian
> > >>
> > >> Sent from a device with less than stellar autocorrect
> > >>
> > >>> On Oct 29, 2018, at 2:03 PM, Abhishek Sinha <abhishek@xxxxxxxxxxxx>
> > >> wrote:
> > >>>
> > >>> Attaching the scheduler crash logs as well.
> > >>>
> > >>> https://pastebin.com/B2WEJKRB
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> Regards,
> > >>>
> > >>> Abhishek Sinha | m: +919035191078 | e: abhishek@xxxxxxxxxxxx
> > >>>
> > >>>
> > >>> On Tue, Oct 30, 2018 at 12:19 AM Abhishek Sinha <
> abhishek@xxxxxxxxxxxx
> > >
> > >>> wrote:
> > >>>
> > >>>> Max,
> > >>>>
> > >>>> We always trigger the DAG externally. I am not sure if there is
> still
> > >> any
> > >>>> backfill involved.
> > >>>>
> > >>>> Is there a way where I can find out in logs, if more than one
> instance
> > >> of
> > >>>> scheduler is running?
> > >>>>
> > >>>>
> > >>>> On 29 October 2018 at 10:43:19 PM, Maxime Beauchemin (
> > >>>> maximebeauchemin@xxxxxxxxx) wrote:
> > >>>>
> > >>>> The stacktrace seems to be pointing in that direction. Id check that
> > >>>> first. It seems like it **could** be a race condition with a
> backfill
> > as
> > >>>> well, unclear.
> > >>>>
> > >>>> It's still a bug though, and the scheduler should make sure to
> handle
> > >> this
> > >>>> and not raise/crash.
> > >>>>
> > >>>> On Mon, Oct 29, 2018, 10:05 AM Abhishek Sinha <
> abhishek@xxxxxxxxxxxx>
> > >>>> wrote:
> > >>>>
> > >>>>> Max,
> > >>>>>
> > >>>>> I do not think there was more than one instance of scheduler
> running.
> > >>>>> Since the scheduler crashed and it has been restarted, I cannot
> > >> confirm it
> > >>>>> now. Is there any log that can provide this information?
> > >>>>>
> > >>>>> Could there be a different cause apart from multiple scheduler
> > >> instances
> > >>>>> running?
> > >>>>>
> > >>>>>
> > >>>>> On 29 October 2018 at 9:30:56 PM, Maxime Beauchemin (
> > >>>>> maximebeauchemin@xxxxxxxxx) wrote:
> > >>>>>
> > >>>>> Abhishek, are you running more than one scheduler instance at once?
> > >>>>>
> > >>>>> Max
> > >>>>>
> > >>>>> On Mon, Oct 29, 2018 at 8:17 AM Abhishek Sinha <
> > abhishek@xxxxxxxxxxxx>
> >
> > >>>>> wrote:
> > >>>>>
> > >>>>>> The issue is happening more frequently now. Can someone please
> look
> > >> into
> > >>>>>> this?
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> On 24 September 2018 at 12:42:49 PM, Abhishek Sinha (
> > >>>>> abhishek@xxxxxxxxxxxx
> > >>>>>> )
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>> Can someone please help in looking into this issue? It is critical
> > >> since
> > >>>>>> this has come up in one of our production environment. Also, this
> > >> issue
> > >>>>> has
> > >>>>>> appeared only once till now.
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> Regards,
> > >>>>>>
> > >>>>>> Abhishek
> > >>>>>>
> > >>>>>> On 20-Sep-2018, at 10:18 PM, Abhishek Sinha <
> abhishek@xxxxxxxxxxxx>
> > >>>>> wrote:
> > >>>>>>
> > >>>>>> Any update on this?
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> Regards,
> > >>>>>>
> > >>>>>> Abhishek
> > >>>>>>
> > >>>>>> On 18-Sep-2018, at 12:48 AM, Abhishek Sinha <
> abhishek@xxxxxxxxxxxx>
> > >>>>> wrote:
> > >>>>>>
> > >>>>>> Pastebin: https://pastebin.com/K6BMTb5K
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> Regards,
> > >>>>>>
> > >>>>>> Abhishek
> > >>>>>>
> > >>>>>> On 18-Sep-2018, at 12:31 AM, Stefan Seelmann <
> > mail@xxxxxxxxxxxxxxxxxx
> > >>>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>> On 9/17/18 8:19 PM, Abhishek Sinha wrote:
> > >>>>>>
> > >>>>>> Any update on this?
> > >>>>>>
> > >>>>>> Please find the scheduler error log attached.
> > >>>>>>
> > >>>>>> Can you share the full python stack trace?
> > >>>>>>
> > >>>>>>
> > >>>>>> Seems the mailing list doesn't allow attachments. Either post the
> > >>>>>> stacktrace inline, or post it somewhere at pastebin or so.
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>
> >
>
>