osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Flip23


Hey Boris,

We have developed something very similar for our needs, but we faced some
issues when running it in HA mode, it was mainly because of the fact that
Tensorflow uses native functions and this caused some issues when connected
with automatic job restarts.

As far as I remember, the issue was that when the job was restarted after
the failure, when it was trying to reload the model it was failing with:

Cannot register 2 metrics with the same name:
/tensorflow/cc/saved_model/load_attempt_count

Not sure if the issue exists in the newest Tensorflow as we have switched
to the different mechanism.

It is probably worth to verify if this error is still present.

Best Regards,
Dom.

pon., 5 lis 2018 o 11:29 Fabian Hueske <fhueske@xxxxxxxxx> napisał(a):

> Hi Boris,
>
> Thanks for sharing the code that you'd like to contribute for FLIP-23.
>
> I have a quick look at the repository and collected some stats to estimate
> the reviewing effort for the contribution.
> There are approx 1900 lines of Java and 2000 lines of Scala code.
> This is a reasonable size that shouldn't be too hard to review.
>
> Are the Java and Scala parts completely separated or does the Java API
> depend on the Scala code (or the other way round)?
> Would it be possible to merge the code in two separate steps?
> I'm asking because it's typically faster to incrementally review and merge
> larger features.
> Another way to split the contribution would be to first add the simple
> version than evolve it into the speculative one as a follow up (not sure if
> that would make sense).
>
> Btw. you posted the same link for simple and speculative model serving.
>
> Best, Fabian
>
> Am Do., 1. Nov. 2018 um 11:27 Uhr schrieb Till Rohrmann <
> trohrmann@xxxxxxxxxx>:
>
> > Thanks for sharing the code with the community Boris!
> >
> > Cheers,
> > Till
> >
> > On Thu, Nov 1, 2018 at 10:29 AM Boris Lublinsky <
> > boris.lublinsky@xxxxxxxxxxxxx> wrote:
> >
> > > For those who want to see an actual code, it is here:
> > > https://github.com/FlinkML/flink-modelServer <
> > > https://github.com/FlinkML/flink-modelServer> for simple model serving
> > > And https://github.com/FlinkML/flink-modelServer <
> > > https://github.com/FlinkML/flink-modelServer> for speculative one
> > >
> > > Boris Lublinsky
> > > FDP Architect
> > > boris.lublinsky@xxxxxxxxxxxxx
> > > https://www.lightbend.com/
> > >
> > >
> >
>