osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [SURVEY] Usage of flink-python and flink-streaming-python


Hi Folks,
To avoid polluting the survey thread with discussions, we started separate
thread and maybe we can continue the discussion over there.

Regards,
Xianda

On Wed, Dec 12, 2018 at 3:34 AM Stephan Ewen <sewen@xxxxxxxxxx> wrote:

> I like that we are having a general discussion about how to use Python and
> Flink together in the future.
> The current python support has some shortcomings that were mentioned
> before, so we clearly need something better.
>
> Parts of the community have worked together with the Apache Beam project,
> which is pretty far in adding a portability layer to support Python.
> Before we dive deep into a design proposal for a new Python API in Flink, I
> think we should figure out in which general direction Python support should
> go.
>
> *Option (1): Language portability via Apache Beam*
>
> Pro:
>   - already exists to a large extend and already has users
>   - portability layer offers other languages in addition to python. Go is
> in the making, NodeJS has been speculated, etc.
>   - collaboration with another project / community which means more
> manpower and exposure. Beam currently has a strong focus on Flink as a
> runner for Python.
>   - Python API is used for existing ML libraries from the TensorFlow
> ecosystem
>
> Con:
>   - Not Flink's API. Python users need to learn the syntax of another API
> (Python API is inherently different, but even more different here).
>
> *Option (2): Implement own Python API*
>
> Pro:
>   - Python API will be closer to Flink Java / Scala APIs
>
> Con:
>   - We will only have Python.
>   - Need to to rebuild the Python language bridge (significant work to get
> stable)
>   - might lose tight collaboration with Beam and the other parties in Beam
>   - not benefiting from Beam's ecosystem
>
> *Option (3): **Implement own portability layer*
>
> Pro
>   - Flexibility to align APIs across languages within Flink ecosystem
>
> Con
>   - A lot of work (for context, to get this feature complete, Beam has
> worked on that for a year now)
>   - Replicating work that already exists
>   - good chance to lose tight collaboration with Beam and parties in that
> project
>   - not benefiting from Beam's ecosystem
>
> Best,
> Stephan
>
>
> On Tue, Dec 11, 2018 at 3:38 PM Thomas Weise <thw@xxxxxxxxxx> wrote:
>
> > Did you take a look at Apache Beam? It already provides a comprehensive
> > Python SDK and can be used with Flink:
> > https://beam.apache.org/roadmap/portability/#python-on-flink
> >
> > We are using it at Lyft for Python streaming pipelines.
> >
> > Thomas
> >
> > On Tue, Dec 11, 2018 at 5:54 AM Xianda Ke <kexianda@xxxxxxxxx> wrote:
> >
> > > Hi Till,
> > >
> > > 1. So far as I know, most of the users at Alibaba are using SQL.  Some
> of
> > > users at Alibaba want integrated python libraries with Flink for
> > streaming
> > > processing, and Jython is unusable.
> > >
> > > 2. Python UDFs for SQL:
> > > * declaring python UDF based on Alibaba's internal DDL syntax.
> > > * start a Python process in open()
> > > * communicate with JVM process via Socket.
> > > * Yes, it support python libraries, users can upload virutalenv/conda
> > > Python runtime
> > >
> > > 3. We've draft a design doc for Python API
> > >  [DISCUSS] Flink Python API
> > > <
> > >
> >
> https://docs.google.com/document/d/1JNGWdLwbo_btq9RVrc1PjWJV3lYUgPvK0uEWDIfVNJI/edit?usp=drive_web
> > > >
> > >
> > > Python UDF for SQL is not discussed in this documentation, we'll
> create a
> > > new proposal when the SQL DDL is ready.
> > >
> > > On Mon, Dec 10, 2018 at 9:52 PM Till Rohrmann <trohrmann@xxxxxxxxxx>
> > > wrote:
> > >
> > > > Hi Xianda,
> > > >
> > > > thanks for sharing this detailed feedback. Do I understand you
> > correctly
> > > > that flink-python and flink-streaming-python are not usable for the
> use
> > > > cases at Alibaba atm?
> > > >
> > > > Could you share a bit more details about the Python UDFs for SQL? How
> > do
> > > > you execute the Python code? Will it work with any Python library? If
> > you
> > > > are about to publish the design document then feel free to refer me
> to
> > > this
> > > > document.
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Mon, Dec 10, 2018 at 3:08 AM Xianda Ke <kexianda@xxxxxxxxx>
> wrote:
> > > >
> > > > > Xianda Ke <kexianda@xxxxxxxxx>
> > > > > 9:47 AM (11 minutes ago)
> > > > > to dev, user
> > > > > After communicating with some of the internal users at Alibaba, my
> > > > > impression is that:
> > > > >
> > > > >    - Most of them need C extensions support, they want to
> integrated
> > > > their
> > > > >    algorithms with stream processing,but Jython is unacceptable for
> > > them.
> > > > >    - For some users, who are only familiar with SQL/Python,
> > developing
> > > > Java
> > > > >    API application/UDF is too complex. Writing Python UDF and
> > declaring
> > > > it
> > > > > in
> > > > >    SQL is preferred.
> > > > >    - Machine Learning users needs richer Python APIs, such as Table
> > API
> > > > >    Python support.
> > > > >
> > > > >
> > > > > From my point of view, currently Python support has a few caveats
> in
> > > > Flink.
> > > > >
> > > > >    - For batch, there is only DataSet Python API.
> > > > >    - For streaming, where Flink really shines, only Jython is
> > > supported,
> > > > >    but Jython has lots of limitations.
> > > > >    - For most of the big data users, SQL/Table API is more
> friendly,
> > > but
> > > > >    Python users have no such APIs right now.
> > > > >    - The interactive Python shell is very user-friendly. It can be
> > used
> > > > to
> > > > >    test interactively and is a simple way to learn the API.
> However,
> > > > there
> > > > > is
> > > > >    no such interactive Python shell in Flink now.
> > > > >
> > > > >
> > > > > At Alibaba, Python UDF for SQL has been developed and has been
> > > delivered
> > > > to
> > > > > internal users.  Currently, we start to develop the Python API, and
> > > we've
> > > > > drafted a design documentation and will publish it to the community
> > > soon
> > > > > for discussion.
> > > > >
> > > > > Regards,
> > > > > Xianda
> > > > >
> > > > > On Fri, Dec 7, 2018 at 11:30 PM Till Rohrmann <
> trohrmann@xxxxxxxxxx>
> > > > > wrote:
> > > > >
> > > > > > Dear Flink community,
> > > > > >
> > > > > > in order to better understand the needs of our users and to plan
> > for
> > > > the
> > > > > > future, I wanted to reach out to you and ask how much you use
> > Flink's
> > > > > > Python API, namely flink-python and flink-streaming-python.
> > > > > >
> > > > > > In order to gather feedback, I would like to ask all Python users
> > to
> > > > > > respond to this thread and quickly outline how you use Python in
> > > > > > combination with Flink. Thanks a lot for your help!
> > > > > >
> > > > > > Cheers,
> > > > > > Till
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Ke, Xianda
> > > > >
> > > >
> > >
> > >
> > > --
> > > Ke, Xianda
> > >
> >
>


-- 
Ke, Xianda