[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Pentaho to Airflow


Besides the link Ben shared, I think the average Airflow user in this case
would write the upsert query/logic themselves then run it in a
PythonOperator (or DockerOperator if your database requires a lot of
dependencies).

T

On Wed, Jun 6, 2018 at 6:28 PM Arash Soheili <tonyarash@xxxxxxxxx> wrote:

> Thanks. I'll take a look.
>
> On Wed, Jun 6, 2018, 3:37 PM Ben Gregory <ben@xxxxxxxxxxxxx> wrote:
>
> > Hey Arash --
> >
> > We wrote this for a similar use case to yours (as I understand it). It's
> an
> > opinionated operator (assumes loading data from AWS S3) but it has an
> > pseudo-"upsert" (INSERT ... ON DUPLICATE KEY UPDATE) method for loading
> > data so you might be able to adapt to your needs.
> >
> >
> >
> https://github.com/airflow-plugins/mysql_plugin/blob/master/operators/s3_to_mysql_operator.py#L9
> >
> > -Ben
> >
> > On Tue, Jun 5, 2018 at 8:55 PM Arash Soheili <tonyarash@xxxxxxxxx>
> wrote:
> >
> > > I have looked through those and didn't find what I needed. Although
> there
> > > is the mysql operator and I have used that to implement and insert or
> > > update.
> > >
> > > I was looking for something like this
> > >
> > >
> >
> https://wiki.pentaho.com/plugins/servlet/mobile?contentId=8292089#content/view/8292089
> > > .
> > > A way to bulk insert or update based on lookup key. What would be the
> > most
> > > optimized way to do this in Airflow?
> > >
> > > On Tue, Jun 5, 2018, 9:47 PM Taylor Edmiston <tedmiston@xxxxxxxxx>
> > wrote:
> > >
> > > > Hey Arash -
> > > >
> > > > There are some common operators built-in
> > > > <
> > >
> >
> https://github.com/apache/incubator-airflow/tree/master/airflow/operators
> > > > >
> > > > to Airflow and some in contrib
> > > > <
> > > >
> > >
> >
> https://github.com/apache/incubator-airflow/tree/master/airflow/contrib/operators
> > > > >
> > > > as well.
> > > >
> > > > We also maintain a community sourced GitHub org of Airflow plugins
> > > (mostly
> > > > hooks and operators) at https://github.com/airflow-plugins.
> > > >
> > > > Are there specific sources/destinations you're looking for to match
> > what
> > > > you use in Pentaho?
> > > >
> > > > Best,
> > > > Taylor
> > > >
> > > > *Taylor Edmiston*
> > > > Blog <https://blog.tedmiston.com/> | CV
> > > > <https://stackoverflow.com/cv/taylor> | LinkedIn
> > > > <https://www.linkedin.com/in/tedmiston/> | AngelList
> > > > <https://angel.co/taylor> | Stack Overflow
> > > > <https://stackoverflow.com/users/149428/taylor-edmiston>
> > > >
> > > >
> > > > On Tue, Jun 5, 2018 at 8:57 PM, Arash Soheili <tonyarash@xxxxxxxxx>
> > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I'm new to Airlfow and helping to setup our organization to
> > transition
> > > > away
> > > > > from using Pentaho Data Integration for our ETL. Although there
> are a
> > > lot
> > > > > of things I don't like about Pentaho they do have some nice
> standard
> > > > > modules like batch databae insert/update which are common ETL
> tasks.
> > > > >
> > > > > As I'm new to Airflow I haven't seen any standard Operators for
> this
> > > kind
> > > > > of task which I would think would be a common use case in Airflow
> or
> > > any
> > > > > ETL. Am I missing this information or is it expected upon each
> > Airflow
> > > > > users to implement their own standard operators for this kind of
> > > > operation.
> > > > > I would think this should at some point become part of Airflow
> > > codebase.
> > > > >
> > > > > Arash
> > > > >
> > > >
> > >
> >
> >
> > --
> >
> > [image: Astronomer Logo] <https://www.astronomer.io/>
> >
> > *Ben Gregory*
> > Data Engineer
> >
> > Mobile: +1-615-483-3653 • Online: astronomer.io <
> > https://www.astronomer.io/>
> >
> > Download our new ebook. <http://marketing.astronomer.io/guide/> From
> > Volume
> > to Value - A Guide to Data Engineering.
> >
>
-- 
*Taylor Edmiston*
Blog <https://blog.tedmiston.com/> | CV
<https://stackoverflow.com/cv/taylor> | LinkedIn
<https://www.linkedin.com/in/tedmiston/> | AngelList
<https://angel.co/taylor> | Stack Overflow
<https://stackoverflow.com/users/149428/taylor-edmiston>