OSDir


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Pentaho to Airflow


Thanks. I'll take a look.

On Wed, Jun 6, 2018, 3:37 PM Ben Gregory <ben@xxxxxxxxxxxxx> wrote:

> Hey Arash --
>
> We wrote this for a similar use case to yours (as I understand it). It's an
> opinionated operator (assumes loading data from AWS S3) but it has an
> pseudo-"upsert" (INSERT ... ON DUPLICATE KEY UPDATE) method for loading
> data so you might be able to adapt to your needs.
>
>
> https://github.com/airflow-plugins/mysql_plugin/blob/master/operators/s3_to_mysql_operator.py#L9
>
> -Ben
>
> On Tue, Jun 5, 2018 at 8:55 PM Arash Soheili <tonyarash@xxxxxxxxx> wrote:
>
> > I have looked through those and didn't find what I needed. Although there
> > is the mysql operator and I have used that to implement and insert or
> > update.
> >
> > I was looking for something like this
> >
> >
> https://wiki.pentaho.com/plugins/servlet/mobile?contentId=8292089#content/view/8292089
> > .
> > A way to bulk insert or update based on lookup key. What would be the
> most
> > optimized way to do this in Airflow?
> >
> > On Tue, Jun 5, 2018, 9:47 PM Taylor Edmiston <tedmiston@xxxxxxxxx>
> wrote:
> >
> > > Hey Arash -
> > >
> > > There are some common operators built-in
> > > <
> >
> https://github.com/apache/incubator-airflow/tree/master/airflow/operators
> > > >
> > > to Airflow and some in contrib
> > > <
> > >
> >
> https://github.com/apache/incubator-airflow/tree/master/airflow/contrib/operators
> > > >
> > > as well.
> > >
> > > We also maintain a community sourced GitHub org of Airflow plugins
> > (mostly
> > > hooks and operators) at https://github.com/airflow-plugins.
> > >
> > > Are there specific sources/destinations you're looking for to match
> what
> > > you use in Pentaho?
> > >
> > > Best,
> > > Taylor
> > >
> > > *Taylor Edmiston*
> > > Blog <https://blog.tedmiston.com/> | CV
> > > <https://stackoverflow.com/cv/taylor> | LinkedIn
> > > <https://www.linkedin.com/in/tedmiston/> | AngelList
> > > <https://angel.co/taylor> | Stack Overflow
> > > <https://stackoverflow.com/users/149428/taylor-edmiston>
> > >
> > >
> > > On Tue, Jun 5, 2018 at 8:57 PM, Arash Soheili <tonyarash@xxxxxxxxx>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > I'm new to Airlfow and helping to setup our organization to
> transition
> > > away
> > > > from using Pentaho Data Integration for our ETL. Although there are a
> > lot
> > > > of things I don't like about Pentaho they do have some nice standard
> > > > modules like batch databae insert/update which are common ETL tasks.
> > > >
> > > > As I'm new to Airflow I haven't seen any standard Operators for this
> > kind
> > > > of task which I would think would be a common use case in Airflow or
> > any
> > > > ETL. Am I missing this information or is it expected upon each
> Airflow
> > > > users to implement their own standard operators for this kind of
> > > operation.
> > > > I would think this should at some point become part of Airflow
> > codebase.
> > > >
> > > > Arash
> > > >
> > >
> >
>
>
> --
>
> [image: Astronomer Logo] <https://www.astronomer.io/>
>
> *Ben Gregory*
> Data Engineer
>
> Mobile: +1-615-483-3653 • Online: astronomer.io <
> https://www.astronomer.io/>
>
> Download our new ebook. <http://marketing.astronomer.io/guide/> From
> Volume
> to Value - A Guide to Data Engineering.
>