OSDir


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Reducing database connection with JdbcIO


side note: try to do a thread dump on the workers maybe


Romain Manni-Bucau
@rmannibucau |  Blog | Old BlogGithub | LinkedIn | Book

2018-03-14 20:21 GMT+01:00 Eugene Kirpichov <kirpichov@xxxxxxxxxx>:
"Jdbcio will create for each prepared statement new connection" - this is not the case: the connection is created in @Setup and deleted in @Teardown.

Something else must be going wrong.

On Wed, Mar 14, 2018 at 12:11 PM Aleksandr <aleksandr.vl@xxxxxxxxx> wrote:
Hello, we had similar problem. Current jdbcio will cause alot of connection errors. 

Typically you have more than one prepared statement. Jdbcio will create for each prepared statement new connection(and close only in teardown) So it is possible that connection will get timeot or in case in case of auto scaling you will get to many connections to sql.
Our solution was to create connection pool in setup and get connection and return back to pool in processElement.

Best Regards,
Aleksandr Gortujev.

14. märts 2018 8:52 PM kirjutas kuupäeval "Jean-Baptiste Onofré" <jb@xxxxxxxxxxxx>:
Agree especially using the current JdbcIO impl that creates connection in the @Setup. Or it means that @Teardown is never called ?

Regards
JB
Le 14 mars 2018, à 11:40, Eugene Kirpichov <kirpichov@xxxxxxxxxx> a écrit:
Hi Derek - could you explain where does the "3000 connections" number come from, i.e. how did you measure it? It's weird that 5-6 workers would use 3000 connections.

On Wed, Mar 14, 2018 at 3:50 AM Derek Chan <derekcsy@xxxxxxxxx> wrote:
Hi,

We are new to Beam and need some help.

We are working on a flow to ingest events and writes the aggregated
counts to a database. The input rate is rather low (~2000 message per
sec), but the processing is relatively heavy, that we need to scale out
to 5~6 nodes. The output (via JDBC) is aggregated, so the volume is also
low. But because of the number of workers, it keeps 3000 connections to
the database and it keeps hitting the database connection limits.

Is there a way that we can reduce the concurrency only at the output
stage? (In Spark we would have done a repartition/coalesce).

And, if it matters, we are using Apache Beam 2.2 via Scio, on Google
Dataflow.

Thank you in advance!