osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [JIRA] -ARROW-1780 - JDBC Adapter - resolved.


Hello Atul,

sorry for the long turnaround time. I finally had the time to spin up the code from Python. I simply did some tests with a table of New York Taxi trip data and Apache Drill. Using the bundled JDBC driver and JayDeBeAPI, the default for accessing JDBC from Python, it took 11 minutes to retrieve 739373 rows from the DB to Pandas. Using the Arrow JDBC adapter instead, this did run in 3.8s on my laptop instead. This is only 4 times slower than loading the backing Parquet file directly. This is a massive improvement.

I will try to look at a bit more about making it simpler to use from Python but this a really great example about how Arrow connects ecosystems at speed.

Regards,
Uwe

On Fri, Jun 22, 2018, at 12:41 PM, Atul Dambalkar wrote:
> Hi Wes, Uwe, Sid, Laurent,
> 
> I have now marked the JDBC Adapter related JIRA 
> (https://issues.apache.org/jira/browse/ARROW-1780) as resolved. Uwe/Wes 
> had already marked the feature for 0.10.0 release. I will continue to 
> monitor and support the feature for any issues. I remember, Uwe wanted 
> to use it in his development.
> 
> Appreciate your inputs during the development of this feature.
> 
> Regards,
> -Atul