Re: How to optimize the performance of Beam on Spark(Internet mail)
did you compare the stages in the Spark UI in order to identify which
stage is taking time ?
You use spark-submit in both cases for the bootstrapping ?
I will do a test here as well.
On 19/09/2018 05:34, devinduan(段丁瑞) wrote:
> Thanks for you reply.
> Our team plan to use Beam instead of Spark, So I'm testing the
> performance of Beam API.
> I'm coding some example through Spark API and Beam API , like
> "WordCount" , "Join", "OrderBy", "Union" ...
> I use the same Resources and configuration to run these Job.
> Tim said I should remove "withNumShards(1)" and
> set spark.default.parallelism=32. I did it and tried again, but Beam job
> still running very slowly.
> Here is My Beam code and Spark code:
> Beam "WordCount":
> Spark "WordCount":
> I will try the other example later.
> *From:* Jean-Baptiste Onofré <mailto:jb@xxxxxxxxxxxx>
> *Date:* 2018-09-18 22:43
> *To:* dev@xxxxxxxxxxxxxxx <mailto:dev@xxxxxxxxxxxxxxx>
> *Subject:* Re: How to optimize the performance of Beam on
> Spark(Internet mail)
> The first huge difference is the fact that the spark runner still uses
> RDD whereas directly using spark, you are using dataset. A bunch of
> optimization in spark are related to dataset.
> I started a large refactoring of the spark runner to leverage Spark 2.x
> (and dataset).
> It's not yet ready as it includes other improvements (the portability
> layer with Job API, a first check of state API, ...).
> Anyway, by Spark wordcount, you mean the one included in the spark
> distribution ?
> On 18/09/2018 08:39, devinduan(段丁瑞) wrote:
> > Hi，
> > I'm testing Beam on Spark.
> > I use spark example code WordCount processing 1G data file, cost 1
> > minutes.
> > However, I use Beam example code WordCount processing the same
> > cost 30minutes.
> > My Spark parameter is : --deploy-mode client
> --executor-memory 1g
> > --num-executors 1 --driver-memory 1g
> > My Spark version is 2.3.1, Beam version is 2.5
> > Is there any optimization method?
> > Thank you.
> Jean-Baptiste Onofré
> Talend - http://www.talend.com
Talend - http://www.talend.com