OSDir


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Announcement & Proposal: HDFS tests on large cluster.


Hi all,

as a positive outcome of extending kubernetes cluster at the bottom of the https://builds.apache.org/view/A-D/view/Beam/job/beam_PerformanceTests_Analysis/37/consoleText and on dedicated slack channel https://apachebeam.slack.com/messages/CAB3W69SS/ we can observe better stability of the tests after cluster resize. Most of the execution times slightly decreased and finally, all tests were executed and analysed.

Thanks,
Kamil Szewczyk



2018-06-08 13:13 GMT+02:00 Łukasz Gajowy <lukasz.gajowy@xxxxxxxxx>:
@Pablo this is exactly as Chamikara says. In fact, there is a dedicated Gcloud project for whole testing infrastructure (called "apache-beam-testing"). It provides the Kubernetes cluster for the data stores as well as big query storage for the test results presented in the testing dashboard. 

@Alan thanks a lot!  

Best regards, 
Łukasz 



czw., 7 cze 2018 o 22:37 Chamikara Jayalath <chamikara@xxxxxxxxxx> napisał(a):
We still use Jenkins machines to execute the test but data stores are hosted in Kubernetes.

On Thu, Jun 7, 2018 at 1:35 PM Pablo Estrada <pabloem@xxxxxxxxxx> wrote:
Just out of curiosity: This does not use the Jenkins machines then?
-P.

On Thu, Jun 7, 2018 at 1:33 PM Alan Myrvold <amyrvold@xxxxxxxxxx> wrote:
Done. Changed the size of the io-datastores kubernetes cluster in apache-beam-testing to 3 nodes.

On Thu, Jun 7, 2018 at 1:45 AM Kamil Szewczyk <szewinho@xxxxxxxxx> wrote:
Hi, 

the node pool size of io-datastores kubernetes cluster in apache-beam-testing project must be changed from 1 -> 3 (or other value).
@Alan Myrvold was already helpful with kubernetes cluster settings so far, but I am not aware who made decisions regarding that as 
this will increase monthly billing. 

Kamil Szewczyk

2018-06-07 6:27 GMT+02:00 Kenneth Knowles <klk@xxxxxxxxxx>:
This is rad. Another +1 from me for a bigger cluster. What do you need to make that happen?

Kenn

On Wed, Jun 6, 2018 at 10:16 AM Pablo Estrada <pabloem@xxxxxxxxxx> wrote:
This is really cool!

+1 for having a cluster with more than one machine run the test.

-P.

On Wed, Jun 6, 2018 at 9:57 AM Chamikara Jayalath <chamikara@xxxxxxxxxx> wrote:
On Wed, Jun 6, 2018 at 5:19 AM Łukasz Gajowy <lukasz.gajowy@xxxxxxxxx> wrote:

Hi all,

I'd like to announce that thanks to Kamil Szewczyk, since this PR we have 4 file-based HDFS tests run on a "Large HDFS Cluster"! More specifically I mean:

- beam_PerformanceTests_Compressed_TextIOIT_HDFS
- beam_PerformanceTests_Compressed_TextIOIT_HDFS
- beam_PerformanceTests_AvroIOIT_HDFS
- beam_PerformanceTests_XmlIOIT_HDFS

The "Large HDFS Cluster" (in contrast to the small one, that is also available) consists of a master node and three data nodes all in separate pods. Thanks to that we can mimic more real-life scenarios on HDFS (3 distributed nodes) and possibly run bigger tests so there's progress! :)


This is great. Also, looks like results are available in test dashboard: https://apache-beam-testing.appspot.com/explore?dashboard=5755685136498688
(BTW we should add information about dashboard to the testing doc: https://beam.apache.org/contribute/testing/)

I'm currently working on proper documentation for this so that everyone can use it in IOITs (stay tuned).

Regarding the above, I'd like to propose scaling up the Kubernetes cluster. AFAIK, currently, it consists of 1 node. If we scale it up to eg. 3 nodes, the HDFS' kubernetes pods will distribute themselves on different machines rather than one, making it an even more "real-life" scenario (possibly more efficient?). Moreover, other Performance Tests (such as JDBC or mongo) could use more space for their infrastructure as well. Scaling up the cluster could also turn out useful for some future efforts, like BEAM-4508[1] (adapting and running some old IOITs on Jenkins). 

WDYT? Are there any objections?

+1 for increasing the size of Kubernetes cluster. 
--
Got feedback? go/pabloem-feedback

--
Got feedback? go/pabloem-feedback