[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Announcement & Proposal: HDFS tests on large cluster.


the node pool size of io-datastores kubernetes cluster in apache-beam-testing project must be changed from 1 -> 3 (or other value).
@Alan Myrvold was already helpful with kubernetes cluster settings so far, but I am not aware who made decisions regarding that as 
this will increase monthly billing. 

Kamil Szewczyk

2018-06-07 6:27 GMT+02:00 Kenneth Knowles <klk@xxxxxxxxxx>:
This is rad. Another +1 from me for a bigger cluster. What do you need to make that happen?


On Wed, Jun 6, 2018 at 10:16 AM Pablo Estrada <pabloem@xxxxxxxxxx> wrote:
This is really cool!

+1 for having a cluster with more than one machine run the test.


On Wed, Jun 6, 2018 at 9:57 AM Chamikara Jayalath <chamikara@xxxxxxxxxx> wrote:
On Wed, Jun 6, 2018 at 5:19 AM Łukasz Gajowy <lukasz.gajowy@xxxxxxxxx> wrote:

Hi all,

I'd like to announce that thanks to Kamil Szewczyk, since this PR we have 4 file-based HDFS tests run on a "Large HDFS Cluster"! More specifically I mean:

- beam_PerformanceTests_Compressed_TextIOIT_HDFS
- beam_PerformanceTests_Compressed_TextIOIT_HDFS
- beam_PerformanceTests_AvroIOIT_HDFS
- beam_PerformanceTests_XmlIOIT_HDFS

The "Large HDFS Cluster" (in contrast to the small one, that is also available) consists of a master node and three data nodes all in separate pods. Thanks to that we can mimic more real-life scenarios on HDFS (3 distributed nodes) and possibly run bigger tests so there's progress! :)

This is great. Also, looks like results are available in test dashboard: https://apache-beam-testing.appspot.com/explore?dashboard=5755685136498688
(BTW we should add information about dashboard to the testing doc: https://beam.apache.org/contribute/testing/)

I'm currently working on proper documentation for this so that everyone can use it in IOITs (stay tuned).

Regarding the above, I'd like to propose scaling up the Kubernetes cluster. AFAIK, currently, it consists of 1 node. If we scale it up to eg. 3 nodes, the HDFS' kubernetes pods will distribute themselves on different machines rather than one, making it an even more "real-life" scenario (possibly more efficient?). Moreover, other Performance Tests (such as JDBC or mongo) could use more space for their infrastructure as well. Scaling up the cluster could also turn out useful for some future efforts, like BEAM-4508[1] (adapting and running some old IOITs on Jenkins). 

WDYT? Are there any objections?

+1 for increasing the size of Kubernetes cluster. 
Got feedback? go/pabloem-feedback