[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Announcement & Proposal: HDFS tests on large cluster.

Hi all,

I'd like to announce that thanks to Kamil Szewczyk, since this PR we have 4 file-based HDFS tests run on a "Large HDFS Cluster"! More specifically I mean:

- beam_PerformanceTests_Compressed_TextIOIT_HDFS
- beam_PerformanceTests_Compressed_TextIOIT_HDFS
- beam_PerformanceTests_AvroIOIT_HDFS
- beam_PerformanceTests_XmlIOIT_HDFS

The "Large HDFS Cluster" (in contrast to the small one, that is also available) consists of a master node and three data nodes all in separate pods. Thanks to that we can mimic more real-life scenarios on HDFS (3 distributed nodes) and possibly run bigger tests so there's progress! :)

I'm currently working on proper documentation for this so that everyone can use it in IOITs (stay tuned).

Regarding the above, I'd like to propose scaling up the Kubernetes cluster. AFAIK, currently, it consists of 1 node. If we scale it up to eg. 3 nodes, the HDFS' kubernetes pods will distribute themselves on different machines rather than one, making it an even more "real-life" scenario (possibly more efficient?). Moreover, other Performance Tests (such as JDBC or mongo) could use more space for their infrastructure as well. Scaling up the cluster could also turn out useful for some future efforts, like BEAM-4508[1] (adapting and running some old IOITs on Jenkins). 

WDYT? Are there any objections?

[1] https://issues.apache.org/jira/browse/BEAM-4508