Thank you everybody for your feedback! I think we can conclude that the most popular option, according to discussion above, is number 3. Not sure if we need to do a separate vote for that but, please, let me know if we need. So, for now, I’d split a work into the following steps: a) Create new module "hadoop-mapreduce-format” which implements support for MapReduce OutputFormat through new HadoopMapreduceFormat.Write class. For that, I just need to change a bit my already created PR 6306 that I added recently (renaming of module and class names). b) Move all source and test classes of “hadoop-input-format” into the module "hadoop-mapreduce-format” and create new class HadoopMapreduceFormat.Read there to support MapReduce InputFormat. c) Make old HadoopInputFormat.Read (in old “hadoop-input-format” module) deprecated and as proxy class to newly created HadoopMapreduceFormat.Read (to keep API compatibility) These 3 steps should be performed and completed within one release cycle (approx. in 2.8). For steps “b” and “c” I’d create another PR to avoid having a huge commit if it will include step “a” as well. Then, in next release after: d) Remove completely module “hadoop-input-format” (approx. in 2.9). Other two Hadoop modules (common and file-system) we leave as it is. I hope that this a correct summary of what community decided and I can move forward. Please, let me know if there any objections against this plan or other suggestions.
|