Luke, Alex, I have some portable metrics interrogations, can you confirm them ?
1 - As it is the SDK harness that will run the code of the UDFs, if a UDF defines a metric, then the SDK harness will give updates through GRPC calls to the runner so that the runner could update metrics cells, right?
2 - Alex, you mentioned in proto and design doc that there will be no aggreagation of metrics. But some runners (spark/flink) rely on accumulators and when they are merged, it triggers the merging of the whole chain to the metric cells. I know that Dataflow does not do the same, it uses non agregated metrics and sends them to an aggregation service. Will there be a change of paradigm with portability for runners that merge themselves ?
There will be local aggregation of metrics scoped to a bundle; after the bundle is finished processing they are discarded. This will require some kind of global aggregation support from a runner, whether that runner does it via accumulators or via an aggregation service is up to the runner.
3 - Please confirm that the distinction between attempted and committed metrics is not the business of portable metrics. Indeed, it does not involve communication between the runner harness and the SDK harness as it is a runner only matter. I mean, when a runner commits a bundle it just updates its committed metrics and do not need to inform the SDK harness. But, of course, when the user requests committed metrics through the SDK, then the SDK harness will ask the runner harness to give them.
You are correct in saying that during execution, the SDK does not differentiate between attempted and committed metrics and only the runner does. We still lack an API definition and contract for how an SDK would query for metrics from a runner but your right in saying that an SDK could request committed metrics and the Runner would supply them some how.