[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Follow up ideas, to simplify creating MonitoringInfos.

Hi Robert and community, :)

I was starting to code this up, but I wasn't sure exactly how to do some of the proto syntax. Would you mind taking a look at this doc and let me know if you know how to resolve any of these issues:
  • Using a map in an option.
  • Referring to string "constants" from other enum options in .proto files.
Please see the comments I have listed in the doc, and a few alternative suggestions.

On Wed, Oct 24, 2018 at 10:08 AM Alex Amato <ajamato@xxxxxxxxxx> wrote:
Okay. That makes sense. Using runtime validation and protos is what I was thinking as well.
I'll include you as a reviewer in my PRs.

As for the choice of a builder/constructor/factory. If we go with factory methods/constructor then we will need a method for each metric type (intCounter, latestInt64, ...). Plus, then I don't think we can have any constructor parameters for labels, we will just need to accept a dictionary for label keys+values like you say. This is because we cannot offer a method for each URN and its combination of labels, and we should avoid such static detection, as you say.

The builder however, allows us to add a method for setting each label. Since there are a small number of labels I think this should be fine. A specific metric URN will have a specific set of labels which we can validate being set. Any variant of this should use a different label (or a new version in the label). Thus, the builder can give an advantage, making it easier to set label values without needing to provide the correct key for the label, when its set. You just need to call the correct method.

Perhaps it might be best to leave this open to each SDK based on its language style (Builder, Factory, etc.) , but make sure we use the protos+runtime validation.

On Wed, Oct 24, 2018 at 7:02 AM Robert Bradshaw <robertwb@xxxxxxxxxx> wrote:
Thanks for bringing this to the list; it's a good question.

I think the difficulty comes from trying to statically define a lists
of possibilities that should instead be runtime values. E.g. we
currently we're up to about a dozen distinct types, and having a
setter for each is both verbose and not very extensible (especially
replicated cross language). I'm not sure the set of possible labels is
fixed either. Generally in the FnAPI we've been using shared constants
for this kind of thing instead. (I was wary about the protos for the
same reasons; it would be good to avoid leaking this through to each
of the various SDKs.) The amount of static typing/validation one gets
depends on how much logic you build into each of these methods (e.g.
does it (almost?) always "metric" which is assumed to already be
encoded correctly, or a specific type that has tradeoffs with the
amount you can do generically (e.g. we have code that first loops over
counters, then over distributions, then over gauges, and I don't think
we want continue that pattern all M places for all N types)).

I would probably lean towards mostly doing runtime validation here.
Specifically, one would have a data file that defines, for each known
URN, its type along with the set of permitted/expected/required
labels. On construction a MonitoringInfo data point, one would provide
a URN + value + labelMap, and optionally a type. On construction, the
constructor (method, factory, whatever) would look up the URN to
determine the type (or throw an error if it's both not known and not
provided), which could be then used to fetch an encoder of sorts (that
can go from value <-> proto encoding, possibly with some validation).
If labels and/or annotations are provided and the URN is known, we can
validate these as well.

As for proto/enums vs. yaml, the former is nice because code
generation comes for free, but has turned out to be much more verbose
(both the definition and the use) and I'm still on the fence whether
it's a net win.

(Unfortunately AutoValue won't help much here, as the ultimate goal is
to wrap a very nested proto OneOf, ideally with some validation.
(Also, in Python, generally, having optional, named arguments makes
this kind of builder pattern less needed.))

On Wed, Oct 24, 2018 at 4:12 AM Kenneth Knowles <kenn@xxxxxxxxxx> wrote:
> FWIW AutoValue will build most of that class for you, if it is as you say.
> Kenn
> On Tue, Oct 23, 2018 at 6:04 PM Alex Amato <ajamato@xxxxxxxxxx> wrote:
>> Hi Robert + beam dev list,
>> I was thinking about your feedback in PR#6205, and agree that this monitoring_infos.py became a bit big.
>> I'm working on the Java Implementation of this now, and want to incorporate some of these ideas and improve on this.
>> I that that I should make something like a MonitoringInfoBuilder class. With a few methods
>> setUrn
>> setTimestamp
>> setting the value (One method for each Type we support. Setting this will also set the type string)
>> setInt64CounterValue
>> setDoubleCounterValue
>> setLatestInt64
>> setTopNInt64
>> setMonitoringDataTable
>> setDistributionInt64
>> ...
>> setting labels (will set the key and value properly for the label)
>> setPTransform(value)
>> setPcollection(value)
>> ...
>> I think this will make building a metric much easier, you would just call the 4 methods and the .build(). These builders are common in Java. (I guess there is a similar thing way we could do in python? I'd like to go back and refactor that as well when I am done)
>> -------------
>> As for your other suggestion to define metrics in the proto/enum file instead of the yaml file. I am not too sure about the best strategy for this. My initial thoughts are:
>> Make a proto extension allowing you to describe/define a MonitoringInfo's (the same info as the metric_definitions.yaml file):
>> URN
>> Type
>> Labels required
>> Annotations: Description, Units, etc.
>> Make the builder read in that MonitoringInfo definision/description assert everything is set properly? I think this would be a decent data driven approach.
>> I was wondering if you had something else in mind?
>> Thanks
>> Alex