osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Follow up ideas, to simplify creating MonitoringInfos.


I was unable to get this to compile and could find no examples of this on the proto github.

Would it be helpful to post the compiler output?

-Max

On 31.10.18 19:19, Lukasz Cwik wrote:
I see and don't know how to help you beyond what your already suggesting. From what I remember, maps were added as syntactic sugar of lists of key value pairs.

On Tue, Oct 30, 2018 at 5:37 PM Alex Amato <ajamato@xxxxxxxxxx <mailto:ajamato@xxxxxxxxxx>> wrote:

    I am not sure on the correct syntax to populate the instances of my
    MonitoringInfoSpec messages

    message MonitoringInfoSpec {

    string urn = 1;

    string type_urn = 2;

    repeated string required_labels = 3;

    *map<string, string> annotations = 4;*

    }


    Notice how the annotations field is not used anywhere. I was unable
    to get this to compile and could find no examples of this on the
    proto github. Perhaps I'll have to reach out to them. I was
    wondering if anyone here was familiar first.


    message MonitoringInfoSpecs {

    enum MonitoringInfoSpecsEnum {

       USER_COUNTER = 0 [(monitoring_info_spec) = {

         urn: "beam:metric:user",

         type_urn: "beam:metrics:sum_int_64",

       }];


       ELEMENT_COUNT = 1 [(monitoring_info_spec) = {

         urn: "beam:metric:element_count:v1",

         type_urn: "beam:metrics:sum_int_64",

         required_labels: ["PTRANSFORM"],

       }];


       START_BUNDLE_MSECS = 2 [(monitoring_info_spec) = {

         urn: "beam:metric:pardo_execution_time:start_bundle_msecs:v1",

         type_urn: "beam:metrics:sum_int_64",

         required_labels: ["PTRANSFORM"],

       }];


       PROCESS_BUNDLE_MSECS = 3 [(monitoring_info_spec) = {

         urn: "beam:metric:pardo_execution_time:process_bundle_msecs:v1",

         type_urn: "beam:metrics:sum_int_64",

         required_labels: ["PTRANSFORM"],

       }];


       FINISH_BUNDLE_MSECS = 4 [(monitoring_info_spec) = {

         urn: "beam:metric:pardo_execution_time:finish_bundle_msecs:v1",

         type_urn: "beam:metrics:sum_int_64",

         required_labels: ["PTRANSFORM"],

       }];


       TOTAL_MSECS = 5 [(monitoring_info_spec) = {

         urn: "beam:metric:ptransform_execution_time:total_msecs:v1",

         type_urn: "beam:metrics:sum_int_64",

         required_labels: ["PTRANSFORM"],

       }];

    }

    }




    On Tue, Oct 30, 2018 at 2:01 PM Lukasz Cwik <lcwik@xxxxxxxxxx
    <mailto:lcwik@xxxxxxxxxx>> wrote:

        I'm not sure what you mean by "Using a map in an option."

        For your second issue, the google docs around this show[1]:

        Note that if you want to use a custom option in a package other
        than the one in which it was defined, you must prefix the option
        name with the package name, just as you would for type names.
        For example:

        // foo.proto
        import "google/protobuf/descriptor.proto";
        package foo;
        extend google.protobuf.MessageOptions {
           optional string my_option = 51234;
        }

        // bar.proto
        import "foo.proto";
        package bar;
        message MyMessage {
           option (foo.my_option) = "Hello world!";
        }


        1:
        https://developers.google.com/protocol-buffers/docs/proto#customoptions


        On Mon, Oct 29, 2018 at 5:19 PM Alex Amato <ajamato@xxxxxxxxxx
        <mailto:ajamato@xxxxxxxxxx>> wrote:

            Hi Robert and community, :)

            I was starting to code this up, but I wasn't sure exactly
            how to do some of the proto syntax. Would you mind taking a
            look at this doc
            <https://docs.google.com/document/d/1SB59MMVZXO0Aa6w0gf4m0qM4oYt4SiofDq3QxnpQaK4/edit?usp=sharing>
            and let me know if you know how to resolve any of these issues:

              * Using a map in an option.
              * Referring to string "constants" from other enum options
                in .proto files.

            Please see the comments I have listed in the doc
            <https://docs.google.com/document/d/1SB59MMVZXO0Aa6w0gf4m0qM4oYt4SiofDq3QxnpQaK4/edit?usp=sharing>,
            and a few alternative suggestions.
            Thanks

            On Wed, Oct 24, 2018 at 10:08 AM Alex Amato
            <ajamato@xxxxxxxxxx <mailto:ajamato@xxxxxxxxxx>> wrote:

                Okay. That makes sense. Using runtime validation and
                protos is what I was thinking as well.
                I'll include you as a reviewer in my PRs.

                As for the choice of a builder/constructor/factory. If
                we go with factory methods/constructor then we will need
                a method for each metric type (intCounter, latestInt64,
                ...). Plus, then I don't think we can have any
                constructor parameters for labels, we will just need to
                accept a dictionary for label keys+values like you say.
                This is because we cannot offer a method for each URN
                and its combination of labels, and we should avoid such
                static detection, as you say.

                The builder however, allows us to add a method for
                setting each label. Since there are a small number of
                labels I think this should be fine. A specific metric
                URN will have a specific set of labels which we can
                validate being set. Any variant of this should use a
                different label (or a new version in the label). Thus,
                the builder can give an advantage, making it easier to
                set label values without needing to provide the correct
                key for the label, when its set. You just need to call
                the correct method.

                Perhaps it might be best to leave this open to each SDK
                based on its language style (Builder, Factory, etc.) ,
                but make sure we use the protos+runtime validation.

                On Wed, Oct 24, 2018 at 7:02 AM Robert Bradshaw
                <robertwb@xxxxxxxxxx <mailto:robertwb@xxxxxxxxxx>> wrote:

                    Thanks for bringing this to the list; it's a good
                    question.

                    I think the difficulty comes from trying to
                    statically define a lists
                    of possibilities that should instead be runtime
                    values. E.g. we
                    currently we're up to about a dozen distinct types,
                    and having a
                    setter for each is both verbose and not very
                    extensible (especially
                    replicated cross language). I'm not sure the set of
                    possible labels is
                    fixed either. Generally in the FnAPI we've been
                    using shared constants
                    for this kind of thing instead. (I was wary about
                    the protos for the
                    same reasons; it would be good to avoid leaking this
                    through to each
                    of the various SDKs.) The amount of static
                    typing/validation one gets
                    depends on how much logic you build into each of
                    these methods (e.g.
                    does it (almost?) always "metric" which is assumed
                    to already be
                    encoded correctly, or a specific type that has
                    tradeoffs with the
                    amount you can do generically (e.g. we have code
                    that first loops over
                    counters, then over distributions, then over gauges,
                    and I don't think
                    we want continue that pattern all M places for all N
                    types)).

                    I would probably lean towards mostly doing runtime
                    validation here.
                    Specifically, one would have a data file that
                    defines, for each known
                    URN, its type along with the set of
                    permitted/expected/required
                    labels. On construction a MonitoringInfo data point,
                    one would provide
                    a URN + value + labelMap, and optionally a type. On
                    construction, the
                    constructor (method, factory, whatever) would look
                    up the URN to
                    determine the type (or throw an error if it's both
                    not known and not
                    provided), which could be then used to fetch an
                    encoder of sorts (that
                    can go from value <-> proto encoding, possibly with
                    some validation).
                    If labels and/or annotations are provided and the
                    URN is known, we can
                    validate these as well.

                    As for proto/enums vs. yaml, the former is nice
                    because code
                    generation comes for free, but has turned out to be
                    much more verbose
                    (both the definition and the use) and I'm still on
                    the fence whether
                    it's a net win.

                    (Unfortunately AutoValue won't help much here, as
                    the ultimate goal is
                    to wrap a very nested proto OneOf, ideally with some
                    validation.
                    (Also, in Python, generally, having optional, named
                    arguments makes
                    this kind of builder pattern less needed.))

                    On Wed, Oct 24, 2018 at 4:12 AM Kenneth Knowles
                    <kenn@xxxxxxxxxx <mailto:kenn@xxxxxxxxxx>> wrote:
                     >
                     > FWIW AutoValue will build most of that class for
                    you, if it is as you say.
                     >
                     > Kenn
                     >
                     > On Tue, Oct 23, 2018 at 6:04 PM Alex Amato
                    <ajamato@xxxxxxxxxx <mailto:ajamato@xxxxxxxxxx>> wrote:
                     >>
                     >> Hi Robert + beam dev list,
                     >>
                     >> I was thinking about your feedback in PR#6205,
                    and agree that this monitoring_infos.py became a bit
                    big.
                     >>
                     >> I'm working on the Java Implementation of this
                    now, and want to incorporate some of these ideas and
                    improve on this.
                     >>
                     >> I that that I should make something like a
                    MonitoringInfoBuilder class. With a few methods
                     >>
                     >> setUrn
                     >> setTimestamp
                     >> setting the value (One method for each Type we
                    support. Setting this will also set the type string)
                     >>
                     >> setInt64CounterValue
                     >> setDoubleCounterValue
                     >> setLatestInt64
                     >> setTopNInt64
                     >> setMonitoringDataTable
                     >> setDistributionInt64
                     >> ...
                     >>
                     >> setting labels (will set the key and value
                    properly for the label)
                     >>
                     >> setPTransform(value)
                     >> setPcollection(value)
                     >> ...
                     >>
                     >>
                     >> I think this will make building a metric much
                    easier, you would just call the 4 methods and the
                    .build(). These builders are common in Java. (I
                    guess there is a similar thing way we could do in
                    python? I'd like to go back and refactor that as
                    well when I am done)
                     >>
                     >> -------------
                     >>
                     >> As for your other suggestion to define metrics
                    in the proto/enum file instead of the yaml file. I
                    am not too sure about the best strategy for this. My
                    initial thoughts are:
                     >>
                     >> Make a proto extension allowing you to
                    describe/define a MonitoringInfo's (the same info as
                    the metric_definitions.yaml file):
                     >>
                     >> URN
                     >> Type
                     >> Labels required
                     >> Annotations: Description, Units, etc.
                     >>
                     >> Make the builder read in that MonitoringInfo
                    definision/description assert everything is set
                    properly? I think this would be a decent data driven
                    approach.
                     >>
                     >> I was wondering if you had something else in mind?
                     >>
                     >> Thanks
                     >> Alex
                     >>
                     >>