Re: [DISCUSS] More precision supported by DATETIME field in Schema
Robert - unfortunately I think changing Beam's element timestamps is not backwards compatible, and will have to wait till Beam 3.0.
+1 to offering more granular timestamps in general. I think it will be
odd if setting the element timestamp from a row DATETIME field is
lossy, so we should seriously consider upgrading that as well.
On Tue, Nov 6, 2018 at 6:42 AM Charles Chen <ccy@xxxxxxxxxx> wrote:
> One related issue that came up before is that we (perhaps unnecessarily) restrict the precision of timestamps in the Python SDK to milliseconds because of legacy reasons related to the Java runner's use of Joda time. Perhaps Beam portability should natively use a more granular timestamp unit.
> On Mon, Nov 5, 2018 at 9:34 PM Rui Wang <ruwang@xxxxxxxxxx> wrote:
>> Thanks Reuven!
>> I think Reuven gives the third option:
>> Change internal representation of DATETIME field in Row. Still keep public ReadableDateTime getDateTime(String fieldName) API to be compatible with existing code. And I think we could add one more API to getDataTimeNanosecond. This option is different from the option one because option one actually maintains two implementation of time.
>> On Mon, Nov 5, 2018 at 9:26 PM Reuven Lax <relax@xxxxxxxxxx> wrote:
>>> I would vote that we change the internal representation of Row to something other than Joda. Java 8 times would give us at least microseconds, and if we want nanoseconds we could simply store it as a number.
>>> We should still keep accessor methods that return and take Joda objects, as the rest of Beam still depends on Joda.
>>> On Mon, Nov 5, 2018 at 9:21 PM Rui Wang <ruwang@xxxxxxxxxx> wrote:
>>>> Hi Community,
>>>> The DATETIME field in Beam Schema/Row is implemented by Joda's Datetime (see Row.java#L611 and Row.java#L169). Joda's Datetime is limited to the precision of millisecond. It has good enough precision to represent timestamp of event time, but it is not enough for the real "time" data. For the "time" type data, we probably need to support even up to the precision of nanosecond.
>>>> Unfortunately, Joda decided to keep the precision of millisecond: https://github.com/JodaOrg/joda-time/issues/139.
>>>> If we want to support the precision of nanosecond, we could have two options:
>>>> Option one: utilize current FieldType's metadata field, such that we could set something into meta data and Row could check the metadata to decide what's saved in DATETIME field: Joda's Datetime or an implementation that supports nanosecond.
>>>> Option two: have another field (maybe called TIMESTAMP field?), to have an implementation to support higher precision of time.
>>>> What do you think about the need of higher precision for time type and which option is preferred?