osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] More precision supported by DATETIME field in Schema


Thanks Reuven!

I think Reuven gives the third option:

Change internal representation of DATETIME field in Row. Still keep public ReadableDateTime getDateTime(String fieldName) API to be compatible with existing code. And I think we could add one more API to getDataTimeNanosecond. This option is different from the option one because option one actually maintains two implementation of time.

-Rui

On Mon, Nov 5, 2018 at 9:26 PM Reuven Lax <relax@xxxxxxxxxx> wrote:
I would vote that we change the internal representation of Row to something other than Joda. Java 8 times would give us at least microseconds, and if we want nanoseconds we could simply store it as a number.

We should still keep accessor methods that return and take Joda objects, as the rest of Beam still depends on Joda.

Reuven

On Mon, Nov 5, 2018 at 9:21 PM Rui Wang <ruwang@xxxxxxxxxx> wrote:
Hi Community,

The DATETIME field in Beam Schema/Row is implemented by Joda's Datetime (see Row.java#L611 and Row.java#L169). Joda's Datetime is limited to the precision of millisecond. It has good enough precision to represent timestamp of event time, but it is not enough for the real "time" data. For the "time" type data, we probably need to support even up to the precision of nanosecond.

Unfortunately, Joda decided to keep the precision of millisecond: https://github.com/JodaOrg/joda-time/issues/139.

If we want to support the precision of nanosecond, we could have two options:

Option one: utilize current FieldType's metadata field, such that we could set something into meta data and Row could check the metadata to decide what's saved in DATETIME field: Joda's Datetime or an implementation that supports nanosecond.

Option two: have another field (maybe called TIMESTAMP field?), to have an implementation to support higher precision of time.

What do you think about the need of higher precision for time type and which option is preferred?

-Rui