[GitHub] stevedlawrence opened a new pull request #77: Modifications to IO layer to support streaming input data
stevedlawrence opened a new pull request #77: Modifications to IO layer to support streaming input data
- Modify the ByteBufferDataInputStream to no longer depend on
ByteBuffers. It is now an InputSourceDataInputStream, and one can
implement a new InputSource interface to provide Bytes. This also
includes various changes like how bitPos and bitLimit are stored to
simplify code (e.g. no offsets) and no longer requires a bitLimit to
be set, since not all inputs may know a bit limit. Moves TLState
members into the PState--the InputSourceDataInputStream class is now
created by the user and so isn't necessarily created in thread from
which it will be used, breaking the ThreadLocal functionality.
- Create two InputSource implementations, one using a ByteBuffer as the
data store and one using a bucketing algorithm to support files larger
than Int.MaxValue and can free up data that can no longer be
- Remove the DataLimits, most of these values weren't actually used.
Instead, create new tunables for those that were and use those where
- When we decode data to characters, we need to know exactly how many
bits were used to decode each character. The Java decoders do not
provide this information, requiring a lot of complex code to keep
track, and even then there were bugs. This creates our own Decoders
that provides the exact information we need and allow for further
modifications that may be needed for things like
dfdl:errorEncodingPolicies and dfdl:utf16Width.
- Remove the reporting and replacing decoders. Instead, we now just have
a single decoder and it handles replacing/reporting based on the
format info. Another benefit of our custom Decoders.
- Modifies the parse Scala/Java API to expect an
InputSourceDataInputStream, created by the API user. Other public API
methods are deprecated and are modified to
create an InputSourceDataInputStream behind the scenes. API functions
are also simplified to not take in a bit starting position or bit
limit. This were really only used for testing, and were used fairly
rarely. Alternative methods are used to set these values where
- Adds the --stream option to the CLI parse subcommand. When this is
provided, if there is left over data at the end of a parse, the CLI
will perform a new parse continuing where the previous left off.
- Modify the TDMLRunner to be based on java.nio.InputStreams rather than
Channels. It was already using streams for everything and then just
wrapping with a channel without provided any actual benefit.
- Fix issue where isAtEnd sometimes does not return a correct value. It
now queries the underlying data stream to determine if there is more
data or not, rather than relying on bitLimit which might not always be
DAFFODIL-934, DAFFODIL-931, DAFFODIL-1065, DAFFODIL-1565
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
Apache Git Services