[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[GitHub] stevedlawrence opened a new pull request #77: Modifications to IO layer to support streaming input data

stevedlawrence opened a new pull request #77: Modifications to IO layer to support streaming input data
URL: https://github.com/apache/incubator-daffodil/pull/77
   - Modify the ByteBufferDataInputStream to no longer depend on
     ByteBuffers. It is now an InputSourceDataInputStream, and one can
     implement a new InputSource interface to provide Bytes. This also
     includes various changes like how bitPos and bitLimit are stored to
     simplify code (e.g. no offsets) and no longer requires a bitLimit to
     be set, since not all inputs may know a bit limit. Moves TLState
     members into the PState--the InputSourceDataInputStream class is now
     created by the user and so isn't necessarily created in thread from
     which it will be used, breaking the ThreadLocal functionality.
   - Create two InputSource implementations, one using a ByteBuffer as the
     data store and one using a bucketing algorithm to support files larger
     than Int.MaxValue and can free up data that can no longer be
     backtracked to.
   - Remove the DataLimits, most of these values weren't actually used.
     Instead, create new tunables for those that were and use those where
   - When we decode data to characters, we need to know exactly how many
     bits were used to decode each character. The Java decoders do not
     provide this information, requiring a lot of complex code to keep
     track, and even then there were bugs. This creates our own Decoders
     that provides the exact information we need and allow for further
     modifications that may be needed for things like
     dfdl:errorEncodingPolicies and dfdl:utf16Width.
   - Remove the reporting and replacing decoders. Instead, we now just have
     a single decoder and it handles replacing/reporting based on the
     format info. Another benefit of our custom Decoders.
   - Modifies the parse Scala/Java API to expect an
     InputSourceDataInputStream, created by the API user. Other public API
     methods are deprecated and are modified to
     create an InputSourceDataInputStream behind the scenes. API functions
     are also simplified to not take in a bit starting position or bit
     limit. This were really only used for testing, and were used fairly
     rarely. Alternative methods are used to set these values where
   - Adds the --stream option to the CLI parse subcommand. When this is
     provided, if there is left over data at the end of a parse, the CLI
     will perform a new parse continuing where the previous left off.
   - Modify the TDMLRunner to be based on java.nio.InputStreams rather than
     Channels. It was already using streams for everything and then just
     wrapping with a channel without provided any actual benefit.
   - Fix issue where isAtEnd sometimes does not return a correct value. It
     now queries the underlying data stream to determine if there is more
     data or not, rather than relying on bitLimit which might not always be

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

With regards,
Apache Git Services