[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

multiple JSON documents in one file, change proposal

On Sat, Dec 1, 2018 at 10:16 PM Marko Rauhamaa <marko at pacujo.net> wrote:
> Chris Angelico <rosuav at gmail.com>:
> > On Sat, Dec 1, 2018 at 9:16 PM Marko Rauhamaa <marko at pacujo.net> wrote:
> >> The need for the format to be "typable" (and editable) is essential
> >> for ad-hoc manual testing of components. That precludes all framing
> >> formats that would necessitate a length prefix. HTTP would be
> >> horrible to have to type even without the content-length problem, but
> >> BEEP (RFC 3080) would suffer from the content-length (and CRLF!)
> >> issue as well.
> >
> > I dunno, I type HTTP manually often enough that it can't be all *that*
> > horrible.
> Say I want to send this piece of JSON:
>    {
>        "msgtype": "echo-req",
>        "opid": 3487547843
>    }
> and the framing format is HTTP. I will need to type something like this:
>    POST / HTTP/1.1^M
>    Host: localhost^M
>    Content-type: application/json^M
>    Content-length: 54^M
>    ^M
>    {
>        "msgtype": "echo-req",
>        "opid": 3487547843
>    }
> That's almost impossible to type without a syntax error.

1) Set your Enter key to send CR-LF, at least for this operation.
That's half your problem solved.
2) Send the request like this:

Content-type: application/json

{"msgtype": "echo-req", "opid": 3487547843}

Then shut down your end of the connection, probably with Ctrl-D. I'm
fairly sure I can type that without bugs, and any compliant HTTP
server should be fine with it.

> >> Finally, couldn't any whitespace character work as a terminator? Yes,
> >> it could, but it would force you to use a special JSON parser that is
> >> prepared to handle the self-delineation. A NUL gives you many more
> >> degrees of freedom in choosing your JSON tools.
> >
> > Either non-delimited or newline-delimited JSON is supported in a lot
> > of tools. I'm quite at a loss here as to how an unprintable character
> > gives you more freedom.
> As stated by Paul in another context, newline-delimited is a no-go
> because it forces you to restrict JSON to a subset that doesn't contain
> newlines (see the JSON example above).
> Of course, you could say that the terminating newline is only
> interpreted as a terminator after a complete JSON value, but that's not
> the format "supported in a lot of tools".

The subset in question is simply "JSON without any newlines between
tokens", which has the exact meaning as it would have *with* those
newlines. So what you lose is the human-readability of being able to
break an object over multiple lines. Is that a problem? Use
non-delimited instead.

> If you use any legal JSON character as a terminator, you have to make it
> contextual or add an escaping syntax.

Or just use non-delimited, strip all whitespace between objects, and
then special-case the one otherwise-ambiguous situation of two Numbers
back to back. Anything that sends newline-delimited JSON will work
with that.

> > I get it: you have a bizarre set of tools and the normal solutions
> > don't work for you. But you can't complain about the tools not
> > supporting your use-cases. Just code up your own styles of doing
> > things that are unique to you.
> There are numerous tools that parse complete JSON documents fine.
> Framing JSON values with NUL-termination is trivial to add in any
> programming environment. For example:
>    def json_docs(path):
>        with open(path) as f:
>            for doc in f.read().split("\0")[:-1].
>                yield json.loads(doc)

Yes, but many text-processing tools don't let you manually insert
NULs. Of *course* you can put anything you like in there when you
control both ends and everything in between; that's kinda the point of
coding. But I'm going to use newlines, and parse as non-delimited,
since that can be done just as easily (see my example code earlier -
it could be converted into the same style of generator as you have
here and would be about as many lines), since that will behave as text
in most applications.