|
|
Fwd: cjson.erl: msg#00057
db.couchdb.devel
And another round:
Begin forwarded message:
From: "Joe Armstrong"
Date: July 7, 2008 9:16:50 PM GMT+02:00
To: "Jan Lehnardt" <jan-1oDqGaOF3Lkdnm+yROfE0A@xxxxxxxxxxxxxxxx>
Cc: "Bob Ippolito"
Subject: Re: cjson.erl
Re Damien's comments on binaries - they are not *that* ugly
<<"abc">> instead of "abc". They were ugly a couple of years ago
before we changed the format.
The important thing to note is that *long* strings should almost
always be represented as binaries.
What is a long string? it depends, I guess anything more than
50 bytes should be a binary) storing (say) xml as a {Tag, [Attrs],
Data}
tree where Data is a string or binary has great implications for
performance. Basically it doesn't matter how you represent Tag and
Attrs, but Data should be a binary and NOT a string.
To be on the safe side I'd choose binaries.
Use of atoms should be discouraged - since you don't want to stress
the atom table (which is not garbed).
As it stands the couchDB distribution has three different modules for
JSON terms - which might confuse the unwary ...
Cheers
/Joe
On Mon, Jul 7, 2008 at 8:05 PM, Jan Lehnardt
<jan-1oDqGaOF3Lkdnm+yROfE0A@xxxxxxxxxxxxxxxx> wrote:
Hello Bob & Joe,
here's Damien's take on the JSON issue.
Cheers
Jan
--
Begin forwarded message:
From: Damien Katz <damienkatz-Re5JQEeQqe8AvxtiuMwx3w@xxxxxxxxxxxxxxxx>
Date: July 7, 2008 7:25:28 PM GMT+02:00
To: couchdb-dev-d1GL8uUpDdXTxqt0kkDzDmD2FQJk+8+b@xxxxxxxxxxxxxxxx
Subject: Re: cjson.erl
Reply-To: couchdb-dev-d1GL8uUpDdXTxqt0kkDzDmD2FQJk+8+b@xxxxxxxxxxxxxxxx
So the sad history of cjson.erl I started with the erlang json
library I
found on the json.org website (which now appears to be a dead
link), and
used that for a while. For reasons I cannot remember (bugs or
performance),
I switched to using the mochiweb json library. However, it used
slightly
different conventions for using Erlang terms to represent the
Json. For one
thing, objects were {struct, [...]}, while the json.org library
used {obj,
[...]}. I think there was one other thing, but I can't remember now.
Anyway, rather than change all my code to use the new convention,
I change
the mochi library it to use the json.org conventions and changed
the name to
cjson.erl (for reason I again cannot remember). Some of the
comments in the
library are likely wrong because of this.
Now switching libraries is easy, but switching the Erlang
respresentation
of json objects is not. However I'd be glad to switch over CouchDB
to using
a different Erlang representation of json, if there is a "blessed"
Erlang
format. Otherwise, I'll need practical reasons for doing so.
Performance is
one such reason.
One thing I'm no happy about is the idea of representing strings
using
binaries. From a code asthetics point of view, it uglifies the
source
dramatically, but I think it might also cause lots of extra
conversions
between binary strings and normal list strings used in most Erlang
libraries
and APIs. If the memory and performance improvements will have to
be big to
make up for the extra complexities in the source.
-Damien
On Jul 7, 2008, at 12:24 PM, Jan Lehnardt wrote:
Heya,
Joe Armstrong tries to get the Erlang community to agree
on a single JSON library that fits everybody's needs. The
biggest players here (according to Joe I guess) are
MochiMedia and ourselves.
Hence the dialogue I quote below:
Begin forwarded message:
From: "Joe Armstrong"
Date: July 7, 2008 10:51:07 AM GMT+02:00
To: "Jan Lehnardt" <jan-1oDqGaOF3Lkdnm+yROfE0A@xxxxxxxxxxxxxxxx>
Cc: "Bob Ippolito"
Subject: cjson.erl
Hi Jan,
[CC'd to Bob Ippolito (Glad to see the facebook stuff taking off -
great work :-)) ]
I've been staring at cjson.erl ...
The comments say it's derived from mochijson.erl.
In the mochiweb there are two json representations
mochijson2.erl and mochijson.erl
I think the "2" is the better one :-)
I think it would be a good idea if you could come to some
agreement
with the mochiweb people as to the best representation of
JSON terms in ERlang and both go out with a single library.
cjson.erl lacks a type declaration in the documentation - which
it needs
(reading the code is hopeless)
mochijson2.erl has this type declaration
%% @type json_string() = atom | binary()
%% @type json_number() = integer() | float()
%% @type json_array() = [json_term()]
%% @type json_object() = {struct, [{json_string(), json_term()}]}
%% @type json_term() = json_string() | json_number() |
json_array() |
%% json_object()
I'm not sure about the additional "struct" tag - nor the
additional
atom tag in json_string
How about ...
@type json_object = {[json_tag::binary(), json_term()]}
@type json_string() = binary()
this makes the erlang term map to JSON in an unambigous manner and
the compiler should be able to generate faster code, since
unpack(Json) when is_binary(J) -> ...
will only have disjoint branches.
I think that:
lists should *only* be used for json_arrays
binary should *only* be used for json_strings
json objs should be *only* be tuples (of pairs)
{{Tag,Val},{Tag,Val},...}
(possibly {Tag1,Val1,Tag2,Val2,....} might be better???)
I think it would be a good idea to isolate this problem - agree
(having done some
measurements, on the fastest and *prettiest* way to do this) -
jointly
change
your code bases (at the same time) and then tell the world -
then issue
ONE
library.
Just for fun I've downloaded the wikipedia using the ideas in
http://users.softlab.ece.ntua.gr/~ttsiod/
buildWikipediaOffline.html
(I want to converts the XML representation of the wikipedia into
JSON
and inject it into coutchDB
and serve it up with mochiweb - I need to write a rendering
engine to
convert wiki markup to HTML
(this is said to be tricky since there is no spec :-)
This should be a good test of coutchDB and mochiweb)
Cheers
/Joe Armstrong
And Bob's reply:
From: "Bob Ippolito"
Date: July 7, 2008 6:12:32 PM GMT+02:00
To: "Joe Armstrong"
Cc: "Jan Lehnardt" <jan-1oDqGaOF3Lkdnm+yROfE0A@xxxxxxxxxxxxxxxx>
Subject: Re: cjson.erl
{struct, ...} is what the library that ships with Yaws does,
which is
why I used that. Using just {[{Key, Value}]} looks fine to me
also and
should be do-able without breaking compatibility immediately.
The reason atoms are accepted is only for encoding purposes, not
for
decoding. There is an unambiguous format from JSON -> Erlang but
for
Erlang -> JSON some conveniences are allowed for practical
reasons.
I'm fine with the {struct, ...} -> {...} change that Joe proposed
because I can do that in a backwards compatible way.
-bob
What is our take on this? :) Damien?
I'll forward our discussions back to Joe and Bob (in case they
don't
read this list).
Cheers
Jan
|
|