[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Python-Dev] Can I make marshal.dumps() slower but stabler?

On Thu, Jul 12, 2018 at 3:22 PM Serhiy Storchaka <storchaka at gmail.com> wrote:
> 12.07.18 08:43, INADA Naoki ????:
> > I'm working on making pyc stable, via stablizing marshal.dumps()
> > https://bugs.python.org/issue34093
> This is not enough for making pyc stable. The order in frozesets still
> is arbitrary.

But we can use PYTHONHASHSEED to make pyc stable.
Currently, refcnt is the only known issue for reproducible pyc build.

> > Sadly, it makes marshal.dumps() 40% slower.
> > Luckily, this overhead is small (only 4%) for dumps(compile(source)) case.
> What about the memory consumption?

No overhead, because we already used same hashtable for w_ref.
I just make it two-pass, instead of one-pass.

> > So my question is:  May I remove unstable but faster code?
> >
> > Or should I make this optional and we maintain two complex code?
> > If so, should this option enabled by default or not?
> My concern is that even if not make it optional, this will complicate
> the code.

When it's not optional, it makes almost duplicate of w_object for
reference counting in object tree.

> > For example, xmlrpc uses marshal.  But xmlrpc has significant overhead
> > other than marshaling, like dumps(compile(source)) case.  So I expect
> > marshal.dumps() performance is not critical for it too.
> xmlrpc doesn't use the marshal module. It uses terms marshalling and
> unmarshalling, but in different meaning.

Oh, I just grepped and misunderstood.

> > Is there any real application which marshal.dumps() performance is critical?
> EVE Online is a well known example.

Do they use version>=3?
In version 3, FLAG_REF is introduced and it made significant runtime
overhead already.
If marshaling speed is very important, version<2 should be used.

> What if write a script which loads .pyc files and stabilize them? This
> could solve the problem for applications which need stable .pyc files,
> with zero impact on common use.

Hmm, do you mean which?:

* Adding marshal.dump_stable_pyc() and use it like
* Implementing pure Python marshal.dumps in distutils

INADA Naoki  <songofacandy at gmail.com>