osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Python-Dev] Proposal: dict.with_values(iterable)


Hi,

On 12/04/2019 2:44 pm, Inada Naoki wrote:
> Hi, all.
> 
> I propose adding new method: dict.with_values(iterable)

You can already do something like this, if memory saving is the main 
concern. This should work on all versions from 3.3.


def shared_keys_dict_maker(keys):
     class C: pass
     instance = C()
     for key in keys:
         for key in keys:
             setattr(instance, key, None)
     prototype = instance.__dict__
     def maker(values):
         result = prototype.copy()
         result.update(zip(keys, values))
         return result
     return maker

m = shared_keys_dict_maker(('a', 'b'))

 >>> d1 = {'a':1, 'b':2}
 >>> print(sys.getsizeof(d1))
... 248

 >>> d2 = m((1,2))
 >>> print(sys.getsizeof(d2))
... 120

 >>> d3 = m((None,"Hi"))
 >>> print(sys.getsizeof(d3))
... 120



> 
> # Motivation
> 
> Python is used to handle data.
> While dict is not efficient way to handle may records, it is still
> convenient way.
> 
> When creating many dicts with same keys, dict need to
> lookup internal hash table while inserting each keys.
> 
> It is costful operation.  If we can reuse existing keys of dict,
> we can skip this inserting cost.
> 
> Additionally, we have "Key-Sharing Dictionary (PEP 412)".
> When all keys are string, many dict can share one key.
> It reduces memory consumption.
> 
> This might be usable for:
> 
> * csv.DictReader
> * namedtuple._asdict()
> * DB-API 2.0 implementations:  (e.g. DictCursor of mysqlclient-python)
> 
> 
> # Draft implementation
> 
> pull request: https://github.com/python/cpython/pull/12802
> 
> with_values(self, iterable, /)
>      Create a new dictionary with keys from this dict and values from iterable.
> 
>      When length of iterable is different from len(self), ValueError is raised.
>      This method does not support dict subclass.
> 
> 
> ## Memory usage (Key-Sharing dict)
> 
>>>> import sys
>>>> keys = tuple("abcdefg")
>>>> keys
> ('a', 'b', 'c', 'd', 'e', 'f', 'g')
>>>> d = dict(zip(keys, range(7)))
>>>> d
> {'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4, 'f': 5, 'g': 6}
>>>> sys.getsizeof(d)
> 360
> 
>>>> keys = dict.fromkeys("abcdefg")
>>>> d = keys.with_values(range(7))
>>>> d
> {'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4, 'f': 5, 'g': 6}
>>>> sys.getsizeof(d)
> 144
> 
> ## Speed
> 
> $ ./python -m perf timeit -o zip_dict.json -s 'keys =
> tuple("abcdefg"); values=[*range(7)]' 'dict(zip(keys, values))'
> 
> $ ./python -m perf timeit -o with_values.json -s 'keys =
> dict.fromkeys("abcdefg"); values=[*range(7)]'
> 'keys.with_values(values)'
> 
> $ ./python -m perf compare_to zip_dict.json with_values.json
> Mean +- std dev: [zip_dict] 935 ns +- 9 ns -> [with_values] 109 ns +-
> 2 ns: 8.59x faster (-88%)
> 
> 
> How do you think?
> Any comments are appreciated.
> 
> Regards,
>