[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

syntax difference

On 2018-06-27 11:11:37 +1200, Gregory Ewing wrote:
> Bart wrote:
> >    x = set(range(10_000_000))
> > 
> > This used up 460MB of RAM (the original 100M I tried exhausted the memory).
> > 
> > The advantage of Pascal-style sets is that that same set will occupy
> > only 1.25MB, as it is a bit-map.
> That's true, but they're also extremely limited compared to
> the things you can do with Python sets. They're really two
> quite different things, designed for different use cases.
> Compact sets of integers are possible in Python, but not
> as a built-in type. This suggests that they're not needed
> much

Also, when such sets are needed, a simple array of bits may not be
sufficient. For example, sets may be sparse, or they may have almost all
except a few bits set. In these cases a tree or RLE representation is
much smaller and faster. There are a number of such implementations
(Judy Arrays and Roaring Bitmaps come to mind[1]). Each has its
advantages and limitations, so its probably better to leave the choice
to the author.


[1] Judy is a C library. Roaring bitmaps are a data structure which has
    been implemented in several languages. I haven't checked whether
    there are packages on pypi. Shouldn't be too hard to implement if
    one needs it.

   _  | Peter J. Holzer    | we build much bigger, better disasters now
|_|_) |                    | because we have much more sophisticated
| |   | hjp at hjp.at         | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20180627/70ce2e42/attachment.sig>