[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Write this accumuator in a functional style

On Thu, 13 Jul 2017 10:10 pm, Rustom Mody wrote:

> Yeah I know append method is supposedly O(1).
> I find that surprising...
> More so when the article
> https://wiki.python.org/moin/TimeComplexity
> talks of average case Vs amortized-worst case(!) Whatever does that mean?

"Average case" refers to the average over the possible data values you are
dealing with. For example, if you ask for the average cost of inserting into a

mylist.insert(i, obj)

you average over i=0, i=1, i=2, ..., i=len(mylist).

"Worst case" refers to the most expensive case, which in the case of inserting
into a list, will be mylist.insert(0, obj).

"Amortized" refers to averaging over many repeated operations. For example,
start with an empty list, and keep inserting over and over again:

mylist = []
for i in range(10000000):  # ideally, N -> ?
    mylist.insert(0, obj)

That's equivalent to averaging over:

[].insert(0, obj)
[1].insert(0, obj)
[1, 1].insert(0, obj)
[1, 1, 1].insert(0, obj)

etc. The average cost of each of those insertions is the amortized cost.

Amortized costs are relevant where the cost of an operation is usually cheap,
but occasionally you hit a special case which makes it expensive. In the case
of list.append, lists are over-allocated:

mylist = [1, 2, 3]
# allocates (let's say) 50 slots, but only 3 are in use

Appending to an over-allocated list takes constant time, because that's just
writing to a slot in an array. So we can append to mylist 47 times before the
array is full, then we get an expensive resize:

mylist 50 slots -> 100 slots

That's very costly, proportional to the size of the list, but it doesn't happen
every append. After the resize, we can do another 50 appends before needing to
do it again.

CPython's allocation strategy for lists doubles[1] the list on each resize, so
the worst case is that your list is 50% (plus one item) full, and you're using
twice as much memory as needed. But the advantage is that resizes happen less
and less frequently: after each resize, it takes twice as many appends before
you need to do the next.

If you do the calculations, each resize is twice as expensive as the last, but
happens half as frequently, so the cost of those resizes are amortized out over
all the appends to be a constant (and relatively small) cost.

[1] Actually, CPython's lists initially quadruple the size of the array, up to a
certain point, and then switch to doubling. This ensures that small lists have
even fewer expensive resizes, at the cost of wasting a bit more memory, but its
only a small array so who cares?

?Cheer up,? they said, ?things could be worse.? So I cheered up, and sure
enough, things got worse.