osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

random.sample with large weighted sample-sets?


I'm not coming up with the right keywords to find what I'm hunting.
I'd like to randomly sample a modestly compact list with weighted
distributions, so I might have

  data = (
    ("apple", 20),
    ("orange", 50),
    ("grape", 30),
    )

and I'd like to random.sample() it as if it was a 100-element list.
However, ideally, this could be done in O(size-of-data) storage
rather than requiring the build-out of the entire set just for
sampling purposes, as the actual data can get a bit large.  For this
small toy data-set, I can use

  sample_me = sum(([s]*n for s,n in data, [])
  random.sample(sample_me, k)

but for large counts, the list returned from sum() grinds my system
because I start swapping.  What am I missing? (links to relevant
keywords/searches/algorithms welcome in lieu of actually answering
in-line)

Thanks,

-tkc




 .