osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

sampling from frequency distribution / histogram without replacement


Hello,
      Just checking to see if anyone has attacked this problem before
for cases where the population size is unfeasibly large. i.e. The number
of categories is manageable, but the sum of the frequencies, N,
precludes simple solutions such as creating a list, shuffling it and
using the first n items to populate the sample (frequency distribution /
histogram).

I note that numpy.random.hypergeometric will allow me to generate a
sample when I only have two categories, and that I could probably
implement some kind of iterative / partitioning approach calling this
repeatedly. But before I do I thought I'd ask if anyone has tackled this
before. Can't find much on the web. Cheers.

Duncan