[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

On 16/02/14 16:35, Charles Allen wrote: > How efficient does this thing need to be? > > You can always just turn it into a two-dimensional sampling problem by > thinking of the data as a function f(x=item), generating a random x=xr > in [0,x], then generating a random y in [0,max(f(x))]. The xr is > accepted if 0 < y <= max(f(xr)), or rejected (and another attempt made) if > y > max(f(xr)). > You can avoid rejection by constructing an alias table. A list can be constructed such that each list element contains a pair of values and a cutoff. e.g. [("apple", 20), ("orange", 50), ("grape", 30)] would become (using one particular algorithm) [(("apple", "orange"), 0.6), (("orange", "apple"), 1.0), (("grape", "orange"), 0.9)] Generate a random index, then select one of the values on the basis of the cutoff. For short enough lists you can generate a single 0-1 random variate, u, and use int(n*u) for the index and compare n*u - int(n*u) to the cutoff, where n is the length of the list. It's still sampling with replacement though. Duncan

- Prev by Date:
**Explanation of list reference** - Next by Date:
**Explanation of list reference** - Previous by thread:
**random.sample with large weighted sample-sets?** - Next by thread:
**random.sample with large weighted sample-sets?** - Index(es):