[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

On 2014-02-16 04:12, Terry Reedy wrote: > On 2/15/2014 11:41 PM, Tim Chase wrote: > > data = ( > > ("apple", 20), > > ("orange", 50), > > ("grape", 30), > > ) To Ben, yes, this was just some sample data; the original gets built from an external (i.e., client-supplied, thus the need to gracefully support crazy-large numbers) data source and is indeed actually a list rather than a tuple. When entering static data like this, I often default to an outer tuple (rather than list) just as a hint/reminder to myself that I don't expect this to change at runtime (and have Python yell at me if I accidentally try). > If you actually start with date in this form, write the few lines > needed to produce the form below. > > import bisect > import random > > data = [ > (0, 'apple'), > (20, 'orange'), > (70, 'grape'), > ] > > for i in range(10): > r = random.randrange(0, 100) > i = bisect.bisect(data, (r, 'zzzzz')) - 1 > print(data[i][1]) Trying to read what may be implicit assumptions in your code: 1) your code calculates "100" as sum(item[0] for item in data) 2) the data has to be sorted for bisect to work 3) you meant to write "(10, 'apple')" rather than 0. With my original example code, a 0-probability shouldn't ever show up in the sampling, where it looks like it might when using this sample code. In my particular use case, I can limit/ensure that 0-probability items never appear in the list, filtering them upon loading. 4) that "zzzzzz" is some arbitrary value that should come after any string that could appear in the data; perhaps using some custom "InfinityString" class where everything compared to it is always less than it. So it would be class InfinityString: def __gt__(self, other): True __ge__ = __gt__ def __lt__(self, other): False __eq__ = __le__ = __ne__ = __lt__ infinity_string = InfinityString() data = load_data() # list of (quantity, value) tuples data.sort() total = sum(qty for qty, value in data) for i in range(num_to_sample): r = random.randrange(0, total) i = bisect.bisect(data, (r, infinity_string)) - 1 use(data[i][1]) Some long-running testing on this code seems to show that if two items have the same probability, bisect only appears to find the last one. Tested with data = [ (10, "apple"), (20, "banana"), # I never get any bananas, even after thousands of iterations (20, "grape"), (50, "orange"), ] Thanks, -tkc

- Prev by Date:
**Explanation of list reference** - Next by Date:
**How to turn a package into something pip can install** - Previous by thread:
**random.sample with large weighted sample-sets?** - Next by thread:
**random.sample with large weighted sample-sets?** - Index(es):