logo       

Re: extracting a random subset of a vector: msg#00082

python.numeric.general

Subject: Re: extracting a random subset of a vector

Curzio Basso wrote:

import numarray as NA
import numarray.random_array as RA

N = 1000
M = 100
full = NA.arange(N)
subset = full[RA.permutation(N)][:M]

---------------------------------------------------------

However, it's quite slow (at least with N~40k),

you can speed it up a tiny bit my subsetting the permutation array first:
subset = full[ RA.permutation(N)[:M] ]

and from the hotshot output is looks like it's the indexing, not the permutation, which takes time.

not from my tests:

import numarray.random_array as RA
import numarray as NA
import time

N = 1000000
M = 100
full = NA.arange(N)

start = time.clock()
P = RA.permutation(N)
print "permutation took %F seconds"%(time.clock() - start)
start = time.clock()
subset = full[P[:M]]
print "subsetting took %F seconds"%(time.clock() - start)

which results in:
permutation took 1.640000 seconds
subsetting took 0.000000 seconds

so it's the permutation that takes the time, as I suspected. What would really speed this up is a random_array.non-repeat-randint() function, written in C. That way you wouldn't have to permute the entire N values, when you really just need M of them.

Does anyone else think this would be a useful function? I can't imagine it wouldn't be that hard to write.

If M <<< N, then you could probably write a little function in Python that called randint, and removed the repeats. If M is only a little smaller than N, this would be slow.

-Chris

--
Christopher Barker, Ph.D.
Oceanographer

NOAA/OR&R/HAZMAT (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@xxxxxxxx


-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise