|
|
Subject: RE: Getting the indexes of the myarray.min() - msg#00038
List: python.numeric.general
Hi,
Raymond D. Hettinger is writing a general statistics module 'statistics.py A
collection of functions for summarizing data' that is somewhere in a Python
CVS (I can not find the exact reference but it appeared in a fairly recent
Python thread). He uses a one-pass algorithm from Knuth for the variance that
has good numerical stability.
Below is a rather rough version modified from my situation (masked arrays) which
uses Knuth's algorithm for the variance. It lacks features like checking
dimensions (assumes variance can be computed) and documentation.
Regards
Bruce Southey
import numarray
def SummaryStats(Matrix):
mshape=Matrix.getshape()
nrows=mshape[0]
ncols=mshape[1]
#print nrows, ncols
# Create matrices to hold statistics
N_obs =numarray.zeros(ncols, type='Float64')
Sum =numarray.zeros(ncols, type='Float64')
Var =numarray.zeros(ncols, type='Float64')
Min =numarray.zeros(ncols, type='Float64')
Max =numarray.zeros(ncols, type='Float64')
Mean =numarray.zeros(ncols, type='Float64')
AdjM =numarray.zeros(ncols, type='Float64')
NewM =numarray.zeros(ncols, type='Float64')
DifM =numarray.zeros(ncols, type='Float64')
for row in range(nrows):
for col in range(ncols):
t_value=Matrix[row,col]
N_obs[col] = N_obs[col] + 1
Sum[col] = Sum[col] + t_value
if t_value > Max[col]:
Max[col]=t_value
if t_value < Min[col]:
Min[col]=t_value
if N_obs[col]==1:
Mean[col]=t_value
AdjM[col]=(t_value-Mean[col])/(N_obs[col])-DifM[col]
NewM[col]=Mean[col]+AdjM[col]
DifM[col]=(NewM[col]-Mean[col])-AdjM[col]
Var[col] = Var[col] +
(t_value-Mean[col])*(t_value-NewM[col])
Mean[col] = NewM[col]
print 'N_obs\n', N_obs
print 'Sum\n', Sum
print 'Mean\n', Mean
print 'Var\n', Var/(nrows-1)
if __name__ == '__main__':
MValues=numarray.array([[1,2,1],[3,2,2],[5,1,1],[4,3,2]])
SummaryStats(MValues)
---- Original message ----
> Date: Thu, 13 May 2004 15:42:30 -0400
> From: "Perry Greenfield" <perry@xxxxxxxxx>
> Subject: RE: [Numpy-discussion] Getting the indexes of the myarray.min()
> To: "Russell E Owen" <rowen@xxxxxxxxxxxxxxxx>, "numarray"
<numpy-discussion@xxxxxxxxxxxxxxxxxxxxx>
>
> > Russell E Owen wrote:
> >
> > At 9:27 AM -0400 2004-05-13, Perry Greenfield wrote:
> > >... One has to trade off the number of such functions
> > >against the speed savings. Another example is getting max and min values
> > >for an array. I've long thought that this is so often done they could
> > >be done in one pass. There isn't a function that does this yet though.
> >
> > Statistics is another area where multiple return values could be of
> > interest -- one may want the mean and std dev, and making two passes
> > is wasteful (since some of the same info needs to be computed both
> > times).
> >
> > A do-all function that computes min, min location, max, max location,
> > mean and std dev all at once would be nice (especially if the
> > returned values were accessed by name, rather than just being a tuple
> > of values, so they could be referenced safely and readably).
> >
> > -- Russell
> >
> We will definitely add something like this for 1.0 or 1.1.
> (but probably for min and max location, it will just be
> for the first encountered).
>
> Perry
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: SourceForge.net Broadband
> Sign-up now for SourceForge Broadband and get the fastest
> 6.0/768 connection for only $19.95/mo for the first 3 months!
> http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion@xxxxxxxxxxxxxxxxxxxxx
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion
-------------------------------------------------------
This SF.Net email is sponsored by: SourceForge.net Broadband
Sign-up now for SourceForge Broadband and get the fastest
6.0/768 connection for only $19.95/mo for the first 3 months!
http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click
Was this page helpful?
Thread at a glance:
Previous Message by Date:
click to view message preview
RE: Getting the indexes of the myarray.min()
On Thu, 13 May 2004, Russell E Owen wrote:
> Statistics is another area where multiple return values could be of
> interest -- one may want the mean and std dev, and making two passes
> is wasteful (since some of the same info needs to be computed both
> times).
Single pass std deviations don't work very well if you've got a lot of
data points and the std deviation is small compared to the average.
I'm not arguing aginst including them, but maybe the documentation for
such a function should include a caveat.
Warren Focke
-------------------------------------------------------
This SF.Net email is sponsored by: SourceForge.net Broadband
Sign-up now for SourceForge Broadband and get the fastest
6.0/768 connection for only $19.95/mo for the first 3 months!
http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click
Next Message by Date:
click to view message preview
RE: outer idiom (was Re: Getting the indexes of themyarray.min())
Álvaro Tejero Cantero wrote:
>
> but this gives
> >>> subtract.outer(r,r).shape
> (10, 3, 10, 3)
>
> that is, subtracts y coordinates to x coordinates which is not intended.
> AFAIK the outer solution is MUCH faster than the nested for loops, so
> what I do now is
>
> >>> r_rel = transpose(array([subtract.outer(r[:,0],r[:,0]),
> subtract.outer(r[:,1],r[:,1]),
> subtract.outer(r[:,2],r[:,2])]))
> >>> r_rel.shape #as with the double loop
> (10,10,3)
>
>
> My question is then if there is any more elegant way to do this,
> especially giving as a result independence of the number of dimensions).
>
Not that I can think of at the moment.
> Maybe an "axis" (=0 in this case?) keyword for the outer function would
> be useful in this context?
>
Perhaps. Right now these ufunc methods are pretty complicated so
it may not be easy to do, but I understand that there is certainly
utility in being able to do that. We'll look into it (but not
right away so don't hold your breath).
Perry
-------------------------------------------------------
This SF.Net email is sponsored by: SourceForge.net Broadband
Sign-up now for SourceForge Broadband and get the fastest
6.0/768 connection for only $19.95/mo for the first 3 months!
http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click
Previous Message by Thread:
click to view message preview
Re: Getting the indexes of the myarray.min()
Álvaro Tejero Cantero wrote:
Hello,
I am new to numarray. I have been searching the documentation, and see
no evident vectorial and concise way to get the indexes for the minimum
of an array times.min().
If I understand your question correctly, you want argmin:
>>> import numarray as na
>>> from numarray import random_array
>>> times = random_array.randint(0, 16, [4,4])
>>> times
array([[15, 3, 11, 10],
[ 1, 1, 15, 7],
[ 4, 3, 5, 6],
[10, 15, 12, 3]])
>>> na.argmin(times) # Takes min along axis 0
array([1, 1, 2, 3])
>>> na.argmin(times, 1) # Take min along axis 1
array([1, 1, 1, 3])
-tim
Currently my two alternatives are: (times is NxN array)
husband, wife = nonzero(equal(times.min(),times))
#gives tuple of 1x1 arrays, each containing one index
and (even uglier)
husband = compress(times.min()==times,indices([N,N])[0])
wife = compress(times.min()==times,indices([N,N])[1])
These are weird ways to get something as simple. I am surely missing
something, but I have tried several slicing strategies before without
success.
For getting the minimum times in each row I use:
choose(argmin(times),transpose(times))
What are the idioms in numpy for these tasks?
Thank you very much in advance, á.
I would
-------------------------------------------------------
This SF.Net email is sponsored by Sleepycat Software
Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to deliver
higher performing products faster, at low TCO.
http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3
Next Message by Thread:
click to view message preview
puzzling compiler warning for numeric extension
I modified an existing numeric c extension today, adding a function
that returns a new array. This is the first time I've used
NA_NewArray. The code seems to run fine, but I get a disturbing
compiler warning:
[rowen:Shared Files/Guider/PyGuide] rowen% python setup.py build
...
src/radProfModule.c: In function `Py_radSqByRadInd':
src/radProfModule.c:378: warning: return from incompatible pointer type
The associated code can be summarized as follows (and it is simple
enough that I've also appended a copy). The error is thrown on the
last line:
static PyObject *Py_radSqByRadInd(PyObject *self, PyObject *args) {
long nElt;
Long *radSqByRadInd;
if (!PyArg_ParseTuple(args, "l", &nElt))
return NULL;
...
return NA_NewArray((void *)radSqByRadInd, tLong, 1, nElt);
}
So...am I actually doing something wrong, or is this warning normal?
-- Russell
P.S. the full code:
static PyObject *Py_radSqByRadInd(PyObject *self, PyObject *args) {
long nElt;
Long radInd;
char ModName[] = "radSqByRadInd";
Long *radSqByRadInd;
if (!PyArg_ParseTuple(args, "l", &nElt))
return NULL;
if (nElt < 0) {
PyErr_Format(PyExc_ValueError, "%s: nPts < 0", ModName);
return NULL;
}
// allocate max(3, nElt) elements (simplifies the computation
// and avoids calling calloc on 0 elements)
radSqByRadInd = calloc(MAX(nElt, 3), sizeof *radSqByRadInd);
if (radSqByRadInd == NULL) {
PyErr_Format(PyExc_MemoryError, "%s: insufficient
memory", ModName);
return NULL;
}
for (radInd=0; radInd<3; radInd++) {
radSqByRadInd[radInd] = radInd;
}
for (radInd=3; radInd<nElt; radInd++) {
radSqByRadInd[radInd] = (radInd - 1) * (radInd - 1);
}
return NA_NewArray((void *)radSqByRadInd, tLong, 1, nElt);
}
-------------------------------------------------------
This SF.Net email is sponsored by: SourceForge.net Broadband
Sign-up now for SourceForge Broadband and get the fastest
6.0/768 connection for only $19.95/mo for the first 3 months!
http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click
|
|