logo       

Re: rebin (corrected): msg#00078

python.numeric.general

Subject: Re: rebin (corrected)

Russell E Owen wrote:

At 10:56 AM -0700 2004-08-30, Tim Hochberg wrote:

[SNIP]


But I still agree with Perry that we ought to provide a built-in rebin
function. It is particularly useful for large multi-dimensional arrays
where it is wasteful (in both CPU and memory) to create a full-size
copy of the array before resampling it down to the desired rebinned
size. I appended the .copy() so that at least the big array is not
still hanging around in memory (remember that the slice creates a
view rather than a copy.)
Rick

A reasonable facsimile of this should be doable without dropping into C. Something like:

def rebin_sum(a, (m, n)):
M, N = a.shape
a = na.reshape(a, (M/m,m,N/n,n))
return na.sum(na.sum(a, 3), 1) / float(m*n)

This does create some temps, but they're smaller than in the boxcar case and it doesn't do all the extra calculation. This doesn't handle the case where a.shape isn't an exact multiple of (m,n). However, I don't think that would be all that hard to implement, if there is a consensus on what should happen then.
I can think of at least two different ways this might be done: tacking on values that match the last value as already proposed and tacking on zeros. There may be others as well. It should probably get a boundary condition argument like convolve and friends.
Personally, I'd be find rebin a little suprising if it resulted in an average, as all the implementations thus far have done, rather than a simple sum over the stencil. When I think of rebinning I'm thinking of number of occurences per bin, and rebinning should keep the totals occurences the same, not change them by the inverse of the stencil size.

My 2.3 cents anyway


I agree that it would be nice to avoid the extra calculation involved in convolution or boxcar averaging, and the extra temp storage.

Your algorithm certainly looks promising, but I'm not sure there's any space saving when the array shape is not an exact multiple of the bin factor. Duplicating the last value is probably the most reasonable alternative for my own applications (imaging). To use your algorithm, I guess one has to increase the array first, creating a new temporary array that is the same as the original except expanded to an even mutiple of the bin factor. In theory one could avoid duplication, but I suspect to do this efficiently one really needs to use C code.

I think you could probably do considerably better than the boxcar code, but it it looks like it would get fairly messy once you start worrying about odd number of bins. It might end up being simpler to implement it C, so that's probably a better idea in the long run.

I personally have no strong opinion on averaging vs summing. Summing retains precision but risks overflow. Averaging potentially has the opposite advantages, though avoiding overflow is tricky. Note that Nadav Horesh's suggested solution (convolution with a mask of 1s instead of boxcar averaging) computed the sum.

I really have no strong feelings since I have no use for rebinning at the moment. Back when I did, it would have been for rebinning data from particle detectors. So for instance, you would change the bin size so that you had enough data in each bin that you could attempt to do statistics on it or plot it or whatever. In that domain it would make no sense to average on rebinning. However, I can see how it makes sense for imaging applications.

In the absence of any compelling reason to do otherwise, I imagine the thing to do is copy what every one else is doing as long as they're consistent. Do you know what Matlab and friends do?

-tim



-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise