osdir.com
mailing list archive

Subject: Re: numarray speed problem - msg#00027

List: python.numeric.general

Date: Prev Next Index Thread: Prev Next Index
Hi H,

I did some work on this problem based on your previous post but apparently my response never made it to numpy-discussion. In a nutshell, I made numarray 12x faster for a benchmark like your numarray_pb_sample.py by speeding up string comparisons and improving all(). The changes are in numarray CVS but there is no Source Forge release that contains them yet. numarray-1.4.0 is still several weeks away. If you want to try CVS from UNIX/Linux just do:

% cvs -d:pserver:anonymous@xxxxxxxxxxxxxxxxxxx:/cvsroot/numpy login
% cvs -z3 -d:pserver:anonymous@xxxxxxxxxxxxxxxxxxx:/cvsroot/numpy co -P numarray

Regards,
Todd

Humufr wrote:

Hello,

I have a problem with numarray and especially the function numarray.all.

I want to compare two files to do this I read the files with a function readcol2 who can put them in a list or numarray format (string or numerical).

I'm doing a comparaison on each line of the file.
If I'm using the array format and the numarray.all function, that take forever to do the comparaison for 2 big files. If I'm using python list object, it's very fast. I think there are some problem or at least some improvement to do. If I understand correctly the goal of numarray, it has been write to speed up some part of python but here it slow down a lot.

An very simple sample to see the effect is at the bottom of this mail.

Thanks for numarray, I hope to not bother you. My comments are more to improve numarray than other things. I have been able to find the problem so no I can avoied it.

H.




def readcol(fname,comments='%',columns=None,delimiter=None,dep=0,arraytype='list'):
"""
Load ASCII data from fname into an array and return the array.
The data must be regular, same number of values in every row
fname can be a filename or a file handle.
Input:

- Fname : the name of the file to read

Optionnal input:
- comments : a string to indicate the charactor to delimit the domments.
the default is the matlab character '%'.
- columns : list or tuple ho contains the columns to use.
- delimiter : a string to delimit the columns

- dep : an integer to indicate from which line you want to begin

to use the file (useful to avoid the descriptions lines)

- arraytype : a string to indicate which kind of array you want ot
have: numeric array (numeric) or character array (numstring) or list (list). By default it's the

list mode used
matfile data is not currently supported, but see
Nigel Wade's matfile ftp://ion.le.ac.uk/matfile/matfile.tar.gz

Example usage:

x,y = transpose(readcol('test.dat')) # data in two columns

X = readcol('test.dat') # a matrix of data

x = readcol('test.dat') # a single column of data

x = readcol('test.dat,'#') # the character use like a comment delimiter is '#'

initial function from pylab (J.Hunter). Change by myself for my specific need

"""
from numarray import array,transpose

fh = file(fname)

X = []
numCols = None
nline = 0
if columns is None:
for line in fh:
nline += 1
if dep is not None and nline <= dep: continue
line = line[:line.find(comments)].strip()
if not len(line): continue
if arraytype=='numeric':
row = [float(val) for val in line.split(delimiter)]
else:
row = [val.strip() for val in line.split(delimiter)]
thisLen = len(row)
if numCols is not None and thisLen != numCols:
raise ValueError('All rows must have the same number of columns')
X.append(row)
else:
for line in fh:
nline +=1
if dep is not None and nline <= dep: continue
line = line[:line.find(comments)].strip()
if not len(line): continue
row = line.split(delimiter)
if arraytype=='numeric':
row = [float(row[i-1]) for i in columns]
elif arraytype=='numstring':
row = [row[i-1].strip() for i in columns]
else:
row = [row[i-1].strip() for i in columns]
thisLen = len(row)
if numCols is not None and thisLen != numCols:
raise ValueError('All rows must have the same number of columns')
X.append(row)

if arraytype=='numeric':
X = array(X)
r,c = X.shape
if r==1 or c==1:
X.shape = max([r,c]),
elif arraytype == 'numstring':
import numarray.strings # pb if numeric+pylab
X = numarray.strings.array(X)
r,c = X.shape
if r==1 or c==1:
X.shape = max([r,c]),
return X


-------------------------------------------
files_test_creation.py

-------------------------------------------

f1 = file('test1.dat','w')
for i in range(10000):
f1.write(str(i)+' '+str(i+1)+' '+str(i+2)+'\n')
f1.close()


f2 = file('test2.dat','w')
for i in range(10000):
f2.write(str(i)+' '+str(i+1)+' '+str(i+2)+'\n')
f2.close()

-------------------------------------------
numarray_pb_sample.py

-------------------------------------------

import numarray
data1 = readcol2.readcol('test1.dat',columns=[1,2,3],comments='#',delimiter=' ',dep=1,arraytype='numstring')
data2 = readcol2.readcol('test2.dat',columns=[1,2,3],comments='#',delimiter=' ',dep=1,arraytype='numstring')

#or in non string array form (same result)
## data1 = readcol2.readcol('test1.dat',columns=[1,2,3],comments='#',delimiter=' ',dep=1,arraytype='numeric')
## data2 = readcol2.readcol('test2.dat',columns=[1,2,3],comments='#',delimiter=' ',dep=1,arraytype='numeric')

for a_i in range(data1.shape[0]):
for b_i in range(data2.shape[0]):
if numarray.all(data1[a_i,:] == data2[b_i,:]):
print a_i,b_i

-------------------------------------------
python_list_sample.py

-------------------------------------------

data1 = readcol2.readcol('test1.dat',columns=[1,2,3],comments='#',delimiter=' ',dep=1,arraytype='list')
data2 = readcol2.readcol('test2.dat',columns=[1,2,3],comments='#',delimiter=' ',dep=1,arraytype='list')

for a_i in range(len(data1)):
for b_i in range(len(data2)):
if data1[a_i] == data2[b_i]:
print a_i,b_i






-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download it for free - -and be entered to win a 42" plasma tv or your very
own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/numpy-discussion




-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download it for free - -and be entered to win a 42" plasma tv or your very
own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php


Was this page helpful?
Yes No
Thread at a glance:

Previous Message by Date: click to view message preview

numarray speed problem

Hello, I have a problem with numarray and especially the function numarray.all. I want to compare two files to do this I read the files with a function readcol2 who can put them in a list or numarray format (string or numerical). I'm doing a comparaison on each line of the file. If I'm using the array format and the numarray.all function, that take forever to do the comparaison for 2 big files. If I'm using python list object, it's very fast. I think there are some problem or at least some improvement to do. If I understand correctly the goal of numarray, it has been write to speed up some part of python but here it slow down a lot. An very simple sample to see the effect is at the bottom of this mail. Thanks for numarray, I hope to not bother you. My comments are more to improve numarray than other things. I have been able to find the problem so no I can avoied it. H. def readcol(fname,comments='%',columns=None,delimiter=None,dep=0,arraytype='list'): """ Load ASCII data from fname into an array and return the array. The data must be regular, same number of values in every row fname can be a filename or a file handle. Input: - Fname : the name of the file to read Optionnal input: - comments : a string to indicate the charactor to delimit the domments. the default is the matlab character '%'. - columns : list or tuple ho contains the columns to use. - delimiter : a string to delimit the columns - dep : an integer to indicate from which line you want to begin to use the file (useful to avoid the descriptions lines) - arraytype : a string to indicate which kind of array you want ot have: numeric array (numeric) or character array (numstring) or list (list). By default it's the list mode used matfile data is not currently supported, but see Nigel Wade's matfile ftp://ion.le.ac.uk/matfile/matfile.tar.gz Example usage: x,y = transpose(readcol('test.dat')) # data in two columns X = readcol('test.dat') # a matrix of data x = readcol('test.dat') # a single column of data x = readcol('test.dat,'#') # the character use like a comment delimiter is '#' initial function from pylab (J.Hunter). Change by myself for my specific need """ from numarray import array,transpose fh = file(fname) X = [] numCols = None nline = 0 if columns is None: for line in fh: nline += 1 if dep is not None and nline <= dep: continue line = line[:line.find(comments)].strip() if not len(line): continue if arraytype=='numeric': row = [float(val) for val in line.split(delimiter)] else: row = [val.strip() for val in line.split(delimiter)] thisLen = len(row) if numCols is not None and thisLen != numCols: raise ValueError('All rows must have the same number of columns') X.append(row) else: for line in fh: nline +=1 if dep is not None and nline <= dep: continue line = line[:line.find(comments)].strip() if not len(line): continue row = line.split(delimiter) if arraytype=='numeric': row = [float(row[i-1]) for i in columns] elif arraytype=='numstring': row = [row[i-1].strip() for i in columns] else: row = [row[i-1].strip() for i in columns] thisLen = len(row) if numCols is not None and thisLen != numCols: raise ValueError('All rows must have the same number of columns') X.append(row) if arraytype=='numeric': X = array(X) r,c = X.shape if r==1 or c==1: X.shape = max([r,c]), elif arraytype == 'numstring': import numarray.strings # pb if numeric+pylab X = numarray.strings.array(X) r,c = X.shape if r==1 or c==1: X.shape = max([r,c]), return X ------------------------------------------- files_test_creation.py ------------------------------------------- f1 = file('test1.dat','w') for i in range(10000): f1.write(str(i)+' '+str(i+1)+' '+str(i+2)+'\n') f1.close() f2 = file('test2.dat','w') for i in range(10000): f2.write(str(i)+' '+str(i+1)+' '+str(i+2)+'\n') f2.close() ------------------------------------------- numarray_pb_sample.py ------------------------------------------- import numarray data1 = readcol2.readcol('test1.dat',columns=[1,2,3],comments='#',delimiter=' ',dep=1,arraytype='numstring') data2 = readcol2.readcol('test2.dat',columns=[1,2,3],comments='#',delimiter=' ',dep=1,arraytype='numstring') #or in non string array form (same result) ## data1 = readcol2.readcol('test1.dat',columns=[1,2,3],comments='#',delimiter=' ',dep=1,arraytype='numeric') ## data2 = readcol2.readcol('test2.dat',columns=[1,2,3],comments='#',delimiter=' ',dep=1,arraytype='numeric') for a_i in range(data1.shape[0]): for b_i in range(data2.shape[0]): if numarray.all(data1[a_i,:] == data2[b_i,:]): print a_i,b_i ------------------------------------------- python_list_sample.py ------------------------------------------- data1 = readcol2.readcol('test1.dat',columns=[1,2,3],comments='#',delimiter=' ',dep=1,arraytype='list') data2 = readcol2.readcol('test2.dat',columns=[1,2,3],comments='#',delimiter=' ',dep=1,arraytype='list') for a_i in range(len(data1)): for b_i in range(len(data2)): if data1[a_i] == data2[b_i]: print a_i,b_i ------------------------------------------------------- SF.Net email is sponsored by: Tame your development challenges with Apache's Geronimo App Server. Download it for free - -and be entered to win a 42" plasma tv or your very own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php

Next Message by Date: click to view message preview

Re: numarray speed problem

Thank you very much. I saw no answer before. It's why I reduce a lot the sample :) I'll try it now Todd Miller wrote: Hi H, I did some work on this problem based on your previous post but apparently my response never made it to numpy-discussion. In a nutshell, I made numarray 12x faster for a benchmark like your numarray_pb_sample.py by speeding up string comparisons and improving all(). The changes are in numarray CVS but there is no Source Forge release that contains them yet. numarray-1.4.0 is still several weeks away. If you want to try CVS from UNIX/Linux just do: % cvs -d:pserver:anonymous@xxxxxxxxxxxxxxxxxxx:/cvsroot/numpy login % cvs -z3 -d:pserver:anonymous@xxxxxxxxxxxxxxxxxxx:/cvsroot/numpy co -P numarray Regards, Todd Humufr wrote: Hello, I have a problem with numarray and especially the function numarray.all. I want to compare two files to do this I read the files with a function readcol2 who can put them in a list or numarray format (string or numerical). I'm doing a comparaison on each line of the file. If I'm using the array format and the numarray.all function, that take forever to do the comparaison for 2 big files. If I'm using python list object, it's very fast. I think there are some problem or at least some improvement to do. If I understand correctly the goal of numarray, it has been write to speed up some part of python but here it slow down a lot. An very simple sample to see the effect is at the bottom of this mail. Thanks for numarray, I hope to not bother you. My comments are more to improve numarray than other things. I have been able to find the problem so no I can avoied it. H. def readcol(fname,comments='%',columns=None,delimiter=None,dep=0,arraytype='list'): """ Load ASCII data from fname into an array and return the array. The data must be regular, same number of values in every row fname can be a filename or a file handle. Input: - Fname : the name of the file to read Optionnal input: - comments : a string to indicate the charactor to delimit the domments. the default is the matlab character '%'. - columns : list or tuple ho contains the columns to use. - delimiter : a string to delimit the columns - dep : an integer to indicate from which line you want to begin to use the file (useful to avoid the descriptions lines) - arraytype : a string to indicate which kind of array you want ot have: numeric array (numeric) or character array (numstring) or list (list). By default it's the list mode used matfile data is not currently supported, but see Nigel Wade's matfile ftp://ion.le.ac.uk/matfile/matfile.tar.gz Example usage: x,y = transpose(readcol('test.dat')) # data in two columns X = readcol('test.dat') # a matrix of data x = readcol('test.dat') # a single column of data x = readcol('test.dat,'#') # the character use like a comment delimiter is '#' initial function from pylab (J.Hunter). Change by myself for my specific need """ from numarray import array,transpose fh = file(fname) X = [] numCols = None nline = 0 if columns is None: for line in fh: nline += 1 if dep is not None and nline <= dep: continue line = line[:line.find(comments)].strip() if not len(line): continue if arraytype=='numeric': row = [float(val) for val in line.split(delimiter)] else: row = [val.strip() for val in line.split(delimiter)] thisLen = len(row) if numCols is not None and thisLen != numCols: raise ValueError('All rows must have the same number of columns') X.append(row) else: for line in fh: nline +=1 if dep is not None and nline <= dep: continue line = line[:line.find(comments)].strip() if not len(line): continue row = line.split(delimiter) if arraytype=='numeric': row = [float(row[i-1]) for i in columns] elif arraytype=='numstring': row = [row[i-1].strip() for i in columns] else: row = [row[i-1].strip() for i in columns] thisLen = len(row) if numCols is not None and thisLen != numCols: raise ValueError('All rows must have the same number of columns') X.append(row) if arraytype=='numeric': X = array(X) r,c = X.shape if r==1 or c==1: X.shape = max([r,c]), elif arraytype == 'numstring': import numarray.strings # pb if numeric+pylab X = numarray.strings.array(X) r,c = X.shape if r==1 or c==1: X.shape = max([r,c]), return X ------------------------------------------- files_test_creation.py ------------------------------------------- f1 = file('test1.dat','w') for i in range(10000): f1.write(str(i)+' '+str(i+1)+' '+str(i+2)+'\n') f1.close() f2 = file('test2.dat','w') for i in range(10000): f2.write(str(i)+' '+str(i+1)+' '+str(i+2)+'\n') f2.close() ------------------------------------------- numarray_pb_sample.py ------------------------------------------- import numarray data1 = readcol2.readcol('test1.dat',columns=[1,2,3],comments='#',delimiter=' ',dep=1,arraytype='numstring') data2 = readcol2.readcol('test2.dat',columns=[1,2,3],comments='#',delimiter=' ',dep=1,arraytype='numstring') #or in non string array form (same result) ## data1 = readcol2.readcol('test1.dat',columns=[1,2,3],comments='#',delimiter=' ',dep=1,arraytype='numeric') ## data2 = readcol2.readcol('test2.dat',columns=[1,2,3],comments='#',delimiter=' ',dep=1,arraytype='numeric') for a_i in range(data1.shape[0]): for b_i in range(data2.shape[0]): if numarray.all(data1[a_i,:] == data2[b_i,:]): print a_i,b_i ------------------------------------------- python_list_sample.py ------------------------------------------- data1 = readcol2.readcol('test1.dat',columns=[1,2,3],comments='#',delimiter=' ',dep=1,arraytype='list') data2 = readcol2.readcol('test2.dat',columns=[1,2,3],comments='#',delimiter=' ',dep=1,arraytype='list') for a_i in range(len(data1)): for b_i in range(len(data2)): if data1[a_i] == data2[b_i]: print a_i,b_i ------------------------------------------------------- SF.Net email is sponsored by: Tame your development challenges with Apache's Geronimo App Server. Download it for free - -and be entered to win a 42" plasma tv or your very own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php _______________________________________________ Numpy-discussion mailing list Numpy-discussion@xxxxxxxxxxxxxxxxxxxxx https://lists.sourceforge.net/lists/listinfo/numpy-discussion ------------------------------------------------------- SF.Net email is sponsored by: Tame your development challenges with Apache's Geronimo App Server. Download it for free - -and be entered to win a 42" plasma tv or your very own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php

Previous Message by Thread: click to view message preview

numarray speed problem

Hello, I have a problem with numarray and especially the function numarray.all. I want to compare two files to do this I read the files with a function readcol2 who can put them in a list or numarray format (string or numerical). I'm doing a comparaison on each line of the file. If I'm using the array format and the numarray.all function, that take forever to do the comparaison for 2 big files. If I'm using python list object, it's very fast. I think there are some problem or at least some improvement to do. If I understand correctly the goal of numarray, it has been write to speed up some part of python but here it slow down a lot. An very simple sample to see the effect is at the bottom of this mail. Thanks for numarray, I hope to not bother you. My comments are more to improve numarray than other things. I have been able to find the problem so no I can avoied it. H. def readcol(fname,comments='%',columns=None,delimiter=None,dep=0,arraytype='list'): """ Load ASCII data from fname into an array and return the array. The data must be regular, same number of values in every row fname can be a filename or a file handle. Input: - Fname : the name of the file to read Optionnal input: - comments : a string to indicate the charactor to delimit the domments. the default is the matlab character '%'. - columns : list or tuple ho contains the columns to use. - delimiter : a string to delimit the columns - dep : an integer to indicate from which line you want to begin to use the file (useful to avoid the descriptions lines) - arraytype : a string to indicate which kind of array you want ot have: numeric array (numeric) or character array (numstring) or list (list). By default it's the list mode used matfile data is not currently supported, but see Nigel Wade's matfile ftp://ion.le.ac.uk/matfile/matfile.tar.gz Example usage: x,y = transpose(readcol('test.dat')) # data in two columns X = readcol('test.dat') # a matrix of data x = readcol('test.dat') # a single column of data x = readcol('test.dat,'#') # the character use like a comment delimiter is '#' initial function from pylab (J.Hunter). Change by myself for my specific need """ from numarray import array,transpose fh = file(fname) X = [] numCols = None nline = 0 if columns is None: for line in fh: nline += 1 if dep is not None and nline <= dep: continue line = line[:line.find(comments)].strip() if not len(line): continue if arraytype=='numeric': row = [float(val) for val in line.split(delimiter)] else: row = [val.strip() for val in line.split(delimiter)] thisLen = len(row) if numCols is not None and thisLen != numCols: raise ValueError('All rows must have the same number of columns') X.append(row) else: for line in fh: nline +=1 if dep is not None and nline <= dep: continue line = line[:line.find(comments)].strip() if not len(line): continue row = line.split(delimiter) if arraytype=='numeric': row = [float(row[i-1]) for i in columns] elif arraytype=='numstring': row = [row[i-1].strip() for i in columns] else: row = [row[i-1].strip() for i in columns] thisLen = len(row) if numCols is not None and thisLen != numCols: raise ValueError('All rows must have the same number of columns') X.append(row) if arraytype=='numeric': X = array(X) r,c = X.shape if r==1 or c==1: X.shape = max([r,c]), elif arraytype == 'numstring': import numarray.strings # pb if numeric+pylab X = numarray.strings.array(X) r,c = X.shape if r==1 or c==1: X.shape = max([r,c]), return X ------------------------------------------- files_test_creation.py ------------------------------------------- f1 = file('test1.dat','w') for i in range(10000): f1.write(str(i)+' '+str(i+1)+' '+str(i+2)+'\n') f1.close() f2 = file('test2.dat','w') for i in range(10000): f2.write(str(i)+' '+str(i+1)+' '+str(i+2)+'\n') f2.close() ------------------------------------------- numarray_pb_sample.py ------------------------------------------- import numarray data1 = readcol2.readcol('test1.dat',columns=[1,2,3],comments='#',delimiter=' ',dep=1,arraytype='numstring') data2 = readcol2.readcol('test2.dat',columns=[1,2,3],comments='#',delimiter=' ',dep=1,arraytype='numstring') #or in non string array form (same result) ## data1 = readcol2.readcol('test1.dat',columns=[1,2,3],comments='#',delimiter=' ',dep=1,arraytype='numeric') ## data2 = readcol2.readcol('test2.dat',columns=[1,2,3],comments='#',delimiter=' ',dep=1,arraytype='numeric') for a_i in range(data1.shape[0]): for b_i in range(data2.shape[0]): if numarray.all(data1[a_i,:] == data2[b_i,:]): print a_i,b_i ------------------------------------------- python_list_sample.py ------------------------------------------- data1 = readcol2.readcol('test1.dat',columns=[1,2,3],comments='#',delimiter=' ',dep=1,arraytype='list') data2 = readcol2.readcol('test2.dat',columns=[1,2,3],comments='#',delimiter=' ',dep=1,arraytype='list') for a_i in range(len(data1)): for b_i in range(len(data2)): if data1[a_i] == data2[b_i]: print a_i,b_i ------------------------------------------------------- SF.Net email is sponsored by: Tame your development challenges with Apache's Geronimo App Server. Download it for free - -and be entered to win a 42" plasma tv or your very own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php

Next Message by Thread: click to view message preview

Re: numarray speed problem

Thank you very much. I saw no answer before. It's why I reduce a lot the sample :) I'll try it now Todd Miller wrote: Hi H, I did some work on this problem based on your previous post but apparently my response never made it to numpy-discussion. In a nutshell, I made numarray 12x faster for a benchmark like your numarray_pb_sample.py by speeding up string comparisons and improving all(). The changes are in numarray CVS but there is no Source Forge release that contains them yet. numarray-1.4.0 is still several weeks away. If you want to try CVS from UNIX/Linux just do: % cvs -d:pserver:anonymous@xxxxxxxxxxxxxxxxxxx:/cvsroot/numpy login % cvs -z3 -d:pserver:anonymous@xxxxxxxxxxxxxxxxxxx:/cvsroot/numpy co -P numarray Regards, Todd Humufr wrote: Hello, I have a problem with numarray and especially the function numarray.all. I want to compare two files to do this I read the files with a function readcol2 who can put them in a list or numarray format (string or numerical). I'm doing a comparaison on each line of the file. If I'm using the array format and the numarray.all function, that take forever to do the comparaison for 2 big files. If I'm using python list object, it's very fast. I think there are some problem or at least some improvement to do. If I understand correctly the goal of numarray, it has been write to speed up some part of python but here it slow down a lot. An very simple sample to see the effect is at the bottom of this mail. Thanks for numarray, I hope to not bother you. My comments are more to improve numarray than other things. I have been able to find the problem so no I can avoied it. H. def readcol(fname,comments='%',columns=None,delimiter=None,dep=0,arraytype='list'): """ Load ASCII data from fname into an array and return the array. The data must be regular, same number of values in every row fname can be a filename or a file handle. Input: - Fname : the name of the file to read Optionnal input: - comments : a string to indicate the charactor to delimit the domments. the default is the matlab character '%'. - columns : list or tuple ho contains the columns to use. - delimiter : a string to delimit the columns - dep : an integer to indicate from which line you want to begin to use the file (useful to avoid the descriptions lines) - arraytype : a string to indicate which kind of array you want ot have: numeric array (numeric) or character array (numstring) or list (list). By default it's the list mode used matfile data is not currently supported, but see Nigel Wade's matfile ftp://ion.le.ac.uk/matfile/matfile.tar.gz Example usage: x,y = transpose(readcol('test.dat')) # data in two columns X = readcol('test.dat') # a matrix of data x = readcol('test.dat') # a single column of data x = readcol('test.dat,'#') # the character use like a comment delimiter is '#' initial function from pylab (J.Hunter). Change by myself for my specific need """ from numarray import array,transpose fh = file(fname) X = [] numCols = None nline = 0 if columns is None: for line in fh: nline += 1 if dep is not None and nline <= dep: continue line = line[:line.find(comments)].strip() if not len(line): continue if arraytype=='numeric': row = [float(val) for val in line.split(delimiter)] else: row = [val.strip() for val in line.split(delimiter)] thisLen = len(row) if numCols is not None and thisLen != numCols: raise ValueError('All rows must have the same number of columns') X.append(row) else: for line in fh: nline +=1 if dep is not None and nline <= dep: continue line = line[:line.find(comments)].strip() if not len(line): continue row = line.split(delimiter) if arraytype=='numeric': row = [float(row[i-1]) for i in columns] elif arraytype=='numstring': row = [row[i-1].strip() for i in columns] else: row = [row[i-1].strip() for i in columns] thisLen = len(row) if numCols is not None and thisLen != numCols: raise ValueError('All rows must have the same number of columns') X.append(row) if arraytype=='numeric': X = array(X) r,c = X.shape if r==1 or c==1: X.shape = max([r,c]), elif arraytype == 'numstring': import numarray.strings # pb if numeric+pylab X = numarray.strings.array(X) r,c = X.shape if r==1 or c==1: X.shape = max([r,c]), return X ------------------------------------------- files_test_creation.py ------------------------------------------- f1 = file('test1.dat','w') for i in range(10000): f1.write(str(i)+' '+str(i+1)+' '+str(i+2)+'\n') f1.close() f2 = file('test2.dat','w') for i in range(10000): f2.write(str(i)+' '+str(i+1)+' '+str(i+2)+'\n') f2.close() ------------------------------------------- numarray_pb_sample.py ------------------------------------------- import numarray data1 = readcol2.readcol('test1.dat',columns=[1,2,3],comments='#',delimiter=' ',dep=1,arraytype='numstring') data2 = readcol2.readcol('test2.dat',columns=[1,2,3],comments='#',delimiter=' ',dep=1,arraytype='numstring') #or in non string array form (same result) ## data1 = readcol2.readcol('test1.dat',columns=[1,2,3],comments='#',delimiter=' ',dep=1,arraytype='numeric') ## data2 = readcol2.readcol('test2.dat',columns=[1,2,3],comments='#',delimiter=' ',dep=1,arraytype='numeric') for a_i in range(data1.shape[0]): for b_i in range(data2.shape[0]): if numarray.all(data1[a_i,:] == data2[b_i,:]): print a_i,b_i ------------------------------------------- python_list_sample.py ------------------------------------------- data1 = readcol2.readcol('test1.dat',columns=[1,2,3],comments='#',delimiter=' ',dep=1,arraytype='list') data2 = readcol2.readcol('test2.dat',columns=[1,2,3],comments='#',delimiter=' ',dep=1,arraytype='list') for a_i in range(len(data1)): for b_i in range(len(data2)): if data1[a_i] == data2[b_i]: print a_i,b_i ------------------------------------------------------- SF.Net email is sponsored by: Tame your development challenges with Apache's Geronimo App Server. Download it for free - -and be entered to win a 42" plasma tv or your very own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php _______________________________________________ Numpy-discussion mailing list Numpy-discussion@xxxxxxxxxxxxxxxxxxxxx https://lists.sourceforge.net/lists/listinfo/numpy-discussion ------------------------------------------------------- SF.Net email is sponsored by: Tame your development challenges with Apache's Geronimo App Server. Download it for free - -and be entered to win a 42" plasma tv or your very own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
Sign up for updates to this mailing list. email:
Loading Comments...
Home | News | Patents | Sitemap | FAQ | advertise

Advertising by