best parallelisation strategy on python
On 04/18/2018 07:16 PM, simona bellavista wrote:
> I have a code fortran 90 that is parallelised with MPI. I would like to traslate it in python, but I am not sure on the parallelisation strategy and libraries. I work on clusters, with each node with 5GB memory and 12 processors or 24 processors (depending on the cluster I am using). Ideally I would like to split the computation on several nodes.
> Let me explain what this code does: It read ~100GB data, they are divided in hdf5 files of ~25GB each. The code should read the data, go through it and then select a fraction of the data, ~1GB and then some CPU intensive work on it, and repeat this process many times, say 1000 times, then write the results to a single final file.
> I was thinking that the CPU intensive part would be written as a shared object in C.
> Do you have suggestions about which library to use?
Since your Fortran code already uses MPI, why not use MPI with Python as
well? I know there are python bindings for MPI. That way you could use
python while keeping the MPI workflow.