More CPUs doen't equal more speed
On 2019-05-23 22:41, Avi Gross via Python-list wrote:
> As others have noted, you have not made it clear how what you are doing is
> running "in parallel."
> I have a similar need where I have thousands of folders and need to do an
> analysis based on the contents of one at a time and have 8 cores available
> but the process may run for months if run linearly. The results are placed
> within the same folder so each part can run independently as long as shared
> resources like memory are not abused.
> Your need is conceptually simple. Break up the list of filenames into N
> batches of about equal length. A simple approach might be to open N terminal
> or command windows and in each one start a python interpreter by hand
> running the same program which gets one of the file lists and works on it.
> Some may finish way ahead of others, of course. If anything they do writes
> to shared resources such as log files, you may want to be careful. And there
> is no guarantee that several will not run on the same CPU. There is also
> plenty of overhead associated with running full processes. I am not
> suggesting this but it is fairly easy to do and may get you enough speedup.
> But since you only seem to need a few minutes, this won't be much.
> Quite a few other solutions involve using some form of threads running
> within a process perhaps using a queue manager. Python has multiple ways to
> do this. You would simply feed all the info needed (file names in your case)
> to a thread that manages a queue. It would allow up to N threads to be
> started and whenever one finishes, would be woken to start a replacement
> till done. Unless one such thread takes very long, they should all finish
> reasonably close to each other. Again, lots of details to make sure the
> threads do not conflict with each other. But, no guarantee which core they
> get unless you use an underlying package that manages that.
Because of the GIL, only 1 Python thread will actually be running at any
time, so if it's processor-intensive, it's better to use multiprocessing.
Of course, if it's already maxing out the disk, then using more cores
won't make it faster.