Running multiple processes in Python

I've recently got a multi-core laptop, so was keen to try some parallel processing using Python. It's pretty simple; you just need to use:

os.fork()

However, the difficult part is working out what happens after the fork, and working out how to build a program around it. The difficulty is that when the program reaches the os.fork() command, the program splits into two identical copies. But generally you don't want two copies a program doing exactly the same thing - you want two programs doing slightly different things. Even trying to create difference using random numbers is problematic.

Differentiating between processes

Naturally, there is a way to differentiate between the parent and child processes: when the os.fork() is called it returns 0 to the child process and id of the child process to the parent.

import os
pid = os.fork()
print pid

As a result, it's possible to make the parent and child processes do different things. For example, the following will write two different files with different outputs:

import os
pid = os.fork()

if pid > 0:
    fout = open('child.txt', 'w')
    fout.write('File created by child process %d' % pid)
else:
    fout = open('parent.txt', 'w')
    fout.write('File created by parent process')

fout.write('\nEnd of file')

Waiting for a child process

If you've created a child process, the chances are you want the parent to wait for it to finish doing whatever its doing before the parent continues. For this you need to use os.waitpid(pid, 0). For example:

import os, time

def timeConsumingFunction():
    x = 1
    for n in xrange(10000000):
        x += 1

pid = os.fork()

if pid > 0:
    child = pid
else:
    timeConsumingFunction()
    os._exit(0)

t = time.time()
os.waitpid(child, 0)
print time.time() - t

Here, the parent process splits of a child which counts to ten million, while the parent waits. Once the child has finished calling the timeConsumingFunction, it exists with os._exit(0). Note that os._exit(0) is used for child processes instead of os.exit(0). The 0 indicates that the process has exited without errors. Once the child has finished, the parent prints the time it spent waiting for the child.

Multiple forks

To create mutliple forks, we can use a loop. In this case, using os._exit(0) is vital to ensure that the child processes don't continue the loop, forking off even more children.

import os, time

NUM_PROCESSES = 7

def timeConsumingFunction():
    x = 1
    for n in xrange(10000000):
        x += 1

children = []

start_time = time.time()
for process in range(NUM_PROCESSES):
    pid = os.fork()
    if pid:
        children.append(pid)
    else:
        timeConsumingFunction()
        os._exit(0)

for i, child in enumerate(children):
    os.waitpid(child, 0)

print time.time() - t

Topics:

Comments

Thanks dude. I'm still using Python 2.4 due to Centos 5 and OS restrictions. :(
This is better than trying to install the backported processing module, making the script more portable :)

Awesome explanation man! Thanks!

This is somewhat incorrect :

pid = os.fork()

if pid > 0:

     fout = open('child.txt', 'w')

    fout.write('File created by child process %d' % pid)

else:

 

    fout = open('parent.txt', 'w')

 

    fout.write('File created by parent process')

 

from the docs its the other way around,

"Return 0 in the child and the child’s process id in the parent."

sorry to make a fuss about 3 yer old code :)

Thanks anyway to all!!! Old, but there are SOOO many things to learn...!! Pufff...

A few years later but I come with a question:)

How do you limit for example that you have maximum 5 processes running a task at a time?

If you waitpid on the process, you block the main thread execution.

Is there any way to check in a loop the number of running forked processes and if number < "predefined number", then fork a new one and if not just sleep again for a while?

 

Post new comment

The content of this field is kept private and will not be shown publicly.