Unformatted text preview:

Python Multiprocessing Module Ali Alzabarah 1 Introduction 2 Python and concurrency 3 Multiprocessing VS Threading 4 Multiprocessing module 5 Pool of worker 6 Distributed concurrency 7 Credit when credit is due 8 References Introduction Thread is a thread of execution in a program Aka lightweight process Process is an instance of a computer program that is being executed Thread share the memory and state of the parent process share nothing Process use inter process communication to communicate thread do not Python and Concurrency Python has three concurrency modules Thread Threading Multiprocessing Python and Concurrency Thread Provides low level primitives for working with multiple threads Python first implementation of thread it is old Not included in Python 3000 Threading Construct higher level threading interface on top of thread module Python and Concurrency Multiprocessing Supports spawning process Offer local and remote concurrency New in python 2 6 Solves the issue in the threading module Python and Concurrency Why new module Python Global Interpreter Lock GIL limitation prevents a true parallelism in multi processors machines What is GIL Lock which must be acquired for a thread to enter the interpreter s space Lock assures that only one thread executes in the cPython VM at a time Python and Concurrency How GIL works It controls the transfer of control between threads Python interpreter determine how long a thread s turn runs NOT the hardware timer Python uses the OS threads as a base but python itself control the transfer of control between threads For the above reason true parallelism won t occur with Threading module So They came up with Multiprocessing to solve this issue Python and Concurrency Nevertheless you re right the GIL is not as bad as you would initially think you just have to undo the brainwashing you got from Windows and Java proponents who seem to consider threads as the only way to approach concurrent activities Guido van Rossum Multiprocessing VS Threading Let s see the problem in action I analyzed the code that was written by Jesse Noller in depth I used cProfile and pstats modules to gain an idea of how the code was handled by Python I m testing the program in Quad Core machine 8 CPU s Multiprocessing VS Threading Single Thread The program took 52 810 CPU seconds Multiprocessing VS Threading Most of the time it was executing isPrime sum primes functions Multiprocessing VS Threading Multi Threads The program took 59 337 CPU seconds This is more than what it took the single version of the same program Multiprocessing VS Threading Most of the time was used by a built in method acquire Multiprocessing VS Threading Wait Built in method Threading acquire method was not in the code Multiprocessing VS Threading Built in acquire This must be the GIL Multi processes Took only 11 968 seconds 5 times faster Multiprocessing VS Threading Most of the time was spent in waiting for other processes to finish Multiprocessing VS Threading So How does multiprocessing module solve the problem It uses subprocesses instead of thread Therefore it allow the programmer to fully leverage multiple processors on a given machine Multiprocessing VS Threading Differences between threading multiprocessing syntax Almost the same Threading Thread target do work args work queue Multiprocessing Process target do work args work queue I m not going to cover all the functionality that multiprocessing module provides but I will discuss what is new Any functionality that threading module provides is also in the multiprocessing module Multiprocessing Module Remember Processes share nothing Processes communicate over interprocess communication channel This was not an issue with Threading module Python developers had to find a way for processes to communicate and share date Otherwise The module will not be as efficient as it is Exchange Object between Processes Multiprocessing module has two communication channels Communication channels Queues Pipes Exchange Object between Processes Queues Returns a process shared queue Any pickle able object can pass through it Thread and process safe Pipes Returns a pair of connection objects connect by a pipe Every object has send recv methods that are used in the communication between processes Exchange Object between Processes let s see an example Queues simple example The program creates two queues Tasks queue that has range of int Results queue that is empty It is used to store results Then creates n workers each worker get a data number from shared queue multiply it by 2 and store it in the result queue Exchange Object between Processes Exchange Object between Processes Observation Result is not in order even if our tasks queue was in order This is because the program run in parallel Queue get return the data to the worker and delete it Part of the output Sharing state between processes Multiprocessing module has two ways to share state between processes Shared data Shared memory Server process Sharing state between processes Shared memory Python provide two ways for the data to be stored in a shared memory map Value The return value is a synchronized wrapper for the object Array The return value is a synchronized wrapper for the array Sharing state between processes Server process A Manager object control a server process that holds python objects and allow other process to manipulate them What is Manager Controls server process which manages shared object It make sure the shared object get updated in all processes when anyone modifies it Sharing state between processes Let s see an example of sharing state between processes The program create a Manager list share it between n number of workers every worker update an index After all workers finish the new list is printed to stdout Sharing state between processes Server process simple example Sharing state between processes Observation We did not have to worry about synchronizing the access to the list The manager took care of that all processes see the same list and act on one shared list Result when n 10000 Multiprocessing module Summary of the last 10 slides Communication channels Queues Pipes Shared data Shared memory value array Server Manager Let s discover other cool features in our module Pool of worker Multiprocessors module has a Pool class that Distribute the work between worker Collect the return value as a list You do not have to worry about managing queue processes shared date stats


View Full Document

CU-Boulder CSCI 5828 - Python Multiprocessing Module

Documents in this Course
Drupal

Drupal

31 pages

Deadlock

Deadlock

23 pages

Deadlock

Deadlock

23 pages

Deadlock

Deadlock

22 pages

Load more
Loading Unlocking...
Login

Join to view Python Multiprocessing Module and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Python Multiprocessing Module and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?