Multiprocessing, Multithreading, and GIL: Essential concepts for every Python developer

Multithreading vs Multiprocessing

Multithreading

  • A single process, having multiple code segments that can be run concurrently
  • Each code segment is called a thread. A process having multiple threads is called a multi-threaded process
  • The process memory is shared among the threads. So thread A can access the variables declared by thread B
  • Gives the impression of parallel execution, but it’s actually concurrency which is not the same as parallelism. Although, threads can run in parallel in a multi-core environment (more on this later)
  • Threads are easier to create and easier to throw away

Multiprocessing

  • Multiple processes, working independently of each other. Each process might have one or more threads. But a single thread is the default.
  • Each process has its own memory space. So process A cannot access the memory of process B
  • Two different processes can run at two different cores in parallel independent of each other
  • There is a significant overhead of creating and throwing away processes

What’s so different in Python?

Introducing Global Interpreter Lock (GIL)

import sysa = 'Hello World'
b = a
c = a
sys.getrefcount(a) # outputs 4

The impact of GIL in multithreaded Python program

The Remedy

Multithreading with ThreadPoolExecutor

from concurrent.futures import ThreadPoolExecutor, waitdef scrape_page(url):
# ... scraping logic
# remove the following line when you have written the logic
raise NotImplementedError
def batch_scrape(urls):
tasks = []
with ThreadPoolExecutor(max_workers=8) as executor:
for url in urls:
# for executor.submit, the first argument will be the name of the function to execute. All the argument after that will be passed as the executing function's argument
tasks.append(executor.submit(scrape_page, url))
wait(tasks)
if __name__ == "__main__":
urls = ['https://google.com', 'htpps://facebook.com']
batch_scrape(urls)

Multiprocessing with ProcessPoolExecutor

from concurrent.futures import ProcessPoolExecutor, waitdef encode_video(file):
# ... encoding logic
# remove the following line when you have written the logic
raise NotImplementedError
def batch_encode(files):
tasks = []
with ProcessPoolExecutor(max_workers=4) as executor:
for file in files:
tasks.append(executor.submit(encode_video, file))
wait(tasks)
if __name__ == "__main__":
filePaths = ['file1.mp4', 'file2.mp4']
batch_encode(filePaths)

Why GIL?

  1. Other languages like Java/C++ use different locking mechanisms but with the cost of decreased performance for single-threaded programs. To overcome single-threaded performance issue they use something like JIT compilers
  2. If you try to add multiple locks, there might be a deadlock situation. Also constantly releasing and acquiring locks has performance bottlenecks. It’s not very easy to overcome these things while keeping awesome language features. GIL is a single lock and simple to implement.
  3. Python is popular and widely used because of its underlying support for C extension libraries. C libraries needed a thread-safe solution. GIL is a single lock on the interpreter, so there is no chance of deadlocks. Also, it’s simpler to implement and maintain. So ultimately GIL was chosen to support all those C extensions
  4. Developers and researchers tried to remove GIL in the past. But as a result, they saw a significant performance drop for single-threaded applications. You should note that most general applications are single-threaded. Also, the underlying C libraries on which Python heavily depends got completely broken. A major thing like GIL cannot be removed without causing backward compatibility issues or slowing down performance. But still, researchers are trying to get rid of GIL and it’s a topic of interest for many.
  5. Ultimately, it seemed that the GIL limitations are not causing any impact when it comes to writing large and complex applications. After all, multiprocessing is still there to solve such problems. Today’s modern computers have enough resource and memory to tackle multiprocessing related overheads.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Ahmed Sadman Muhib

Ahmed Sadman Muhib

I’m a Software Engineer interested in Web, Cloud and Cross-platform technologies. Favorite things of mine are reading, watching movies and eating burgers