Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Understanding the Global Interpreter Lock and Concurrency in Python

Tech May 16 1

Process Definition

A process represents a running program. It serves as the smallest unit for task allocation within an operating system.

Thread Definition

A thread is the smallest unit responsible for execution within a process. It acts as an entity within the process, scheduled independently by the system. While threads do not own system resources themselves, they share the resources owned by the process with other threads in the same group. Threads can create or terminate other threads, and multiple threads within a single process can execute concurrently.

Differences Between Processes and Threads

  • A process is the fundamental unit of task allocation by the OS; a thread is the basic unit of execution inside a process. A process must contain at least one thread (the main thread).
  • Creating and destroying threads is lightweight; doing the same for processes is resource-intensive.
  • Context switching is faster between threads than between processes.
  • Threads within the same process can communicate directly via shared memory, whereas processes often require intermediaries like Queue or Pipe for inter-process communication (IPC).
  • Multiple processes can leverage multi-core CPUs (true parallelism), whereas standard Python threads are limited to a single core due to the GIL.
  • A new process is typically a clone of the parent process, whereas a new thread is spawned within the existing memory space.
  • The primary goal of multiprocessing is to utilize multiple CPU cores, while multithreading aims to handle waiting tasks (like I/O) on a single core efficiently.

In multiprocessing, although the memory spaces are distinct, the initial resources are often copied from the parent process.

Process Pools

A process pool provides a controlled way to manage a specific number of worker processes. When a request is submitted to the pool:

  • If the pool is not full, a new process is created to handle the task.
  • If the pool has reached its limit, the request waits until a process becomes available.

Key Benefits:

  • Limits the maximum number of concurrent processes to prevent resource exhaustion.
  • Automates resource cleanup, saving memory.
  • Improves efficiency by reusing processes rather than creating and destroying them for every task.
import os
import time
from multiprocessing import Pool

counter = 100

def task_handler(worker_name):
    print(f'Worker PID: {os.getpid()}, Name: {worker_name}')
    global counter
    for _ in range(3):
        counter += 1
        print(counter)
    time.sleep(1)

if __name__ == '__main__':
    print(f'Main Process PID: {os.getpid()}')
    worker_pool = Pool(processes=5)
    
    worker_pool.apply_async(task_handler, args=('Worker-A',))
    worker_pool.apply_async(task_handler, args=('Worker-B',))
    
    worker_pool.close()  # Prevent new tasks from being submitted
    worker_pool.join()  # Wait for all tasks to complete
    
    print('Main execution finished')
    print(f'Global counter in main: {counter}')

Output Analysis:
The main process counter remains 100 because the subprocesses operate on their own memory copy of the variable.

CPU-Bound vs I/O-Bound Tasks

  • CPU-Bound: Tasks that require heavy computation, such as calculating pi, floating-point operations, or video rendering. These benefit from multi-core processing (Multiprocessing).
  • I/O-Bound: Tasks involving network requests, disk access, or user input (e.g., Web applications). These benefit from threading as the CPU can switch to other threads while waiting.

The Global Interpreter Lock (GIL)

The GIL is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecodes simultaneously within a single process.

  • When a thread runs, it holds the GIL, blocking other threads in the same process.
  • If a thread encounters an I/O operation or a specific time slice expires, it releases the GIL, allowing another thread to run.
  • Consequently, Python threads in a single process are concurrent (taking turns) rather than parallel (running at the exact same time on multiple cores).
  • For I/O-bound programs, multithreading can still be faster because the GIL is released during waiting periods.

Why Synchronization Locks Are Needed Despite the GIL

While the GIL ensures that only one thread executes Python bytecode at a time, it does not protect specific data operations from race conditions. The GIL is applied at the interpreter level, not the logic level.

  • GIL: A mechanism to manage byte-code execution permission.
  • Synchronization Lock (Mutex): A mechanism to protect shared data. It ensures that a specific block of code (e.g., modifying a variable) completes without interruption from other threads.

Data Corruption Example (Without Lock)

The operation value += 1 is not atomic. It involves reading, adding, and writing back. With out a lock, threads can interfere.

import threading

shared_value = 0

def increment_task():
    global shared_value
    for _ in range(1000000):
        shared_value += 1

if __name__ == '__main__':
    thread_a = threading.Thread(target=increment_task)
    thread_b = threading.Thread(target=increment_task)
    
    thread_a.start()
    thread_b.start()
    
    thread_a.join()
    thread_b.join()
    
    print(f'Final value: {shared_value}')

Result: The final value will likely be less than 2,000,000 due to race conditions where updates are overwritten.

Implementing a Synchronization Lock (Context Manager)

Using with ensures the lock is acquired before the block and released automatically after.

import threading
from threading import Lock

shared_value = 0
mutex = Lock()

def safe_increment():
    global shared_value
    for _ in range(1000000):
        with mutex:
            shared_value += 1

if __name__ == '__main__':
    t1 = threading.Thread(target=safe_increment)
    t2 = threading.Thread(target=safe_increment)
    
    t1.start()
    t2.start()
    
    t1.join()
    t2.join()
    
    print(f'Corrected Final value: {shared_value}')

Implementing a Synchronization Lock (Manual Acquire/Release)

Explicitly calling acquire() and release().

import threading
from threading import Lock

shared_value = 0
mutex = Lock()

def safe_increment_manual():
    global shared_value
    for _ in range(1000000):
        mutex.acquire()
        shared_value += 1
        mutex.release()

if __name__ == '__main__':
    t1 = threading.Thread(target=safe_increment_manual)
    t2 = threading.Thread(target=safe_increment_manual)
    
    t1.start()
    t2.start()
    
    t1.join()
    t2.join()
    
    print(f'Corrected Final value: {shared_value}')

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.