You signed in with another tab or window.Reload to refresh your session.You signed out in another tab or window.Reload to refresh your session.You switched accounts on another tab or window.Reload to refresh your session.Dismiss alert
Optimized Python IPC: Uses shared memory to bypass multiprocessing queue I/O bottlenecks, ideal for large data (1MB+) in scientific computing, RL, etc. Reduces system load and improves latency
Python's standardmultiprocessing.Queue relies on_winapi.CreateFile for inter-process communication (IPC), introducing significant I/O overhead. This can become a performance bottleneck in demanding applications like Reinforcement Learning, scientific computing, or distributed systems that transfer large amounts of data (e.g. NumPy arrays or tensors) between processes (actors, replay buffers, trainers, etc.).
py-sharedmemory provides an alternative that utilizesmultiprocessing.shared_memory (and therefore_winapi.CreateFileMapping) for the main data and sends only lightweight metadata through the queues. This eliminates most inter-process I/O, reducing system load and latency. If you're hitting performance limits with standard queues,py-sharedmemory may help.
Usage example
importmultiprocessingasmpfrommemoryimportcreate_shared_memory_pair,SharedMemorySender,SharedMemoryReceiverdefproducer_sm(sender:SharedMemorySender):your_data="your data"sender.put(your_data)# blocks until space is availablesender.put(your_data,timeout=3)# raises queue.Full exception after 3ssender.put(your_data,block=False)# raises queue.Full exception if no space availablesender.put_nowait(your_data)# ^ equivalent to above# ...# wait for all data to be received before closing the sender# to properly close all shared memory objectssender.wait_for_all_ack()defconsumer_sm(receiver:SharedMemoryReceiver):data=receiver.get()# blocksdata=receiver.get(timeout=3)# raises queue.Empty exception after 3sdata=receiver.get(block=False)# raises queue.Empty exception if no data availabledata=receiver.get_nowait()# ^ equivalent to above# ...if__name__=='__main__':sender,receiver=create_shared_memory_pair(capacity=5)mp.Process(target=producer_sm,args=(sender,)).start()mp.Process(target=consumer_sm,args=(receiver,)).start()
Considerations
There is a certain overhead to allocating shared memory which is especially noticable for smaller objects.Use the following heuristic depending on the size of the data you are handling:
10B
100B
1KB
10KB
100KB
1MB
10MB
100MB
1GB
10GB
mp.Queue()
✅
✅
✅
✅
✅
❌
❌
❌
❌
❌
py-sharedmemory
❌
❌
❌
❌
❌
✅
✅
✅
✅
✅
Performance Testing
I benchmarked data transfer performance using both the standardmultiprocessing.Queue and mypy-sharedmemory implementation:
Starting around 1MB per message,py-sharedmemory matches or slightly trails the standard queue in speed. However, the key advantage is that it avoids generating I/O, which becomes critical at larger data sizes. Notably, the standard implementation fails with 10GB messages, whilepy-sharedmemory handles them reliably.
Here’s the I/O load on my Windows system using the standard queue:
And here’s the I/O load usingpy-sharedmemory:
In practice,py-sharedmemory delivers smoother and more stable performance, with consistent put/get times and no slowdowns, especially under high data throughput.
About
Optimized Python IPC: Uses shared memory to bypass multiprocessing queue I/O bottlenecks, ideal for large data (1MB+) in scientific computing, RL, etc. Reduces system load and improves latency