Sharing Python data between processes using mmap

I’ve been toying with an idea of exposing statistics for a Python application via shared memory to keep the performance impact on the application as low as possible. The goal being an application could passively expose a number of metrics that could either be periodically polled via munin/Icinga/etc plugins or interactive tools when diagnosing issues on a system.

But first things first: I need to put data into shared memory from Python. mmap is an excellent widely-implemented POSIX system call for creating a shared memory space backed by an on-disk file.

Usually in the UNIX world you have 2 ways of accessing/manipulating data: memory addresses or streams (files). Manipulating data via memory addresses means pointers, offsets, malloc/free, etc. Stream interfaces manipulate data via read/write/seek system calls for files and send/recv/etc for sockets.

mmap gives you both interfaces. A memory mapped file can be manipulated via read/write/seek or by directly accessing its mapped memory region. The advantage of the latter is that this memory region is in userspace — meaning you can manipulate a file without incurring the overhead of write system calls for every manipulation.

Anyway, enough exposition, let’s see some code. (Despite mmap’s nice featureset, I’m only using it as a simple memory sharing mechanism anyway.) The following code shares a tiny bit of data between 2 Python processes using the excellent mmap module in the stdlib. a.py writes to the memory mapped region, and b.py reads the data out. ctypes allows for an easy way to create values in a memory mapped region and manipulate them like “normal” Python objects.

These code samples were written using Python 2.7 on Linux. They should work fine on any POSIX system, but Windows users will have to change the mmap calls to match the Windows API.

a.py

#!/usr/bin/env python
import ctypes
import mmap
import os
import struct
 
 
def main():
    # Create new empty file to back memory map on disk
    fd = os.open('/tmp/mmaptest', os.O_CREAT | os.O_TRUNC | os.O_RDWR)
 
    # Zero out the file to insure it's the right size
    assert os.write(fd, '\x00' * mmap.PAGESIZE) == mmap.PAGESIZE
 
    # Create the mmap instace with the following params:
    # fd: File descriptor which backs the mapping or -1 for anonymous mapping
    # length: Must in multiples of PAGESIZE (usually 4 KB)
    # flags: MAP_SHARED means other processes can share this mmap
    # prot: PROT_WRITE means this process can write to this mmap
    buf = mmap.mmap(fd, mmap.PAGESIZE, mmap.MAP_SHARED, mmap.PROT_WRITE)
 
    # Now create an int in the memory mapping
    i = ctypes.c_int.from_buffer(buf)
 
    # Set a value
    i.value = 10
 
    # And manipulate it for kicks
    i.value += 1
 
    assert i.value == 11
 
    # Before we create a new value, we need to find the offset of the next free
    # memory address within the mmap
    offset = struct.calcsize(i._type_)
 
    # The offset should be uninitialized ('\x00')
    assert buf[offset] == '\x00'
 
    # Now ceate a string containing 'foo' by first creating a c_char array
    s_type = ctypes.c_char * len('foo')
 
    # Now create the ctypes instance
    s = s_type.from_buffer(buf, offset)
 
    # And finally set it
    s.raw = 'foo'
 
    print 'First 10 bytes of memory mapping: %r' % buf[:10]
    raw_input('Now run b.py and press ENTER')
 
    print
    print 'Changing i'
    i.value *= i.value
 
    print 'Changing s'
    s.raw = 'bar'
 
    new_i = raw_input('Enter a new value for i: ')
    i.value = int(new_i)
 
 
if __name__ == '__main__':
    main()

b.py

import mmap
import os
import struct
import time
 
def main():
    # Open the file for reading
    fd = os.open('/tmp/mmaptest', os.O_RDONLY)
 
    # Memory map the file
    buf = mmap.mmap(fd, mmap.PAGESIZE, mmap.MAP_SHARED, mmap.PROT_READ)
 
    i = None
    s = None
 
    while 1:
        new_i, = struct.unpack('i', buf[:4])
        new_s, = struct.unpack('3s', buf[4:7])
 
        if i != new_i or s != new_s:
            print 'i: %s => %d' % (i, new_i)
            print 's: %s => %s' % (s, new_s)
            print 'Press Ctrl-C to exit'
            i = new_i
            s = new_s
 
        time.sleep(1)
 
 
if __name__ == '__main__':
    main()

(Note that I cruelly don’t clean up /tmp/mmaptest after the scripts finished. Consider it a 4KB tax for anyone who runs arbitrary code they found on the Internet without reading it first.)

This entry was posted in GNU/Linux, Open Source, Python, Technology and tagged , , , . Bookmark the permalink.
  • A.jhonson

    please correct me if i’m wrong but you’ve used the file: mmaptest to exchange the data betwean the 2 scripts “i’m new to python.

  • http://www.facebook.com/Martlark Andrew Rowe

    The file name is used as the name of the memory map.

  • Shailesh Goswami

    thank you this is the exact solution which i was in need.

  • Tom Hancock

    Is there a way to share between scripts without writing to file?
    I want to share data between processes, but I want no trace of this happening.
    Is there a way to create several memory maps and reference them individually? I want to have several different instances of memory and be able to read from each one at different times. But I do not want to write anything to file as I am passing sensitive data between the processes.

  • Ray Luo

    So in your scenario you use mmap as an IPC. The programming interface might be convenient (especially after you further wrap it in your MemoryMapFile helper). But then this IPC actually uses file as medium, so it might not meet your minimal-performance-impact design goal. It is not necessarily more superior than simply write your data into a normal file. Or did I miss something?

  • tomerfiliba

    the mmap’ed pages don’t necessarily go to disk, so long as there’s enough memory available and not all processes have unmapped the file