lmdb

http://github.com/dw/py-lmdb

This is a Python wrapper for the OpenLDAP MDB ‘Lightning’ Database. Two versions are provided and automatically selected during installation: a cffi version that supports PyPy and a custom module for CPython. Python 3.x is not supported yet.

As no packages are available the MDB library is bundled inline with the wrapper and built statically.

Introduction

MDB is a tiny database with some excellent properties:

  • Ordered-map interface (keys are always sorted)
  • Reader/writer transactions: readers don’t block writers and writers don’t block readers. Each environment supports one concurrent write transaction.
  • Read transactions are extremely cheap: under 400 nanoseconds on CPython.
  • Environments may be opened by multiple processes on the same host, making it ideal for working around Python’s GIL.
  • Multiple sub-databases may be created with transactions covering all sub-databases.
  • Memory mapped, allowing for zero copy lookup and iteration. This is optionally exposed to Python using the buffer() interface.
  • Maintenance requires no external process or background threads.
  • No application-level caching is required: MDB relies entirely on the operating system’s buffer cache.
  • 32kb of object code and 6kLOC of C.

Installation

To install the Python module, ensure a C compiler and pip or easy_install are available and type:

pip install lmdb
# or
easy_install lmdb

Note: on PyPy the wrapper depends on cffi which in turn depends on libffi, so you may need to install the development package for it. On Debian/Ubuntu:

apt-get install libffi-dev

You may also use the cffi version on CPython. This is accomplished by setting the LMDB_FORCE_CFFI environment variable before installation or before module import with an existing installation:

>>> import os
>>> os.environ['LMDB_FORCE_CFFI'] = '1'

>>> # cffi version is loaded.
>>> import lmdb

Sub-databases

To use the sub-database feature you must call lmdb.open() or lmdb.Environment with a max_dbs= parameter set to the number of databases required. This must be done by the first process or thread opening the environment as it is used to allocate resources kept in shared memory.

Caution: MDB implements sub-databases by storing a special descriptor key in the main database. All databases in an environment share the same file. Because a sub-database is just a key in the main database, attempts to create one will fail if this key already exists. Furthermore the key is visible to lookups and enumerations. If your main database keyspace conflicts with the names you are using for sub-databases then consider moving the contents of your main database to another sub-database.

>>> env = lmdb.open('/tmp/test', max_dbs=2)
>>> with env.begin(write=True) as txn
...     txn.put('somename', 'somedata')

>>> # Error: database cannot share name of existing key!
>>> subdb = env.open_db('somename')

Caution: when a sub-database has been opened with Environment.open_db() the resulting handle is shared with all environment users. In particular this means any user calling Environment.close_db() will invalidate the handle for all users. For this reason databases are never closed automatically, you must do it explicitly.

There is little reason to close a handle: open handles only consume slots in the shared environment and repeated calls to Environment.open() for the same name return the same handle. Simply setting max_dbs= higher than the maximum number of handles required will alleviate any need to coordinate management amongst users.

Storage efficiency & limits

MDB groups records in pages matching the operating system memory manager’s page size which is usually 4096 bytes. In order to maintain its internal structure each page must contain a minimum of 2 records, in addition to 8 bytes per record and a 16 byte header. Due to this the engine is most space-efficient when the combined size of any (8+key+value) combination does not exceed 2040 bytes.

When an attempt to store a record would exceed the maximum size, its value part is written separately to one or more pages. Since the trailer of the last page containing the record value cannot be shared with other records, it is more efficient when large values are an approximate multiple of 4096 bytes, minus 16 bytes for an initial header.

Space usage can be monitored using Environment.stat():

>>> pprint(env.stat())
{'branch_pages': 1040L,
 'depth': 4L,
 'entries': 3761848L,
 'leaf_pages': 73658L,
 'overflow_pages': 0L,
 'psize': 4096L}

This database contains 3,761,848 records and no values were spilled (overflow_pages).

By default record keys are limited to 511 bytes in length, however this can be adjusted by rebuilding the library.

Buffers

Since MDB is memory mapped it is possible to access record data without keys or values ever being copied by the kernel, database library, or application. To exploit this the library can be instructed to return buffer() objects instead of strings by passing buffers=True to Environment.begin() or Transaction.

In Python buffer() objects can be used in many places where strings are expected. In every way they act like a regular sequence: they support slicing, indexing, iteration, and taking their length. Many Python APIs will automatically convert them to bytestrings as necessary, since they also implement __str__():

>>> txn = env.begin(buffers=True)
>>> buf = txn.get('somekey')
>>> buf
<read-only buffer ptr 0x12e266010, size 4096 at 0x10d93b970>

>>> len(buf)
4096
>>> buf[0]
'a'
>>> buf[:2]
'ab'
>>> value = str(buf)
>>> len(value)
4096
>>> type(value)
<type 'str'>

It is also possible to pass buffers directly to many native APIs, for example file.write(), socket.send(), zlib.decompress() and so on.

A buffer may be sliced without copying by passing it to buffer():

>>> # Extract bytes 10 through 210:
>>> sub_buf = buffer(buf, 10, 200)
>>> len(sub_buf)
200

Caution: in CPython buffers returned by Transaction and Cursor are reused, so that consecutive calls to Transaction.get or any of the Cursor methods will overwrite the objects that have already been returned. To preserve a value returned in a buffer, convert it to a string using str().

>>> txn = env.begin(write=True, buffers=True)
>>> txn.put('key1', 'value1')
>>> txn.put('key2', 'value2')

>>> val1 = txn.get('key1')
>>> vals1 = str(val1)
>>> vals1
'value1'
>>> val2 = txn.get('key2')
>>> str(val2)
'value2'

>>> # Caution: the buffer object is reused!
>>> str(val1)
'value2'

>>> # But our string copy was preserved!
>>> vals1
'value1'

Caution: in both PyPy and CPython, returned buffers absolutely should not be used after their generating transaction has completed, or after you modified the database in the same transaction!

Interface

lmdb.open(path, **kwargs)

Shortcut for Environment constructor.

Environment class

class lmdb.Environment(path, map_size=10485760, subdir=True, readonly=False, metasync=True, sync=True, map_async=False, mode=420, create=True, max_readers=126, max_dbs=0)

Structure for a database environment. An environment may contain multiple databases, all residing in the same shared-memory map and underlying disk file.

To write to the environment a Transaction must be created. One simultaneous write transaction is allowed, however there is no limit on the number of read transactions even when a write transaction exists. Due to this, write transactions should be kept as short as possible.

Equivalent to mdb_env_open()

path:
Location of directory (if subdir=True) or file prefix to store the database.
map_size:

Maximum size database may grow to; used to size the memory mapping. If database grows larger than map_size, an exception will be raised and the user must close and reopen Environment. On 64-bit there is no penalty for making this huge (say 1TB). Must be <2GB on 32-bit.

Note: the default map size is set low to encourage a crash, so users can figure out a good value before learning about this option too late.

subdir:
If True, path refers to a subdirectory to store the data and lock files in, otherwise it refers to a filename prefix.
readonly:
If True, disallow any write operations. Note the lock file is still modified.
metasync:
If False, never explicitly flush metadata pages to disk. OS will flush at its discretion, or user can flush with sync().
sync
If False, never explicitly flush data pages to disk. OS will flush at its discretion, or user can flush with sync(). This optimization means a system crash can corrupt the database or lose the last transactions if buffers are not yet flushed to disk.
mode:
File creation mode.
create:
If False, do not create the directory path if it is missing.
max_readers:
Slots to allocate in lock file for read threads; attempts to open the environment by more than this many clients simultaneously will fail. Only meaningful for environments that aren’t already open.
max_dbs:
Maximum number of databases available. If 0, assume environment will be used as a single database.
begin(**kwargs)

Shortcut for lmdb.Transaction

close()

Close the environment, invalidating any open iterators, cursors, and transactions.

Equivalent to mdb_env_close()

copy(path)

Make a consistent copy of the environment in the given destination directory.

Equivalent to mdb_env_copy()

info()

Return some nice environment information as a dict:

map_addr Address of database map in RAM.
map_size Size of database map in RAM.
last_pgno ID of last used page.
last_txnid ID of last committed transaction.
max_readers Maximum number of threads.
num_readers Number of threads in use.

Equivalent to mdb_env_info()

open_db(name=None, txn=None, reverse_key=False, dupsort=False, create=True)

Open a database, returning an opaque handle. Repeat open_db() calls for the same name will return the same handle. As a special case, the main database is always open.

Equivalent to mdb_dbi_open()

A newly created database will not exist if the transaction that created it aborted, nor if another process deleted it. The handle resides in the shared environment, it is not owned by the current transaction or process. Only one thread should call this function; it is not mutex-protected in a read-only transaction.

Preexisting transactions, other than the current transaction and any parents, must not use the new handle, nor must their children.

name:
Database name. If None, indicates the main database should be returned, otherwise indicates a sub-database should be created inside the main database. In other words, a key representing the database will be visible in the main database, and the database name cannot conflict with any existing key
txn:
Transaction used to create the database if it does not exist. If unspecified, a temporarily write transaction is used. Do not call open_db() from inside an existing transaction without supplying it here. Note the passed transaction must have write=True.
reverse_key:
If True, keys are compared from right to left (e.g. DNS names).
dupsort:

Duplicate keys may be used in the database. (Or, from another perspective, keys may have multiple data items, stored in sorted order.) By default keys must be unique and may have only a single data item.

dupsort is not yet fully supported.

create:
If True, create the database if it doesn’t exist, otherwise raise an exception.
path()

Directory path or file name prefix where this environment is stored.

Equivalent to mdb_env_get_path()

stat()

Return some nice environment statistics as a dict:

psize Size of a database page in bytes.
depth Height of the B-tree.
branch_pages Number of internal (non-leaf) pages.
leaf_pages Number of leaf pages.
overflow_pages Number of overflow pages.
entries Number of data items.

Equivalent to mdb_env_stat()

sync(force=False)

Flush the data buffers to disk.

Equivalent to mdb_env_sync()

Data is always written to disk when Transaction.commit() is called, but the operating system may keep it buffered. MDB always flushes the OS buffers upon commit as well, unless the environment was opened with sync=False or metasync=False.

force:
If True, force a synchronous flush. Otherwise if the environment was opened with sync=False the flushes will be omitted, and with map_async=True they will be asynchronous.

Transaction class

class lmdb.Transaction(env, db=None, parent=None, write=False, buffers=False)

A transaction handle.

All operations require a transaction handle, transactions may be read-only or read-write. Transactions may not span threads; a transaction must only be used by a single thread.

Threads may only have a single transaction open for each environment.

Cursors may not span transactions; each cursor must be opened and closed within a single transaction.

Equivalent to mdb_txn_begin()

env:
Environment the transaction should be on.
db:
Default sub-database to operate on. If unspecified, defaults to the environment’s main database. Can be overridden on a per-call basis below.
parent:
None, or a parent transaction (see lmdb.h).
write:
Transactions are read-only by default. To modify the database, you must pass write=True.
buffers:

If True, indicates buffer() objects should be yielded instead of strings. This setting applies to the Transaction instance itself and any Cursors created within the transaction.

This feature significantly improves performance, since MDB has a zero-copy design, but it requires care when manipulating the returned buffer objects. The benefit of this facility is diminished when using small keys and values.

abort()

Abort the pending transaction.

Equivalent to mdb_txn_abort()

commit()

Commit the pending transaction.

Equivalent to mdb_txn_commit()

cursor(db=None)

Shortcut for lmdb.Cursor(db, self)

delete(key, value='', db=None)

Delete a key from the database.

Equivalent to mdb_del()

key:
The key to delete.
value:
If the database was opened with dupsort=True and value is not the empty string, then delete elements matching only this (key, value) pair, otherwise all values for key are deleted.

Returns True if at least one key was deleted.

drop(db, delete=True)

Delete all keys in a sub-database, and optionally delete the sub-database itself. Deleting the sub-database causes it to become unavailable, and invalidates existing cursors.

Equivalent to mdb_drop()

get(key, default=None, db=None)

Fetch the first value matching key, otherwise return default. A cursor must be used to fetch all values for a key in a dupsort=True database.

Equivalent to mdb_get()

put(key, value, dupdata=False, overwrite=True, append=False, db=None)

Store a record, returning True if it was written, or False to indicate the key was already present and override=False.

Equivalent to mdb_put()

key:
String key to store.
value:
String value to store.
dupdata:
If True and database was opened with dupsort=True, add pair as a duplicate if the given key already exists. Otherwise overwrite any existing matching key.
overwrite:
If False, do not overwrite any existing matching key.
append:
If True, append the pair to the end of the database without comparing its order first. Appending a key that is not greater than the highest existing key will cause corruption.

Cursor class

class lmdb.Cursor(db, txn)

Structure for navigating a database.

Equivalent to mdb_cursor_open()

db:
Database to navigate.
txn:
Transaction to navigate.

As a convenience, Transaction.cursor() can be used to quickly return a cursor:

>>> env = lmdb.open('/tmp/foo')
>>> child_db = env.open_db('child_db')
>>> with env.begin() as txn:
...     cursor = txn.cursor()           # Cursor on main database.
...     cursor2 = txn.cursor(child_db)  # Cursor on child database.

Cursors start in an unpositioned state: if iternext() or iterprev() are used in this state, iteration proceeds from the start or end respectively. Iterators directly position using the cursor, meaning strange behavior results when multiple iterators exist on the same cursor.

>>> with env.begin() as txn:
...     for i, (key, value) in enumerate(txn.cursor().iterprev()):
...         print '%dth last item is (%r, %r)' % (1 + i, key, value)

Both forward() and reverse() accept keys and values arguments. If both are True, then the value of item() is yielded on each iteration. If only keys is True, key() is yielded, otherwise only value() is yielded.

Prior to iteration, a cursor can be positioned anywhere in the database:

>>> with env.begin() as txn:
...     cursor = txn.cursor()
...     if not cursor.set_range('5'): # Position at first key >= '5'.
...         print 'Not found!'
...     else:
...         for key, value in cursor: # Iterate from first key >= '5'.
...             print key, value

Iteration is not required to navigate, and sometimes results in ugly or inefficient code. In cases where the iteration order is not obvious, or is related to the data being read, use of set_key(), set_range(), key(), value(), and item() are often preferable:

>>> # Record the path from a child to the root of a tree.
>>> path = ['child14123']
>>> while path[-1] != 'root':
...     assert cursor.set_key(path[-1]), \
...         'Tree is broken! Path: %s' % (path,)
...     path.append(cursor.value())
count()

Return the number of duplicates for the current key. This is only meaningful for databases that have dupdata=True.

Equivalent to mdb_cursor_count()

delete()

Delete the current element and move to the next element, returning True on success or False if the database was empty.

Equivalent to mdb_cursor_del()

first()

Move to the first element, returning True on success or False if the database is empty.

Equivalent to mdb_cursor_get() with MDB_FIRST

item()

Return the current (key, value) pair.

iternext(keys=True, values=True)

Return a forward iterator that yields the current element before calling next(), repeating until the end of the database is reached. As a convenience, Cursor implements the iterator protocol by automatically returning a forward iterator when invoked:

>>> # Equivalent:
>>> it = iter(cursor)
>>> it = cursor.iternext(keys=True, values=True)

If the cursor was not yet positioned, it is moved to the first record in the database, otherwise iteration proceeds from the current position.

iterprev(keys=True, values=True)

Return a reverse iterator that yields the current element before calling prev(), until the start of the database is reached.

If the cursor was not yet positioned, it is moved to the last record in the database, otherwise iteration proceeds from the current position.

key()

Return the current key.

last()

Move to the last element, returning True on success or False if the database is empty.

Equivalent to mdb_cursor_get() with MDB_LAST

next()

Move to the next element, returning True on success or False if there is no next element.

Equivalent to mdb_cursor_get() with MDB_NEXT

prev()

Move to the previous element, returning True on success or False if there is no previous element.

Equivalent to mdb_cursor_get() with MDB_PREV

put(key, val, dupdata=False, overwrite=True, append=False)

Store a record, returning True if it was written, or False to indicate the key was already present and override=False. On success, the cursor is positioned on the key.

Equivalent to mdb_cursor_put()

key:
String key to store.
val:
String value to store.
dupdata:
If True and database was opened with dupsort=True, add pair as a duplicate if the given key already exists. Otherwise overwrite any existing matching key.
overwrite:
If False, do not overwrite any existing matching key.
append:
If True, append the pair to the end of the database without comparing its order first. Appending a key that is not greater than the highest existing key will cause corruption.
set_key(key)

Seek exactly to key, returning True on success or False if the exact key was not found.

It is an error to set_key() the empty string.

Equivalent to mdb_cursor_get() with MDB_SET_KEY

set_range(key)

Seek to the first key greater than or equal key, returning True on success, or False to indicate key was past end of database.

Behaves like first() if key is the empty string.

Equivalent to mdb_cursor_get() with MDB_SET_RANGE

value()

Return the current value.

Exceptions

class lmdb.Error(what, code=0)

Raised when any MDB error occurs.