lmdb¶
This is a universal Python binding for the LMDB ‘Lightning’ Database. Two variants are provided and automatically selected during install: a CFFI variant that supports PyPy and all versions of CPython >=3.9, and a C extension that supports CPython >=3.9. Both variants provide the same interface.
LMDB is a tiny database with some excellent properties:
Ordered map interface (keys are always lexicographically sorted).
Reader/writer transactions: readers don’t block writers, writers don’t block readers. Each environment supports one concurrent write transaction.
Read transactions are extremely cheap.
Environments may be opened by multiple processes on the same host, making it ideal for working around Python’s GIL.
Multiple named databases may be created with transactions covering all named databases.
Memory mapped, allowing for zero copy lookup and iteration. This is optionally exposed to Python using the
memoryview()interface.Maintenance requires no external process or background threads.
No application-level caching is required: LMDB fully exploits the operating system’s buffer cache.
When to use py-lmdb¶
Python has several key-value and embedded database options. LMDB fills a specific niche — here is how it compares.
vs. dbm (shelve, gdbm, ndbm)
The dbm family ships with Python and provides a simple dict-like
interface. LMDB is faster for read-heavy workloads (zero-copy reads via
mmap), supports concurrent readers without locking, provides ACID
transactions, and allows multiple processes to share a database safely.
dbm databases do not support transactions and offer no crash-safety
guarantees. Choose dbm only when simplicity matters more than
performance or durability.
vs. SQLite
SQLite is a full relational database with SQL, joins, indexes, and aggregation. If your data is naturally relational, SQLite is the better choice. LMDB is a key-value store — it has no query language, no schema, and no secondary indexes. Where LMDB excels is raw throughput: reads are memory-mapped with zero system calls, and readers never block writers. LMDB is a good fit when you need a fast persistent ordered map rather than a relational database — for example, caches, lookup tables, message queues, or ML feature stores.
vs. Redis
Redis is an in-memory data-structure server accessed over the network. LMDB is an embedded library — no server, no network round-trips, no serialization overhead. LMDB data is persistent on disk by default, while Redis persistence is optional and adds latency. LMDB can also handle datasets much larger than available RAM (the OS pages data in and out transparently), whereas Redis requires all data to fit in memory. Use Redis when you need shared state across machines, pub/sub, expiration, or its rich data structures (lists, sets, sorted sets). Use LMDB when you need fast local persistence within a single machine.
vs. RocksDB (python-rocksdb)
RocksDB is an LSM-tree store optimized for write-heavy workloads. It
can sustain higher write throughput than LMDB, especially for large
datasets that exceed RAM. LMDB uses a B+ tree with copy-on-write, giving
it consistently fast reads and predictable latency — there are no
background compactions that can cause latency spikes. LMDB’s read path
is a simple mmap lookup with no copying, making it significantly
faster for read-dominated workloads. RocksDB also has a much larger
dependency footprint.
vs. pickle (shelve, JSON files)
Serializing a Python dict to a pickle or JSON file is the simplest
form of persistence, but it requires loading the entire dataset into memory
and rewriting the whole file on every save. Pickle files are typically more
space-efficient since they have no page overhead or free-space tracking.
LMDB reads and writes individual records without loading the full dataset,
supports concurrent readers, and provides crash-safe transactions. For
anything beyond a small configuration file, LMDB is dramatically faster
and more robust.
Summary
py-lmdb |
pickle |
dbm |
SQLite |
Redis |
RocksDB |
|
|---|---|---|---|---|---|---|
ACID transactions |
Yes |
No |
No |
Yes |
No |
Yes |
Concurrent readers |
Lock-free |
No |
No |
WAL mode |
Yes |
Yes |
Read performance |
Excellent |
Poor |
Fair |
Good |
Good |
Good |
Write performance |
Good |
Poor |
Fair |
Good |
Excellent |
Excellent |
Larger than RAM |
Yes |
No |
Yes |
Yes |
No |
Yes |
Embedded (no server) |
Yes |
Yes |
Yes |
Yes |
No |
Yes |
Multi-process safe |
Yes |
No |
No |
Yes |
N/A |
Yes |
Zero-copy reads |
Yes |
No |
No |
No |
No |
No |
Installation: Windows¶
Binary wheels are published via PyPI for Windows, allowing the binding to be installed via pip without the need for a compiler to be present. The binary releases statically link against the bundled version of LMDB.
To install:
pip install lmdb
Installation: UNIX¶
For convenience, a supported version of LMDB is bundled with the binding and
built statically by default. If your system distribution includes LMDB, set the
LMDB_FORCE_SYSTEM environment variable, and optionally LMDB_INCLUDEDIR
and LMDB_LIBDIR prior to invoking setup.py.
By default, the bundled LMDB library is patched before building. The patches
(located in lib/py-lmdb/) provide security hardening, bug fixes, and the
ability to copy/backup an environment under a particular transaction. If you
prefer to build without patches, set the environment variable LMDB_PURE.
The CFFI variant depends on CFFI, which in turn depends on libffi, which
may need to be installed from a package. On CPython, both variants additionally
depend on the CPython development headers. On Debian/Ubuntu:
apt-get install libffi-dev python3-dev build-essential
To install the C extension, ensure a C compiler and pip are available and type:
pip install lmdb
The CFFI variant may be used on CPython by setting the LMDB_FORCE_CFFI
environment variable before installation, or before module import with an
existing installation:
>>> import os >>> os.environ['LMDB_FORCE_CFFI'] = '1' >>> # CFFI variant is loaded. >>> import lmdb
Getting Help¶
Before getting in contact, please ensure you have thoroughly reviewed this documentation, and if applicable, the associated official Doxygen documentation.
If you have found a bug or have a question, please report it on the GitHub issue tracker.
Named Databases¶
Named databases require the max_dbs= parameter to be provided when calling
lmdb.open() or lmdb.Environment. This must be done by the
first process or thread opening the environment.
Once a correctly configured Environment is created, new named
databases may be created via Environment.open_db().
Existing named databases can be listed with Environment.dbs(). This
works by iterating the main database and attempting to open each key as a named
database. It only returns reliable results when the main database is not
also used to store regular key-value pairs — do not mix named databases with
application keys in the main database.
Cursors & iteration¶
A Cursor provides fine-grained navigation over the key-value pairs
in a database. Cursors are created from a transaction and share its lifetime.
with env.begin() as txn:
with txn.cursor() as cur:
# use the cursor ...
Full scan¶
Iterate every key-value pair in the database:
with env.begin() as txn:
with txn.cursor() as cur:
for key, value in cur.iternext():
print(key, value)
iternext() starts from the first key if the cursor is
unpositioned, or from the current position otherwise. Use
iterprev() for reverse iteration (starts from the last key):
with env.begin() as txn:
with txn.cursor() as cur:
for key, value in cur.iterprev():
print(key, value) # last key first
The keys and values parameters control what is yielded. Passing
values=False avoids touching value data, which can be faster when you only
need keys:
with env.begin() as txn:
with txn.cursor() as cur:
for key in cur.iternext(values=False):
print(key)
Range queries¶
set_range() seeks to the first key greater than or equal
to the given key, enabling range scans and prefix scans:
with env.begin() as txn:
with txn.cursor() as cur:
# Iterate all keys from b'order-100' onward:
if cur.set_range(b'order-100'):
for key, value in cur.iternext():
print(key, value)
For prefix scanning, combine set_range with a key check:
with env.begin() as txn:
with txn.cursor() as cur:
prefix = b'user:'
if cur.set_range(prefix):
for key, value in cur.iternext():
if not key.startswith(prefix):
break
print(key, value)
Exact lookup¶
set_key() seeks to an exact key. It returns True if the
key exists, False otherwise. get() is a convenience
wrapper that returns the value directly (or a default):
with env.begin() as txn:
with txn.cursor() as cur:
if cur.set_key(b'mykey'):
print(cur.value()) # the value
# Or more concisely via Transaction.get():
val = txn.get(b'mykey') # returns None if missing
Batch reads¶
getmulti() looks up multiple keys in a single call, returning
a list of (key, value) tuples for each key found:
with env.begin() as txn:
with txn.cursor() as cur:
results = cur.getmulti([b'key1', b'key2', b'key3'])
for key, value in results:
print(key, value)
Keys that do not exist are silently skipped.
Batch writes¶
putmulti() writes multiple records efficiently. It returns
a (consumed, added) tuple:
with env.begin(write=True) as txn:
with txn.cursor() as cur:
items = [(b'key1', b'val1'), (b'key2', b'val2')]
consumed, added = cur.putmulti(items)
If the keys are in sorted order, pass append=True for a significant speed
improvement — LMDB skips its internal search and appends directly:
with env.begin(write=True) as txn:
with txn.cursor() as cur:
sorted_items = sorted(items)
cur.putmulti(sorted_items, append=True)
Cursor positioning¶
Navigation methods like first(), last(),
next(), prev(), set_key(),
and set_range() all return True on success and False
when there is no matching element. After a False return (or before any
positioning call) the cursor is unpositioned — key() and
value() return empty bytestrings.
with env.begin() as txn:
with txn.cursor() as cur:
cur.first() # True (unless DB is empty)
print(cur.key()) # first key
print(cur.value()) # first value
cur.last()
print(cur.item()) # (last_key, last_value)
while cur.prev(): # walk backward from last
print(cur.key())
Writing & deleting at the cursor¶
put() stores a record and positions the cursor on it.
delete() removes the current record and advances to the next.
replace() and pop() combine a read and write
in a single call:
with env.begin(write=True) as txn:
with txn.cursor() as cur:
cur.put(b'key', b'value')
# Overwrite, returning old value:
old = cur.replace(b'key', b'new_value') # returns b'value'
# Fetch and delete in one step:
val = cur.pop(b'key') # returns b'new_value'
Duplicate-sort databases¶
By default each key in a database maps to exactly one value. Opening a database
with dupsort=True allows a key to have multiple values, stored in sorted
order. This is useful for modelling one-to-many relationships (e.g. tags per
document, edges per vertex) without encoding multiple values into a single
record.
env = lmdb.open('/tmp/test', max_dbs=1)
db = env.open_db(b'edges', dupsort=True)
with env.begin(write=True, db=db) as txn:
txn.put(b'node1', b'node2')
txn.put(b'node1', b'node3')
txn.put(b'node1', b'node4')
txn.put(b'node2', b'node5')
Duplicate values for a key are always sorted lexicographically, just like keys. The maximum size of a duplicate value is limited to 511 bytes (the same as the maximum key size), because LMDB stores duplicates in a nested B-tree that treats each value as a key internally.
Reading¶
Transaction.get() returns the first (lowest-sorted) value for a key.
To access all values, use a Cursor:
with env.begin(db=db) as txn:
# Position on key, then iterate its values:
with txn.cursor() as cur:
if cur.set_key(b'node1'):
for value in cur.iternext_dup(values=True):
print(value) # b'node2', b'node3', b'node4'
# Count values for a key:
with txn.cursor() as cur:
if cur.set_key(b'node1'):
print(cur.count()) # 3
# Check if a specific (key, value) pair exists:
with txn.cursor() as cur:
print(cur.set_key_dup(b'node1', b'node3')) # True
# Iterate all unique keys (skipping duplicates):
with txn.cursor() as cur:
for key in cur.iternext_nodup():
print(key) # b'node1', b'node2'
Batch reading with Cursor.getmulti() supports dupdata=True to
return all values per key in a single call:
with env.begin(db=db) as txn:
with txn.cursor() as cur:
results = cur.getmulti([b'node1', b'node2'], dupdata=True)
# [(b'node1', b'node2'), (b'node1', b'node3'),
# (b'node1', b'node4'), (b'node2', b'node5')]
Deleting¶
Transaction.delete() accepts a value parameter to remove a
specific duplicate. Without it, all values for the key are removed.
Cursor.delete() takes dupdata=True to remove all values for the
current key, or dupdata=False (the default) to remove only the current
value.
with env.begin(write=True, db=db) as txn:
txn.delete(b'node1', b'node3') # remove one value
txn.delete(b'node2') # remove all values for key
Cursor methods¶
The following cursor methods are specific to dupsort=True databases:
first_dup()/last_dup()— move to the first/last value for the current key.next_dup()/prev_dup()— move to the next/previous value for the current key.next_nodup()/prev_nodup()— skip to the next/previous distinct key.iternext_dup()/iterprev_dup()— iterate values for the current key.iternext_nodup()/iterprev_nodup()— iterate distinct keys only, skipping duplicates.set_key_dup()— seek to an exact(key, value)pair.set_range_dup()— seek to the first(key, value)pair greater than or equal to a given pair.count()— return the number of values for the current key.
Storage efficiency & limits¶
Records are grouped into pages matching the operating system’s VM page size, which is usually 4096 bytes. Each page must contain at least 2 records, in addition to 8 bytes per record and a 16 byte header. Due to this the engine is most space-efficient when the combined size of any (8+key+value) combination does not exceed 2040 bytes.
When an attempt to store a record would exceed the maximum size, its value part is written separately to one or more dedicated pages. Since the trailer of the last page containing the record value cannot be shared with other records, it is more efficient when large values are an approximate multiple of 4096 bytes, minus 16 bytes for an initial header.
Space usage can be monitored using Environment.stat():
>>> pprint(env.stat()) {'branch_pages': 1040L, 'depth': 4L, 'entries': 3761848L, 'leaf_pages': 73658L, 'overflow_pages': 0L, 'psize': 4096L}
This database contains 3,761,848 records and no values were spilled
(overflow_pages). Environment.stat only return information for the
default database. If named databases are used, you must add the results
from Transaction.stat on each named database.
By default record keys are limited to 511 bytes in length, however this can be
adjusted by rebuilding the library. The compile-time key length can be queried
via Environment.max_key_size().
Memory usage¶
Diagnostic tools often overreport the memory usage of LMDB databases, since the
tools poorly classify that memory. The Linux ps command RSS measurement
may report a process as having an entire database resident, causing user alarm.
While the entire database may really be resident, it is half the story.
Unlike heap memory, pages in file-backed memory maps, such as those used by LMDB, may be efficiently reclaimed by the OS at any moment so long as the pages in the map are clean. Clean simply means that the resident pages’ contents match the associated pages that live in the disk file that backs the mapping. A clean mapping works exactly like a cache, and in fact it is a cache: the OS page cache.
On Linux, the /proc/<pid>/smaps file contains one section for each memory
mapping in a process. To inspect the actual memory usage of an LMDB database,
look for a data.mdb entry, then observe its Dirty and Clean values.
When no write transaction is active, all pages in an LMDB database should be
marked clean, unless the Environment was opened with sync=False, and no
explicit Environment.sync() has been called since the last write
transaction, and the OS writeback mechanism has not yet opportunistically
written the dirty pages to disk.
Bytestrings¶
This documentation uses bytestring to mean the bytes() type.
All keys and values must be bytes() (not str()).
Buffers¶
Since LMDB is memory mapped it is possible to access record data without keys
or values ever being copied by the kernel, database library, or application. To
exploit this the library can be instructed to return memoryview()
objects instead of bytestrings by passing buffers=True to
Environment.begin() or Transaction.
memoryview() objects can be used in many places where bytestrings are
expected. They support slicing, indexing, iteration, and taking their length.
Many Python APIs will automatically convert them to bytestrings as necessary:
>>> txn = env.begin(buffers=True) >>> buf = txn.get(b'somekey') >>> buf <memory at 0x10d93b970> >>> len(buf) 4096 >>> buf[0] 97 >>> bytes(buf[:2]) b'ab' >>> value = bytes(buf) >>> len(value) 4096 >>> type(value) <class 'bytes'>
It is also possible to pass buffers directly to many native APIs, for example
file.write(), socket.send(), zlib.decompress() and
so on. A buffer may be sliced without copying:
>>> # Extract bytes 10 through 210: >>> sub_buf = buf[10:210] >>> len(sub_buf) 200
In both PyPy and CPython, returned buffers must be discarded after their
producing transaction has completed or been modified in any way. To preserve
buffer’s contents, copy it using bytes():
with env.begin(write=True, buffers=True) as txn: buf = txn.get(b'foo') # only valid until the next write. buf_copy = bytes(buf) # valid forever txn.delete(b'foo') # this is a write! txn.put(b'foo2', b'bar2') # this is also a write! print('foo: %r' % (buf,)) # ERROR! invalidated by write print('foo: %r' % (buf_copy,)) # OK print('foo: %r' % (buf,)) # ERROR! also invalidated by txn end print('foo: %r' % (buf_copy,)) # still OK
writemap mode¶
When Environment or open() is invoked with
writemap=True, the library will use a writeable memory mapping to directly
update storage. This improves performance at a cost to safety: it is possible
(though fairly unlikely) for buggy C code in the Python process to accidentally
overwrite the map, resulting in database corruption.
Caution
This option may cause filesystems that don’t support sparse files, such as OSX, to immediately preallocate map_size= bytes of underlying storage when the environment is opened or closed for the first time.
Caution
A filesystem failure (such as running out of space), will crash the Python process if this option is enabled. (This is a general OS limitation, and not limited to LMDB).
Resource Management¶
Environment, Transaction, and Cursor
support the context manager protocol, allowing for robust resource cleanup in
the case of exceptions.
with env.begin() as txn:
with txn.cursor() as curs:
# do stuff
print('key is:', curs.get(b'key'))
On CFFI it is important to use the Cursor context manager, or
explicitly call Cursor.close() if many cursors are created within a
single transaction. Failure to close a cursor on CFFI may cause many dead
objects to accumulate until the parent transaction is aborted or committed.
Transaction management¶
While any reader exists, writers cannot reuse space in the database file that
has become unused in later versions. Due to this, continual use of long-lived
read transactions may cause the database to grow without bound. A lost
reference to a read transaction will simply be aborted (and its reader slot
freed) when the Transaction is eventually garbage collected. This
should occur immediately on CPython, but may be deferred indefinitely on PyPy.
However the same is not true for write transactions: losing a reference to a
write transaction can lead to deadlock, particularly on PyPy, since if the same
process that lost the Transaction reference immediately starts
another write transaction, it will deadlock on its own lock. Subsequently the
lost transaction may never be garbage collected (since the process is now
blocked on itself) and the database will become unusable.
These problems are easily avoided by always wrapping Transaction in
a with statement somewhere on the stack:
# Even if this crashes, txn will be correctly finalized.
with env.begin() as txn:
if txn.get(b'foo'):
function_that_stashes_away_txn_ref(txn)
function_that_leaks_txn_refs(txn)
crash()
Threads¶
MDB_NOTLS mode is used exclusively, which allows read transactions to
freely migrate across threads and for a single thread to maintain multiple read
transactions. This enables mostly care-free use of read transactions, for
example when using gevent.
Most objects can be safely called by a single caller from a single thread, and
usually it only makes sense to to have a single caller, except in the case of
Environment.
Most Environment methods are thread-safe, and may be called
concurrently, except for Environment.close(). Running close at the
same time as other database operations may crash the interpreter.
A write Transaction may only be used from the thread it was created
on.
A read-only Transaction can move across threads, but it cannot be
used concurrently from multiple threads.
Cursor is not thread-safe, but it does not make sense to use it on
any thread except the thread that currently owns its associated
Transaction.
Forking & multiprocessing¶
LMDB environments must not be used across a fork() call. The memory map,
lock table, and file descriptors are not valid in the child process. Attempting
to use an inherited Environment in the child will corrupt the
database or crash.
py-lmdb detects fork() via a cached process ID and skips cleanup of
inherited transactions in the child, preventing the child from corrupting the
parent’s state. However, inherited Environment objects are not
usable in the child — they must be discarded and re-opened.
The safe patterns for multiprocessing are:
Open after fork: Open a fresh
Environmentin each child process. The children can safely share the same database path.Use spawn: Use
multiprocessing.get_context('spawn')instead of the defaultforkcontext.spawnstarts a new Python interpreter, avoiding the problem entirely. This is already the default on Windows and macOS (Python 3.14+).
import multiprocessing
def worker(path):
env = lmdb.open(path) # open fresh in child
with env.begin() as txn:
print(txn.get(b'key'))
env.close()
env = lmdb.open('/tmp/mydb')
with env.begin(write=True) as txn:
txn.put(b'key', b'value')
env.close() # close before forking
p = multiprocessing.Process(target=worker, args=('/tmp/mydb',))
p.start()
p.join()
Caution
If the process will be forked while an Environment is open,
set max_spare_txns=0 when opening the environment. Cached read-only
transactions hold a slot in the LMDB reader lock table; after fork(),
the child inherits these stale slots that it cannot clean up, which can
exhaust the reader table.
Asyncio¶
LMDB is fundamentally synchronous (memory-mapped file I/O), but py-lmdb’s C
extension releases the GIL during all database operations, making it safe to
offload calls to a thread pool via asyncio.loop.run_in_executor().
The lmdb.aio module provides thin async wrappers that do this
automatically. The synchronous code path is completely unaffected — this is
an opt-in import.
import lmdb
import lmdb.aio
env = lmdb.open('/tmp/mydb')
aenv = lmdb.aio.wrap(env)
# async context managers work naturally:
async with aenv.begin(write=True) as txn:
await txn.put(b'key', b'value')
async with aenv.begin() as txn:
val = await txn.get(b'key')
async with txn.cursor() as cur:
await cur.first()
items = await cur.iternext()
Low-overhead accessors like key(), value(), item(), id(),
path(), and max_key_size() are called directly without dispatching to
the executor.
Iterators (iternext(), iterprev(), etc.) are consumed in the executor
and returned as a list.
All objects support async with for lifetime management. Write transactions
are committed on clean exit and aborted on exception, matching the synchronous
behavior.
Because LMDB transactions are not thread-safe, each
AsyncTransaction holds an asyncio.Lock that serializes all
operations dispatched through it (including operations on its cursors). This
means asyncio.gather() and other forms of concurrency are safe on the
same transaction — calls are automatically queued.
Caution
Do not mix synchronous and async access to the same
Environment. Once an environment is wrapped with
lmdb.aio.wrap(), all access should go through the async wrapper.
Handling database growth¶
LMDB uses a fixed-size memory map. When the database is full, write operations
raise lmdb.MapFullError. There are two strategies for handling
this.
Set a large initial map size
On 64-bit systems the map size is a virtual address reservation, not a physical
allocation — the OS only allocates pages as they are written. It is safe to
set map_size to a value much larger than the current data (e.g. several
GiB) when opening the environment:
# Reserve 1 GiB of address space. Only pages actually written
# consume physical memory/disk.
env = lmdb.open('/tmp/mydb', map_size=1024 * 1024 * 1024)
Caution
On filesystems that don’t support sparse files — notably older macOS
(HFS+) — the full map_size may be preallocated on disk. On such
systems, choose a map size closer to the expected data size and use
set_mapsize() to grow as needed.
py-lmdb enables sparse files on Windows (NTFS) automatically, so large
map_size values do not waste disk space there.
Resize at runtime with set_mapsize()
If the database outgrows its map, call Environment.set_mapsize() to
enlarge it:
import lmdb
env = lmdb.open('/tmp/mydb', map_size=1024 * 1024)
try:
with env.begin(write=True) as txn:
txn.put(b'key', large_value)
except lmdb.MapFullError:
# Double the map size and retry.
env.set_mapsize(env.info()['map_size'] * 2)
with env.begin(write=True) as txn:
txn.put(b'key', large_value)
Calling set_mapsize() replaces the underlying memory map. This means:
All open transactions, cursors, and iterators are invalidated. Any attempt to use them after the call will raise an exception. Always finish or abort transactions before calling
set_mapsize().A write transaction must not be active — the call will raise an error if one is.
The environment itself remains usable: new transactions can be opened immediately after the resize.
A typical pattern for applications that grow organically is a retry loop:
def put_with_retry(env, key, value):
"""Put a key/value pair, growing the map if necessary."""
while True:
try:
with env.begin(write=True) as txn:
txn.put(key, value)
return
except lmdb.MapFullError:
env.set_mapsize(env.info()['map_size'] * 2)
Caution
In multi-process scenarios, all processes sharing the environment should
use the same map size. If one process resizes the map, other processes
will receive lmdb.MapResizedError on their next transaction
and must call set_mapsize(0) (which re-reads the current size from the
file) or set_mapsize(new_size) to pick up the change.
32-bit processes
32-bit processes are severely limited in the amount of virtual memory that can be mapped. The maximum file that can be mapped is around 1.1 GiB, and that ceiling decreases as the process runs due to address space fragmentation. LMDB requires a contiguous range of virtual addresses for its map, so fragmentation is especially harmful. See this analysis for more information.
On Windows, you can inspect the precise maximum mapping size using the SysInternals tool VMMap: select your Python process, select the “free” row, and sort by size.
This is not a concern for 64-bit processes.
Interface¶
- lmdb.open(path, **kwargs)¶
Shortcut for
Environmentconstructor.
- lmdb.version(subpatch=False)¶
Return a tuple of integers (major, minor, patch) describing the LMDB library version that the binding is linked against. The version of the binding itself is available from
lmdb.__version__.- subpatch:
If true, returns a 4 integer tuple consisting of the same plus an extra integer that represents any patches applied by py-lmdb itself (0 representing no patches).
Environment class¶
- class lmdb.Environment(path, map_size=10485760, subdir=True, readonly=False, metasync=True, sync=True, map_async=False, mode=493, create=True, readahead=True, writemap=False, meminit=True, max_readers=126, max_dbs=0, max_spare_txns=1, lock=True)¶
Structure for a database environment. An environment may contain multiple databases, all residing in the same shared-memory map and underlying disk file.
To write to the environment a
Transactionmust be created. One simultaneous write transaction is allowed, however there is no limit on the number of read transactions even when a write transaction exists.This class is aliased to lmdb.open.
It is a serious error to have open the same LMDB file in the same process at the same time. Failure to heed this may lead to data corruption and interpreter crash.
Equivalent to mdb_env_open()
- path:
Location of directory (if subdir=True) or file prefix to store the database.
- map_size:
Maximum size database may grow to; used to size the memory mapping. If database grows larger than
map_size, an exception will be raised and the user must close and reopenEnvironment. On 64-bit there is no penalty for making this huge (say 1TB). Must be <2GB on 32-bit.Note
The default map size is set low to encourage a crash, so users can figure out a good value before learning about this option too late.
- subdir:
If
True, path refers to a subdirectory to store the data and lock files in, otherwise it refers to a filename prefix.- readonly:
If
True, disallow any write operations. Note the lock file is still modified. If specified, thewriteflag tobegin()orTransactionis ignored.- metasync:
If
False, flush system buffers to disk only once per transaction, omit the metadata flush. Defer that until the system flushes files to disk, or next commit orsync().This optimization maintains database integrity, but a system crash may undo the last committed transaction. I.e. it preserves the ACI (atomicity, consistency, isolation) but not D (durability) database property.
- sync:
If
False, don’t flush system buffers to disk when committing a transaction. This optimization means a system crash can corrupt the database or lose the last transactions if buffers are not yet flushed to disk.The risk is governed by how often the system flushes dirty buffers to disk and how often
sync()is called. However, if the filesystem preserves write order and writemap=False, transactions exhibit ACI (atomicity, consistency, isolation) properties and only lose D (durability). I.e. database integrity is maintained, but a system crash may undo the final transactions.Note that sync=False, writemap=True leaves the system with no hint for when to write transactions to disk, unless
sync()is called. map_async=True, writemap=True may be preferable.- mode:
File creation mode.
- create:
If
False, do not create the directory path if it is missing.- readahead:
If
False, LMDB will disable the OS filesystem readahead mechanism, which may improve random read performance when a database is larger than RAM.- writemap:
If
True, use a writeable memory map unless readonly=True. This is faster and uses fewer mallocs, but loses protection from application bugs like wild pointer writes and other bad updates into the database. Incompatible with nested transactions.Processes with and without writemap on the same environment do not cooperate well.
- meminit:
If
FalseLMDB will not zero-initialize buffers prior to writing them to disk. This improves performance but may cause old heap data to be written saved in the unused portion of the buffer. Do not use this option if your application manipulates confidential data (e.g. plaintext passwords) in memory. This option is only meaningful when writemap=False; new pages are always zero-initialized when writemap=True.- map_async:
When
writemap=True, use asynchronous flushes to disk. As withsync=False, a system crash can then corrupt the database or lose the last transactions. Callingsync()ensures on-disk database integrity until next commit.- max_readers:
Maximum number of simultaneous read transactions. Can only be set by the first process to open an environment, as it affects the size of the lock file and shared memory area. Attempts to simultaneously start more than this many read transactions will fail.
- max_dbs:
Maximum number of databases available. If 0, assume environment will be used as a single database.
- max_spare_txns:
Read-only transactions to cache after becoming unused. Caching transactions avoids two allocations, one lock and linear scan of the shared environment per invocation of
begin(),Transaction,get(),gets(), orcursor(). Should match the process’s maximum expected concurrent transactions (e.g. thread count).- lock:
If
False, don’t do any locking. If concurrent access is anticipated, the caller must manage all concurrency itself. For proper operation the caller must enforce single-writer semantics, and must ensure that no readers are using old transactions while a writer is active. The simplest approach is to use an exclusive lock so that no readers may be active at all when a writer begins.
- begin(db=None, parent=None, write=False, buffers=False)¶
Shortcut for
lmdb.Transaction
- close()¶
Close the environment, invalidating any open iterators, cursors, and transactions. Repeat calls to
close()have no effect.Equivalent to mdb_env_close()
- copy(path, compact=False, txn=None)¶
Make a consistent copy of the environment in the given destination directory.
- compact:
If
True, perform compaction while copying: omit free pages and sequentially renumber all pages in output. This option consumes more CPU and runs more slowly than the default, but may produce a smaller output database.- txn:
If provided, the backup will be taken from the database with respect to that transaction, otherwise a temporary read-only transaction will be created. Note: this parameter being non-None is not available if the module was built with LMDB_PURE. Note: this parameter may be set only if compact=True.
Equivalent to mdb_env_copy2() or mdb_env_copy3()
- copyfd(fd, compact=False, txn=None)¶
Copy a consistent version of the environment to file descriptor fd.
- compact:
If
True, perform compaction while copying: omit free pages and sequentially renumber all pages in output. This option consumes more CPU and runs more slowly than the default, but may produce a smaller output database.- txn:
If provided, the backup will be taken from the database with respect to that transaction, otherwise a temporary read-only transaction will be created. Note: this parameter being non-None is not available if the module was built with LMDB_PURE.
Equivalent to mdb_env_copyfd2() or mdb_env_copyfd3
- dbs(txn=None)¶
Return a list of named databases in the environment, as a list of bytestrings.
This works by iterating the main database and attempting to open each key as a named database. It only returns reliable results when the main database is not used to store regular key-value pairs.
- txn:
Read-only or read-write
Transactionto use. IfNone, a temporary read-only transaction is created and released automatically.
- flags()¶
Return a dict describing Environment constructor flags used to instantiate this environment.
- info()¶
Return some nice environment information as a dict:
map_addrAddress of database map in RAM.
map_sizeSize of database map in RAM.
last_pgnoID of last used page.
last_txnidID of last committed transaction.
max_readersNumber of reader slots allocated in the lock file. Equivalent to the value of maxreaders= specified by the first process opening the Environment.
num_readersMaximum number of reader slots in simultaneous use since the lock file was initialized.
Equivalent to mdb_env_info()
- max_key_size()¶
Return the maximum size in bytes of a record’s key part. This matches the
MDB_MAXKEYSIZEconstant set at compile time.
- max_readers()¶
Return the maximum number of readers specified during open of the environment by the first process. This is the same as max_readers= specified to the constructor if this process was the first to open the environment.
- open_db(key=None, txn=None, reverse_key=False, dupsort=False, create=True, integerkey=False, integerdup=False, dupfixed=False)¶
Open a database, returning an instance of
_Database. RepeatEnvironment.open_db()calls for the same name will return the same handle. As a special case, the main database is always open.Equivalent to mdb_dbi_open()
Named databases are implemented by storing a special descriptor in the main database. All databases in an environment share the same file. Because the descriptor is present in the main database, attempts to create a named database will fail if a key matching the database’s name already exists. Furthermore the key is visible to lookups and enumerations. If your main database keyspace conflicts with the names you use for named databases, then move the contents of your main database to another named database.
>>> env = lmdb.open('/tmp/test', max_dbs=2) >>> with env.begin(write=True) as txn: ... txn.put('somename', 'somedata') >>> # Error: database cannot share name of existing key! >>> subdb = env.open_db('somename')
A newly created database will not exist if the transaction that created it aborted, nor if another process deleted it. The handle resides in the shared environment, it is not owned by the current transaction or process. Only one thread should call this function; it is not mutex-protected in a read-only transaction.
The dupsort, integerkey, integerdup, and dupfixed parameters are ignored if the database already exists. The state of those settings are persistent and immutable per database. See
_Database.flags()to view the state of those options for an opened database. A consequence of the immutability of these flags is that the default non-named database will never have these flags set.Preexisting transactions, other than the current transaction and any parents, must not use the new handle, nor must their children.
- key:
Bytestring database name. If
None, indicates the main database should be returned, otherwise indicates a named database should be created inside the main database.In other words, a key representing the database will be visible in the main database, and the database name cannot conflict with any existing key.
- txn:
Transaction used to create the database if it does not exist. If unspecified, a temporarily write transaction is used. Do not call
open_db()from inside an existing transaction without supplying it here. Note the passed transaction must have write=True.- reverse_key:
If
True, keys are compared from right to left (e.g. DNS names).- dupsort:
Duplicate keys may be used in the database. (Or, from another perspective, keys may have multiple data items, stored in sorted order.) By default keys must be unique and may have only a single data item.
- create:
If
True, create the database if it doesn’t exist, otherwise raise an exception.- integerkey:
If
True, indicates keys in the database are C unsigned orsize_tintegers encoded in native byte order. Keys must all be either unsigned orsize_t, they cannot be mixed in a single database.- integerdup:
If
True, values in the database are C unsigned orsize_tintegers encoded in native byte order. Implies dupsort and dupfixed areTrue.- dupfixed:
If
True, values for each key in database are of fixed size, allowing each additional duplicate value for a key to be stored without a header indicating its size. Implies dupsort isTrue.
- path()¶
Directory path or file name prefix where this environment is stored.
Equivalent to mdb_env_get_path()
- reader_check()¶
Search the reader lock table for stale entries, for example due to a crashed process. Returns the number of stale entries that were cleared.
- readers()¶
Return a multi line Unicode string describing the current state of the reader lock table.
- set_mapsize(map_size)¶
Change the maximum size of the map file.
All open transactions, cursors, and iterators on this environment are invalidated, and the memory map is replaced. A write transaction must not be active.
- map_size:
The new size in bytes.
Equivalent to mdb_env_set_mapsize()
- stat()¶
Return some environment statistics for the default database as a dict:
psizeSize of a database page in bytes.
depthHeight of the B-tree.
branch_pagesNumber of internal (non-leaf) pages.
leaf_pagesNumber of leaf pages.
overflow_pagesNumber of overflow pages.
entriesNumber of data items.
Equivalent to mdb_env_stat()
- sync(force=False)¶
Flush the data buffers to disk.
Equivalent to mdb_env_sync()
Data is always written to disk when
Transaction.commit()is called, but the operating system may keep it buffered. MDB always flushes the OS buffers upon commit as well, unless the environment was opened with sync=False or metasync=False.- force:
If
True, force a synchronous flush. Otherwise if the environment was opened with sync=False the flushes will be omitted, and with map_async=True they will be asynchronous.
Database class¶
- class lmdb._Database(env, txn, name, reverse_key, dupsort, create, integerkey, integerdup, dupfixed)¶
Internal database handle. This class is opaque, save a single method.
Should not be constructed directly. Use
Environment.open_db()instead.- flags(*args)¶
Return the database’s associated flags as a dict of _Database constructor kwargs.
Transaction class¶
- class lmdb.Transaction(env, db=None, parent=None, write=False, buffers=False)¶
A transaction object. All operations require a transaction handle, transactions may be read-only or read-write. Write transactions may not span threads. Transaction objects implement the context manager protocol, so that reliable release of the transaction happens even in the face of unhandled exceptions:
# Transaction aborts correctly: with env.begin(write=True) as txn: crash() # Transaction commits automatically: with env.begin(write=True) as txn: txn.put('a', 'b')
Equivalent to mdb_txn_begin()
- env:
Environment the transaction should be on.
- db:
Default named database to operate on. If unspecified, defaults to the environment’s main database. Can be overridden on a per-call basis below.
- parent:
None, or a parent transaction (see lmdb.h).- write:
Transactions are read-only by default. To modify the database, you must pass write=True. This flag is ignored if
Environmentwas opened withreadonly=True.- buffers:
If
True, indicatesmemoryview()objects should be yielded instead of bytestrings. This setting applies to theTransactioninstance itself and anyCursorscreated within the transaction.This feature significantly improves performance, since MDB has a zero-copy design, but it requires care when manipulating the returned buffer objects. The benefit of this facility is diminished when using small keys and values.
- abort()¶
Abort the pending transaction. Repeat calls to
abort()have no effect after a previously successfulcommit()orabort(), or after the associatedEnvironmenthas been closed.Equivalent to mdb_txn_abort()
- commit()¶
Commit the pending transaction.
Equivalent to mdb_txn_commit()
- cursor(db=None)¶
Shortcut for
lmdb.Cursor(db, self)
- delete(key, value=b'', db=None)¶
Delete a key from the database.
Equivalent to mdb_del()
- key:
The key to delete.
- value:
If the database was opened with dupsort=True and value is not the empty bytestring, then delete elements matching only this (key, value) pair, otherwise all values for key are deleted.
Returns True if at least one key was deleted.
- drop(db, delete=True)¶
Delete all keys in a named database and optionally delete the named database itself. Deleting the named database causes it to become unavailable, and invalidates existing cursors.
Equivalent to mdb_drop()
- get(key, default=None, db=None)¶
Fetch the first value matching key, returning default if key does not exist. A cursor must be used to fetch all values for a key in a dupsort=True database.
Equivalent to mdb_get()
- id()¶
Return the transaction’s ID.
This returns the identifier associated with this transaction. For a read-only transaction, this corresponds to the snapshot being read; concurrent readers will frequently have the same transaction ID.
- pop(key, db=None)¶
Use a temporary cursor to invoke
Cursor.pop().- db:
Named database to operate on. If unspecified, defaults to the database given to the
Transactionconstructor.
- put(key, value, dupdata=True, overwrite=True, append=False, db=None)¶
Store a record, returning
Trueif it was written, orFalseto indicate the key was already present and overwrite=False. On success, the cursor is positioned on the new record.Equivalent to mdb_put()
- key:
Bytestring key to store.
- value:
Bytestring value to store.
- dupdata:
If
Falseand database was opened with dupsort=True, will returnFalseif the key already has that value. In other words, this only affects the return value.- overwrite:
If
False, do not overwrite any existing matching key. If False and writing to a dupsort=True database, this will not add a value to the key and this function will returnFalse.- append:
If
True, append the pair to the end of the database without comparing its order first. Appending a key that is not greater than the highest existing key will fail and returnFalse.- db:
Named database to operate on. If unspecified, defaults to the database given to the
Transactionconstructor.
- replace(key, value, db=None)¶
Use a temporary cursor to invoke
Cursor.replace().- db:
Named database to operate on. If unspecified, defaults to the database given to the
Transactionconstructor.
- stat(db=None)¶
Return statistics like
Environment.stat(), except for a single DBI. db must be a database handle returned byopen_db(). If db isNone, the transaction’s default database is used.
Cursor class¶
- class lmdb.Cursor(db, txn)¶
Structure for navigating a database.
Equivalent to mdb_cursor_open()
- db:
_Databaseto navigate.- txn:
Transactionto navigate.
As a convenience,
Transaction.cursor()can be used to quickly return a cursor:>>> env = lmdb.open('/tmp/foo') >>> child_db = env.open_db('child_db') >>> with env.begin() as txn: ... cursor = txn.cursor() # Cursor on main database. ... cursor2 = txn.cursor(child_db) # Cursor on child database.
Cursors start in an unpositioned state. If
iternext()oriterprev()are used in this state, iteration proceeds from the start or end respectively. Iterators directly position using the cursor, meaning strange behavior results when multiple iterators exist on the same cursor.Note
From the perspective of the Python binding, cursors return to an ‘unpositioned’ state once any scanning or seeking method (e.g.
next(),prev_nodup(),set_range()) returnsFalseor raises an exception. This is primarily to ensure safe, consistent semantics in the face of any error condition.When the Cursor returns to an unpositioned state, its
key()andvalue()return empty strings to indicate there is no active position, although internally the LMDB cursor may still have a valid position.This may lead to slightly surprising behaviour when iterating the values for a dupsort=True database’s keys, since methods such as
iternext_dup()will cause Cursor to appear unpositioned, despite it returningFalseonly to indicate there are no more values for the current key. In that case, simply callingnext()would cause iteration to resume at the next available key.This behaviour may change in future.
Iterator methods such as
iternext()anditerprev()accept keys and values arguments. If both areTrue, then the value ofitem()is yielded on each iteration. If only keys isTrue,key()is yielded, otherwise onlyvalue()is yielded.Prior to iteration, a cursor can be positioned anywhere in the database:
>>> with env.begin() as txn: ... cursor = txn.cursor() ... if not cursor.set_range('5'): # Position at first key >= '5'. ... print('Not found!') ... else: ... for key, value in cursor: # Iterate from first key >= '5'. ... print((key, value))
Iteration is not required to navigate, and sometimes results in ugly or inefficient code. In cases where the iteration order is not obvious, or is related to the data being read, use of
set_key(),set_range(),key(),value(), anditem()may be preferable:>>> # Record the path from a child to the root of a tree. >>> path = ['child14123'] >>> while path[-1] != 'root': ... assert cursor.set_key(path[-1]), \ ... 'Tree is broken! Path: %s' % (path,) ... path.append(cursor.value())
- close()¶
Close the cursor, freeing its associated resources.
- count()¶
Return the number of values (“duplicates”) for the current key.
Only meaningful for databases opened with dupsort=True.
Equivalent to mdb_cursor_count()
- delete(dupdata=False)¶
Delete the current element and move to the next, returning
Trueon success orFalseif the database was empty.If dupdata is
True, delete all values (“duplicates”) for the current key, otherwise delete only the currently positioned value. Only meaningful for databases opened with dupsort=True.Equivalent to mdb_cursor_del()
- first()¶
Move to the first key in the database, returning
Trueon success orFalseif the database is empty.If the database was opened with dupsort=True and the key contains duplicates, the cursor is positioned on the first value (“duplicate”).
Equivalent to mdb_cursor_get() with MDB_FIRST
- first_dup()¶
Move to the first value (“duplicate”) for the current key, returning
Trueon success orFalseif the database is empty.Only meaningful for databases opened with dupsort=True.
Equivalent to mdb_cursor_get() with MDB_FIRST_DUP
- get(key, default=None)¶
Equivalent to
set_key(), exceptvalue()is returned when key is found, otherwise default.
- getmulti(keys, dupdata=False, dupfixed_bytes=None, keyfixed=False, values=True)¶
Returns an iterable of (key, value) 2-tuples containing results for each key in the iterable keys.
- keys:
Iterable to read keys from.
- dupdata:
If
Trueand database was opened with dupsort=True, read all duplicate values for each matching key.- dupfixed_bytes:
If database was opened with dupsort=True and dupfixed=True, accepts the size of each value, in bytes, and applies an optimization reducing the number of database lookups.
- keyfixed:
If dupfixed_bytes is set and database key size is fixed, setting keyfixed=True will result in this function returning a memoryview to the results as a structured array of bytes. The structured array can be instantiated by passing the memoryview buffer to NumPy:
key_bytes, val_bytes = 4, 8 dtype = np.dtype([(f'S{key_bytes}', f'S{val_bytes}}')]) arr = np.frombuffer( cur.getmulti(keys, dupdata=True, dupfixed_bytes=val_bytes, keyfixed=True) )
- values:
If
False, return a flat list of keys that exist in the database instead of(key, value)tuples. Value data is never touched, avoiding page faults on large values. Incompatible withdupdata=True.
- item()¶
Return the current (key, value) pair.
- iternext(keys=True, values=True)¶
Return a forward iterator that yields the current element before calling
next(), repeating until the end of the database is reached. As a convenience,Cursorimplements the iterator protocol by automatically returning a forward iterator when invoked:>>> # Equivalent: >>> it = iter(cursor) >>> it = cursor.iternext(keys=True, values=True)
If the cursor is not yet positioned, it is moved to the first key in the database, otherwise iteration proceeds from the current position.
- iternext_dup(keys=False, values=True)¶
Return a forward iterator that yields the current value (“duplicate”) of the current key before calling
next_dup(), repeating until the last value of the current key is reached.Only meaningful for databases opened with dupsort=True.
if not cursor.set_key("foo"): print("No values found for 'foo'") else: for idx, data in enumerate(cursor.iternext_dup()): print("%d'th value for 'foo': %s" % (idx, data))
- iternext_nodup(keys=True, values=False)¶
Return a forward iterator that yields the current value (“duplicate”) of the current key before calling
next_nodup(), repeating until the end of the database is reached.Only meaningful for databases opened with dupsort=True.
If the cursor is not yet positioned, it is moved to the first key in the database, otherwise iteration proceeds from the current position.
for key in cursor.iternext_nodup(): print("Key '%s' has %d values" % (key, cursor.count()))
- iterprev(keys=True, values=True)¶
Return a reverse iterator that yields the current element before calling
prev(), until the start of the database is reached.If the cursor is not yet positioned, it is moved to the last key in the database, otherwise iteration proceeds from the current position.
>>> with env.begin() as txn: ... for i, (key, value) in enumerate(txn.cursor().iterprev()): ... print('%dth last item is (%r, %r)' % (1+i, key, value))
- iterprev_dup(keys=False, values=True)¶
Return a reverse iterator that yields the current value (“duplicate”) of the current key before calling
prev_dup(), repeating until the first value of the current key is reached.Only meaningful for databases opened with dupsort=True.
- iterprev_nodup(keys=True, values=False)¶
Return a reverse iterator that yields the current value (“duplicate”) of the current key before calling
prev_nodup(), repeating until the start of the database is reached.If the cursor is not yet positioned, it is moved to the last key in the database, otherwise iteration proceeds from the current position.
Only meaningful for databases opened with dupsort=True.
- key()¶
Return the current key.
- last()¶
Move to the last key in the database, returning
Trueon success orFalseif the database is empty.If the database was opened with dupsort=True and the key contains duplicates, the cursor is positioned on the last value (“duplicate”).
Equivalent to mdb_cursor_get() with MDB_LAST
- last_dup()¶
Move to the last value (“duplicate”) for the current key, returning
Trueon success orFalseif the database is empty.Only meaningful for databases opened with dupsort=True.
Equivalent to mdb_cursor_get() with MDB_LAST_DUP
- next()¶
Move to the next element, returning
Trueon success orFalseif there is no next element.For databases opened with dupsort=True, moves to the next value (“duplicate”) for the current key if one exists, otherwise moves to the first value of the next key.
Equivalent to mdb_cursor_get() with MDB_NEXT
- next_dup()¶
Move to the next value (“duplicate”) of the current key, returning
Trueon success orFalseif there is no next value.Only meaningful for databases opened with dupsort=True.
Equivalent to mdb_cursor_get() with MDB_NEXT_DUP
- next_nodup()¶
Move to the first value (“duplicate”) of the next key, returning
Trueon success orFalseif there is no next key.Only meaningful for databases opened with dupsort=True.
Equivalent to mdb_cursor_get() with MDB_NEXT_NODUP
- pop(key)¶
Fetch a record’s value then delete it. Returns
Noneif no previous value existed. This uses the best available mechanism to minimize the cost of a delete-and-return-previous operation.For databases opened with dupsort=True, the first data element (“duplicate”) for the key will be popped.
- key:
Bytestring key to delete.
- prev()¶
Move to the previous element, returning
Trueon success orFalseif there is no previous item.For databases opened with dupsort=True, moves to the previous data item (“duplicate”) for the current key if one exists, otherwise moves to the previous key.
Equivalent to mdb_cursor_get() with MDB_PREV
- prev_dup()¶
Move to the previous value (“duplicate”) of the current key, returning
Trueon success orFalseif there is no previous value.Only meaningful for databases opened with dupsort=True.
Equivalent to mdb_cursor_get() with MDB_PREV_DUP
- prev_nodup()¶
Move to the last value (“duplicate”) of the previous key, returning
Trueon success orFalseif there is no previous key.Only meaningful for databases opened with dupsort=True.
Equivalent to mdb_cursor_get() with MDB_PREV_NODUP
- put(key, val, dupdata=True, overwrite=True, append=False)¶
Store a record, returning
Trueif it was written, orFalseto indicate the key was already present and overwrite=False. On success, the cursor is positioned on the key.Equivalent to mdb_cursor_put()
- key:
Bytestring key to store.
- val:
Bytestring value to store.
- dupdata:
If
Falseand database was opened with dupsort=True, will returnFalseif the key already has that value. In other words, this only affects the return value.- overwrite:
If
False, do not overwrite the value for the key if it exists, just returnFalse. For databases opened with dupsort=True,Falsewill always be returned if a duplicate key/value pair is inserted, regardless of the setting for overwrite.- append:
If
True, append the pair to the end of the database without comparing its order first. Appending a key that is not greater than the highest existing key will fail and returnFalse.
- putmulti(items, dupdata=True, overwrite=True, append=False)¶
Invoke
put()for each (key, value) 2-tuple from the iterable items. Elements must be exactly 2-tuples, they may not be of any other type, or tuple subclass.Returns a tuple (consumed, added), where consumed is the number of elements read from the iterable, and added is the number of new entries added to the database. added may be less than consumed when overwrite=False.
- items:
Iterable to read records from.
- dupdata:
If
Trueand database was opened with dupsort=True, add pair as a duplicate if the given key already exists. Otherwise overwrite any existing matching key.- overwrite:
If
False, do not overwrite the value for the key if it exists, just returnFalse. For databases opened with dupsort=True,Falsewill always be returned if a duplicate key/value pair is inserted, regardless of the setting for overwrite.- append:
If
True, append records to the end of the database without comparing their order first. Appending a key that is not greater than the highest existing key will cause corruption.
- replace(key, val)¶
Store a record, returning its previous value if one existed. Returns
Noneif no previous value existed. This uses the best available mechanism to minimize the cost of a set-and-return-previous operation.For databases opened with dupsort=True, only the first data element (“duplicate”) is returned if it existed, all data elements are removed and the new (key, data) pair is inserted.
- key:
Bytestring key to store.
- value:
Bytestring value to store.
- set_key(key)¶
Seek exactly to key, returning
Trueon success orFalseif the exact key was not found. It is an error toset_key()the empty bytestring.For databases opened with dupsort=True, moves to the first value (“duplicate”) for the key.
Equivalent to mdb_cursor_get() with MDB_SET_KEY
- set_key_dup(key, value)¶
Seek exactly to (key, value), returning
Trueon success orFalseif the exact key and value was not found. It is an error toset_key()the empty bytestring.Only meaningful for databases opened with dupsort=True.
Equivalent to mdb_cursor_get() with MDB_GET_BOTH
- set_range(key)¶
Seek to the first key greater than or equal to key, returning
Trueon success, orFalseto indicate key was past end of database. Behaves likefirst()if key is the empty bytestring.For databases opened with dupsort=True, moves to the first value (“duplicate”) for the key.
Equivalent to mdb_cursor_get() with MDB_SET_RANGE
- set_range_dup(key, value)¶
Seek to the first key/value pair greater than or equal to key, returning
Trueon success, orFalseto indicate that value was past the last value of key or that (key, value) was past the end end of database.Only meaningful for databases opened with dupsort=True.
Equivalent to mdb_cursor_get() with MDB_GET_BOTH_RANGE
- value()¶
Return the current value.
Async classes¶
- lmdb.aio.wrap(env, executor=None)¶
Wrap an
lmdb.Environmentfor async use.executor is passed to
loop.run_in_executor().None(the default) uses the loop’s default executor.
- class lmdb.aio.AsyncEnvironment(env, executor=None)¶
Async wrapper for
lmdb.Environment.Created by
wrap(). All methods of the underlyingEnvironmentare available and are dispatched to an executor, except for the low-overhead accessorspath(),max_key_size(),max_readers(), andflags(), which are called directly.Supports
async withfor lifetime management — the environment is closed on exit.All other
Environmentmethods are available as coroutines via__getattr__proxy.- begin(*args, **kwargs)¶
Start a new transaction, returning an
AsyncTransaction.Accepts the same arguments as
lmdb.Environment.begin(). Can be used withawaitorasync with:async with aenv.begin(write=True) as txn: await txn.put(b'key', b'value')
- class lmdb.aio.AsyncTransaction(txn, executor=None)¶
Async wrapper for
lmdb.Transaction.All methods of the underlying
Transactionare available. Most are dispatched to an executor;id()is called directly.An
asyncio.Lockserializes all operations dispatched through this transaction, including operations on its cursors. This makesasyncio.gather()safe on the same transaction.Supports
async with— write transactions are committed on clean exit and aborted on exception.All other
Transactionmethods are available as coroutines via__getattr__proxy.- cursor(*args, **kwargs)¶
Open a cursor, returning an
AsyncCursor.Accepts the same arguments as
lmdb.Transaction.cursor(). Can be used withawaitorasync with:async with txn.cursor() as cur: await cur.first() items = await cur.iternext()
- class lmdb.aio.AsyncCursor(cursor, executor=None, lock=None)¶
Async wrapper for
lmdb.Cursor.All methods of the underlying
Cursorare available. Most are dispatched to an executor;key(),value(), anditem()are called directly.Iterator methods (
iternext(),iterprev(), etc.) are consumed in the executor and returned as a list.Shares the parent transaction’s
asyncio.Lock.Supports
async with— the cursor is closed on exit.All other
Cursormethods are available as coroutines via__getattr__proxy.- async iternext(keys=True, values=True)¶
Return a forward iterator that yields the current element before calling
next(), repeating until the end of the database is reached. As a convenience,Cursorimplements the iterator protocol by automatically returning a forward iterator when invoked:>>> # Equivalent: >>> it = iter(cursor) >>> it = cursor.iternext(keys=True, values=True)
If the cursor is not yet positioned, it is moved to the first key in the database, otherwise iteration proceeds from the current position.
- async iternext_dup(keys=False, values=True)¶
Return a forward iterator that yields the current value (“duplicate”) of the current key before calling
next_dup(), repeating until the last value of the current key is reached.Only meaningful for databases opened with dupsort=True.
if not cursor.set_key("foo"): print("No values found for 'foo'") else: for idx, data in enumerate(cursor.iternext_dup()): print("%d'th value for 'foo': %s" % (idx, data))
- async iternext_nodup(keys=True, values=False)¶
Return a forward iterator that yields the current value (“duplicate”) of the current key before calling
next_nodup(), repeating until the end of the database is reached.Only meaningful for databases opened with dupsort=True.
If the cursor is not yet positioned, it is moved to the first key in the database, otherwise iteration proceeds from the current position.
for key in cursor.iternext_nodup(): print("Key '%s' has %d values" % (key, cursor.count()))
- async iterprev(keys=True, values=True)¶
Return a reverse iterator that yields the current element before calling
prev(), until the start of the database is reached.If the cursor is not yet positioned, it is moved to the last key in the database, otherwise iteration proceeds from the current position.
>>> with env.begin() as txn: ... for i, (key, value) in enumerate(txn.cursor().iterprev()): ... print('%dth last item is (%r, %r)' % (1+i, key, value))
- async iterprev_dup(keys=False, values=True)¶
Return a reverse iterator that yields the current value (“duplicate”) of the current key before calling
prev_dup(), repeating until the first value of the current key is reached.Only meaningful for databases opened with dupsort=True.
- async iterprev_nodup(keys=True, values=False)¶
Return a reverse iterator that yields the current value (“duplicate”) of the current key before calling
prev_nodup(), repeating until the start of the database is reached.If the cursor is not yet positioned, it is moved to the last key in the database, otherwise iteration proceeds from the current position.
Only meaningful for databases opened with dupsort=True.
Exceptions¶
- class lmdb.Error¶
Raised when an LMDB-related error occurs, and no more specific
lmdb.Errorsubclass exists.
- class lmdb.KeyExistsError¶
Key/data pair already exists.
- class lmdb.NotFoundError¶
No matching key/data pair found.
Normally py-lmdb indicates a missing key by returning
None, or a user-supplied default value, however LMDB may return this error where py-lmdb does not know to convert it into a non-exceptional return.
- class lmdb.PageNotFoundError¶
Request page not found.
- class lmdb.CorruptedError¶
Located page was of the wrong type.
- class lmdb.PanicError¶
Update of meta page failed.
- class lmdb.VersionMismatchError¶
Database environment version mismatch.
- class lmdb.InvalidError¶
File is not an MDB file.
- class lmdb.MapFullError¶
Environment map_size= limit reached.
- class lmdb.DbsFullError¶
Environment max_dbs= limit reached.
- class lmdb.ReadersFullError¶
Environment max_readers= limit reached.
- class lmdb.TlsFullError¶
Thread-local storage keys full - too many environments open.
- class lmdb.TxnFullError¶
Transaciton has too many dirty pages - transaction too big.
- class lmdb.CursorFullError¶
Internal error - cursor stack limit reached.
- class lmdb.PageFullError¶
Internal error - page has no more space.
- class lmdb.MapResizedError¶
Database contents grew beyond environment map_size=.
- class lmdb.IncompatibleError¶
Operation and DB incompatible, or DB flags changed.
- class lmdb.BadDbiError¶
The specified DBI was changed unexpectedly.
- class lmdb.BadRslotError¶
Invalid reuse of reader locktable slot.
- class lmdb.BadTxnError¶
Transaction cannot recover - it must be aborted.
- class lmdb.BadValsizeError¶
Too big key/data, key is empty, or wrong DUPFIXED size.
- class lmdb.ReadonlyError¶
An attempt was made to modify a read-only database.
- class lmdb.InvalidParameterError¶
An invalid parameter was specified.
- class lmdb.LockError¶
The environment was locked by another process.
- class lmdb.MemoryError¶
Out of memory.
- class lmdb.DiskError¶
No more disk space.
Command line tools¶
A rudimentary interface to most of the binding’s functionality is provided. These functions are useful for e.g. backup jobs.
$ python -mlmdb --help
Usage: python -mlmdb [options] <command>
Basic tools for working with LMDB.
copy: Consistent high speed backup an environment.
python -mlmdb copy -e source.lmdb target.lmdb
copyfd: Consistent high speed backup an environment to stdout.
python -mlmdb copyfd -e source.lmdb > target.lmdb/data.mdb
drop: Delete one or more sub-databases.
python -mlmdb drop db1
dump: Dump one or more databases to disk in 'cdbmake' format.
Usage: dump [db1=file1.cdbmake db2=file2.cdbmake]
If no databases are given, dumps the main database to 'main.cdbmake'.
edit: Add/delete/replace values from a database.
python -mlmdb edit --set key=value --set-file key=/path \
--add key=value --add-file key=/path/to/file \
--delete key
get: Read one or more values from a database.
python -mlmdb get [<key1> [<keyN> [..]]]
readers: Display readers in the lock table
python -mlmdb readers -e /path/to/db [-c]
If -c is specified, clear stale readers.
restore: Read one or more database from disk in 'cdbmake' format.
python -mlmdb restore db1=file1.cdbmake db2=file2.cdbmake
The special db name ":main:" may be used to indicate the main DB.
rewrite: Re-create an environment using MDB_APPEND
python -mlmdb rewrite -e src.lmdb -E dst.lmdb [<db1> [<dbN> ..]]
If no databases are given, rewrites only the main database.
shell: Open interactive console with ENV set to the open environment.
stat: Print environment statistics.
warm: Read environment into page cache sequentially.
watch: Show live environment statistics
Options:
-h, --help show this help message and exit
-e ENV, --env=ENV Environment file to open
-d DB, --db=DB Database to open (default: main)
-r READ, --read=READ Open environment read-only
-S MAP_SIZE, --map_size=MAP_SIZE
Map size in megabytes (default: 10)
-a, --all Make "dump" dump all databases
-E TARGET_ENV, --target_env=TARGET_ENV
Target environment file for "dumpfd"
-x, --xxd Print values in xxd format
-M MAX_DBS, --max-dbs=MAX_DBS
Maximum open DBs (default: 128)
--out-fd=OUT_FD "copyfd" command target fd
Options for "copy" command:
--compact Perform compaction while copying.
Options for "edit" command:
--set=SET List of key=value pairs to set.
--set-file=SET_FILE
List of key pairs to read from files.
--add=ADD List of key=value pairs to add.
--add-file=ADD_FILE
List of key pairs to read from files.
--delete=DELETE List of key=value pairs to delete.
Options for "readers" command:
-c, --clean Clean stale readers? (default: no)
Options for "watch" command:
--csv Generate CSV instead of terminal output.
--interval=INTERVAL
Interval size (default: 1sec)
--window=WINDOW Average window size (default: 10)
Implementation Notes¶
Iterators¶
It was tempting to make Cursor directly act as an iterator, however
that would require overloading its next() method to mean something other than
the natural definition of next() on an LMDB cursor. It would additionally
introduce unintuitive state tied to the cursor that does not exist in LMDB:
such as iteration direction and the type of value yielded.
Instead a separate iterator is produced by __iter__(), iternext(), and iterprev(), with easily described semantics regarding how they interact with the cursor.
Memsink Protocol¶
If the memsink package is available during installation of the CPython
extension, then the resulting module’s Transaction object will act
as a source for the Memsink Protocol. This is an experimental protocol to
allow extension of LMDB’s zero-copy design outward to other C types, without
requiring explicit management by the user.
This design is a work in progress; if you have an application that would benefit from it, please leave a comment on the ticket above.
Deviations from LMDB API¶
- mdb_dbi_close():
This is not exposed since its use is perilous at best. Users must ensure all activity on the DBI has ceased in all threads before closing the handle. Failure to do this could result in “impossible” errors, or the DBI slot becoming reused, resulting in operations being serviced by the wrong named database. Leaving handles open wastes a tiny amount of memory, which seems a good price to avoid subtle data corruption.
Cursor.replace(),Cursor.pop():There are no native equivalents to these calls, they just implement common operations in C to avoid a chunk of error prone, boilerplate Python from having to do the same.
- mdb_set_compare(), mdb_set_dupsort():
Neither function is exposed for a variety of reasons. In particular, neither can be supported safely, since exceptions cannot be propagated through LMDB callbacks, and can lead to database corruption if used incorrectly. Secondarily, since both functions are repeatedly invoked for every single lookup in the LMDB read path, most of the performance benefit of LMDB is lost by introducing Python interpreter callbacks to its hot path.
There are a variety of workarounds that could make both functions useful, but not without either punishing binding users who do not require these features (especially on CFFI), or needlessly complicating the binding for what is essentially an edge case.
In all cases where mdb_set_compare() might be useful, use of a special key encoding that encodes your custom order is usually desirable. See issue #79 for example approaches.
The answer is not so clear for mdb_set_dupsort(), since a custom encoding there may necessitate wasted storage space, or complicating record decoding in an application’s hot path. Please file a ticket if you think you have a use for mdb_set_dupsort().
Technology¶
The binding is implemented twice: once using CFFI, and once as native C extension. This is since a CFFI binding is necessary for PyPy, but its performance on CPython is very poor. For good performance on CPython, only Cython and a native extension are viable options. Initially Cython was used, however this was abandoned due to the effort and relative mismatch involved compared to writing a native extension.
Invalidation lists¶
Much effort has gone into avoiding crashes: when some object is invalidated
(e.g. due to Transaction.abort()), child objects are updated to ensure
they don’t access memory of the no-longer-existent resource, and that they
correspondingly free their own resources. On CPython this is accomplished by
weaving a linked list into all PyObject structures. This avoids the need to
maintain a separate heap-allocated structure, or produce excess weakref
objects (which internally simply manage their own lists).
With CFFI this isn’t possible. Instead each object has a _deps dict that
maps dependent object IDs to the corresponding objects. Weakrefs are avoided
since they are very inefficient on PyPy. Prior to invalidation _deps is
walked to notify each dependent that the resource is about to disappear.
Finally, each object may either store an explicit _invalid attribute and
check it prior to every operation, or rely on another mechanism to avoid the
crash resulting from using an invalidated resource. Instead of performing these
explicit tests continuously, on CFFI a magic
Some_LMDB_Resource_That_Was_Deleted_Or_Closed object is used. During
invalidation, all native handles are replaced with an instance of this object.
Since CFFI cannot convert the magical object to a C type, any attempt to make a
native call will raise TypeError with a nice descriptive type name
indicating the problem. Hacky but efficient, and mission accomplished.
Argument parsing¶
The CPython module parse_args() may look “special”, at best. The alternative PyArg_ParseTupleAndKeywords performs continuous heap allocations and string copies, resulting in a difference of 10,000 lookups/sec slowdown in a particular microbenchmark. The 10k/sec slowdown could potentially disappear given a sufficiently large application, so this decision needs revisited at some stage.
ChangeLog¶
Unreleased¶
Fixes¶
- Fix segmentation fault on Linux/aarch64 caused by closing an
Environmentfrom a different thread than the one holding an active write transaction. LMDB requires all transactions to be closed beforemdb_env_closeand ties the write lock to the owning OS thread; closing from another thread aborted the write transaction on the wrong thread, which could not release the robust process-shared write mutex, leaving a dangling entry on the owning thread’s robust list when the lock file was unmapped. A laterbegin()then crashed insidepthread_mutex_lock(glibc tolerates the dangling entry on x86_64 but faults on aarch64).Environment.close()now waits for the owning thread to release an active write transaction before tearing down the mapping. (#465)
2026-03-29 2.2.0¶
New features¶
- New
lmdb.aiomodule providing asyncio support with AsyncEnvironmentandAsyncTransaction. (#434)
- New
Environment.dbs()lists all named databases in an environment.(#253, #438)
Cursor.getmulti()acceptsvalues=Falsefor batch key-existencechecks without fetching values. (#435)
Transaction.stat()now defaults to the main database when nodbargument is given. (#366, #439)
Environment.set_mapsize()now safely invalidates open transactionsand cursors before remapping, preventing use-after-free crashes. (#443)
- Enable sparse files on Windows so NTFS does not preallocate the full
map_sizeon disk. (#444)
Bug fixes¶
- Fix type stubs: return types now use
Union[bytes, memoryview]to reflect the
buffers=parameter, anddbs()is included. (#448, #449)
- Fix type stubs: return types now use
- Declare
python_requires='>=3.9'so pip skips this package on Python 2. (#447)
- Declare
Documentation¶
Modernize documentation for Python 3 and current project state. (#436)
Add API reference for
lmdb.aiomodule. (#440)Document fork safety and multiprocessing guidance. (#441)
Document duplicate-sort (dupsort) databases. (#442)
Document cursor iteration and navigation patterns. (#445)
Add “When to use py-lmdb” comparison section. (#446)
Render ChangeLog as native HTML instead of a code block. (#451)
Other¶
- Add LMDB data layout validation tests and patched/pure equivalence
checks. (#433)
2026-03-19 2.1.1¶
- Fix false MDB_CORRUPTED error when overwriting values larger than the page
size (overflow/bigdata values) with txn.put(key, value, overwrite=True). Two hardening checks from 2.1.0 did not account for F_BIGDATA nodes where NODEDSZ() returns the logical data size, not the on-page size. (#431)
2026-03-19 2.1.0¶
Security release. All users who open LMDB databases from untrusted or potentially-tampered sources should upgrade immediately. Applications that only open databases they created themselves are not at risk, but upgrading is still recommended for defense-in-depth.
Security fixes¶
The bundled LMDB 0.9.35 trusts many on-disk fields without validation. A crafted data.mdb file can exploit this to crash the process, read arbitrary memory, or corrupt the heap. These are all upstream LMDB bugs; py-lmdb ships patches to address them.
- CVE-2019-16224: heap buffer overflow via MDB_DUPFIXED without
MDB_DUPSORT in on-disk md_flags. (#429)
- CVE-2019-16225: SIGSEGV from P_DIRTY flag set on mmap’d disk pages,
causing mdb_page_touch() to skip copy-on-write. (#429)
- CVE-2019-16226: out-of-bounds memmove in mdb_node_del via corrupt
mn_hi making NODEDSZ() huge. (#429)
- CVE-2019-16227: NULL pointer dereference of mc_xcursor when
F_DUPDATA is set on a node in a non-DUPSORT database. (#429)
- CVE-2019-16228: divide-by-zero from zero mm_psize in meta page
header. (#429)
- 13 additional hardening patches from variant analysis of the same code
(#430), including:
Validate mp_lower/mp_upper bounds on every page fetch — prevents NUMKEYS() unsigned wrap and SIZELEFT() underflow.
Bounds-check NODEDSZ() in mdb_node_read, mdb_cursor_put, and mdb_page_split — prevents OOB reads and heap overflows from corrupt node sizes.
Validate DUPSORT sub-page headers before copying — prevents memcpy size underflow (unsigned wrap to huge value).
Check NODEDSZ >= sizeof(MDB_db) before memcpy in mdb_xcursor_init1 — prevents read past node boundary.
Validate LEAF2 mp_pad (key size) — zero or huge values cause OOB via LEAF2KEY().
Guard mc_xcursor NULL in MDB_GET_CURRENT and _mdb_cursor_del — two call sites missed by the CVE-2019-16227 fix.
Guard nsize underflow in mdb_node_shrink.
Validate overflow page extent (pgno + mp_pages) stays within the database.
Reject meta page numbers (0, 1) as B-tree roots.
Validate md_depth <= CURSOR_STACK to prevent stack buffer overrun.
Bug fixes¶
- Cross-thread write transactions now block instead of raising
lmdb.Error(“Attempt to operate on closed/deleted/dropped object.”). The check added in 1.8.0 was overly strict: it rejected all concurrent write transactions, but only same-thread re-entrance is actually incorrect. The cpython implementation now releases the GIL during mdb_txn_begin for write transactions, allowing another thread to block until the first commits. (#427, #428)
Other¶
- Refactored setup.py patch application into a single loop with a shared
patch list, replacing duplicated os.system() blocks.
2026-03-17 2.0.0¶
This is the largest py-lmdb release in years, made possible by a new coding partner: Claude (Anthropic). Nearly every change below was co-authored by Claude, turning months of backlogged issues into a week of focused work.
Potentially breaking changes¶
- Thread-safety overhaul (#180). Environment.close(), Transaction.abort(),
Transaction.commit(), and cursor operations are now serialized with locks to prevent use-after-free and double-free crashes when called concurrently. Code that previously “worked” by luck with racy close/abort patterns may now block where it previously crashed or silently corrupted memory. If you relied on the old undefined behavior, review your threading model.
- Duplicate environment path rejection (#230). Opening the same LMDB path
twice in one process now raises lmdb.Error instead of silently proceeding to a likely segfault. This will surface latent bugs in code that accidentally opened the same environment twice.
- Minimum Python version is now 3.9. Python 2.7 and 3.5-3.8 are no longer
supported. Python 2 compatibility shims have been removed from the codebase.
New features¶
- PEP 561 type stubs (py.typed) are now shipped (#257). IDEs and type
checkers will pick up lmdb’s types automatically.
- Cache getpid() result to avoid a syscall on every transaction dealloc.
Since glibc 2.25, getpid() is no longer cached in userspace; this showed up in profiles for workloads with many short-lived transactions. Contributed by @ltfish (#421).
Bug fixes¶
- Fix memory safety issues across cpython and CFFI implementations (#420):
multiple reference count leaks in cpython.c (env_readers_callback, db_flags, env_flags, env_copy, env_new, make_arg_cache, cursor_get_multi), a getmulti validation bug in cffi.py, and a cursor leak on mdb_cursor_open failure.
- Fix unsafe buffer protocol usage in cpython.c that could read freed memory
(#372).
- Fix cmd_copyfd fd validation to use os.fstat() instead of os.fdopen(),
which consumed the fd.
- Add fork detection to CFFI implementation. Previously, Transaction.__del__
and Environment.__del__ would call mdb_txn_abort / mdb_env_close in forked child processes, risking corruption of the parent’s environment.
- Release GIL during mdb_env_close (#418) so the event loop isn’t blocked
while msync/fsync runs on large writemap databases.
Other¶
Comprehensive test suite for lmdb.tool (#148).
Fix ReadTheDocs build (#414, #172).
Fix pyright errors and remove Python 2 compat code.
2026-03-12 1.8.1¶
CI Fix
2026-03-12 1.8.0¶
Update bundled LMDB from 0.9.33 to 0.9.35.
- Fix Windows heap corruption caused by recursive write mutex. Replace
Windows Mutex objects with Semaphores for LMDB’s read and write locks, preventing concurrent write transactions on the same thread. Reported by @RogueZamboni (#394).
- Fix infinite loop with next_nodup/prev_nodup on the sole key in a
dupsort database after delete+put. Reported by @RogerMarsh (#388).
- Fix METH_NOARGS function signatures to include the required second
parameter (CPython 3.14 compatibility). Reported by @dw (#182).
- Fix installing and running on PyPy.
Reported by and fix contributed by @mgorny (#403).
Fix PreloadTest on openSUSE. Contributed by @mgrossu (#400).
CI-only: Fix PyPy wheel upload by adding auditwheel repair.
2025-10-14 1.7.5¶
CI-only: Fix generation of 3.14 binaries.
2025-10-14 1.7.4 – yanked¶
CI-only: Generate Python 3.14 binaries.
2025-07-15 1.7.3¶
Fix CFFI build on some platforms by ensuring paths are absolute.
Correct CI badge URL in README.
2025-07-10 1.7.2¶
CI-only fix
2025-07-09 1.7.1¶
CI-only fix
2025-07-09 1.7.0¶
Rewrite CI to use cibuildwheel.
Update bundled LMDB to 0.9.33, plus a patch to fix ITS#10346.
- Prevent some accidental use of LMDB objects by child processes.
Contributed by Callum Walker.
2025-01-05 1.6.2¶
CI-only fix.
2025-01-05 1.6.1¶
CI-only fix.
2025-01-05 1.6.0¶
Support for Python 3.13. Contributed by Miro Hrončok and Adam Williamson.
CI: Publish 3.13 binaries and Linux aarch64 wheels for multiple versions.
2024-07-01 1.5.1¶
CI-only fix.
2024-06-30 1.5.0¶
Add Python 3.12 binaries.
Update bundled LMDB to 0.9.31.
Remove Python 2.7 support.
2022-04-04 v1.4.1¶
Update CI to build manylinux binaries.
2022-12-06 v1.4.0¶
Add Python 3.11 support.
2021-12-30 v1.3.0¶
Add aarch64 architecture builds. Contributed by odidev.
Add Python 3.10 support.
- Fix crash relating to caching of transactions. The ‘max_spare_txns’
parameter to Environment/open is currently ignored in cpython.
2021-04-19 v1.2.1¶
Resolve CI bug where non-Linux wheels were not being published to PyPI.
2021-04-15 v1.2.0¶
Update bundled LMDB to 0.9.29.
Add non-bundled testing to CI.
- Remove wheel generation for 2.7 because the manylinux images no longer
support it.
- Allow passing None as a value to transaction.del in CFFI implementation
for parity with cpython implementation.
Fix Cursor.put behavior on a dupsort DB with append=True.
- Add warning to docs about use of Environment.set_mapsize. This is currently
an unresolved issue with upstream LMDB.
CFFI implementation: fix a seg fault when open_db returns map full.
CFFI implementation: fix a bug in open_db in a read-only environment.
2021-02-05 v1.1.1¶
- Dowgrade underlying LMDB to 0.9.26. 0.9.27 has a minor defect that will
need to get resolved.
2021-02-04 v1.1.0¶
- Migrate CI pipeline from Travis and AppVeyor to Github Actions. Now
includes comprehensive testing across 4 dimensions (OS, Python version, cpython/CFFI, pure/with mods). Also includes publishing to PyPI.
Prevent invalid flag combinations when creating a database.
- Add a Cursor.getmulti method with optional buffer support. Contributed by
Will Thompson <willsthompson@gmail.com>.
Upgrade underlying LMDB to 0.9.27.
2020-08-28 v1.0.0¶
- Start of new semantic versioning scheme. This would be a minor version
bump from the 0.99 release if it were semantically versioned.
- Allow environment copy to take a passed-in transaction. This is the
first released feature that requires a (very small) patch to the underlying C library. By default, the patch will be applied unless this module is built with LMDB_PURE environment variable set.
2020-08-13 v0.99¶
Fix lmdb.tool encoding issues.
Fix -l lmdb invocation issue.
Minor documentation improvements.
Update LMDB to version 0.9.24.
Update for Python 3.9 (current release candidate) support.
Resolve a bug when using cursor.putmulti and append=True on dupsort DBs.
- Allow _Database.flags method to take no arguments since the one argument
wasn’t being used.
2019-11-06 v0.98¶
Fix that a duplicate argument to a lmdb method would cause an assert.
- Solaris needs
#include "python.h"as soon as possible. Fix contributed by Jesús Cea.
- Solaris needs
Fix crash under debug cpython when mdb_cursor_open failed
2019-08-11 v0.97¶
Fix a missed GIL unlock sequence. Reported by ajschorr.
- Fix argv check in JEP (cpython under Java) environment. Contributed by
de-code.
2019-07-14 v0.96¶
First release under new maintainer, Nic Watson.
Doc updates.
More removal of code for now-unsupported Python versions.
- Only preload the value with the GIL unlocked when the value is actually
requested. This significantly improves read performance to retrieve keys with large values when the value isn’t retrieved. Reported by Dan Patton.
2019-06-08 v0.95¶
The minimum supported version of Python is now 2.7.
The library is no longer tested on Python 3.2.
- The address-book.py example was updated for Python 3. Contributed by Jamie
Bliss.
Development-related files were removed from the distribution tarball.
- Handling of the Environment(create=True) flag was improved. Fix contributed
by Nir Soffer.
- Database names may be reused after they are dropped on CFFI, without
reopening the environment. Fix contributed by Gareth Bult.
2018-04-09 v0.94¶
- CPython argument parsing now matches the behaviour of CFFI, and most sane
Python APIs: a bool parameter is considered to be true if it is any truthy value, not just if it is exactly True. Reported by Nic Watson.
Removed Python 2.6 support due to urllib3 warnings and pytest dropping it.
Updared LMDB to version 0.9.22.
Fixed several 2.7/3 bugs in command line tool.
2017-07-16 v0.93¶
- py-lmdb is now built with AppVeyor CI, providing early feedback on Windows
build health. Egg and wheel artifacts are being generated, removing the need for a dedicated Windows build machine, however there is no mechanism to paublish these to PyPI yet.
- The “warm” tool command did not function on Python 3.x. Reported by Github
user dev351.
- Tests now pass on non-4kb page-sized machines, such as ppc64le. Reported by
Jonathan J. Helmus.
- Windows 3.6 eggs and wheels are now available on PyPI, and tests are run
against 3.6. Reported by Ofek Lev.
- Python 3.2 is no longer supported, due to yet more pointless breakage
introduced in pip/pkg_resources.
- py-lmdb currently does not support LMDB >=0.9.19 due to interface changes in
LMDB. Support will appear in a future release.
2016-10-17 v0.92¶
- Changes to support __all__ caused the CPython module to fail to import at
runtime on Python 3. This was hidden during testing as the CFFI module was successfully imported.
2016-10-17 v0.91¶
- The docstring for NotFoundError was clarified to indicate that it is
not raised in normal circumstances.
- CFFI open_db() would always attempt to use a write transaction, even if the
environment was opened with readonly=True. Now both CPython and CFFI will use a read-only transaction in this case. Reported by Github user handloomweaver.
- The source distribution previously did not include a LICENSE file, and may
have included random cached junk from the source tree during build. Reported by Thomas Petazzoni.
Transaction.id() was broken on Python 2.5.
Repair Travis CI build again.
- CFFI Cursor did not correctly return empty strings for key()/value()/item()
when iternext()/iterprev() had reached the start/end of the database. Detected by tests contributed by Ong Teck Wu.
- The package can now be imported from within a CPython subinterpreter. Fix
contributed by Vitaly Repin.
- lmdb.tool –delete would not delete keys in some circumstances. Fix
contributed by Vitaly Repin.
- Calls to Cursor.set_range_dup() could lead to memory corruption due to
Cursor’s idea of the key and value failing to be updated correctly. Reported by Michael Lazarev.
- The lmdb.tool copy command now supports a –compact flag. Contributed by
Achal Dave.
- The lmdb.tool edit command selects the correct database when –delete is
specified. Contributed by ispequalnp.
- lmdb.tool correctly supports the -r flag to select a read-only environment.
Contributed by ispequalnp.
- The lmdb.tool –txn_size parameter was removed, as it was never implemented,
and its original function is no longer necessary with modern LMDB. Reported by Achal Dave.
- The documentation template was updated to fix broken links. Contributed by
Adam Chainz.
- The Travis CI build configuration was heavily refactored by Alexander Zhukov.
Automated tests are running under Travis CI once more.
- The CPython extension module did not define __all__. It is now defined
contain the same names as on CFFI.
- Both implementations were updated to remove lmdb.open() from __all__,
ensuring
from lmdb import *does not shadow the builtin open(). The function can still be invoked using its fully qualified name, and the alias “Environment” may be used whenfrom lmdb import *is used. Reported by Alexander Zhukov.
- The CPython extension exported BadRSlotError, instead of BadRslotError. The
exception’s name was corrected to match CFFI.
- Environment.open_db() now supports integerdup=True, dupfixed=True, and
integerkey=True flags. Based on a patch by Jonathan Heyman.
2016-07-11 v0.90¶
- This release was deleted from PyPI due to an erroneous pull request
upgrading the bundled LMDB to mdb.master.
2016-02-12 v0.89¶
LMDB 0.9.18 is bundled.
- CPython Iterator.next() was incorrectly defined as pointing at the
implementation for Cursor.next(), triggering a crash if the method was ever invoked manually. Reported by Kimikazu Kato.
2016-01-24 v0.88¶
LMDB 0.9.17 is bundled.
Transaction.id() is exposed.
Binary wheels are built for Python 3.5 Windows 32/64-bit.
2015-08-11 v0.87¶
- Environment.set_mapsize() was added to allow runtime adjustment of the
environment map size.
- Remove non-determinism from setup.py, to support Debian’s reproducible
builds project. Patch by Chris Lamb.
Documentation correctness and typo fixes. Patch by Gustav Larsson.
- examples/keystore: beginnings of example that integrates py-lmdb with an
asynchronous IO loop.
2015-06-07 v0.86¶
- LMDB_FORCE_SYSTEM builds were broken by the GIL/page fault change. This
release fixes the problem.
Various cosmetic fixes to documentation.
2015-06-06 v0.85¶
New exception class: lmdb.BadDbiError.
- Environment.copy() and Environment.copyfd() now support compact=True, to
trigger database compaction while copying.
Various small documentation updates.
- CPython set_range_dup() and set_key_dup() both invoked MDB_GET_BOTH, however
set_range_dup() should have instead invoked MDB_GET_BOTH_RANGE. Fix by Matthew Battifarano.
- lmdb.tool module was broken on Win32, since Win32 lacks signal.SIGWINCH. Fix
suggested by David Khess.
- LMDB 0.9.14 is bundled along with extra fixes from mdb.RE/0.9 (release
engineering) branch.
- CPython previously lacked a Cursor.close() method. Problem was noticed by
Jos Vos.
- Several memory leaks affecting the CFFI implementation when running on
CPython were fixed, apparent only when repeatedly opening and discarding a large number of environments. Noticed by Jos Vos.
- The CPython extension previously did not support weakrefs on Environment
objects, and the implementation for Transaction objects was flawed. The extension now correctly invalidates weakrefs during deallocation.
- Both variants now try to avoid taking page faults with the GIL held,
accomplished by touching one byte of every page in a value during reads. This does not guarantee faults will never occur with the GIL held, but it drastically reduces the possibility. The binding should now be suitable for use in multi-threaded applications with databases containing >2KB values where the entire database does not fit in RAM.
2014-09-22 v0.84¶
LMDB 0.9.14 is bundled.
- CFFI Cursor.putmulti() could crash when append=False and a key already
existed.
2014-06-24 v0.83¶
LMDB 0.9.13 is bundled along with extra fixes from upstream Git.
- Environment.__enter__() and __exit__() are implemented, allowing
Environments to behave like context managers.
- Cursor.close(), __enter__() and __exit__() are implemented, allowing Cursors
to be explicitly closed. In CFFI this mechanism must be used when many cursors are used within a single transaction, otherwise a resource leak will occur.
- Dependency tracking in CFFI is now much faster, especially on PyPy, however
at a cost: Cursor use must always be wrapped in a context manager, or .close() must be manually invoked for discarded Cursors when the parent transaction is long lived.
Fixed crash in CFFI Cursor.putmulti().
2014-05-26 v0.82¶
- Both variants now implement max_spare_txns, reducing the cost of creating a
read-only transaction 4x for an uncontended database and by up to 20x for very read-busy environments. By default only 1 read-only transaction is cached, adjust max_spare_txns= parameter if your script operates multiple simultaneous read transactions.
Patch from Vladimir Vladimirov implementing MDB_NOLOCK.
- The max_spare_iters and max_spare_cursors parameters were removed, neither
ever had any effect.
- Cursor.putmulti() implemented based on a patch from Luke Kenneth Casson
Leighton. This function moves the loop required to batch populate a database out of Python and into C.
- The bundled LMDB 0.9.11 has been updated with several fixes from upstream
Git.
- The cost of using keyword arguments in the CPython extension was
significantly reduced.
2014-04-26 v0.81¶
- On Python 2.x the extension module would silently interpret Unicode
instances as buffer objects, causing UCS-2/UCS-4 string data to end up in the database. This was never intentional and now raises TypeError. Any Unicode data passed to py-lmdb must explicitly be encoded with .encode() first.
- open_db()’s name argument was renamed to key, and its semantics now match
get() and put(): in other words the key must be a bytestring, and passing Unicode will raise TypeError.
The extension module now builds under Python 3.4 on Windows.
2014-04-21 v0.80¶
- Both variants now build successfully as 32 bit / 64bit binaries on
Windows under Visual Studio 9.0, the compiler for Python 2.7. This enables py-lmdb to be installed via pip on Windows without requiring a compiler to be available. In future, .egg/.whl releases will be pre-built for all recent Python versions on Windows.
Known bugs: Environment.copy() and Environment.copyfd() currently produce a database that cannot be reopened.
- The lmdb.enable_drop_gil() function was removed. Its purpose was
experimental at best, confusing at worst.
2014-03-17 v0.79¶
CPython Cursor.delete() lacked dupdata argument, fixed.
- Fixed minor bug where CFFI _get_cursor() did not note its idea of
the current key and value were up to date.
- Cursor.replace() and Cursor.pop() updated for MDB_DUPSORT databases. For
pop(), the first data item is popped and returned. For replace(), the first data item is returned, and all duplicates for the key are replaced.
- Implement remaining Cursor methods necessary for working with MDB_DUPSORT
databases: next_dup(), next_nodup(), prev_dup(), prev_nodup(), first_dup(), last_dup(), set_key_dup(), set_range_dup(), iternext_dup(), iternext_nodup(), iterprev_dup(), iterprev_nodup().
- The default for Transaction.put(dupdata=…) and Cursor.put(dupdata=…) has
changed from False to True. The previous default did not reflect LMDB’s normal mode of operation.
LMDB 0.9.11 is bundled along with extra fixes from upstream Git.
2014-01-18 v0.78¶
Patch from bra-fsn to fix LMDB_LIBDIR.
Various inaccurate documentation improvements.
Initial work towards Windows/Microsoft Visual C++ 9.0 build.
LMDB 0.9.11 is now bundled.
To work around install failures minimum CFFI version is now >=0.8.0.
- ticket #38: remove all buffer object hacks. This results in ~50% slowdown
for cursor enumeration, but results in far simpler object lifetimes. A future version may introduce a better mechanism for achieving the same performance without loss of sanity.
2013-11-30 v0.77¶
Added Environment.max_key_size(), Environment.max_readers().
- CFFI now raises the correct Error subclass associated with an MDB_* return
code.
Numerous CFFI vs. CPython behavioural inconsistencies have been fixed.
An endless variety of Unicode related 2.x/3.x/CPython/CFFI fixes were made.
LMDB 0.9.10 is now bundled, along with some extra fixes from Git.
Added Environment(meminit=…) option.
2013-10-28 v0.76¶
Added support for Environment(…, readahead=False).
LMDB 0.9.9 is now bundled.
- Many Python 2.5 and 3.x fixes were made. Future changes are automatically
tested via Travis CI <https://travis-ci.org/dw/py-lmdb>.
- When multiple cursors exist, and one cursor performs a mutation,
remaining cursors may have returned corrupt results via key(), value(), or item(). Mutations are now explicitly tracked and cause the cursor’s data to be refreshed in this case.
- setup.py was adjusted to ensure the distutils default of ‘-DNDEBUG’ is never
defined while building LMDB. This caused many important checks in the engine to be disabled.
- The old ‘transactionless’ API was removed. A future version may support the
same API, but the implementation will be different.
- Transaction.pop() and Cursor.pop() helpers added, to complement
Transaction.replace() and Cursor.replace().
License¶
The OpenLDAP Public License
Version 2.8, 17 August 2003
Redistribution and use of this software and associated documentation
("Software"), with or without modification, are permitted provided
that the following conditions are met:
1. Redistributions in source form must retain copyright statements
and notices,
2. Redistributions in binary form must reproduce applicable copyright
statements and notices, this list of conditions, and the following
disclaimer in the documentation and/or other materials provided
with the distribution, and
3. Redistributions must contain a verbatim copy of this document.
The OpenLDAP Foundation may revise this license from time to time.
Each revision is distinguished by a version number. You may use
this Software under terms of this license revision or under the
terms of any subsequent revision of the license.
THIS SOFTWARE IS PROVIDED BY THE OPENLDAP FOUNDATION AND ITS
CONTRIBUTORS ``AS IS'' AND ANY EXPRESSED OR IMPLIED WARRANTIES,
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT
SHALL THE OPENLDAP FOUNDATION, ITS CONTRIBUTORS, OR THE AUTHOR(S)
OR OWNER(S) OF THE SOFTWARE BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
The names of the authors and copyright holders must not be used in
advertising or otherwise to promote the sale, use or other dealing
in this Software without specific, written prior permission. Title
to copyright in this Software shall at all times remain with copyright
holders.
OpenLDAP is a registered trademark of the OpenLDAP Foundation.
Copyright 1999-2003 The OpenLDAP Foundation, Redwood City,
California, USA. All Rights Reserved. Permission to copy and
distribute verbatim copies of this document is granted.