- Notifications
You must be signed in to change notification settings - Fork124
Description
Hey, we have a pretty simple lmdb setup - no env flags, all single-threaded, no nested transactions. This is running on many hosts which receive the same change streams and therefore make identical LMDB transactions. Each host will make at most ~400 write transactions per minute, and across the entire fleet, there are maybe like 100k transactions total per minute (just giving an idea of how rare this is, I've seen it happen about once per month at this volume). Transaction size is variable, could include anywhere from 1-1000 operations. But given that every host makes the same transactions, and seeing nothing out of the ordinary in the logs, it seems like there is some race condition or other intermittent bug with lmdbjava/our implementation/the OS. We've also confirmed that this doesn't match any existing bug with the C library.
The malformed records we see contain garbage data replacing the bytes from left to right:
Example normal record (key, value):(b'\x1akey-prefix-129632d9951a047d9\x1akey-prefix-8302eb7207a2c859d', b'\x01\x00\x00\x00\x00\x00\x00\x00\x00')
Example malformed record:(b'\x80\xe1.\xad\xc6\x7f\x00\x00ix-60cf8260ecfb1741f\x1akey-prefix-38357a98a4\xf0\x00\x00\x00\x00\x00', b'\xe0\xdd/\xad\xc6\x7f\x00\x00\x00')
The presence of 7f seems to indicate these are virtual memory addresses.
I've also run theVerifier in our env for 1 hour for 61855873 rows and no errors. The issue hasn't been able to be reproduced.
Any pointers on where else to look?