- Notifications
You must be signed in to change notification settings - Fork124
Description
Summary
When usinglmdbjava
in a multithreaded environment (~32 threads), I experiencesporadic native crashes (SIGSEGV) across various JVMs and OSes during my CI matrix builds. The crashes consistently point to the native methodmdb_txn_renew0
, invoked duringenv.txnRead()
oror dbi.get()
.
Environment
- Java versions: Various (e.g., Corretto 21.0.6.7.1, Temurin 21.0.6+7)
- OS: Linux, Windows, macOS (GitHub Actions runners)
- LMDBJava version: Latest via Maven (0.9.1)
- Concurrency: ~32 threads executing
txnRead()
andget()
in tight loops - Frequency: Millions of calls across all threads
- Crash frequency: Sporadic (impossible to reproduce locally, but more frequent in CI)
Crash Snippet example
# A fatal error has been detected by the Java Runtime Environment:# SIGSEGV (0xb) at pc=0x00007f62e442b95e, pid=2053, tid=2149# Problematic frame:# C [lmdbjava-native-library-8897859339362759578.so+0x595e] mdb_txn_renew0+0x13e...Stack: [0x00007f62adddf000,0x00007f62adedf000], sp=0x00007f62adedd290, free space=1016kNative frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)C [lmdbjava-native-library-8897859339362759578.so+0x595e] mdb_txn_renew0+0x13eC [lmdbjava-native-library-8897859339362759578.so+0x5eec] mdb_txn_begin+0x26cC 0x00007f62e44257ecJava frames: (J=compiled Java code, j=interpreted, Vv=VM code)J 2884 org.lmdbjava.Library$Lmdb$jnr$ffi$0.mdb_txn_begin$jni$29(JJIJ)I (0 bytes) @ 0x00007f62cc409b02 [0x00007f62cc409aa0+0x0000000000000062]J 3902 c2 net.ladenthin.bitcoinaddressfinder.ConsumerJava.containsAddress(Ljava/nio/ByteBuffer;)Z (213 bytes) @ 0x00007f62cc516f70 [0x00007f62cc5159c0+0x00000000000015b0]J 3954% c1 net.ladenthin.bitcoinaddressfinder.ConsumerJava.consumeKeys(Ljava/nio/ByteBuffer;)V (1373 bytes) @ 0x00007f62c4f17b4c [0x00007f62c4f172a0+0x00000000000008ac]
Usage Pattern
I use the following method in high-throughput read scenarios:
@OverridepublicbooleancontainsAddress(ByteBufferhash160) {try (Txn<ByteBuffer>txn =env.txnRead()) {ByteBufferbyteBuffer =lmdb_h160ToAmount.get(txn,hash160);returnbyteBuffer !=null; }}
hash160
isnot shared across threads (it’s either thread-local, duplicated, or allocated per call).- The
Dbi<ByteBuffer>
(lmdb_h160ToAmount
) andEnv<ByteBuffer>
are shared and initialized once.
I’ve tried:
hash160.duplicate().rewind()
- Allocating a fresh
ByteBuffer.allocateDirect()
and copying the contents
None of this resolves the crash.
What I'm Doing
In my code, I'm callingenv.txnRead()
to create a read-only transaction and then immediately passing it tolmdb_h160ToAmount.get(txn, hash160)
to check for the existence of a key. This is part of a high-throughput loop running in about 32 threads.
I'm currently trying to determine whether the native crash originates from theenv.txnRead()
call itself or from the subsequentget(txn, hash160)
operation. Both calls are used extensively under heavy concurrent load, and isolating the failing component has been difficult due to the sporadic nature of the issue.
Try 1 - Fails
I tried synchronizing only theget(txn, hash160)
call (while keepingenv.txnRead()
outside the synchronized block). Even with this partial synchronization, I experienced a native crash during CI testing.
@OverridepublicbooleancontainsAddress(ByteBufferhash160) {try (Txn<ByteBuffer>txn =env.txnRead()) {synchronized (this) {ByteBufferbyteBuffer =lmdb_h160ToAmount.get(txn,hash160);returnbyteBuffer !=null; } }}
Error: Crashed tests:Error: net.ladenthin.bitcoinaddressfinder.LMDBPersistencePerformanceTestError: org.apache.maven.surefire.booter.SurefireBooterForkException: ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?Error: Command was cmd.exe /X /C "C:\hostedtoolcache\windows\Java_Microsoft_jdk\21.0.2\x64\bin\java -javaagent:C:\\Users\\runneradmin\\.m2\\repository\\org\\jacoco\\org.jacoco.agent\\0.8.13\\org.jacoco.agent-0.8.13-runtime.jar=destfile=D:\\a\\BitcoinAddressFinder\\BitcoinAddressFinder\\target\\jacoco.exec -Xmx2g -Xms1g --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/jdk.internal.ref=ALL-UNNAMED --add-opens=java.base/jdk.internal.misc=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=jdk.management/com.sun.management.internal=ALL-UNNAMED -jar C:\Users\runneradmin\AppData\Local\Temp\surefire10771212932124905688\surefirebooter-20250409125554748_110.jar C:\Users\runneradmin\AppData\Local\Temp\surefire10771212932124905688 2025-04-09T12-55-54_486-jvmRun1 surefire-20250409125554748_108tmp surefire_14-20250409125554748_109tmp"Error: Error occurred in starting fork, check output in logError: Process Exit Code: -1073741819Error: Crashed tests:Error: net.ladenthin.bitcoinaddressfinder.LMDBPersistencePerformanceTest
Try 2 - Fails
This code is crashing, which shows me more or less thatenv.txnRead()
has a problem with concurrent execution.
@OverridepublicbooleancontainsAddress(ByteBufferhash160) {try (Txn<ByteBuffer>txn =env.txnRead()) {synchronized (this) {ByteBufferbyteBuffer =lmdb_h160ToAmount.get(txn,hash160);returnbyteBuffer !=null; } }}
## A fatal error has been detected by the Java Runtime Environment:## SIGSEGV (0xb) at pc=0x00007fb36801295e, pid=2040, tid=2168## JRE version: OpenJDK Runtime Environment (Alibaba Dragonwell Extended Edition)-21.0.6.0.6+7-GA (21.0.6) (build 21.0.6.0.6)# Java VM: OpenJDK 64-Bit Server VM (Alibaba Dragonwell Extended Edition)-21.0.6.0.6+7-GA (21.0.6.0.6, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)# Problematic frame:# C [lmdbjava-native-library-6392604367366445385.so+0x595e] mdb_txn_renew0+0x13e## Core dump will be written. Default location: Core dumps may be processed with "/usr/lib/systemd/systemd-coredump %P %u %g %s %t 9223372036854775808 %h" (or dumping to /home/runner/work/BitcoinAddressFinder/BitcoinAddressFinder/core.2040)## If you would like to submit a bug report, please visit:# mailto:dragonwell_use@googlegroups.com# The crash happened outside the Java Virtual Machine in native code.# See problematic frame for where to report the bug.#--------------- S U M M A R Y ------------Command Line: -javaagent:/home/runner/.m2/repository/org/jacoco/org.jacoco.agent/0.8.13/org.jacoco.agent-0.8.13-runtime.jar=destfile=/home/runner/work/BitcoinAddressFinder/BitcoinAddressFinder/target/jacoco.exec -Xmx2g -Xms1g --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/jdk.internal.ref=ALL-UNNAMED --add-opens=java.base/jdk.internal.misc=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=jdk.management/com.sun.management.internal=ALL-UNNAMED /home/runner/work/BitcoinAddressFinder/BitcoinAddressFinder/target/surefire/surefirebooter-20250409134947304_28.jar /home/runner/work/BitcoinAddressFinder/BitcoinAddressFinder/target/surefire 2025-04-09T13-49-47_148-jvmRun1 surefire-20250409134947304_26tmp surefire_3-20250409134947304_27tmpHost: AMD EPYC 7763 64-Core Processor, 4 cores, 15G, Ubuntu 24.04.2 LTSTime: Wed Apr 9 13:51:19 2025 UTC elapsed time: 65.874255 seconds (0d 0h 1m 5s)--------------- T H R E A D ---------------Current thread (0x00007fb27820c2d0): JavaThread "pool-3-thread-31" [_thread_in_native, id=2168, stack(0x00007fb2ed5e7000,0x00007fb2ed6e7000) (1024K)]Stack: [0x00007fb2ed5e7000,0x00007fb2ed6e7000], sp=0x00007fb2ed6e5250, free space=1016kNative frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)C [lmdbjava-native-library-6392604367366445385.so+0x595e] mdb_txn_renew0+0x13eC [lmdbjava-native-library-6392604367366445385.so+0x5eec] mdb_txn_begin+0x26cC 0x00007fb36ca137ecJava frames: (J=compiled Java code, j=interpreted, Vv=VM code)J 2752 org.lmdbjava.Library$Lmdb$jnr$ffi$0.mdb_txn_begin$jni$29(JJIJ)I (0 bytes) @ 0x00007fb3543c0942 [0x00007fb3543c08e0+0x0000000000000062]J 3664 c2 net.ladenthin.bitcoinaddressfinder.ConsumerJava.containsAddress(Ljava/nio/ByteBuffer;)Z (213 bytes) @ 0x00007fb35446f560 [0x00007fb35446d260+0x0000000000002300]J 3891% c1 net.ladenthin.bitcoinaddressfinder.ConsumerJava.consumeKeys(Ljava/nio/ByteBuffer;)V (1373 bytes) @ 0x00007fb34cd57f84 [0x00007fb34cd572a0+0x0000000000000ce4]
Try 3 - Fails
@OverridepublicbooleancontainsAddress(ByteBufferhash160) {Txn<ByteBuffer>txn;synchronized (env) {txn =env.txnRead(); }try (txn) {ByteBufferbyteBuffer =lmdb_h160ToAmount.get(txn,hash160);returnbyteBuffer !=null; }}
crashes also
## A fatal error has been detected by the Java Runtime Environment:## SIGSEGV (0xb) at pc=0x00007ff3940e695e, pid=2026, tid=2134## JRE version: OpenJDK Runtime Environment Zulu21.40+17-CRaC-CA (21.0.6+7) (build 21.0.6+7-LTS)# Java VM: OpenJDK 64-Bit Server VM Zulu21.40+17-CRaC-CA (21.0.6+7-LTS, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)# Problematic frame:# C [lmdbjava-native-library-1362637522283092738.so+0x595e] mdb_txn_renew0+0x13e## Core dump will be written. Default location: Core dumps may be processed with "/usr/lib/systemd/systemd-coredump %P %u %g %s %t 9223372036854775808 %h" (or dumping to /home/runner/work/BitcoinAddressFinder/BitcoinAddressFinder/core.2026)## If you would like to submit a bug report, please visit:# http://www.azul.com/support/# The crash happened outside the Java Virtual Machine in native code.# See problematic frame for where to report the bug.#--------------- S U M M A R Y ------------Command Line: -javaagent:/home/runner/.m2/repository/org/jacoco/org.jacoco.agent/0.8.13/org.jacoco.agent-0.8.13-runtime.jar=destfile=/home/runner/work/BitcoinAddressFinder/BitcoinAddressFinder/target/jacoco.exec -Xmx2g -Xms1g --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/jdk.internal.ref=ALL-UNNAMED --add-opens=java.base/jdk.internal.misc=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=jdk.management/com.sun.management.internal=ALL-UNNAMED /home/runner/work/BitcoinAddressFinder/BitcoinAddressFinder/target/surefire/surefirebooter-20250409144845698_28.jar /home/runner/work/BitcoinAddressFinder/BitcoinAddressFinder/target/surefire 2025-04-09T14-48-45_583-jvmRun1 surefire-20250409144845698_26tmp surefire_3-20250409144845698_27tmpHost: AMD EPYC 7763 64-Core Processor, 4 cores, 15G, Ubuntu 24.04.2 LTSTime: Wed Apr 9 14:50:17 2025 UTC elapsed time: 65.782065 seconds (0d 0h 1m 5s)--------------- T H R E A D ---------------Current thread (0x00007ff2a0244340): JavaThread "pool-3-thread-12" [_thread_in_native, id=2134, stack(0x00007ff362ef0000,0x00007ff362ff0000) (1024K)]Stack: [0x00007ff362ef0000,0x00007ff362ff0000], sp=0x00007ff362fee3a0, free space=1016kNative frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)C [lmdbjava-native-library-1362637522283092738.so+0x595e] mdb_txn_renew0+0x13eC [lmdbjava-native-library-1362637522283092738.so+0x5eec] mdb_txn_begin+0x26cC 0x00007ff3982137ecJava frames: (J=compiled Java code, j=interpreted, Vv=VM code)J 2855 org.lmdbjava.Library$Lmdb$jnr$ffi$0.mdb_txn_begin$jni$29(JJIJ)I (0 bytes) @ 0x00007ff380422d82 [0x00007ff380422d20+0x0000000000000062]J 3842 c2 net.ladenthin.bitcoinaddressfinder.ConsumerJava.containsAddress(Ljava/nio/ByteBuffer;)Z (213 bytes) @ 0x00007ff38053d2b8 [0x00007ff38053c060+0x0000000000001258]J 4064% c2 net.ladenthin.bitcoinaddressfinder.ConsumerJava.consumeKeys(Ljava/nio/ByteBuffer;)V (1373 bytes) @ 0x00007ff38057c3f0 [0x00007ff38057c1a0+0x0000000000000250]j net.ladenthin.bitcoinaddressfinder.ConsumerJava.consumeKeysRunner()V+97j net.ladenthin.bitcoinaddressfinder.ConsumerJava.lambda$startConsumer$1()Ljava/lang/Void;+8j net.ladenthin.bitcoinaddressfinder.ConsumerJava$$Lambda+0x00007ff3181a7318.call()Ljava/lang/Object;+4
Usually works, but crashes in about 1 out of 100 runs
A crash has been observed with this version:
@OverridepublicsynchronizedbooleancontainsAddress(ByteBufferhash160) {try (Txn<ByteBuffer>txn =env.txnRead()) {ByteBufferbyteBuffer =lmdb_h160ToAmount.get(txn,hash160);returnbyteBuffer !=null; }}
surefire-reports-ubuntu-latest-corretto-java21_synchronized_method.zip
And with this version also:
Synchronize both env.txnRead() and get()
@OverridepublicbooleancontainsAddress(ByteBufferhash160) {Txn<ByteBuffer>txn;synchronized (env) {txn =env.txnRead();try (txn) {ByteBufferbyteBuffer =lmdb_h160ToAmount.get(txn,hash160);returnbyteBuffer !=null; } }}
surefire-reports-ubuntu-latest-microsoft-java21_synchronized_env.zip
This strongly suggests the issue is not purely a concurrency problem.
It may indicate a deeper issue in the implementation of one or more of the following:
env.txnRead()
Dbi.get(txn, key)
- Shared JNI-backed structures withinlmdbjava
Questions
Before I try building a minimal reproducer, I’d appreciate clarification on:
- ❓ Is using ashared
Dbi<ByteBuffer>
across threads supported? - ❓ Is
env.txnRead()
thread-safe? - ❓ Are there known issues when mixing many parallel
Txn
instances with a sharedDbi
? - ❓ Is my opening for read only correct? Thread safe?https://github.com/bernardladenthin/BitcoinAddressFinder/blob/main/src/main/java/net/ladenthin/bitcoinaddressfinder/persistence/lmdb/LMDBPersistence.java#L114 It is documented herehttps://github.com/lmdbjava/lmdbjava/wiki/Concurrency
Notes
- I've ruled out key buffer corruption, shared buffer issues, or GC-related problems.
- I’ve seen this crash in different JVMs and platforms (Linux, macOS, Windows) via GitHub Actions.
- I can provide logs, dump files, or a stripped-down reproducer if needed.
- Related issue in my project with logs attached:
BitcoinAddressFinder Issue #50