Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Potential race condition or concurrency issue with env.txnRead() or dbi.get() under heavy multithreaded access #253

Open
@bernardladenthin

Description

@bernardladenthin

Summary

When usinglmdbjava in a multithreaded environment (~32 threads), I experiencesporadic native crashes (SIGSEGV) across various JVMs and OSes during my CI matrix builds. The crashes consistently point to the native methodmdb_txn_renew0, invoked duringenv.txnRead() oror dbi.get().

Environment

  • Java versions: Various (e.g., Corretto 21.0.6.7.1, Temurin 21.0.6+7)
  • OS: Linux, Windows, macOS (GitHub Actions runners)
  • LMDBJava version: Latest via Maven (0.9.1)
  • Concurrency: ~32 threads executingtxnRead() andget() in tight loops
  • Frequency: Millions of calls across all threads
  • Crash frequency: Sporadic (impossible to reproduce locally, but more frequent in CI)

Crash Snippet example

# A fatal error has been detected by the Java Runtime Environment:#  SIGSEGV (0xb) at pc=0x00007f62e442b95e, pid=2053, tid=2149# Problematic frame:# C  [lmdbjava-native-library-8897859339362759578.so+0x595e]  mdb_txn_renew0+0x13e...Stack: [0x00007f62adddf000,0x00007f62adedf000],  sp=0x00007f62adedd290,  free space=1016kNative frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)C  [lmdbjava-native-library-8897859339362759578.so+0x595e]  mdb_txn_renew0+0x13eC  [lmdbjava-native-library-8897859339362759578.so+0x5eec]  mdb_txn_begin+0x26cC  0x00007f62e44257ecJava frames: (J=compiled Java code, j=interpreted, Vv=VM code)J 2884  org.lmdbjava.Library$Lmdb$jnr$ffi$0.mdb_txn_begin$jni$29(JJIJ)I (0 bytes) @ 0x00007f62cc409b02 [0x00007f62cc409aa0+0x0000000000000062]J 3902 c2 net.ladenthin.bitcoinaddressfinder.ConsumerJava.containsAddress(Ljava/nio/ByteBuffer;)Z (213 bytes) @ 0x00007f62cc516f70 [0x00007f62cc5159c0+0x00000000000015b0]J 3954% c1 net.ladenthin.bitcoinaddressfinder.ConsumerJava.consumeKeys(Ljava/nio/ByteBuffer;)V (1373 bytes) @ 0x00007f62c4f17b4c [0x00007f62c4f172a0+0x00000000000008ac]

Usage Pattern

I use the following method in high-throughput read scenarios:

@OverridepublicbooleancontainsAddress(ByteBufferhash160) {try (Txn<ByteBuffer>txn =env.txnRead()) {ByteBufferbyteBuffer =lmdb_h160ToAmount.get(txn,hash160);returnbyteBuffer !=null;    }}
  • hash160 isnot shared across threads (it’s either thread-local, duplicated, or allocated per call).
  • TheDbi<ByteBuffer> (lmdb_h160ToAmount) andEnv<ByteBuffer> are shared and initialized once.

I’ve tried:

  • hash160.duplicate().rewind()
  • Allocating a freshByteBuffer.allocateDirect() and copying the contents

None of this resolves the crash.


What I'm Doing

In my code, I'm callingenv.txnRead() to create a read-only transaction and then immediately passing it tolmdb_h160ToAmount.get(txn, hash160) to check for the existence of a key. This is part of a high-throughput loop running in about 32 threads.

I'm currently trying to determine whether the native crash originates from theenv.txnRead() call itself or from the subsequentget(txn, hash160) operation. Both calls are used extensively under heavy concurrent load, and isolating the failing component has been difficult due to the sporadic nature of the issue.

Try 1 - Fails

I tried synchronizing only theget(txn, hash160) call (while keepingenv.txnRead() outside the synchronized block). Even with this partial synchronization, I experienced a native crash during CI testing.

@OverridepublicbooleancontainsAddress(ByteBufferhash160) {try (Txn<ByteBuffer>txn =env.txnRead()) {synchronized (this) {ByteBufferbyteBuffer =lmdb_h160ToAmount.get(txn,hash160);returnbyteBuffer !=null;        }    }}
Error:  Crashed tests:Error:  net.ladenthin.bitcoinaddressfinder.LMDBPersistencePerformanceTestError:  org.apache.maven.surefire.booter.SurefireBooterForkException: ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?Error:  Command was cmd.exe /X /C "C:\hostedtoolcache\windows\Java_Microsoft_jdk\21.0.2\x64\bin\java -javaagent:C:\\Users\\runneradmin\\.m2\\repository\\org\\jacoco\\org.jacoco.agent\\0.8.13\\org.jacoco.agent-0.8.13-runtime.jar=destfile=D:\\a\\BitcoinAddressFinder\\BitcoinAddressFinder\\target\\jacoco.exec -Xmx2g -Xms1g --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/jdk.internal.ref=ALL-UNNAMED --add-opens=java.base/jdk.internal.misc=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=jdk.management/com.sun.management.internal=ALL-UNNAMED -jar C:\Users\runneradmin\AppData\Local\Temp\surefire10771212932124905688\surefirebooter-20250409125554748_110.jar C:\Users\runneradmin\AppData\Local\Temp\surefire10771212932124905688 2025-04-09T12-55-54_486-jvmRun1 surefire-20250409125554748_108tmp surefire_14-20250409125554748_109tmp"Error:  Error occurred in starting fork, check output in logError:  Process Exit Code: -1073741819Error:  Crashed tests:Error:  net.ladenthin.bitcoinaddressfinder.LMDBPersistencePerformanceTest

Try 2 - Fails

This code is crashing, which shows me more or less thatenv.txnRead() has a problem with concurrent execution.

@OverridepublicbooleancontainsAddress(ByteBufferhash160) {try (Txn<ByteBuffer>txn =env.txnRead()) {synchronized (this) {ByteBufferbyteBuffer =lmdb_h160ToAmount.get(txn,hash160);returnbyteBuffer !=null;        }    }}
## A fatal error has been detected by the Java Runtime Environment:##  SIGSEGV (0xb) at pc=0x00007fb36801295e, pid=2040, tid=2168## JRE version: OpenJDK Runtime Environment (Alibaba Dragonwell Extended Edition)-21.0.6.0.6+7-GA (21.0.6) (build 21.0.6.0.6)# Java VM: OpenJDK 64-Bit Server VM (Alibaba Dragonwell Extended Edition)-21.0.6.0.6+7-GA (21.0.6.0.6, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)# Problematic frame:# C  [lmdbjava-native-library-6392604367366445385.so+0x595e]  mdb_txn_renew0+0x13e## Core dump will be written. Default location: Core dumps may be processed with "/usr/lib/systemd/systemd-coredump %P %u %g %s %t 9223372036854775808 %h" (or dumping to /home/runner/work/BitcoinAddressFinder/BitcoinAddressFinder/core.2040)## If you would like to submit a bug report, please visit:#   mailto:dragonwell_use@googlegroups.com# The crash happened outside the Java Virtual Machine in native code.# See problematic frame for where to report the bug.#---------------  S U M M A R Y ------------Command Line: -javaagent:/home/runner/.m2/repository/org/jacoco/org.jacoco.agent/0.8.13/org.jacoco.agent-0.8.13-runtime.jar=destfile=/home/runner/work/BitcoinAddressFinder/BitcoinAddressFinder/target/jacoco.exec -Xmx2g -Xms1g --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/jdk.internal.ref=ALL-UNNAMED --add-opens=java.base/jdk.internal.misc=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=jdk.management/com.sun.management.internal=ALL-UNNAMED /home/runner/work/BitcoinAddressFinder/BitcoinAddressFinder/target/surefire/surefirebooter-20250409134947304_28.jar /home/runner/work/BitcoinAddressFinder/BitcoinAddressFinder/target/surefire 2025-04-09T13-49-47_148-jvmRun1 surefire-20250409134947304_26tmp surefire_3-20250409134947304_27tmpHost: AMD EPYC 7763 64-Core Processor, 4 cores, 15G, Ubuntu 24.04.2 LTSTime: Wed Apr  9 13:51:19 2025 UTC elapsed time: 65.874255 seconds (0d 0h 1m 5s)---------------  T H R E A D  ---------------Current thread (0x00007fb27820c2d0):  JavaThread "pool-3-thread-31"        [_thread_in_native, id=2168, stack(0x00007fb2ed5e7000,0x00007fb2ed6e7000) (1024K)]Stack: [0x00007fb2ed5e7000,0x00007fb2ed6e7000],  sp=0x00007fb2ed6e5250,  free space=1016kNative frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)C  [lmdbjava-native-library-6392604367366445385.so+0x595e]  mdb_txn_renew0+0x13eC  [lmdbjava-native-library-6392604367366445385.so+0x5eec]  mdb_txn_begin+0x26cC  0x00007fb36ca137ecJava frames: (J=compiled Java code, j=interpreted, Vv=VM code)J 2752  org.lmdbjava.Library$Lmdb$jnr$ffi$0.mdb_txn_begin$jni$29(JJIJ)I (0 bytes) @ 0x00007fb3543c0942 [0x00007fb3543c08e0+0x0000000000000062]J 3664 c2 net.ladenthin.bitcoinaddressfinder.ConsumerJava.containsAddress(Ljava/nio/ByteBuffer;)Z (213 bytes) @ 0x00007fb35446f560 [0x00007fb35446d260+0x0000000000002300]J 3891% c1 net.ladenthin.bitcoinaddressfinder.ConsumerJava.consumeKeys(Ljava/nio/ByteBuffer;)V (1373 bytes) @ 0x00007fb34cd57f84 [0x00007fb34cd572a0+0x0000000000000ce4]

Try 3 - Fails

@OverridepublicbooleancontainsAddress(ByteBufferhash160) {Txn<ByteBuffer>txn;synchronized (env) {txn =env.txnRead();    }try (txn) {ByteBufferbyteBuffer =lmdb_h160ToAmount.get(txn,hash160);returnbyteBuffer !=null;    }}

crashes also

## A fatal error has been detected by the Java Runtime Environment:##  SIGSEGV (0xb) at pc=0x00007ff3940e695e, pid=2026, tid=2134## JRE version: OpenJDK Runtime Environment Zulu21.40+17-CRaC-CA (21.0.6+7) (build 21.0.6+7-LTS)# Java VM: OpenJDK 64-Bit Server VM Zulu21.40+17-CRaC-CA (21.0.6+7-LTS, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)# Problematic frame:# C  [lmdbjava-native-library-1362637522283092738.so+0x595e]  mdb_txn_renew0+0x13e## Core dump will be written. Default location: Core dumps may be processed with "/usr/lib/systemd/systemd-coredump %P %u %g %s %t 9223372036854775808 %h" (or dumping to /home/runner/work/BitcoinAddressFinder/BitcoinAddressFinder/core.2026)## If you would like to submit a bug report, please visit:#   http://www.azul.com/support/# The crash happened outside the Java Virtual Machine in native code.# See problematic frame for where to report the bug.#---------------  S U M M A R Y ------------Command Line: -javaagent:/home/runner/.m2/repository/org/jacoco/org.jacoco.agent/0.8.13/org.jacoco.agent-0.8.13-runtime.jar=destfile=/home/runner/work/BitcoinAddressFinder/BitcoinAddressFinder/target/jacoco.exec -Xmx2g -Xms1g --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/jdk.internal.ref=ALL-UNNAMED --add-opens=java.base/jdk.internal.misc=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=jdk.management/com.sun.management.internal=ALL-UNNAMED /home/runner/work/BitcoinAddressFinder/BitcoinAddressFinder/target/surefire/surefirebooter-20250409144845698_28.jar /home/runner/work/BitcoinAddressFinder/BitcoinAddressFinder/target/surefire 2025-04-09T14-48-45_583-jvmRun1 surefire-20250409144845698_26tmp surefire_3-20250409144845698_27tmpHost: AMD EPYC 7763 64-Core Processor, 4 cores, 15G, Ubuntu 24.04.2 LTSTime: Wed Apr  9 14:50:17 2025 UTC elapsed time: 65.782065 seconds (0d 0h 1m 5s)---------------  T H R E A D  ---------------Current thread (0x00007ff2a0244340):  JavaThread "pool-3-thread-12"        [_thread_in_native, id=2134, stack(0x00007ff362ef0000,0x00007ff362ff0000) (1024K)]Stack: [0x00007ff362ef0000,0x00007ff362ff0000],  sp=0x00007ff362fee3a0,  free space=1016kNative frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)C  [lmdbjava-native-library-1362637522283092738.so+0x595e]  mdb_txn_renew0+0x13eC  [lmdbjava-native-library-1362637522283092738.so+0x5eec]  mdb_txn_begin+0x26cC  0x00007ff3982137ecJava frames: (J=compiled Java code, j=interpreted, Vv=VM code)J 2855  org.lmdbjava.Library$Lmdb$jnr$ffi$0.mdb_txn_begin$jni$29(JJIJ)I (0 bytes) @ 0x00007ff380422d82 [0x00007ff380422d20+0x0000000000000062]J 3842 c2 net.ladenthin.bitcoinaddressfinder.ConsumerJava.containsAddress(Ljava/nio/ByteBuffer;)Z (213 bytes) @ 0x00007ff38053d2b8 [0x00007ff38053c060+0x0000000000001258]J 4064% c2 net.ladenthin.bitcoinaddressfinder.ConsumerJava.consumeKeys(Ljava/nio/ByteBuffer;)V (1373 bytes) @ 0x00007ff38057c3f0 [0x00007ff38057c1a0+0x0000000000000250]j  net.ladenthin.bitcoinaddressfinder.ConsumerJava.consumeKeysRunner()V+97j  net.ladenthin.bitcoinaddressfinder.ConsumerJava.lambda$startConsumer$1()Ljava/lang/Void;+8j  net.ladenthin.bitcoinaddressfinder.ConsumerJava$$Lambda+0x00007ff3181a7318.call()Ljava/lang/Object;+4

Usually works, but crashes in about 1 out of 100 runs

A crash has been observed with this version:

@OverridepublicsynchronizedbooleancontainsAddress(ByteBufferhash160) {try (Txn<ByteBuffer>txn =env.txnRead()) {ByteBufferbyteBuffer =lmdb_h160ToAmount.get(txn,hash160);returnbyteBuffer !=null;    }}

surefire-reports-ubuntu-latest-corretto-java21_synchronized_method.zip

And with this version also:

Synchronize both env.txnRead() and get()

@OverridepublicbooleancontainsAddress(ByteBufferhash160) {Txn<ByteBuffer>txn;synchronized (env) {txn =env.txnRead();try (txn) {ByteBufferbyteBuffer =lmdb_h160ToAmount.get(txn,hash160);returnbyteBuffer !=null;        }    }}

surefire-reports-ubuntu-latest-microsoft-java21_synchronized_env.zip

This strongly suggests the issue is not purely a concurrency problem.
It may indicate a deeper issue in the implementation of one or more of the following:

  • env.txnRead()
  • Dbi.get(txn, key)
  • Shared JNI-backed structures withinlmdbjava

Questions

Before I try building a minimal reproducer, I’d appreciate clarification on:


Notes

  • I've ruled out key buffer corruption, shared buffer issues, or GC-related problems.
  • I’ve seen this crash in different JVMs and platforms (Linux, macOS, Windows) via GitHub Actions.
  • I can provide logs, dump files, or a stripped-down reproducer if needed.
  • Related issue in my project with logs attached:
    BitcoinAddressFinder Issue #50

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions


      [8]ページ先頭

      ©2009-2025 Movatter.jp