Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

OOM on Large C Project & Incomplete File Tracing#20181

Unanswered
Roarcannotprogramming asked this question inQ&A
Discussion options

I am using CodeQL to analyze a very large, private C project and have encountered a couple of issues. I would be grateful for any help or guidance you can provide.

Problem 1: Out-of-Memory (OOM) During Database Creation

When I attempt to create a CodeQL database from the project's root directory, the process fails with an Out-of-Memory (OOM) error. This error occursbefore the actual compilation phase begins.

Observations:

  • The project is extremely large.
  • The machine has over 40GB of free RAM at the time of the OOM error.
  • As a workaround, I can successfully create a database without an OOM error if I set the--source-root to a smaller, specific subdirectory that I am interested in.

Questions:

  1. What pre-build operations does the CodeQL CLI perform that could consume so much memory, even when significant physical RAM is available?
  2. What is the recommended best practice or workflow for creating a database for a very large C project to avoid these OOM issues?

Problem 2: Incomplete Tracing for Most Compiled Files

I've noticed that most C files are not being fully analyzed, even though they are part of the compilation.

Observations:

  • A specific file, let's call itexample.c, is definitely being compiled.
  • I have inspected thebuild-tracer.log and can confirm that the CodeQL tracer has captured its compilation process. The log contains the completeinvocation,Command, andProcessed command line forexample.c.
  • This issue affects the majority of the files in the project. For these files, it seems that none of their internal contents (functions, expressions, variables, etc.) have been extracted.
  • When I query the database, the only result related toexample.c is its file-level location (example.c:0:0:0:0). A small number of other files are partially traced, where I can identify some expressions (expr), but the analysis is still far from complete.

For example, running the following query on the database highlights this issue:

fromLocatablelocb,Locationlocwherelocb.getLocation()=locselectlocb,loc

The query shows that for files likeexample.c, only the file itself is located, with no deeper elements available.

Question:

  • Why might the contents of most files not be extracted into the database, even though their build commands were successfully captured by the build tracer?
You must be logged in to vote

Replies: 3 comments 2 replies

Comment options

Update:

After a closer look atbuild-tracer.log, I have a new finding that might be the root cause ofProblem 2.

It appears that CodeQL is not tracing thellvm-ar commands that are used to archive.o object files (which are LLVM IR bitcode in my case) into static libraries (.a files).

Here's what I found in the log:

  1. The log shows the final linking step where the static library is used, but it also explicitly states that the archive file is being excluded:

    Command: /path/to/codeql/cpp/tools/linux64/extractor -mimic /path/to/clang-15 -o example.elf ... -Wl,--whole-archive /path/to/example.a ... excluded /path/to/build/example.a because it is an object
  2. I have searched the entirebuild-tracer.log and cannot find any trace of thellvm-ar command itself being executed.

  3. Furthermore, there are no log entries that contain both the archive name (example.a) and the object file name (example.o) together, which I would expect to see during an archiving step.

This leads me to believe that if the creation of static libraries isn't traced, the extractor might not know which object files are contained within them, and therefore fails to analyze the corresponding source files. Could this be the reason why the contents ofexample.c and other files are missing from the database?

Is this expected behavior, or is there a specific configuration required to ensure thatllvm-ar (or the archiver in general) is traced correctly?

You must be logged in to vote
0 replies
Comment options

Hello, let me ask a couple of questions to better clarify your scenario:

  • which command line did you use for the database creation?
  • do you have any more detail on the type of the OOM error?
You must be logged in to vote
0 replies
Comment options

To followup on@esteffin questions,
Could you clarify if this is a JavaOutOfMemoryError exception raised or the system triggering an OOM situation? If the former, then you can also try to increase the ram available for the evaluator and JVM using thecodeql resolve ram option.

The CLI will do a scan of all source code to count lines of code before starting the build. This can be suppressed with--no-calculate-baseline argument to the CodeQL CLI.

If you have purchased Github Advanced Security, I recommend that you also reach out to support.

You must be logged in to vote
2 replies
@Roarcannotprogramming
Comment options

Thanks for the suggestion! Using--no-calculate-baseline did resolve the OOM error.

However, Problem 2 is still a major blocker. I've analyzed thebuild-trace.log again and found that it's filled with a large number of errors. For example:

[E 12:35:09 867372] Warning[extractor-c++]: In construct_text_message: "/path/to/xxx.c", line 56: error: expected a ";"

and

[E 12:35:09 867373] Warning[extractor-c++]: In construct_text_message: "/path/to/xxx.c", line 56: error: identifier "__u64" is undefined

There are many other variations as well, such as"a declaration here must declare a parameter" and"incomplete type xxx is not allowed".

These error messages seem to correspond directly to thelocb results from the query:

fromLocatablelocb,Locationlocwherelocb.getLocation()=locselectlocb,loc

For instance, one of the results isexpected a ';', file:///path/to/xxx.c:56:1:56:1.

The log file containing these extractor warnings is about 1.7G in size. This leads me to a question: is it possible that the CodeQL C++ extractor is failing to parse the syntax correctly? This is strange because my project compiles successfully using the specified build command.

Any insights on why the extractor would report so many syntax errors on a codebase that compiles cleanly would be greatly appreciated.

@andersfugmann
Comment options

Happy to hear that the first part got resolved.
For the 2. part, could you copy the command used to invoke CodeQL? You also mention use of Clang-15 - is that the only compiler used in the project for compiling C/C++ code? To better understand what is going it would also be very helpful if you can create and share a test case that produces the problem.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Category
Q&A
Labels
None yet
3 participants
@Roarcannotprogramming@andersfugmann@esteffin

[8]ページ先頭

©2009-2025 Movatter.jp