Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Support ignoring generated files#3318

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
mbg wants to merge4 commits intomain
base:main
Choose a base branch
Loading
frommbg/ignore-generated
Open

Conversation

@mbg
Copy link
Member

@mbgmbg commentedNov 19, 2025

This PR adds experimental support for excluding files that are marked aslinguist-generated=true in a.gitattributes file from analysis.

Risk assessment

For internal use only. Please select the risk level of this change:

  • Low risk: Changes are fully under feature flags, or have been fully tested and validated in pre-production environments and are highly observable, or are documentation or test only.

Which use cases does this change impact?

Workflow types:

  • Managed - Impacts users withdynamic workflows (Default Setup, CCR, ...).

Products:

  • CCR - The changes impact analyses for Copilot Code Reviews.

Environments:

  • Dotcom - Impacts CodeQL workflows ongithub.com.

How did/will you validate this change?

  • Test repository - This change will be tested on a test repository before merging.
  • Unit tests - I am depending on unit test coverage (i.e. tests in.test.ts files).
  • End-to-end tests - I am depending on PR checks (i.e. tests inpr-checks).

If something goes wrong after this change is released, what are the mitigation and rollback strategies?

  • Feature flags - All new or changed code paths can be fully disabled with corresponding feature flags.

How will you know if something goes wrong after this change is released?

  • Telemetry - I rely on existing telemetry or have made changes to the telemetry.
    • Dashboards - I will watch relevant dashboards for issues after the release. Consider whether this requires this change to be released at a particular time rather than as part of a regular release.
    • Alerts - New or existing monitors will trip if something goes wrong with this change.

Are there any special considerations for merging or releasing this change?

  • No special considerations - This change can be merged at any time.

Merge / deployment checklist

  • Confirm this change is backwards compatible with existing workflows.
  • Consider adding achangelog entry for this change.
  • Confirm thereadme and docs have been updated if necessary.

@github-actionsgithub-actionsbot added the size/MShould be of average difficulty to review labelNov 19, 2025
@mbgmbg marked this pull request as ready for reviewNovember 19, 2025 19:44
@mbgmbg requested a review froma team as acode ownerNovember 19, 2025 19:44
CopilotAI review requested due to automatic review settingsNovember 19, 2025 19:44
Copy link
Contributor

CopilotAI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Pull Request Overview

This PR adds experimental support for excluding generated files (marked withlinguist-generated=true in.gitattributes) from CodeQL analysis. The feature is controlled by theignore_generated_files feature flag and is automatically enabled for Copilot Code Reviews (CCR).

Key Changes

  • New git utility functions (listFiles,getGeneratedFiles) to identify generated files via git attributes
  • NewisCCR() helper function to detect CCR execution context
  • Integration into config initialization to add generated files topaths-ignore

Reviewed Changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 12 comments.

Show a summary per file
FileDescription
src/git-utils.tsAdds functions to list tracked files and identify files marked as generated via linguist-generated attribute
src/git-utils.test.tsAdds unit tests for the new git utility functions
src/feature-flags.tsDefines the IgnoreGeneratedFiles feature flag with environment variable CODEQL_ACTION_IGNORE_GENERATED_FILES
src/config-utils.tsIntegrates generated file detection into config initialization, adding them to paths-ignore when feature is enabled or in CCR
src/actions-util.tsAdds isCCR() detection function and modifies isDefaultSetup() to exclude CCR scenarios
src/actions-util.test.tsAdds tests for the new workflow detection functions
lib/*.jsAuto-generated JavaScript code mirroring TypeScript changes (not reviewed per guidelines)

Comment on lines +426 to +430
constfiles=awaitlistFiles(workingDirectory);
conststdout=awaitrunGitCommand(
workingDirectory,
["check-attr","linguist-generated","--", ...files],
"Unable to check attributes of files.",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Passing all files as command-line arguments via the spread operator could hit command-line length limits in repositories with very large numbers of files (e.g., tens of thousands). Consider usinggit check-attr --stdin instead, which reads file paths from stdin and avoids this limitation. This would involve piping the file list to git rather than passing it as arguments.

Copilot uses AI. Check for mistakes.
Comment on lines +397 to +407
test("listFiles returns array of file paths",async(t)=>{
sinon
.stub(gitUtils,"runGitCommand")
.resolves(["dir/file.txt","README.txt"].join(os.EOL));

awaitt.notThrowsAsync(async()=>{
constresult=awaitgitUtils.listFiles("/some/path");
t.is(result.length,2);
t.is(result[0],"dir/file.txt");
});
});

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The stub created in this test is not restored after the test completes. This could cause test pollution if other tests rely onrunGitCommand behavior. Consider wrapping the test in a try-finally block or usingt.teardown() to restore the stub, similar to how it's done in other tests in this file (e.g., lines 393-394).

Copilot uses AI. Check for mistakes.
["ls-files"],
"Unable to list tracked files.",
);
returnstdout.split(os.EOL);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Splitting onos.EOL will include an empty string at the end of the array if the git output ends with a newline (which is typical). This empty string will then be passed togit check-attr ingetGeneratedFiles. Consider filtering out empty strings:return stdout.split(os.EOL).filter((line) => line.length > 0);

Suggested change
returnstdout.split(os.EOL);
returnstdout.split(os.EOL).filter((line)=>line.length>0);

Copilot uses AI. Check for mistakes.

constgeneratedFiles:string[]=[];
constregex=/^([^:]+):linguist-generated:true$/;
for(constresultofstdout.split(os.EOL)){

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Splitting onos.EOL will include an empty string at the end of the array if the git output ends with a newline (which is typical). The regex won't match empty strings, but it's cleaner to filter them out explicitly:for (const result of stdout.split(os.EOL).filter((line) => line.length > 0)). This pattern is used elsewhere in the codebase (e.g., line 284 checksif (line) before processing).

Suggested change
for(constresultofstdout.split(os.EOL)){
for(constresultofstdout.split(os.EOL).filter((line)=>line.length>0)){

Copilot uses AI. Check for mistakes.
Comment on lines +423 to +443
exportasyncfunctiongetGeneratedFiles(
workingDirectory:string,
):Promise<string[]>{
constfiles=awaitlistFiles(workingDirectory);
conststdout=awaitrunGitCommand(
workingDirectory,
["check-attr","linguist-generated","--", ...files],
"Unable to check attributes of files.",
);

constgeneratedFiles:string[]=[];
constregex=/^([^:]+):linguist-generated:true$/;
for(constresultofstdout.split(os.EOL)){
constmatch=result.match(regex);
if(match){
generatedFiles.push(match[1]);
}
}

returngeneratedFiles;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Missing JSDoc documentation. Consider adding a comment explaining what this function does, its parameters, and return value. For example:/**\n * Returns a list of files marked as generated via the linguist-generated attribute in .gitattributes.\n * @param workingDirectory The directory to check for generated files.\n * @returns An array of file paths (relative to the working directory) that are marked as generated.\n */

Copilot uses AI. Check for mistakes.
Comment on lines +425 to +432
awaitt.notThrowsAsync(async()=>{
constresult=awaitgitUtils.getGeneratedFiles("/some/path");

t.assert(runGitCommandStub.calledTwice);

t.is(result.length,1);
t.is(result[0],"test.json");
});

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The stubrunGitCommandStub is not restored after the test completes. Add a try-finally block or uset.teardown() to ensure the stub is restored, similar to how it's done in other tests in this file (e.g., lines 393-394).

Suggested change
awaitt.notThrowsAsync(async()=>{
constresult=awaitgitUtils.getGeneratedFiles("/some/path");
t.assert(runGitCommandStub.calledTwice);
t.is(result.length,1);
t.is(result[0],"test.json");
});
try{
awaitt.notThrowsAsync(async()=>{
constresult=awaitgitUtils.getGeneratedFiles("/some/path");
t.assert(runGitCommandStub.calledTwice);
t.is(result.length,1);
t.is(result[0],"test.json");
});
}finally{
runGitCommandStub.restore();
}

Copilot uses AI. Check for mistakes.
Comment on lines +262 to +268
return(
(isDynamicWorkflow()&&
process.env["CODEQL_ACTION_ANALYSIS_KEY"]?.startsWith(
"dynamic/copilot-pull-request-reviewer",
))||
false
);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The logic can be simplified. The|| false is needed to convertundefined from optional chaining tofalse, but the outer parentheses are unnecessary. Consider:return isDynamicWorkflow() && (process.env["CODEQL_ACTION_ANALYSIS_KEY"]?.startsWith("dynamic/copilot-pull-request-reviewer") ?? false); This is clearer and uses the nullish coalescing operator which is more explicit about handlingundefined.

Suggested change
return(
(isDynamicWorkflow()&&
process.env["CODEQL_ACTION_ANALYSIS_KEY"]?.startsWith(
"dynamic/copilot-pull-request-reviewer",
))||
false
);
returnisDynamicWorkflow()&&
(process.env["CODEQL_ACTION_ANALYSIS_KEY"]?.startsWith("dynamic/copilot-pull-request-reviewer")??false);

Copilot uses AI. Check for mistakes.
Comment on lines +254 to +257
process.env.GITHUB_EVENT_NAME="dynamic";
t.assert(isDynamicWorkflow());
process.env.GITHUB_EVENT_NAME="push";
t.false(isDynamicWorkflow());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This test directly modifiesprocess.env without proper cleanup, which could cause test pollution. Consider usingwithMockedEnv (defined at line 30) to ensure environment variables are restored after the test, similar to how it's done in other tests in this file (e.g., lines 206-251).

Suggested change
process.env.GITHUB_EVENT_NAME="dynamic";
t.assert(isDynamicWorkflow());
process.env.GITHUB_EVENT_NAME="push";
t.false(isDynamicWorkflow());
withMockedEnv(
{GITHUB_EVENT_NAME:"dynamic"},
()=>{
t.assert(isDynamicWorkflow());
},
);
withMockedEnv(
{GITHUB_EVENT_NAME:"push"},
()=>{
t.false(isDynamicWorkflow());
},
);

Copilot uses AI. Check for mistakes.
Comment on lines +254 to +272
process.env.GITHUB_EVENT_NAME="dynamic";
t.assert(isDynamicWorkflow());
process.env.GITHUB_EVENT_NAME="push";
t.false(isDynamicWorkflow());
});

test("isCCR() returns true when expected",(t)=>{
process.env.GITHUB_EVENT_NAME="dynamic";
process.env.CODEQL_ACTION_ANALYSIS_KEY=
"dynamic/copilot-pull-request-reviewer";
t.assert(isCCR());
t.false(isDefaultSetup());
});

test("isDefaultSetup() returns true when expected",(t)=>{
process.env.GITHUB_EVENT_NAME="dynamic";
process.env.CODEQL_ACTION_ANALYSIS_KEY="dynamic/github-code-scanning";
t.assert(isDefaultSetup());
t.false(isCCR());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This test directly modifiesprocess.env without proper cleanup, which could cause test pollution. Consider usingwithMockedEnv (defined at line 30) to ensure environment variables are restored after the test, similar to how it's done in other tests in this file (e.g., lines 206-251).

Suggested change
process.env.GITHUB_EVENT_NAME="dynamic";
t.assert(isDynamicWorkflow());
process.env.GITHUB_EVENT_NAME="push";
t.false(isDynamicWorkflow());
});
test("isCCR() returns true when expected",(t)=>{
process.env.GITHUB_EVENT_NAME="dynamic";
process.env.CODEQL_ACTION_ANALYSIS_KEY=
"dynamic/copilot-pull-request-reviewer";
t.assert(isCCR());
t.false(isDefaultSetup());
});
test("isDefaultSetup() returns true when expected",(t)=>{
process.env.GITHUB_EVENT_NAME="dynamic";
process.env.CODEQL_ACTION_ANALYSIS_KEY="dynamic/github-code-scanning";
t.assert(isDefaultSetup());
t.false(isCCR());
withMockedEnv(
{GITHUB_EVENT_NAME:"dynamic"},
()=>{
t.assert(isDynamicWorkflow());
},
);
withMockedEnv(
{GITHUB_EVENT_NAME:"push"},
()=>{
t.false(isDynamicWorkflow());
},
);
});
test("isCCR() returns true when expected",(t)=>{
withMockedEnv(
{
GITHUB_EVENT_NAME:"dynamic",
CODEQL_ACTION_ANALYSIS_KEY:"dynamic/copilot-pull-request-reviewer",
},
()=>{
t.assert(isCCR());
t.false(isDefaultSetup());
},
);
});
test("isDefaultSetup() returns true when expected",(t)=>{
withMockedEnv(
{
GITHUB_EVENT_NAME:"dynamic",
CODEQL_ACTION_ANALYSIS_KEY:"dynamic/github-code-scanning",
},
()=>{
t.assert(isDefaultSetup());
t.false(isCCR());
},
);

Copilot uses AI. Check for mistakes.
}

/** Determines whether we are running in CCR. */
exportfunctionisCCR():boolean{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

What about introducing an environment variable we set in CCR, rather than relying on the analysis key?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

As discussed elsewhere, that is sort of what we are doing here already (and we decide on whether are in CCR in other places in the same way currently).

A better solution would be to add a new analysis kind (or similar) for CCR, but that would be more work and currently the same ascode-quality in essentially everything but name.

I'd suggest we should stick with this for the moment (since we also use the same approach elsewhere) and look at improving it longer-term.

exportasyncfunctiongetGeneratedFiles(
workingDirectory:string,
):Promise<string[]>{
constfiles=awaitlistFiles(workingDirectory);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This could potentially be a very large number of files, too many to pass on the command line.

If we're mainly interested in CCR, could we filter down to just the diff here?

Alternatively, we could parse globs from the.gitattributes file rather than finding all files that match. That would be more likely to contain one entry for a large directory rather than potentially hundreds.

Or we could add a limit on the number of files on which we'll runcheck-attr.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This could potentially be a very large number of files, too many to pass on the command line.

True. The files could be passed in viastdin as well I think, which might be a more robust solution.

If we're mainly interested in CCR, could we filter down to just the diff here?

Potentially. We should first look at whether there is actually enough of a performance hit for large repos for this to make sense though.

Alternatively, we could parse globs from the .gitattributes file rather than finding all files that match. That would be more likely to contain one entry for a large directory rather than potentially hundreds.

I considered this, but I'd like to avoid it if we can get the information fromgit itself so we don't have to try and duplicate what it does.

Comment on lines +423 to +425
exportasyncfunctiongetGeneratedFiles(
workingDirectory:string,
):Promise<string[]>{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Have you measured how long this operation takes overall on a large mono-repo?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I believe it should be relatively quick, since Git caches this information IIRC. That said, I haven't explicitly tested it on a large repo, but I can do that.

// If we are in CCR or the corresponding FF is enabled, try to determine
// which files in the repository are marked as generated and add them to
// the `paths-ignore` configuration.
if((awaitfeatures.getValue(Feature.IgnoreGeneratedFiles))||isCCR()){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'd advise rolling this out behind a feature flag first, even in CCR.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This was actually an&&, but I changed it for testing purposes 😅

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@henrymercerhenrymercerhenrymercer left review comments

Copilot code reviewCopilotCopilot left review comments

At least 1 approving review is required to merge this pull request.

Assignees

No one assigned

Labels

size/MShould be of average difficulty to review

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

3 participants

@mbg@henrymercer

[8]ページ先頭

©2009-2025 Movatter.jp