Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[WIP] Chromium OS fork's Portage Ebuild changes#2704

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
Kangie wants to merge5 commits intokoalaman:master
base:master
Choose a base branch
Loading
fromKangie:portage

Conversation

@Kangie
Copy link

Hi all,

I've been working on bringing theChromium OS fork's Portage Ebuild changes to base shellcheck. It seems to be working pretty well and I was wondering if there's any interest in merging portage support.

I have a few questions:

  1. What changes would need to be made to this patch to get it into shape to merge and maintain?
  2. Are there any open issues that would block merging this (possiblyfiner grained control over shell dialect features #1341)?

It's been a while since I worked in Haskell, but I'm willing to give it a try if there's interest. I would like to be able to use this to validate the Gentoo ebuild repository!

I will need to redo the rebase by cherry picking individual commits to do proper attribution. I also plan to redo the chromium-overlay-sourced python scripts so that they can be consistently licensed.

Please review this work-in-progress PR and let me know your thoughts.

Thanks for your time!

thesamesam, ionenwks, and hololeap reacted with eyes emoji
@koalaman
Copy link
Owner

Sorry, it's been a busy year. Can you give me some background for this? Why does Gentoo's build system need build time variables in ShellCheck?

@Kangie
Copy link
Author

No worries!

The short version is that Gentoo's package manager, Portage, usesebuilds which are very bash-like.

Google's ChromeOS (and its open-source flavour ChromiumOS) use Portage to maintain... some parts of the system; I've never had to dig too far into it. At some point a ChromeOS or ChromiumOS developer added functionality to their fork of shellcheck so that it could be used to lint ebuilds without an excessive amount of false positives. The fork is outdated and for fun (?) I decided to update the patches to run on a more modern version of shellcheck.

I'd like to use it as part of my Gentoo ebuild QA, and several developers and members of the community seem interested as well. The ChromiumOS code seems to work well enough so I thought I'd see about getting those changes merged rather than maintaininganother set of patches.

I guess my questions at this point are:

  1. Would you be interested in merging the ebuild functionality?
  2. What changes would you like to see so that this code is maintainable and sane?
  3. Are there any open issues that would prevent this from proceeding?

Please note that I don't intend to suggest that this PR be merged in its current state; the PR isjust a monolithic patch submitted for feedback with all of the ChromiumOS fork's commits rebased, modified, and squashed.

Cheers!

@hololeap
Copy link

The main problem that the patch seems to be solving is that shellcheck catches false positives on well-defined constants in.ebuild files:

$shellcheck persistent-2.14.5.0.ebuildIn persistent-2.14.5.0.ebuild line 1:#Copyright 1999-2023 Gentoo Authors^-- SC2148 (error): Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.In persistent-2.14.5.0.ebuild line 4:EAPI=8^--^ SC2034 (warning): EAPI appears unused. Verify use (or export if used externally).In persistent-2.14.5.0.ebuild line 9:CABAL_FEATURES="lib profile haddock hoogle hscolour test-suite"^------------^ SC2034 (warning): CABAL_FEATURES appears unused. Verify use (or export if used externally).In persistent-2.14.5.0.ebuild line 12:DESCRIPTION="Type-safe, multi-backend data serialization"^---------^ SC2034 (warning): DESCRIPTION appears unused. Verify use (or export if used externally).In persistent-2.14.5.0.ebuild line 13:HOMEPAGE="https://www.yesodweb.com/book/persistent"^------^ SC2034 (warning): HOMEPAGE appears unused. Verify use (or export if used externally).In persistent-2.14.5.0.ebuild line 15:LICENSE="MIT"^-----^ SC2034 (warning): LICENSE appears unused. Verify use (or export if used externally).In persistent-2.14.5.0.ebuild line 16:SLOT="0/${PV}"^--^ SC2034 (warning): SLOT appears unused. Verify use (or export if used externally).In persistent-2.14.5.0.ebuild line 17:KEYWORDS="~amd64 ~x86"^------^ SC2034 (warning): KEYWORDS appears unused. Verify use (or export if used externally).In persistent-2.14.5.0.ebuild line 43:DEPEND="${RDEPEND}^----^ SC2034 (warning): DEPEND appears unused. Verify use (or export if used externally).For more information:  https://www.shellcheck.net/wiki/SC2148 -- Tips depend on target shell and y...  https://www.shellcheck.net/wiki/SC2034 -- CABAL_FEATURES appears unused. Ve...

@koalaman
Copy link
Owner

Is the set of variables known in advanced? How far would one get by adding.ebuild as a synonym for.bash and a hard coded list of known variables?

@Kangie
Copy link
Author

Kangie commentedApr 30, 2023
edited
Loading

Hi@koalaman,

While ebuild/eclass variables do change over time, it is possible to extract them automatically (which is howsrc/ShellCheck/PortageAutoInternalVariables.hs is constructed).

As best I can what you've described is basicallywhat we're doing with this patch; set ebuilds to be an alias for bash script, hardcode.ebuild variables (which do not change often), add some special case handling for 'is this eclass inherited' so that none of the ENVVARS there interfere with standardbash parsing, and generate a list of eclass variables.

@koalaman
Copy link
Owner

My understanding was that it essentially adds a de facto build time dependency on Python and out-of-tree ChromeOS build files, and is then tailored to work on ChromeOS ebuilds. Is that not the case?

@Kangie
Copy link
Author

Kangie commentedApr 30, 2023
edited
Loading

adds a de facto build time dependency on Python and out-of-tree ChromeOS build files

I don't intend for the python scripts to remain in their current state - They're ripped directly from the ChromeOS/ChromiumOS ebuild repository and have a different licence from the rest of your code. It'll be easy enough to rework those down the track, if required, and probably even port them to Haskell. They were included in the WIP for completeness, and also because if you're not interested in merging a revised version of this PR then there's no point in doing that porting work!

It may also make more sense for these scripts to live in the Gentoo ebuild repository (associated with the shellcheck package) so that package maintainers / Gentoo users can use them to assist with providing updates upstream without the scripts being checked into VCS here.

Itdoes add a dependency on the Gentoo ebuild repository (which is just a git repo that contains text files). This could be handled in a few ways:

  • as part of the (manual) packaging process for a release the automated variable list could be updated by scripts
  • Gentoo users/packagers can assist with maintenance of the eclass and ebuild-specific variables via PRs (etc), handle some of the 'eclass variable' stuff downstream between releases, and accept that if we miss getting updates to you by the time you decide to release an update then that's what gets packaged.
  • Via CI/CD automation, with obvious benefits and disadvantages to trusting automation.

is then tailored to work on ChromeOS ebuilds

The intent here is to build against the Gentoo ebuild repository. Almost everything downstream of Gentoo will be able to take advantage of the changes as-is - they tend to mirror Gentoo eclasses at the very least. Some projects (Likely ChromeOS) may wish to regeneratesrc/ShellCheck/PortageAutoInternalVariables.hs if they use eclasses that have diverged from Gentoo, however as Portage is a source-based package management tool generating and applying these patches will be trivial; At the very least downstream distributions won't have to maintain the framework to enable this as a separate fork, only update the eclass variable list to suit their distribution.

I'm interested in hearing your thoughts. Thanks for your time this morning!

@koalaman
Copy link
Owner

I'm not opposed to merging a polished version of the functionality, the source and upkeep of the variable data is my only real concern. Could/should these be collected at runtime instead?

I imagine a significant motivator for the current approach was to simplify the patch and its rebasing -- which it definitely does -- rather than any benefits of collecting the data at compile time.

@Kangie
Copy link
Author

Sorry, got hit by a migraine then swamped with work!

I'll see if I can come up with a performant approach that will gather the vast majority of the variables from eclasses at runtime - basically everything that goes intosrc/ShellCheck/PortageAutoInternalVariables.hs. It'll probably be a few weeks to a month or so before I have a change to push new code; I'll ping you here when I think it's looking good.

@koalaman
Copy link
Owner

Sounds good! Don't worry too much about threading it through into the right places, I can help out with that. The main thing is a function inIO that takes an ebuild directory and parses out the necessary information.

thesamesam and Kangie reacted with heart emoji

@Kangie
Copy link
Author

Quick update, it's not forgotten I just moved halfway across a continent and hurt both of my arms.

@hololeap has made some good progress on getting us a runtime ebuild repository parser that (if I'm remembering our discussions properly) should be a lot more flexible when working with non-gentoo repositories.

I'll try and get back into this "soon", recovery permitting :)

@hololeap
Copy link

@koalaman I created a function that scans the system and creates anIO (Map String [String]), whereString is the name of the eclass and[String] is the list of variables that are inherited from it. TheMap should be able to plug in here (from the ChromiumOS code):

portageAutoInternalVariables

The problem is that I don't see anywhere to run in theIO monad. The closest entry point is all the way in the mainshellcheck.hs:

result<- checkScript sys checkspec

Ideally, the loaded file needs to be determined to be an ebuildbefore attempting to scan for eclasses, which could also fail on non-Gentoo systems. If found, the eclass data would then need to be passed deeper into the code, and I don't see an existing way to do that.

Uses the `portageq` command to scan for repositories, which in turn arescanned for eclasses, which are then scanned for eclass variables.The variables are scanned using a heuristic which looks for    "# @ECLASS_VARIABLE: "at the start of each line, which means only properly documentedvariables will be found.Signed-off-by: hololeap <hololeap@users.noreply.github.com>
@hololeap
Copy link

Creates a Map of eclass names to eclass variables by scanning thesystem for repositories and their respective eclasses. Runs `portageq`to determine repository names and locations. Emits a warning if anIOException is caught when attempting to run `portageq`.This Map is passed via CheckSpec to AnalysisSpec and finally toParameters, where it is read by `checkUnusedAssignments` in order todetermine which variables can be safely ignored by this check.Signed-off-by: hololeap <hololeap@users.noreply.github.com>
The Gentoo eclass list is now populated using pure Haskell. The oldpython generators and generated module are no longer needed.Signed-off-by: hololeap <hololeap@users.noreply.github.com>
Signed-off-by: hololeap <hololeap@users.noreply.github.com>
@hololeap
Copy link

272ef81 is my attempt to pass the scannedIO (Map EclassName [EclassVar]) intoParameters so that they can be read viacheckScript.

dfa920c adds a dependency onattoparsec andtext.attoparsec seems to perform roughly 2x better thanparsec.

Kangie reacted with rocket emoji

@Kangie
Copy link
Author

Kangie commentedAug 6, 2023
edited
Loading

I will test these changes when I'm home from dog show stuff, thanks for committing them here.

Runsportageq to determine repository names and locations. Emits a warning if an IOException is caught when attempting to runportageq.

Can we avoid the external dependency by reading up on theGentoo PMS?. It's probably sane to assume that Gentoo / Portage users will have access toportageq but I do wonder if that holds true for Paludis and Pkgcore.

Regardless it looks great and I can't wait to test it soon

@koalaman
Copy link
Owner

I haven't had a chance to look deeply into your changes yet, but here are my thoughts for plumbing:

0138a6f

This adds a functionsiGetPortageVariables to ShellCheck's SystemInterface, which is used for system access type functions like "read file by name". The data can now be fetched on demand, and the core functionality can be robustly tested on non-Portage systems.

Some thoughts:

  • It would be fairly simple to look for (static)inherit commands and pick the relevant keys from the map based on that. Is that the plan? Do eclass files inherit from each other so that this would need to be done recursively?
  • I'm not familiar with how portage repos are structured. Would one ideally include a class name so thatsiGetPortageVariables "foo" could only look atfoo.eclass or whatever, instead of parsing all files (which I imagine was mostly done in the proof-of-concept because it happened at compile time).
  • A final version probably won't need to depend on attoparsec. The current parser is likely slow due to unintentionally excessive lookaheads like intakeLine = manyTill anyChar (try endOfLine) (compared totakeLine = many $ noneOf "\n"). If it just reads line by line, then it may not warrant a monadic parser at all. This can all be sorted out later when things work. Same with improving robustness with regard to spaces and such.

@hololeap
Copy link

hololeap commentedAug 16, 2023
edited
Loading

Thanks for the attention on this. I can give some insight on your questions.


It would be fairly simple to look for (static) inherit commands and pick the relevant keys from the map based on that. Is that the plan? Do eclass files inherit from each other so that this would need to be done recursively?

The ChromiumOS code has some kind of functionality in place to check forinherit commands:

getInheritedEclasses::Token-> [String]
getInheritedEclasses root= execWriter$ doAnalysis findInheritedEclasses root
where
findInheritedEclasses cmd
| cmd`isCommand`"inherit"= tell$ catMaybes$ getLiteralString<$> (arguments cmd)
findInheritedEclasses _=return()

eclass filesdo inherit from each other, so it would make sense to do this recursively


I'm not familiar with how portage repos are structured. Would one ideally include a class name so thatsiGetPortageVariables "foo" could only look atfoo.eclass or whatever, instead of parsing all files (which I imagine was mostly done in the proof-of-concept because it happened at compile time).

The eclasses are stored in aeclass/ directory at the root of the repo. It isn't possible to tell which repo holds a given eclass based on the name, but the repos could be enumerated in the order of their priority, in order to look for a.eclass which has been inherited. The list of available repos on the system is quickly found using a portage-specific command:

--| Run @portageq@ to gather a list of repo names and paths, then scan each
-- one for eclasses and ultimately eclass metadata.
scanRepos::IO [Repository]
scanRepos=do
let cmd="/usr/bin/portageq"
let args= ["repos_config","/"]
out<- runOrDie cmd args
case parse reposParser"scanRepos" outof
Left pe->fail$show pe
Right nps->do
forM nps$\(n,p)->Repository n p<$> getEclasses p

Parsingall.eclass files is not necessary but was indeed intended to produce a functioning proof-of-concept.


A final version probably won't need to depend on attoparsec. The current parser is likely slow due to unintentionally excessive lookaheads like intakeLine = manyTill anyChar (try endOfLine) (compared totakeLine = many $ noneOf "\n"). If it just reads line by line, then it may not warrant a monadic parser at all. This can all be sorted out later when things work. Same with improving robustness with regard to spaces and such.

I admittedly am not very familiar with optimizingparsec for speed, and I felt thatattoparsec would be a good fit here because

  1. It is optimized for doing bulk parses where you don't need easy-to-read error messages containing location state
  2. It has low-level functions such astakeWhile, which I cannot find equivalents of inparsec.

I understand that adding another dependency isn't ideal either, and that with the other optimizations (such as only scanning the needed.eclass files)parsec may be more than adequate.

@koalaman
Copy link
Owner

Thanks! I managed to get a Gentoo environment running in Docker, and got a bit further on this inhttps://github.com/koalaman/shellcheck/tree/ebuild

It now runs end-to-end, but I ended up reverting a number of changes when going from Parameters to SystemInterface, and I'm still working on adding them back one by one.

Kangie reacted with heart emoji

@koalaman
Copy link
Owner

It's been a busy September, buthttps://github.com/koalaman/shellcheck/tree/ebuild now has most of the changes from the original patch, and I replaced the lovely attoparsec code with some janky regex matching and regular IO to avoid pulling in the additional dependencies.

In the original PR, checking a small ebuild took 240ms, of which 200ms was invokingportageq. With this uglier approach it takes 310ms, but I think this is fine since it only happens when checking ebuild files, and only once per shellcheck invocation. This is all on a default Gentoo system though, I don't know how repos tend to scale.

The original PR had a check for KEYWORDS in 9999.ebuild files, but it was gated on inheriting an eclasscros-workon, which I'm sure is useful but it seemed a bit too niche to support upstream (it could be implemented in a site specificCustom.hs by looking up the filename intokenPositions in theParameters).

Could you try it out and let me know how it works in practice, and what I may have missed from the original PR?

@Kangie
Copy link
Author

I replaced the lovely attoparsec code with some janky regex matching and regular IO to avoid pulling in the additional dependencies.

It's your software!

In the original PR, checking a small ebuild took 240ms, of which 200ms was invokingportageq. With this uglier approach it takes 310ms, but I think this is fine since it only happens when checking ebuild files, and only once per shellcheck invocation. This is all on a default Gentoo system though, I don't know how repos tend to scale.

I'll get back to you, but::gentoo is probably the largest repository out there - I have about 10 enabled but they're orders of magnitude smaller than::gentoo,

The original PR had a check for KEYWORDS in 9999.ebuild files, but it was gated on inheriting an eclasscros-workon, which I'm sure is useful but it seemed a bit too niche to support upstream (it could be implemented in a site specificCustom.hs by looking up the filename intokenPositions in theParameters).

That seems very ChromiumOS specific. I wouldn't worry about it too much -9999 is a convention that means 'from VCS sources' and theeclass in question seems to be a helper for ChromiumOS to build things entirely from git sources.

Could you try it out and let me know how it works in practice, and what I may have missed from the original PR?

I'll build a copy right now!

@Kangie
Copy link
Author

It took ages to update my hackage index. I've got to run off to work shortly so detailed testing will have to wait for a bit but overall it looks great.

It's probably worth suppressing the 'eclass dir does not exist' message outside of verbose/debug - most repos just use gentoo eclasses, though there are a good number thatdo ship their own eclasses.

Unable to find .eclass files: /var/db/repos/crossdev/eclass: getDirectoryContents:openDirStream: does not exist (No such file or directory)Unable to find .eclass files: /var/db/repos/kangie/eclass: getDirectoryContents:openDirStream: does not exist (No such file or directory)Unable to find .eclass files: /var/db/repos/kubler/eclass: getDirectoryContents:openDirStream: does not exist (No such file or directory)Unable to find .eclass files: /var/db/repos/steam-overlay/eclass: getDirectoryContents:openDirStream: does not exist (No such file or directory)

Otherwise, when run across an ebuild that I maintain the branch only returned valid warnings - ebuild syntax isn't flagged as a warning.

I'll try and find some actual issues but that's a positive result so far!

@Kangie
Copy link
Author

Kangie commentedOct 10, 2023
edited
Loading

I've got a Curl ebuild bump to prepare for, which gave me a good opportunity to do some testing.

Fornet-misc/curl/curl-9999.ebuild:

  • Whitelist${PN} for SC2206 (portage won't provide something that can be split) (Edit:${P},${PV},${V} too now that I think about it; probably also SC2086)
  • Whitelist${ED} for SC2115 (portage will always ensure that$ED is not empty)
  • MULTILIB_* here comes frommultilib-build.eclass - that's inherited bymultilib-minimal, which is inherited by this ebuild.
In curl-9999.ebuild line 107:MULTILIB_WRAPPED_HEADERS=(^----------------------^ SC2034 (warning): MULTILIB_WRAPPED_HEADERS appears unused. Verify use (or export if used externally).In curl-9999.ebuild line 111:MULTILIB_CHOST_TOOLS=(^------------------^ SC2034 (warning): MULTILIB_CHOST_TOOLS appears unused. Verify use (or export if used externally).

This guy is consumed by portage! Can we whitelist it?

https://github.com/gentoo/portage/blob/18db7f8cc8c750b50100680dcf288f3177173669/bin/install-qa-check.d/90config-impl-decl#L36

In curl-9999.ebuild line 115:QA_CONFIG_IMPL_DECL_SKIP=(^----------------------^ SC2034 (warning): QA_CONFIG_IMPL_DECL_SKIP appears unused. Verify use (or export if used externally).

Edit: We can probably scan over/usr/lib/portage/${PYTHON_IMPL}/install-qa-check.d - I doubt it'll hurt to just scan everything under/usr/lib/portage/**/install-qa-check.d for stray variables. I don't know how that will impact non-portage ebuild package managers though! It's probably worth suppressing a 'not found' outside of debug/verbose mode to cover that use case.

I'll try and come up with a better way to find edge cases like this but it looks good so far!

@Kangie
Copy link
Author

Kangie commentedOct 28, 2023
edited
Loading

mycmakeargs isconsumed by the cmake eclass:

In createrepo_c/createrepo_c-1.0.0.ebuild line 44:        local mycmakeargs=(              ^---------^ SC2034 (warning): mycmakeargs appears unused. Verify use (or export if used externally).

emesonargs isconsumed by the meson eclass (this is a better example, we definitely callmeson_src_configure:

In zchunk/zchunk-1.3.1.ebuild line 32:        local emesonargs=(              ^--------^ SC2034 (warning): emesonargs appears unused. Verify use (or export if used externally).

These are pretty minor nits that could be worked around with documentation IMO, but there are a whole class of false-positives that might be easily addressable with some haskell trickery - overall everything I've checked so far hasso much less noise than it did proviously.

@koalaman
Copy link
Owner

I see. So, I guess there are two surmountable issues:

  • Support for multiple levels of inheritance
  • Support for@VARIABLE and maybe@USER_VARIABLE in addition to@ECLASS_VARIABLE

And some significantly more annoying issues:

  • There are a number of variables with very generic names likeP andV that ebuild checks want to have special handling for
  • There are special cases (bugs?) for declared variables where it says# @VARIABLE: MYCMAKEARGS but it actually looks formycmakeargs
  • There is a (presumably evolving) long tail of variables likeQA_CONFIG_IMPL_DECL_SKIP that are not declared or listed anywhere

I'm starting to wonder about the relative value of supporting and maintaining ebuild support with workarounds upstream, compared to some more flexible 80% solution like having CI awk/grep the variables into assignments file that shellcheck can read with# shellcheck source=myfile on asource /dev/null.

@thesamesam
Copy link

And some significantly more annoying issues:

* There are a number of variables with very generic names like `P` and `V` that ebuild checks want to have special handling for

Yeah, I can see how this part in particular is a nuisance. There's notthat many of them and they're well-defined at least though.

* There are special cases (bugs?) for declared variables where it says `# @VARIABLE: MYCMAKEARGS` but it actually looks for `mycmakeargs`

This one is a quirk because the user might setMYCMAKEARGS in the environment, whilemycmakeargs is to be used within ebuilds. Suppressing warnings/errors for@USER_VARIABLE would handle that.

* There is a (presumably evolving) long tail of variables like `QA_CONFIG_IMPL_DECL_SKIP` that are not declared or listed anywhere

We could probably just ignoreQA_*. They're documented inebuild(5) but that's all.

I'm starting to wonder about the relative value of supporting and maintaining ebuild support with workarounds upstream, compared to some more flexible 80% solution like having CI awk/grep the variables into assignments file that shellcheck can read with# shellcheck source=myfile on asource /dev/null.

I've not been involved in the discussion thus far so I don't want to say "yes, definitely do that", but it sounds appealing and like a reasonable fit given all the quirks.

@hololeap
Copy link

hololeap commentedNov 27, 2023
edited
Loading

An observation: all of the changes in this PR are essentially adding things to whitelists for various checks (SC2034,SC2206,SC2115, etc.)

What ifshellcheck looks for external whitelists based on the extension of the file (e.g..ebuild)?

For instance, the whitelists could exist in a directory tree, the structure looking like:

$find /usr/share/shellcheck/whitelists//usr/share/shellcheck/whitelists/ebuild/core.wl/usr/share/shellcheck/whitelists/ebuild/eclasses/cmake.wl/usr/share/shellcheck/whitelists/ebuild/eclasses/haskell-cabal.wl

The individual.wl (standing for "whitelist") files could look like:

# Comment@SC2206- PN- P- PV- V@SC2115- ED
  • If the extension is.ebuild and/usr/share/shellcheck/whitelists/ebuild/ doesn't exist, then just emit a warning
  • Everything under/usr/share/shellcheck/whitelists/ebuild/ is either a.wl file or a directory containing.wl files (this only helps keeps things organized and prevents a giant monolithic.wl file)
  • The whitelists would be maintained separately by Gentoo
    • The whitelists for eclasses could be generated by a script (similar toapp-doc/eclass-manpages)
    • Packaged separately (e.g.dev-util/shellcheck-ebuild-whitelists)
  • Whitelist files could be parsed usingparsec, so it won't add any extra dependencies.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

4 participants

@Kangie@koalaman@hololeap@thesamesam

[8]ページ先頭

©2009-2025 Movatter.jp