- Notifications
You must be signed in to change notification settings - Fork135
Package normalization ruleset for Repology
License
repology/repology-rules
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
There can be a huge discrepancy in how packages for a single projectare named and versioned in different repositories, so Repologyneeds a flexible ruleset in order to overcome the differences,match packages, and make versions comparable.
You are welcome to submit pull requests with the rules you need.Here's a quick pointer of how to add specific rules:
- Choose a target name (prefer the least ambiguous and/or most widely used name)
- Open the corresponding yaml file under
800.renames-and-merges/
(if there's no existing yaml file relevant to your package,use the file named with the first letter of your target name, likea.yaml
) - Add a rule like
- { setname: <target name>, name: <original name> }
- Open the corresponding yaml file under
900.version-fixes/
- Add a rule like:
- { name: <package name>, ver: <bad version>, ignore: true }
- Consider using a
verpat
with a regular expression to match similarbad versions which may appear in the future. Examples:verpat: "20[0-9]{6}"
to match dates (20110323
)verpat: "20[0-9]{2}\\.[0-9]{2}\\.[0-9]{2}"
same, but for delimited date, (2010.03.23
)verpat: ".*20[0-9]{6}.*"
to match dates anywhere in the version (1.0.20110323
)verpat: "[0-9a-f]{7}"
match something resembling a git commit (a7b823f
)verpat: "[0-9]{4,}"
match something resembling a build or revision number (12345
)
- Open the corresponding yaml file under
850.split-ambiguities/
- Add a group of rules to distinguish packages by upstream URL(wwwpart,wwwpat andsourceforgeconditions are allowed). Such group must end with acatch-all rule for packages not matched by specificrules. Example:
-{ name: <ambiguous name>, wwwpart: <url part for project A>, setname: <name A> }`-{ name: <ambiguous name>, wwwpart: <url part for project B>, setname: <name B> }`-{ name: <ambiguous name>, addflag: unclassified }
Things to know if you're submitting a pull request or have push accessto this repository.
- Repology is currently set up to automatically pull the latest rulesetfrom the
master
branch in this repository on each update, so everythingcommitted here will be automatically applied to Repology in severalhours. - Repology runs
make check
after updating the repository, and if itfails, rolls back to the latest good commit, so it's somewhatprotected from a broken ruleset. - In the worst case, a broken ruleset will prevent Repology fromupdating until the problem is resolved.
- Still, please run
make check
before committing, and/or installthe git hook inscripts/pre-push
,which runs it for you (you can copy it into.git/hooks
or just runmake install-hook
). - The checker script requires the Python modules
voluptuous
andPyYAML
.pip install PyYAML voluptuous
should install them for you. - In general, stay close to the style already used in the ruleset,use existing rules as examples, keep it simple and have fun!
- If in doubt, you can always just submit a report from the package's pageon the website and avoid all the work!
Rules are stored in a set of files inYAML format,a flexible human-friendly markup format for structured data.Each rule is a single item of a big array, and may be written in a single ormultiple lines (depending on what's more convenient for the particularcase). For example, the following rule renamesetracer
intoextreme-tuxracer
:
-{ name: etracer, setname: extreme-tuxracer }
which is the same as:
-name:etracersetname:extreme-tuxracer
Each rule has a set of keywords which specify how a package is matched(by name, version, repository, category etc.) and how it is modified(package is renamed, version scheme is changed, flags are applied, etc.).
Rule order matters, as multiple rules may match a single package, andthey are applied in order. Furthermore, changes applied by earlierrules affect further matches: for instance, if a package is renamed,the new name will be matched for the following rules.
While rules are basically arbitrary, it's practical though to attributeeach rule to a specific class of action, the most distinctive of which are:
- Rename or merge rules. Match a name, and set another name. The main purposeis to merge differently-named packages into the same project. For example,
etracer
,extremetuxracer
,extreme-tuxracer
→extreme-tuxracer
. - Split rules. Match a name and some additional property (version, homepageor repository), and set another name. Used to split similarly-namedpackages of different projects. For example,
clementine
→clementine-wm
,clementine-player
. - Version fixes. Match a name but do not change it; instead, change versionsor set some version-related flags. Used to fix incorrect versioning scheme(
v1.0
→1.0
), mark some versions as devel (such as beta versions),or ignore some versions (e.g. snapshots like20130523
when there areofficial versions like1.0
).
The ruleset is split into several distinctive parts,mostly based on the functional class of rules described above.They are arranged in such a way that when adding a rule into a specific partyou don't need to be aware of the rest of the ruleset.
100.prefix-suffix - normalization of repository specific prefixes andsuffixes which are not part of the meaningful package name. Such as removalof
lib32-
prefixes.2xx.handpicked - a block where access to unmodified package names isneeded, such as manual whitelists or blacklists.
[45]xx.wildcard - wildcard rules which affect a lot of packages. Thesemostly handle modules for specific languages such as Perl (which may benamed like
p5-Foo-Bar
orlibfoo-bar-perl
in different repositories)by adding distinctive prefix (perl:
in this case) to them,so they do not conflict with modules for other languages and other software.There are three subsets here:
- pure rules which are known to not have any false positives(e.g. packages from
CPAN
are always perl modules). - exceptions for the wildcard rules
- wildcard rules themselves
- pure rules which are known to not have any false positives(e.g. packages from
750.exceptions - the small set of remaining exceptions.If a package needs a rule here, it's most positively incorrectly named.
800.renames-and-merges - pure merge rules
850.split-ambiguities - pure split rules
900.version-fixes - pure version fixes
950.split-branches - additional split section for projects whichhave multiple development branches which are incompatible and maypresent in a single repository at the same time for compatibilitypurposes. For example,
gtk2
andgtk3
.There are also somefixme subsets which are remainings of the previousgeneration of the ruleset. These files will eventually be refactoredand removed.
This may seem complex, but in practice the mostly used rulesets are800,850 and900, which cleanly correspond to three functionalclasses of rules described in theprevious section.
Other parts of the ruleset may need attention when new repositories areintroduced.
As already mentioned, the keywords that comprise rules are related to eithermatching packages, or modifying them. Below are detailed descriptions for allof them.
Each repository that Repology supports has a set ofrulesets associated withit. For instance, all Debian-based distros have the rulesetdebuntu
. This maybe used to only match packages in specific repositories, but without the needto chase a specific repository version. You may look up repositories and theirdetails in therepos.ddirectory of the main Repology repository.
You may specify a list of rulesets to match any of them.
-{ ruleset: freebsd, ... }-{ ruleset: [ arch, openbsd ], ... }
Disable rule matching for specified ruleset(s).
# applies to all Debian derivatives, but not Deepin-{ ruleset: debuntu, noruleset: deepin, ... }
Deprecated. Same asruleset, and may be just changed into it.
Matches package category(ies). Note that category information is notavailable for all repositories, and each repository may have itsown set of categories.
-{ category: games, ... }-{ category: [ mail-client, mail-filter, mail-mta ], ... }
Matches package category(ies) against a regular expression.The whole category is matched, match is case insensitive.
-{ categorypat: "emacs[0-9]+Packages" }
Matches package maintainer(s). The matching is case-insensitive.
-{ maintainer: "nobody@nowhere.com" }
Match exact package name(s).
-{ name: firefox, ... }-{ name: [postgresql-client, postgresql-server, postgresql-contrib], ... }
Matches package name against a regular expression.The whole name is matched. May contain captures.
-{ namepat: "swig[0-9]+", ... }
Matches exact package version(s).
-{ name: firefox, ver: "50.0.1", ... }
The opposite ofver: matches if the package version is none of specifiedversion(s).
-{ name: firefox, notver: ["50.0.1", "50.0.2"] }
Matches a package version name against a regular expression.The whole version is matched. Note that you need to escape periods,which mean "any symbol" in regular expressions. Matching is case-insensitive.
-{ name: firefox, verpat: "50\\.[0-9]+", ... }-{ name: firefox, verpat: "50\\..*", ... }
Matches the number of components (dot-separated parts) of a version.
-{ name: gimp, vercomps: 3, ...}# matches 1.2.3, but not 1.2 or 1.2.3.4
Matches versions longer than a given number of components (dot-separated parts).
Mostly useful to match broken version schemes that add extra version components.
-{ name: gimp, verlonger: 3, ...}# 2.9.8.12345 is something unofficial
Compares version to a given one and matches if it is:
- vergt: greater (>)
- verge: greater or equal (≥)
- verlt: lesser (<)
- verle: lesser or equal (≤)
- vereq: equal
- verne: not equal
# match git >= 2.16-{ name: git, verge: "2.16", ...}
Be careful when using this with regard to pre-release versions:1.0beta1
is lesser than1.0
, so it won't matchverge: 1.0
.You may useverpat instead.
Similar to theverXX family, but checks how a package version relatesto a specified release. A release includes all pre-releases andpost-releases with a given prefix; e.g.releq: "1.0"
would match1.0alpha1
,1.0
,1.0patch
,1.0.1
, but not0.99
and1.1
.
Matches the package homepage against a regular expression. Note thatunlike namepat and verpat, a partial match is allowed here.Also note that dots should be escaped with double slash,as.
means "any character" in regular expressions.
-{ name: firefox, wwwpat: "mozilla\\.org", ... }
Matches when a package homepage contains given substring. Thisis usually more practical thanwwwpat as in most cases youjust need to match an URL part and don't need complex patterns,and you don't need to worry about escaping here. Matching iscase-insensitive.
-{ name: firefox, wwwpart: "mozilla.org", ... }
Matches when a package homepage is a sourceforge page for a givenproject name (https://<project>.sourceforge.net
,https://sourceforge.net/project/<project>
etc.):
-{ name: aterm, sourceforge: aterm, ... }
Matches when a package summary contains a given substring. Usefulas an alternative towwwpart for cases where the packagehomepage is not available. Matching is case-insensitive.
-{ name: firefox, summpart: "browser", ... }
Matches when a package has thep_is_patch
flag set(see thep_is_patch
action below).
Effectively rename the package. You may use the$0
placeholder tosubstitute original name, or$1
,$2
etc. to substitute the contentsof the corresponding captures of the regular expression used innamepat.Note that you don't need to use neithername nornamepat for$0
to work, but you must havenamepat with correspondingcaptures to use$1
and so on.
# etracer→extreme-tuxracer-{ name: etracer, setname: extreme-tuxracer }# aspell-dict-en→aspell-ru, aspell-dict-ru→aspell-ru etc.-{ namepat: "aspell-dict-(.*)", setname: "aspell-$1" }# all packages in dev-perl Gentoo category are prepended `perl:`# Locale-Msgfmt→perl:Locale-Msgfmt-{ ruleset: gentoo, category: dev-perl, setname: "perl:$0" }
Changes the version of the package. As withsetname, you mayuse the placeholders$0
,$1
, etc.
# remove bogus leading version component-{ verpat: "0\\.(.*)", setver: $1 }
Set totrue
to completely remove a package. It will not appearanywhere in Repology. Set tofalse
to undo.
# a metapackage which does not refer to any real project, we don't need it-{ name: "x11-fonts", remove: true }
Set totrue
to mark the version of a matched package as a development orunstable version, so it does not make the latest stable version be markedas outdated. Set tofalse
to undo.
# mark versions with odd second component as devel-{ name: gnome-terminal, verpat: "[0-9]+\\.[0-9]*[13579]\\..*", devel: true }
A project may use two parallel versioning schemes, one of which containsadditional version components, such as a build number:
0.17
,0.17.13509
,0.17.13541
,0.18
,0.18.16131
Normally,0.18.16131
would be considered more recent than0.18
,but if these refer to the same version, this is not desired behavior.In such case, a version scheme containing extra components(e.g. one which compares greater) may be marked asaltver,which would allow both0.18
and0.18.16131
to be considered the latest,and both to be marked as outdated by the presence of either0.19
or0.19.x
.
-{ name: freecad, verlonger: 3, altver: true }
Similar toaltver, but for the case where versioning schemesdo not have a common prefix and are totally incompatible:
3.2.1
,3207
,3.2.2
,3211
Marking either of the schemes with this flag results in completely independent processing,which would allow both3.2.2
and3211
to be treated as the newest version.
-{ name: sublime-text, verpat: "[0-9]+", altscheme: true }
Set totrue
to ignore specific package versions. This is meant for thecases where comparison is not possible - ignored versions are excluded fromcomparison and do not affect the status of other versions. There are multipleignore flavors:
rolling
- the package is always fetched from the latest snapshot or VCSmaster/trunk. Its version has no meaning (like Gentoo's9999
),and may contain repository-specific formats such as a commit hash,revision or date.noscheme
- there's no official versioning scheme. Repositories mayuse random versions or dates, there's no point comparing them.incorrect
- known incorrect version (e.g. version which was notreleased yet)untrusted
- used for repositories which are known for providingincorrect versions, to ignore them proactively. It's a common patternto create a pair ofincorrect
rules matching specific versions, and anuntrusted
rule for the following versions in a given repository.ignored
- general ignore actionsuccessor
- currently an alias fordevel
, used to convey the additionalmeaning of this being a fork of an unmaintained original projectdebianism
- currently an alias fordevel
, used to convey the additionalmeaning of this package using a distribution maintained at Debian (probablywith version addendum)snapshot
- currently alias forignored
# Fedora was known to use "6.0.0" version before it was actually released# mark as incorrect and prevent future problems-{ name: llvm, ver: "6.0.0", ruleset: fedora, incorrect: true }-{ name: llvm, ruleset: fedora, untrusted: true }
Set totrue
to indicate that this project usesp
letter in the versionto indicate post- or patch releases. This fixes version comparison, asby defaultp
is treated as pre-release.
# sudo 1.8.21p2 > 1.8.21-{ name: sudo, p_is_patch: true }
Set totrue
to indicate that this project uses any letter in the versionto indicate post- releases.
# rb here denotes a patchset, treat is as such-{ name: webalizer, verpat: ".*rb.*", any_is_patch: true }
Set totrue
to force the package version to compare lower thanany other package version. Useful to handle upstream versioningschema change when new versions compare lower than legacy ones.Set tofalse
to undo.
# when 0.20 follows 0.193:-{ version: "0.193", sink: true }
Result:0.20 (newest)
>0.193 (outdated)
Set totrue
to force the package to be outdated, even if itclassifies as the most recent. Note that this does not lead toanother version being selected as newest. Useful to convey thata version is outdated even when there are no newer versions (forinstance, when a project is superceded by another project).Set tofalse
to undo.
# when 0.20 follows 0.193:-{ version: "0.193", outdated: true }
Result:0.193 (outdated)
>0.20 (outdated)
Set totrue
to force the package to be legacy instead of outdated.Set tofalse
to undo. Useful when a specific repository purposely containsan outdated version of a specific project for compatibility purposes.
-{ name: ruby-slack-notifier-1, ruleset: aur, legacy: true }
Set totrue
to prevent the package from ever having legacy status.This is useful for marking packages which declare to be of developmentversion, but are nevertheless outdated.
-{ name: ffmpeg-git, nolegacy: true }
Output a given warning when matched.
# will catch unexpected versions-{ name: gtk, verpat: "1\\..*", setname: gtk1 }-{ name: gtk, verpat: "2\\..*", setname: gtk2 }-{ name: gtk, verpat: "3\\..*", setname: gtk3 }-{ name: gtk, verpat: "4\\..*", setname: gtk4 }-{ name: gtk, warning: "Neither of gtk1,2,3,4 - need a new rule or some weirdness is going on" }# will trigger a warning if new project called "tesseract" appears# ...or website changes, or just a package without website defined appears,# so it'll require another condition-{ name: tesseract, setname: tesseract-game, wwwpart: tesseract.gg }-{ name: tesseract, setname: tesseract-ocr, wwwpart: tesseract-ocr }-{ name: tesseract, warning: "Please add rule for tesseract" }
Flavors are used to distinguish a set of packages denoting multipleversions of a project and a set of packages denoting a multiple partsor variants of a project. Consider an example:
foo1 1.0
andfoo2 2.0
merged intofoo
. In this case they denotemultiple versions of the same project, flavors are not needed hereandfoo1
will havelegacy
status.foo-client 1.0
andfoo-server 1.1
merged intofoo
. In this casethey denote parts of the same project, which are expected to be ofthe same version. Flavors should be used in this case, sofoo-client
will have theoutdated
status.
Flavors are plain strings and may be arbitrary, for exampleclient
andserver
in the last example. You may specify a flavor explicitly,or use thetrue
value to make the flavor be taken from the package name.
-{ name: postgresql-client, setname: postgresql, addflavor: client }-{ name: postgresql-server, setname: postgresql, addflavor: server }# This works too-{ name: [postgresql-client, postgresql-server], setname: postgresql, addflavor: true }
Same as addflavor, but replaces flavor instead to appending toflavors list.
Set totrue
to remove all previously added flavors.
Set totrue
to stop ruleset processing right after the current rule.
Consider this a legacy feature; it should not be needed.
Takes a pattern and replacement strings, and applies them to the packagename. Used for low-level normalization.
# slashes in package names are not allowed-{ replaceinname: { "/": "-" } }# also useful for some repositories-{ replaceinname: { " ": "-" } }
Converts a package name to lowercase. This is called once in thevery beginning of the ruleset. The purpose of having this as a ruleaction is to be able to have exceptions, e.g. packages which shouldbe distinguished solely by the case of their names.
-{ tolowername: true }
Changes the subrepo property of the package. As withsetname,you may use the placeholders$0
,$1
, etc.
# split subrepo name from package name-{ namepat: "([^-]+)-(.*)", setsubrepo: $1, setname: $2 }
For additional flexibility, a mechanism exists to toggle some rulesbased on the previous rules.
Sets a virtual flag (arbitrary string) which only exists for the durationof rule processing, and may be checked in the following rules.
-{ name: python, addflag: not_python_module }
Only matches if the specified flag is (or is not) set.
-{ name: python, addflag: not_python_module }...# will add "python:" prefix to all packages in category "python",# but not for "python" package-{ category: python, noflag: not_python_module, setname: "python:$0" }
These annotations do not affect package processing, but are relatedto ruleset maintenance.
Indicates that a rule needs manual maintenance. For example, whena development version cannot be determined from the version schema,one would need to revisit and update the version occasionally.
-{ name: tor, verge: "0.3.4", devel: true, maintenance: true }
Indicates that a rule should not be removed even if it doesn'tmatch any packages. That is, a rule is likely to be useful sometimein the future.
Indicates that a rule may be removed if it doesn't match any packages.
GPLv3 or later, seeCOPYING.