Movatterモバイル変換

Portal:Toolforge/Admin

From Wikitech

Documentation of backend components andadmin procedures for Toolforge. SeeHelp:Toolforge for user facing documentation about actually using Toolforge to run your bots and webservices.

Admin permissions

Performing admin procedures requires having admin permissions on Toolforge. There is not a single "admin" flag, but a set of interrelated permissions you can be granted. These are described in detail in the pageToolforge roots and Toolforge admins.

Failover

Tools should be able to survive the failure of any one virt* node. Some items may need manual failover

Static webserver

This is a stateless simple nginx http server. Simply switch the floating IP from tools-static-10 to tools-static-11 (or vice versa) to switch over. Recovery is also equally trivial - just bring the machine back up and make sure puppet is ok.

Checker service

This is the service that Icinga hits to check status of several services. It's totally stateless.

SeePortal:Toolforge/Admin/Toolschecker

Prometheus

SeePortal:Toolforge/Admin/Prometheus#Failover.

tools-service

Service nodes run the Toolforge internalaptly service, to serve .deb packages as a repository for all the other nodes.

Command orchestration

Toolforge and Toolsbeta both have a localcumin server.

Administrative tasks

Logging in as root

For normal login root access seeToolforge roots and Toolforge admins.

In case the normal login does not work for example due to an LDAP failure, administrators can also directly log in as root. To prepare for that occasion, generate a separate key withssh-keygen, add an entry to thepasswords::root::extra_keys hash in Horizon's 'Project Puppet' section with your shell username as key and your public key as value and wait a Puppet cycle to have your key added to theroot accounts. Add to your~/.ssh/config:

# Use different identity for Tools root.Match host *.tools.eqiad1.wikimedia.cloud user root     IdentityFile ~/.ssh/your_secret_root_key

The code that readspasswords::root::extra_keys is inlabs/private:modules/passwords/manifests/init.pp.

Disabling all ssh logins except root

Useful for dealing with security critical situations. Just touch/etc/nologin and PAM will prevent any and all non-root logins.

Complaints of bastion being slow

Tracked inPhabricator
Task T266300

Users are increasingly noticing slowness on tools-login due to either CPU or IOPS exhaustion caused by people running processes there instead of on Kubernetes. Here are some tips for finding the processes in need of killing:

Look for IOPS hogs
- $iotop
Look for abnormal processes:
- $psaxouser:32,pid,cmd|grep-Ev"^($USER|root|daemon|_lldpd|messagebus|nagios|nslcd|ntp|prometheus|statd|syslog|Debian-exim|www-data)"|grep-ivE'screen|tmux|-bash|mosh-server|sshd:|/bin/bash|/bin/zsh'
- If you seepyb.py kill with extreme prejudice.
If the rogue job is running as a tool,!log something like:!log tools.$TOOL Killed $PROC process running on tools-bastion-NN. See https://wikitech.wikimedia.org/wiki/Help:Toolforge/Jobs_framework for instructions on running jobs on Kubernetes.

Local package management

Local packages are provided by anaptly repository ontools-services-05.

Ontools-services-05, you can manipulate the package database by various commands; cf.aptly(1). Afterwards, you need to publish the database to the filePackages by (for thetrusty-tools repository)aptly publish --skip-signing update trusty-tools. To use the packages on the clients you need to wait 30 minutes again or runapt-get update. In general, you should never just delete packages, but move them to~tools.admin/archived-packages.

You can always see where a package is (would be) coming from withapt-cache showpkg $package.

Local package policy

Package repositories

We only install packages from trustworthy repositories.
- OK are
  - The official Debian and Ubuntu repositories, and
  - Self-built packages (apt.wikimedia.org and aptly)
- Not OK are:
  - PPAs
  - other 3rd party repositories

Packagers effectively get root on our systems, as they could add a rootkit to the package, or upload an unsafe sshd version, and apt-get will happily install it

Hardness clause: in extraordinary cases, and for 'grandfathered in' packages, we can deviate from this policy, as long as security and maintainability are kept in mind.

apt.wikimedia.org

We assume that whatever is good for production is also OK for Toolforge.

aptly

We manage the aptly repository ourselves.

Packages in aptly need to be built by Toolforge admins
- we cannot import .deb files from untrusted 3rd party sources

Package source files need to come from a trusted source
- a source file from a trusted source (i.e. backports), or
- we build the debian source files ourselves
- we cannot build .dcs files from untrusted 3rd party sources

Packages need to be easy to update and build
- cowbuilder/pdebuild OK
- fpm is OK
- SeeDeploy new jobutils package for an example walk through of building and adding packages to aptly.

We only package if strictly necessary
- infrastructure packages
- packages that should be available for effective development (e.g. composer or sbt)
- not: python-*, lib*-perl, ..., which should just be installed with the available platform-specific package managers

For each package, it should be clear who is responsible for keeping it up to date
- for infrastructure packages, this should be one of the paid staffers

A list of locally maintained packages can be found under/local packages.

Building packages

moved toPortal:Toolforge/Admin/Packaging

Deploy new misctools package

moved toPortal:Toolforge/Admin/Packaging

Testing/QA for a new tools-webservice package

There is a simple flask app in toolsbeta using the tooltest that is set up to be deployed via webservice on Kubernetes.

After runningbecome test, you can go to theqa/tools-webservice directory. This is checked out via anonymous https, and is suitable for checking out a patch you are reviewing. There is an untracked file in there that is useful here, usually. The webservicefile at the route is just a copy of the one in thescripts folder in the repo. The only difference is:

9d8< sys.path.insert(0, '')

That exchanges the distribution installed package in the python path for the local directory, so if you run./webservice $somecommand it will run what is in your local folder rather than what is in/usr/lib/python3/dist-packages/. If you are testing changes made directly toscripts/webservice in the repo, you will likely need to copy that over the file and addsys.path.insert(0, "") after the import sys line.

If there is noimport sys line in this version of the code, add one! This should let you bang on your new version without having to mess with packaging, yet.

Deploy new tools-webservice package

moved toPortal:Toolforge/Admin/Packaging

Webserver statistics

To get a look at webserver statistics,goaccess is installed on the webproxies. Usage:

goaccess --date-format="%d/%b/%Y" --log-format='%h - - [%d:%t %^] "%r" %s %b "%R" "%u"' -q -f/var/log/nginx/access.log

Interactive key bindings are documented onthe man page. HTML output is supported by piping to a file. Note that nginx logs are rotated (twice?) daily, so there is only very recent data available.

Banning an IP from tool labs

OnHiera:Tools, add the IP to the list of dynamicproxy::banned_ips, then force a puppet run on the webproxies. Add a note toHelp:Toolforge/Banned explaining why. The user will get a message like[1].

Deploying the main web page

This website (plus the 403/500/503 error pages) are hosted undertools.admin. To deploy,

$becomeadmin$cdtool-admin-web$gitpull

Regenerate replica.my.cnf

This requires access to the cloudcontrol hostwhich is running maintain-dbusers, and can be done as follows:

$sshcloudcontrolXXXX.eqiad.wmnet$sudo/usr/local/sbin/maintain-dbusersdeletetools.${NAME}--account-type=tool:# or$sudo/usr/local/sbin/maintain-dbusersdelete${USERNAME}--account-type=user

Once the account has been deleted, the maintain-dbusers service will automatically recreate the user account.

Debugging bad MariaDB credentials

Sometimes things go wrong and a user'sreplica.my.cnf credentials don't propigate everywhere. You can check the status on various servers to try and narrow down what went wrong.

The database credentials needed are in/etc/dbusers.yaml on the cloudcontrol host running maintain-dbusers.

$sshcloudcontrolXXXX.eqiad.wmnet$sudocat/etc/dbusers.yaml:# look for the accounts-backend['password'] for the m5-master connections (user: labsdbaccounts):# look for the labsdbs['password'] for the other connections (user: labsdbadmin)$CHECK_UID=u12345# User id to check for:# Check if the user is in our meta datastore$mariadb-hm5-master.eqiad.wmnet-ulabsdbaccounts-p-e"USE labsdbaccounts; SELECT * FROM account WHERE mysql_username='${CHECK_UID}'\G":# Check if all the accounts are created in the labsdb boxes from meta datastore.$ACCT_ID=....# Account_id is foreign key (id from account table)$mariadb-hm5-master.eqiad.wmnet-ulabsdbaccounts-p-e"USE labsdbaccounts; SELECT * FROM labsdbaccounts.account_host WHERE account_id=${ACCT_ID}\G":# Check the actual labsdbs if needed$mariadb-hclouddbXXXX.eqiad.wmnet-ulabsdbadmin-p-e'SELECT User, Password from mysql.user where User like "${CHECK_UID}";':# Resynchronize account state on the replicas by finding missing GRANTS on each db server$sudomaintain-dbusersharvest-replicas

Seephab:T183644 for an example of fixing automatic credential creation caused when a old LDAP user becomes a Toolforge member and has an untracked user account on toolsdb.

Regenerate kubernetes credentials for tools (.kube/config)

With admin credentials (root on a control plane node will do), runkubectl -n tool-<toolname> delete cm maintain-kubeusers-<toolname>; it should get regenerated within minutes.

Adding K8S Components

SeePortal:Toolforge/Admin/Kubernetes#Building new nodes

Deleting a tool

Tracked inPhabricator
Task T170355Resolved

For batch or CLI deletion of tools, use the 'mark_tool' command on a cloudcontrol node:

andrew@cloudcontrol1003:~$sudomark_toolusage: mark_tool [-h] [--ldap-user LDAP_USER] [--ldap-password LDAP_PASSWORD]                 [--ldap-base-dn LDAP_BASE_DN] [--project PROJECT] [--disable]                 [--delete] [--enable]                 toolmark_tool: error: the following arguments are required: tool

Maintainers can mark their tools for deletion using the "Disable tool" button on the tool's detail page onhttps://toolsadmin.wikimedia.org/. In either case, the immediate effect of disabling a tool is to stop any running jobs, prevent users from logging in as that tool, and schedule archiving and deletion for 40 days in the future.

A tool can be restored within 40 days of being disabled

Tool archives are stored on the tools NFS server, currentlytools-nfs-2.tools.eqiad1.wikimedia.cloud:

root@labstore1004:/srv/disable-tool#ls-ltrah/srv/tools/archivedtools/total 1.8Gdrwxr-xr-x 5 root root 4.0K Jun 21 19:37 ..-rw-r--r-- 1 root root 102K Jul 22 22:15 andrewtesttooltwo-rw-r--r-- 1 root root   45 Oct 13 00:47 andrewtesttooltwo.tgz-rw-r--r-- 1 root root 8.3M Oct 13 03:20 mediaplaycounts.tgz-rw-r--r-- 1 root root 1.8G Oct 13 04:01 projanalysis.tgz-rw-r--r-- 1 root root 1.3M Oct 13 21:05 reportsbot.tgzdrwxr-xr-x 2 root root 4.0K Oct 13 21:10 .-rw-r--r-- 1 root root 719K Oct 13 21:10 wsm.tgz-rw-r--r-- 1 root root 4.8K Oct 13 21:20 andrewtesttoolfour.tgz

The actual deletion process is shockingly complicated. A tool will only be archived and deleted if all of the prior steps succeed, but disabling of a tool should be a sure thing.

SSL certificates

SeePortal:Toolforge/Admin/SSL certificates.

Granting a tool write access to Elasticsearch

Generate a random password and the mkpassword crypt entry for it using the scriptnew-es-password.sh. (Must be run a host with the `mkpasswd` command installed. (The mkpasswd is part of the whois Debian package.)

$./new-es-password.shtools.exampletools.example elasticsearch.ini----[elasticsearch]user=tools.examplepassword=A3rJqgFKxa/x4NlnIhmw2cXcV92it/Zv0Yt+a7yhxCw=----tools.example puppet master private (hieradata/labs/tools/common.yaml)----profile::toolforge::elasticsearch::haproxy::elastic_users:  - name: 'tools.example'    password: '$6$FYwP3wxT4K7O9EE$OA3P5972NWJVG/WUnD240sal34/dsNabbcawItevMYO9uoR.fJBrjSABex0EDW0wlkWHID1Tf4oJoiNvYFGmy/'

Add the private SHA512 hash to thetools puppetserver:

$sshtools-puppetserver-01.tools.eqiad1.wikimedia.cloud$sudo-i#cd/srv/git/labs/private#vimhieradata/labs/tools/common.yaml... merge the hiera data with the existing key...:wq#gitaddhieradata/labs/tools/common.yaml#gitcommit-m"[local] Elasticsearch credentials for$TOOL"

Force a puppet run on tools-elastic nodes usingCumin

cloudcumin1001.eqiad.wmnet:~$sudocumin"O{project:tools name:.*elastic.*}""run-puppet-agent"

Make the credentials available to the tool asenvvars:

$sshdev.toolforge.org$sudo-ibecomeexample-tool$toolforgeenvvarscreateTOOL_ELASTICSEARCH_USEREnter the value of your envvar (Hit Ctrl+C to cancel): <insert user>$toolforgeenvvarscreateTOOL_ELASTICSEARCH_PASSWORDEnter the value of your envvar (Hit Ctrl+C to cancel): <insert password>

Note: An older procedure placed the credentials in/data/project/$TOOL/.elasticsearch.ini instead.

Resolve the ticket!

Package upgrades

SeeManaging package upgrades.

Creating a new Docker image (e.g. for new versions of Node.js)

SeePortal:Toolforge/Admin/Kubernetes#Docker Images

APIs

Toolforge is moving towards an API-oriented model where client tools (such as those installed on bastions) contact the Toolforge API to make changes instead of making them directly.

See theuser docs also.

Access

They APIs are presented as one single aggregated endpoint though theAPI Gateway.

The base endpoint ishttps://api.svc.[project].eqiad1.wikimedia.cloud:30003. Services are routed with subpaths, for example/jobs for theJobs API.

For authentication we currently use client certificates issued by the Kubernetes cluster internal CA via maintain-kubeusers. This will change in the future as we evolve how the APIs are accessed and used.

Components

Checker Service

This is the service that Icinga hits to check status of several services. It's totally stateless.

SeePortal:Toolforge/Admin/Toolschecker

Redis

SeePortal:Toolforge/Admin/Redis.

Elasticsearch

No dedicated page yet, we have somedashboards.

Prometheus

SeePortal:Toolforge/Admin/Prometheus.

Apt repository

SeePortal:Toolforge/Admin/Apt repository

Striker/Toolforge UI

SeePortal:Toolforge/Admin/Striker

ToolsDB

SeePortal:Toolforge/Admin/ToolsDB

Kubernetes infrastructure

SeePortal:Toolforge/Admin/Kubernetes

Users and community

Some information about how to manage users and general community and their relationship with Toolforge.

Project membership request approval

User access requests show up inhttps://toolsadmin.wikimedia.org/tools/membership/

Some guidelines for account approvals, based onadvice from scfc:

If the request contains any defamatory or abusive information as part of the username(s), reason, or comments → mark asDeclined and check the "Suppress this request (hide from non-admin users)" checkbox.
- You should also block the user on Wikitech and consider contacting aSteward for wider review of the SUL account.
If the user name "looks" like a bot or someone else who could not consent to theTerms of use andRules → mark asDeclined.
Check the status of the associatedSUL account. If the user is banned on one or more wikis → mark asDeclined.
If the stated purpose is "tangible" ("I want to move my bot x to Toolforge", "I want to build a web app that does y", etc.) → mark asApproved.
- If you know that someone else has been working on the same problem, add a message explaining who the user should contact or where they might find more information.
If the stated purpose is "abstract" ("research", "experimentation", etc.) and there is a hackathon ongoing or planned, the user has a non-throw-away mail address, the user has created a user page with coherent information about theirself or linked a SUL account of good standing, etc. → mark asApproved.
Otherwise add a comment asking for clarification of their reason for use and mark asFeedback needed. The request is not really "denied", but more (indefinitely) "delayed".

Requests left inFeedback needed for more information for more than 30 days should usually be declined with a message like "Feel free to apply again later with more complete information."

Quota management

Toolforge quotas are managed via maintain-kubeusers.

Have the user open a phabricator ticket, for the papertrail. See alsoHelp:Toolforge/Kubernetes#Quotas_and_Resources
Send a patch for maintain-kubeusers, have it reviewed and merged:https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/blob/main/components/maintain-kubeusers/values/tools.yaml
Deploy in the cluster, using the deployPortal:Cloud_VPS/Admin/Cookbooks

Other

How do Toolforge web services actually work?

SeePortal:Toolforge/Admin/Kubernetes#Ingress

What makes a root/Giving root access

SeeToolforge roots and Toolforge admins

Servicegroup log

tools.admin runs/data/project/admin/bin/toolhistory, which provides an hourly snapshot ofldaplist -l servicegroup as git repository in/data/project/admin/var/lib/git/servicegroups

Useful administrative tools

These tools offer useful information about Toolforge itself:

ToolsDB - Statistics about tables owned by tools
k8s-stats - examine what our tools are doing
OpenStack Browser - examine projects, instances, web proxies, and Puppet config

Brainstorming

/BotLicensing

Sub pages

Retrieved from "https://wikitech.wikimedia.org/w/index.php?title=Portal:Toolforge/Admin&oldid=2356940"

Categories:

[8]ページ先頭

Movatterモバイル変換

Admin permissions

Failover

Static webserver

Checker service

Prometheus

tools-service

Command orchestration

Administrative tasks

Logging in as root

Disabling all ssh logins except root

Complaints of bastion being slow

Local package management

Local package policy

Building packages

Deploy new misctools package

Testing/QA for a new tools-webservice package

Deploy new tools-webservice package

Webserver statistics

Banning an IP from tool labs

Deploying the main web page

Regenerate replica.my.cnf

Debugging bad MariaDB credentials

Regenerate kubernetes credentials for tools (.kube/config)

Adding K8S Components

Deleting a tool

SSL certificates

Granting a tool write access to Elasticsearch

Package upgrades

Creating a new Docker image (e.g. for new versions of Node.js)

APIs

Access

Components

Deploying a component

API Gateway

Jobs Service

Envvars Service

Build Service

Logs Service

Components service

Tools-mail / Exim

Checker Service

Redis

Elasticsearch

Prometheus

Apt repository

Striker/Toolforge UI

ToolsDB

Kubernetes infrastructure

Users and community

Project membership request approval

Quota management

Other

How do Toolforge web services actually work?

What makes a root/Giving root access

Servicegroup log

Useful administrative tools

Brainstorming

Sub pages