2023.2 Series Release Notes

17.9.0

New Features

  • kolla-ansible now validates the Prometheus configuration files whencalled viakolla-ansible-i$inventoryvalidate-config. Thisvalidation is done by running thepromtoolcheckconfig command.See thedocumentationfor thekolla-ansiblevalidate-config command for details.

17.8.0

Bug Fixes

  • Fixes Apache and placement writing to the same log file.Apache placement VirtualHost ErrorLog has been renamed toplacement-api-error.log (similar to other services).LP#[2095607]

  • Fixes a bug where the etcd3gwbackend_url in cinder.conf would beinvalid whenopenstack_cacert was set.LP#2085908

  • Fixes an issue with Grafana datasource updates by removing hardcodedversion number. This ensures proper datasource configuration updates.LP#[2096664]

  • Fixed an issue with theprometheus.yml template which would break whendeploying alertmanager.

17.7.0

Bug Fixes

  • Fixes cases when fluentd parser fails on Python traceback.OpenStack services regex has been reworked to include bothglobal_request_id and handling cases with Python traceback.LP#2044370

  • Fixes busy libvirt’s secret volume while secrets are changing.LP#2073678

  • Adds a check to stop deploying/upgrading the RabbitMQ containers if itwill result in downgrading the version of RabbitMQ running.

  • Fixes a bug where the RabbitMQ version check would fail to pull the newimage due to lack of auth.LP#2086171

  • Fixes a bug where the IP address comparison was not done properlyfor the variablekolla_same_external_internal_vip.Fix the comparison to use theipaddr filter instead.For details seeLP#2076889.

17.6.0

New Features

  • Adds the ability to provide the NTP (time source) server for multiple DHCPranges in the Ironic Inspector DHCP server.

Upgrade Notes

  • Support for failing execution early if fact collection fails on any of thehosts by settingkolla_ansible_setup_any_errors_fatal totrue hasbeen removed. This is due to Ansible’sany_errors_fatal parameter notbeing templated, resulting in the value always being interpreted astrue, even though the default value ofkolla_ansible_setup_any_errors_fatal isfalse.

    Equivalent behaviour is possible by setting the maximum failure percentageto 0. This may be done specifically for fact gathering usinggather_facts_max_fail_percentage or globally usingkolla_max_fail_percentage.

Bug Fixes

  • Fixes an issue with ironic dnsmasq failing to start in deploymentsusing podman because it requires the NET_RAW capability. SeeLP#2055282.

  • Fixes keystone service configuration for haproxy when using federation.LP#2058656

  • Fixes mariadb’s backup failure due to missingCREATE privilegeson themariadb_backup_history table.LP#2061889

  • Fixes the MariaDB recovery issue when kolla-ansible is running froma docker container.LP#2073370

  • Fix ProxySQL unable to bind due to incorrect formatof IPv6 addresses in themysql_ifaces configuration.LP#2081106

  • Fixes an issue during fact gathering when using the--limit argumentwhere a host that fails to gather facts could cause another host to failduring delegated fact gathering.

  • Addskip_kpartxyes to multipath.confdefaults section to preventkpartx scanning multipath devices and unlockmultipathddelmapoperation of os-brick for volume detaching oprtaions.LP#2078973 <https://launchpad.net/bugs/2078973>`__

  • Fixes 2067036.Addedoctavia_interface_wait_timeout to controloctavia-interface.service timeout to be able waitopenvswitch agent sync has been finished andoctavia-lb-net is reachable from the host.Also set restart policy for this unit to on-failureLP#2067036

  • Fixes Octavia service upgrade issue where it can fail when Octaviapersistence database user is missing.LP#2065591

  • Fixes unreliable health checks for neutron_ovn_agentand neutron_ovn_metadata_agent bug.Changed to check OVS DB connection instead of OVNsouthbound DB connection.LP#2084128

  • Fixes an issue, when using podman, with named volumes that use a modespecifier. SeeLP#2054834 for more details.

  • Fixes parsing of JSON output of inner modules called bykolla-toolboxwhen data was returned on standard error.LP#2080544

17.5.0

New Features

  • Modifies public API firewalld rules to be applied immediately to a runningfirewalld service. This requires firewalld to be running, but avoidsreloading firewalld, which is disruptive due to the way in which firewalldbuilds its firewall chains.

Bug Fixes

  • Fixes an deploy opensearch with enable TLS on the internal VIP.

  • Fixes handling of openvswitch onmanila-share nodes.LP#1993285

  • Fixes the Python requests library issue when using customCA by adding the REQUESTS_CA environment variable to thekolla-toolbox container.SeeLP#1967132

  • Fixes configuration of CloudKitty when internal TLS is enabled.LP#1998831

  • Fixes the detection of the Nova Compute Ironic service when a customhostoption is set in the service config file.SeeLP#2056571

  • Removes the default/tmp/ mountpoint from the horizon container. Thischange is made to harden the container and prevent potential securityissues. For more information, see the Bug Report:LP#2068126.

  • Fixes an issue where OVN northbound or southbound database deployment couldfail when a new leader is elected.LP#2059124

17.4.0

Upgrade Notes

  • MariaDB backup now uses the same image as the running MariaDB server. Thefollowing variables relating to MariaDB backups are no longer used and havebeen removed:

    • mariabackup_image

    • mariabackup_tag

    • mariabackup_image_full

Deprecation Notes

  • Support for deploying Masakari is no longer deprecated. The Masakari CIscenarios are now working again, and commitment has been made to improvethe health of the project.

Bug Fixes

  • Add conditionals for IPv6 sysctl settingsthat have IPV6 disabled in kernel.Changing sysctl settings related to IPv6 on thosesystems lead to errors.LP#1906306

  • Fixes nova-cell not updating the cell0 database address when VIP changes.LP#1915302

  • Fixes trove module imports.Path to the modules needed by trove-api changed in source trovepackage so the configuration was updated.LP#1937120

  • Incorrect condition in Podman part prevented the retrievalof facts of all the containers when no names were provided.LP#2058492

  • Modifies the MariaDB procedure to use the same container image as therunning MariaDB server container. This should prevent compatibility issuesthat may cause the backup to fail.

  • Fixes a bug in kolla_podman_worker, where missing commasin list of strings create implicit concatenation of itemsthat should be separate.LP#2067278

  • Fixed ‘cinder-backup’ service when Swift with TLS enabled.LP#2051986

  • Fixes the dimensions comparison when we setvalues like1g in the container dimensionsconfiguration, making the docker containergetting restarted even with no changes, aswe are comparing1g with1073741824,which is displayed in the docker inspectwhile1g is in the configuration.

  • Fixes keystone port in skyline-console pointing to wrong endpoint port.LP#2069855

  • Fixes 2065168.Fix kolla systemd unit template to prevent restartall kolla services with docker.service restart.LP#[2065168]

  • Fixes a bug in kolla-ansible where the keystone service role was not beingcreated during an upgrade. This was due to the service-ks-register role notbeing imported in the upgrade.yml file. The service-ks-register role is nowimported in the upgrade.yml file.See bug:https://bugs.launchpad.net/kolla-ansible/+bug/2056761

  • Fixed an issue where the MariaDB Cluster recovery process would fail if thesequence number was not found in the logs. The recovery process now checksthe complete log file for the sequence number and recovers the cluster.SeeLP#1821173for details.

  • Updates the default Grafana OpenSearch datasource configuration to usevalues for OpenSearch that work out of the box. Replaces the Elasticsearchvalues that were previously being used. The new configuration can beapplied by deleting your datasource and reconfiguring Grafana through kollaansible. In order to prevent dashboards from breaking when the datasourceis deleted, one should usedatasource variablesin Grafana. See bug2039500.

  • All stable RabbitMQ feature flags are now enabled during deployments,reconfigures, and upgrades. As such, the variablerabbitmq_feature_flags is no longer required. This is a partial fix toRabbitMQ SLURP support.LP#2049512

  • Fixes an issue where the Keystone admin endpoint would be recreated whenupgrading Keystone. The endpoint is now explicitly removed during theupgrade process.

  • Fixes skyline’s old format of stop task.It used docker_container which would causeproblems with podman deployments.

17.3.0

Upgrade Notes

  • If credentials are updated inpasswords.yml kolla-ansible is now ableto update these credentials in the keystone database and in the on diskconfig files.

    The changes topasswords.yml are applied oncekolla-ansible-iINVENTORY reconfigure has been run.

    If you want to revert to the old behavior - credentials not automaticallyupdating during reconfigure if they changed inpasswords.yml - you canspecify this by settingupdate_keystone_service_user_passwords:falsein your globals.yml.

    Notice that passwords are only changed if you change them inpasswords.yml. This mechanism is not a complete solution for automaticcredential rollover. No passwords are changed if you do not change theminsidepasswords.yml.

Bug Fixes

  • Fixes configuration of nova-compute and nova-compute-ironic,that will enable exposing vendordata over configdrive.LP#2049607

  • Fixes mariadb role deployment when using Ansible check mode.LP#2052501

  • Updated configuration of service user tokens for all Nova and Cinderservices to stop using admin role for service_token and use servicerole.

    SeeLP#[2004555] andLP#[2049762]for more details.

  • Changes to service user passwords inpasswords.yml will now be appliedwhen reconfiguring services.

    This behaviour can reverted by settingupdate_keystone_service_user_passwords:false.

    FixesLP#2045990

17.2.0

Bug Fixes

  • Fixes enabled usage audit notifications when theyare not needed. SeeLP##2049503.

  • Fixes an idempotency issue in the OpenSearch upgrade tasks where subsequentruns of kolla-ansible upgrade would leave shard allocation disabled.LP#2049512

17.1.0

New Features

  • Set a log retention policy for OpenSearch via Index State Management (ISM).Documentation.

Upgrade Notes

  • Added log retention in OpenSearch, previously handled by ElasticsearchCurator. By default the soft and hard retention periods are 30 and 60 daysrespectively. If you are upgrading from Elasticsearch, and have previouslyconfiguredelasticsearch_curator_soft_retention_period_days orelasticsearch_curator_hard_retention_period_days, those variables willbe used instead of the defaults. You should migrate your configurationto use the new variable names before the Caracal release.

Bug Fixes

  • Fixes non-persistent Neutron agent state data.LP2009884

  • Fixes long service restarts while using systemdLP#2048130.

  • Fixes an issue with high CPU usage of the cAdvisor container by setting theper-container housekeeping interval to the same value as the Prometheusscrape interval.LP#2048223

  • Fixes Nova operations using thescp command, such as cold migration orresize, on Debian Bookworm.LP#2048700

  • Fixes Docker health check for thesahara_engine container.LP#2046268

  • Added log retention in OpenSearch, previously handled by ElasticsearchCurator, now using Index State Management (ISM) OpenSearch bundled plugin.LP#2047037.

17.0.0

New Features

  • Adds http/2 support to HAProxy frontends.

  • Add Lets Encrypt TLS certificate service integration into Openstack deployment. Enables trusted TLS certificate generation option for secure communcation with OpenStack HAProxy instances usingletsencrypt_email,kolla_internal_fqdn and/orkolla_external_fqdn is required. One container runs an Apache ACME client webserver and one runs Lego for certificate retrieval and renewal. The Lego container starts a cron job which attempts to renew certificates every 12 hours.

  • Added capability to specify custom kernel modules for Neutron:neutron_modules_default: Lists default modules.neutron_modules_extra: For custom modules and parameters.

  • A customevent_pipeline.yaml file for the Ceilometer notificationservice is now processed withmerge_yaml. This allows Jinja2 to beused. Furthermore, it is possible to have a globalevent_pipeline.yaml and host-specificevent_pipeline.yamlfiles.

  • Supports Debian Bookworm (12) as host distribution.

  • Adds support for exposing Prometheus server on the external interface.This is disabled by default and can be enabled by settingenable_prometheus_server_external totrue. Basic auth is used toprotect the endpoint.

  • Addsprometheus_external_fqdn andprometheus_internal_fqdn tocustomise prometheus FQDNs.

  • Implements support for Podman deployment as an alternative to Docker.To perform deployment using Podman, set the variablekolla_container_engine to valuepodmaninside of theglobals.yml file.

  • Adds the ability for the instance label on prometheus metrics to bereplaced with the inventory hostname as opposed to using the ip address asthe metric label. The ip address is still used as the target address whichmeans that there is no issue of the hostname being unresolvable.

    More information on how to use this feature can be found in thereference documentation for logging and monitoring.

  • HAProxy supports setting nbthread via variable haproxy_threads.Threads are recommended instead of processes since HAProxy 1.8.They cannot be used both at the same time.

  • Adds single service external frontend feature to haproxy.Details are in thehaproxy guidesection of the documentation.

  • HORIZON_IMAGES_UPLOAD_MODE is now set to'direct' by default.This improves image uploads from clients, because these no longer useHorizon’s webserver as a staging area - the image upload goes directlyto Glance API.

  • Updates apache grok pattern to match the size of response in bytes,time taken to serve the request and user agent.

  • With the parameterironic_agent_files_directory it is possible toprovide the directory for theironic-agent.kernel andironic-agent.initramfs files. By default the parameter is set tothe value ofnode_custom_config. This corresponds to the existingbehaviour.

  • Addskeystone_federation_oidc_additional_options that allows to passadditional OIDC options.

  • Adds support for copying in{{node_custom_config}}/magnum/kubeconfigto Magnum containers formagnum-cluster-api driver.

  • You can now enable the usage of quorum queues in RabbitMQ for all servicesby setting the variableom_enable_rabbitmq_quorum_queues totrue.Notice that you can’t use quorum queues and high availability at the sametime. This is caught by a precheck.This feature is enabled by default to improve reliability of the messagingqueues.

  • Theetcd tooling has been updated to handle adding and removing nodes.Previously this was an undocumented manual process and required creatingservice containers. Operators can refer to theetcd admin guidefor more details.

  • Added a neutron check for ML2/OVS and ML2/OVN presenceat the start of deploy phase. It will fail ifneutron_plugin_agent is set toovn and use of ML2/OVScontainer detected. In case where neutron_plugin_agentis set toopenvswitch the check will fail when it detectsML2/OVN container or any of the OVN specific volumes.

  • Glance, cinder, manila services now supportconfiguration of multiple ceph cluster backends.For nova and gnocchi there is the possibility toconfigure different ceph clusters - for gnocchi thisis possible at the service level while for nova atthe host level. See the external ceph guidedocs.on how to set multiple ceph backends for more details.

  • The flag--check-expiry has been added to theoctavia-certificatescommand.kolla-ansibleoctavia-certificates--check-expiry<days> willcheck if the Octavia certificates are set to expire within a given numberof days.

  • The Octavia amphora provider driver improves control plane resiliency.Should a control plane host go down during a load balancer provisioningoperation, an alternate controller can resume the in-process provisioningand complete the request. This solves the issue with resources stuck inPENDING_* states by writing info about task states in persistent backendand monitoring job claims via jobboard. The jobboard feature is nowenabled by default. It requires the Redis service to be enabled as adependency. Use theenable_octavia_jobboard variable to overrideif needed.

  • With the parameterrabbitmq_datadir_volume it is possibleto use a directory as volume for the rabbitmq service. By default,a volume named rabbitmq is used (the previous default).

  • Adds newrestart_policy calledoneshot that does not createsystemd units and is used for bootstrap tasks.

  • Added support for Cinder-Backup with S3 backend.

  • Added support for Glance with S3 backend

  • In the configuration template of the Senlin service thecafileparameter is now set by default in theauthentication section.This way the use of self-signed certificates on the internal Keystoneendpoint is also usable in the Senlin service.

  • Adds support for ansible-core only installation (usekolla-ansibleinstall-deps to install required collections).

Upgrade Notes

  • Minimum supported Ansible version is now7 (ansible-core 2.14)and maximum supported is8 (ansible-core 2.15).

  • Default keystone user role has been changed from deprecated role_member_ tomember role.

  • Nowironic_tftp service does not bind on 0.0.0.0, by default it uses ipaddress of theapi_interface. To revert to the old behaviour, pleasesetironic_tftp_interface_address:0.0.0.0 inglobals.yml.

  • Remnants of Monasca, Storm, Kafka and Zookeeper have been removed(includingkolla-ansiblemonasca_cleanup command).

  • etcd has been upgraded to version3.4 in this release. Operators arehighly encouraged to read theupgrade notesfor impacts onetcd clients. Upgrades are only supported frometcd v3.3: Skip version upgrades are not supported. Please ensure thatadequate backups are taken before running the upgrade to guard againstdataloss.

  • etcd version3.4 drops support for thev3alpha endpoint. Internalkolla-ansible endpoints have been updated, but operators are stronglyencouraged to audit any customizations or external users ofetcd.

  • Prometheus now uses basic auth. The password is under the keyprometheus_password in the Kolla passwords file. The username isadmin. The default set of users can be changed using the variable:prometheus_basic_auth_users.

  • Configure Nova libvirt.num_pcie_ports to 16 by default. Nova currentlysets ‘num_pcie_ports’ to “0” (defaults to libvirt’s “1”), which isnot sufficient for hotplug use with ‘q35’ machine type.

  • Configuring HAProxy nbproc setting viahaproxy_processes andhaproxy_process_cpu_map variables has been dropped sincethreads are the recommended way to scale CPU performance since 1.8.This covershaproxy,glance-tls-proxy andneutron-tls-proxy.Please usehaproxy_threads andhaproxy_thread_cpu_map instead(orglance_tls_proxy_threads andglance_tls_proxy_thread_cpu_mapfor Glance TLS proxy andneutron_tls_proxy_threads andneutron_tls_proxy_thread_cpu_map for Neutron TLS proxy).

  • HORIZON_IMAGES_UPLOAD_MODE is now set to'direct' by default.In order to retain the previous default ('legacy') - pleasesetHORIZON_IMAGES_UPLOAD_MODE:'legacy' in yourcustom_local_settings file.

  • Quorum queues in RabbitMQ (controlled byom_enable_rabbitmq_quorum_queues variable) is enabled by default fromnow on.Support for non-HA RabbitMQ queues is dropped. Either quorum queues thatare enabled by default, or classic mirrored queues are required now.Migration procedure from non-HA to HA

  • Removes the restriction on the maximum supported version of 2.14.2 foransible-core. Any 2.14 series release is now supported.

  • The default value forceph_cinder_keyring has been changedfrom:“ceph.client.cinder.keyring”to:“client.{{ ceph_cinder_user }}.keyring”

    the default value forceph_cinder_backup_keyring has been changedfrom:“ceph.client.cinder-backup.keyring”to:“client.{{ ceph_cinder_backup_user }}.keyring”

    the default value forceph_glance_keyring has been changedfrom:“ceph.client.glance.keyring”to:“client.{{ ceph_glance_user }}.keyring”

    the default value forceph_manila_keyring has been changedfrom:“ceph.client.manila.keyring”to:“client.{{ ceph_manila_user }}.keyring”

    and the default value forceph_gnocchi_keyring has been changedfrom:“ceph.client.gnocchi.keyring”to:“client.{{ ceph_gnocchi_user }}.keyring”

    User who did override default values for the abovevariables have to change them according to the new pattern.

  • The Octavia amphora provider by default is now deployed with the jobboardfeature enabled. This requires the Redis service to be enabled as adependency, please update your configuration accordingly if needed.For futher information seeAmphorav2 docs

  • Enabledovn_emit_need_to_frag setting by default. It is useful whenexternal network’s MTU is lower then internal geneve networks.Host kernel needs to be in version >= 5.2 for this option to work.All Kolla supported host operating systems have higher kernel version.

  • Since kolla-ansible now also supports Podman, ansible modulekolla_docker has been renamed to kolla_container.

  • restart_policy:no will now create systemd units, but withRestart property set tono.

  • Changes default value of nova libvirt driver settingskip_cpu_compare_on_dest to true. With the libvirt driver, duringlive migration, skip comparing guest CPU with the destination host.When using QEMU >= 2.9 and libvirt >= 4.4.0, libvirt will do the correctthing with respect to checking CPU compatibility on the destination hostduring live migration.

  • Zun is currently incompatible with Debian Bookworm. This is because Zuncurrently has a hard dependency on a deprecated Docker feature. Operatorsupgrading from Bullseye are strongly encouraged to disable Zun first.While workarounds may be possible, none are currently tested in CI.

  • Support for Zun for this release has been provisionally dropped. This isdue to a number of base dependencies that require updating. The Zun imagesremain buildable, and the roles have not been removed, but a precheckhas been added to prevent breaking current deployments.

    Operators are strongly encouraged to hold off upgrading if Zun is arequirement. Please also consult the deprecation notes.

Deprecation Notes

  • Deprecates support for deploying Masakari.Support for deploying Masakari will be removed from Kolla Ansiblein the Caracal Release.

  • Zun is currently provisionally deprecated but not removed. If Zun regainscompatibility within the next release cycle, backports to this version ofKolla and Kolla-Ansible will be considered to provide a smooth upgradepath.

Security Issues

  • The kolla-genpwd, kolla-mergepwd, kolla-readpwd and kolla-writepwdcommands now creates or updates passwords.yml with correctpermissions. Also they display warning message about incorrectpermissions.

  • Restrict the access to the http Openstack services exposed /server-statusby default through the HAProxy on the public endpoint. Fixes issue forUbuntu/Debian installations. RockyLinux/CentOS not affected.LP#1996913

Bug Fixes

  • Fixes issues with OVN NB/SB DB deployment, where first node needs to berebootstrapped.LP#1875223

  • Fix MariaDB backup if enable_proxysql is enable

  • Fixes 504 timeout when scraping openstack exporter.Ensures that HAProxy server timeout is the same as thescrape timeout for the openstack exporter backend.LP#2006051

  • Fix improper use of--file parameter withdesignate-managepoolupdate command.LP#2012292 <https://bugs.launchpad.net/kolla-ansible/+bug/2012292>

  • Set correct permissions for opensearch-dashboard data locationLP#2020152 https://bugs.launchpad.net/kolla-ansible/+bug/2020152

  • Fix issue with octavia security group rules creation when usingIPv6 configuration for octavia management network.SeeLP#2023502for more details.

  • Fixes glance-api failed to start privsep daemon whencinder_backend_ceph is set to true.SeeLP#2024541for more details.

  • Fixes 2024554.Adds host andmariadb_port to the wsrep sync status check.This is so none standard ports can be used for mariadb deployments.LP#2024554

  • enable_keystone_federation andkeystone_enable_federation_openidhave not been explicitly handled as bool in various templates in thekeystone role so far.LP#2036390

  • Starting with ansible-core 2.13, list concatenation format is changedwhich resulted in inability to override horizon policy files.SeeLP#2045660for more details.

  • Fixes an issue when Kolla is setting the producer tasks to None,and this disables all designate producer tasks.LP#1879557

  • Fixes CloudKitty failing to query Prometheus now that basic authenticationis required.

  • Fixesironic_tftp which binds to all ip addresses on the system.Addedironic_tftp_interface,ironic_tftp_address_family andironic_tftp_interface_address parameters to set the address for theironic_tftp service.LP#2024664

  • Fixes the incorrect endpoint URLs and service type information for theCyborg service in the Keystone.LP#2020080

  • Fixes an issue when usingenable_prometheus_server_external inconjunction withhaproxy_single_external_frontend.

  • Fixes an issue where a Docker health check wasn’t configured forthe OpenSearch Dashboards container. See bug2028362.

  • Fixes an issue where Prometheus would fail to scrape the OpenStack exporterwhen using internal TLS with an FQDN.LP#2008208

  • Fixes prometheus grafana datasource using incorrect basic auth credentials.

  • Fixes an issue where ‘q35’ libvirt machine type VM could not hotplugmore than one PCIe device at a time.

  • Fixes an issue with prometheus scraping itself now that basic auth hasbeen enabled.

  • Fixes an issue where Fluentd was parsing Horizon WSGI application logsincorrectly.Horizon error logs are now written tohorizon-error.log instead ofhorizon.log.SeeLP#1898174

  • Fixes an issue where keepalived track script fails on single controllerenvironment and keepalived VIP goes into BACKUP state.keepalived_track_script_enabled variable has been introduced (default:true), which can be used to disable track scripts in keepalivedconfiguration.LP#2025219

  • Theetcd tooling has been updated to better serialize restarts whenapplying configuration or updates. Previously minor outages might haveoccurred since all services were restarted in the same task.

  • Fixes an issue were an OVS-DPDK task had a different nameto how it was being notified.

  • Fixes an issue where Prometheus scraping of Etcd metrics would fail if EtcdTLS is enabled.LP#2036950

  • Fixes an issue where it wasn’t possible to customise Nova service configat the individual service level, which is required in some use cases.

  • Added ability to define address for a separate tgtd network interface.

Other Notes

  • Refactors the MariaDB and RabbitMQ restart procedures to be compatible withAnsible 2.14.3+. SeeAnsible issue 80848 for details.