- Notifications
You must be signed in to change notification settings - Fork0
GO code and example YAML files to deploy vDPA VFs in a container running in kubernetes.
License
amorenoz/vdpa-deployment
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Example YAML files to deploy vDPA VFs in a container running in kubernetes.
VirtIO Data Path Acceleration (vDPA) is a technology that enables pods to useaccelerated network interfaces without having to include vendor specificdrivers. This is possible because vDPA-capable NICs implement the virtIOdatapath. The vDPA Framework is in charge of translating the vendor-specificcontrol path (that the NIC understands) to a vendor agnostic protocol(to be exposed to the application).
For an overview of the technology, read thevDPA overview blog post.More technical blog entries can also be read in theVirtio-networking series two.
Note that, apart from the vDPA kernel framework implemented in thelinux kernel, there is another vDPA framework in DPDK. However, the DPDKframework is out of the scope of this repository for now.
This repo combines several other repos to enable vDPA VFs to be used incontainers. The following diagram shows an overview of the end-to-endvDPA solution in Kubernetes:
More information about this solution can be found in theDesign Document
As shown in the diagram, the Kubernetes vDPA solution will support bothSR-IOV CNI (for legacy SR-IOV devices) and theAccelerated Bridge CNI(for switchdev devices). Currently, this repository focuses onusing SR-IOV CNI
To leverage this repo, download this repo, runmake all
:
make all
make all
builds the following images/binaries:
sriov-device-plugin
docker image: Located in thesriov-dpdirectory. This image takes the upstream SR-IOV Device Plugin and appliessome local patches to enable it to work with vDPA as well. Seesriov-dp.sriov-cni
binary and docker image: Located in thesriov-cni directory.To install the sriov-cni, the binary must be copied to the default CNI directory,typically/opt/cni/bin/
. Alternatively, a DaemonSet can be deployed which willtake care of doing that in all the nodes. Seesriov-cni.dpdk-app-devel
docker image: This image contains a recent DPDK installation
If you don't want to build all the projects from source, docker imageswill be provided for convenience. SeeDocker Hub section
On multi-node clusters you might need to load the built imagesinto the different nodes:
./scripts/load-image.sh nfvpe/sriov-device-plugin user@HOSTNAME ./scripts/load-image.sh nfvpe/sriov-cni user@HOSTNAME
The following set of commands will deploy the images above.'NOTE:' Update configMap-vdpa.yaml to match local HW.
kubectl create -f ./deployment/netAttach-vdpa-vhost-mlx.yaml kubectl create -f ./deployment/configMap-vdpa.yaml kubectl create -f ./deployment/sriovdp-vdpa-daemonset.yaml kubectl get node $HOSTNAME -o json | jq '.status.allocatable'
To deploy a sample application, seeSample Applications
The following set of commands will tear down the deployment above.
kubectl delete -f ./deployment/sriovdp-vdpa-daemonset.yaml kubectl delete -f ./deployment/configMap-vdpa.yaml kubectl create -f ./deployment/netAttach-vdpa-vhost-mlx.yaml
Once the SR-IOV Device Plugin and SR-IOV CNI have been installed, the applicationconsuming the vDPA devices can be started. This repository will provide somesample applications:
- single-pod: A simgle DPDK pod using a vDPA interface
- vdpa-traffic-test: A simple test that deploys two pods that send packets to eachother (using testpmd)
- More TBD
The single pod application deploys a pod that runs testpmd on the vdpa device.The testpmd arguments can be modified indeployments/vdpa-single.yaml
To deploy the application run:
kubectl apply -f deployment/vdpa-single.yaml
Inspect the logs with:
kubectl apply logs -f vdpa-pod
Delete the application by running:
kubectl delete -f deployment/vdpa-single.yaml
The traffic test deploys two pods, one generates traffic, the other receives it.
In order to select where the generator and sink runs, node selectors are used.
First, add a label to the node you want the generator to run on:
kubectl label node GEN_NODENAME vdpa-test-role-gen=true
kubectl label node SINK_NODENAME vdpa-test-role-sink=true
Deploy the application by running:
kubectl apply -f deployment/vdpa-traffic-test.yaml
Delete the application by running:
kubectl delete -f deployment/vdpa-traffic-test.yaml
This setup assumes:
- Running on bare metal.
- Kubernetes is installed.
- Multus CNI is installed.
- vDPA VFs have already been created and bound to vhost-vdpa driver
For reference, this repo was developed and tested on:
- Fedora 32 - kernel 5.10.0+ (modified: SeeHugePage Cgroup Known Issue)
- GO: go1.14.9
- Docker: 19.03.11
- Kubernetes: v1.19.3
This repo has been tested with:
- Nvidia Mellanox ConnectX-6 Dx
To deploy the Kubernetes-vDPA solution, the following steps must be taken:
- Install SR-IOV CNI
- Create Network-Attachment-Definition
- Create ConfigMap
- Start SR-IOV Device Plugin Daemonset
The changes to enable the SR-IOV CNI to also manage vDPA interfacesare in this repository:
https://github.com/amorenoz/sriov-cni/tree/rfe/vdpa
To build SR-IOV CNI in a Docker image:
make sriov-cni
To run:
kubectl create -f ./deployment/sriov-cni-daemonset.yaml
As with all DaemonSet YAML files, there is a version of the file forKubernetes versions prior to 1.16 in thek8s-pre-1-16
subdirectory.
The Network-Attachment-Definition define the attributes of the networkfor the interface (in this case a vDPA VF) that is being attached tothe pod.
There are three sample Network-Attachment-Definition in thedeployment
directory. You can modify them freely to match your setup. For moreinformation, see theSR-IOV CNI Configuration reference
The following commands setup those networks:
kubectl create -f ./deployment/netAttach-vdpa-vhost-mlx.yaml kubectl create -f ./deployment/netAttach-vdpa-vhost-mlx-1000.yaml kubectl create -f ./deployment/netAttach-vdpa-vhost-mlx-2000.yaml
The following command can be used to determine the set ofNetwork-Attachment-Definitions currently created on the system:
kubectl get network-attachment-definitions NAME AGE vdpa-mlx-vhost-net 24h vdpa-mlx-vhost-net-1000 24h vdpa-mlx-vhost-net-2000 24h
The following commands delete those networks:
kubectl delete -f ./deployment/netAttach-vdpa-vhost-mlx.yaml kubectl delete -f ./deployment/netAttach-vdpa-vhost-mlx-1000.yaml kubectl delete -f ./deployment/netAttach-vdpa-vhost-mlx-2000.yaml
The ConfigMap provides the filters to the SR-IOV Device-Plugin toallow it to select the set of VFs that are available to a givenNetwork-Attachment-Definition. The parameter ‘resourceName’ mapsback to one of the Network-Attachment-Definitions defined earlier.
The SR-IOV Device Plugin has been extended to support an additionalfilter that is used to select the vdpa type to be used:vdpaType
.Supported values are:
- vhost
- virtio
The following example configMap creates two pools of vdpa devicesbound to vhost-vdpa driver:
Example:
cat deployment/configMap-vdpa.yaml apiVersion: v1kind: ConfigMapmetadata: name: sriovdp-config namespace: kube-systemdata: config.json: | { "resourceList": [{ "resourceName": "vdpa_ifcvf_vhost", "selectors": { "vendors": ["1af4"], "devices": ["1041"], "drivers": ["ifcvf"], "vdpaType": "vhost" } }, { "resourceName": "vdpa_mlx_vhost", "selectors": { "vendors": ["15b3"], "devices": ["101e"], "drivers": ["mlx5_core"], "vdpaType": "vhost" } } ] }
NOTE:
This file will most likely need to be updated before using tomatch interface on deployed hardware. To obtain the other attributes,like vendor and devices, use the ‘lspci’ command:
lspci -nn | grep Ethernet:05:00.1 Ethernet controller [0200]: Intel Corporation Device [8086:15fe]05:00.2 Ethernet controller [0200]: Red Hat, Inc. Virtio network device [1af4:1041] (rev 01)05:00.3 Ethernet controller [0200]: Red Hat, Inc. Virtio network device [1af4:1041] (rev 01)65:00.0 Ethernet controller [0200]: Mellanox Technologies MT2892 Family [ConnectX-6 Dx] [15b3:101d]65:00.1 Ethernet controller [0200]: Mellanox Technologies MT2892 Family [ConnectX-6 Dx] [15b3:101d]65:00.2 Ethernet controller [0200]: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function [15b3:101e]65:00.3 Ethernet controller [0200]: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function [15b3:101e]
The following command creates the configMap:
cd $GOPATH/src/github.com/redhat-nfvpe/vdpa-deployment kubectl create -f ./deployment/configMap-vdpa.yaml
The following command can be used to determine the set ofconfigMaps currently created in the system:
kubectl get configmaps --all-namespacesNAMESPACE NAME DATA AGEkube-public cluster-info 2 5d23hkube-system coredns 1 5d23hkube-system extension-apiserver-authentication 6 5d23hkube-system multus-cni-config 1 5d23hkube-system sriovdp-config 1 4h24m
The following command deletes the configMap:
kubectl delete -f ./deployment/configMap-vdpa.yaml
The changes to enable the SR-IOV Device Plugin to also managevDPA interfaces are currently in this repository:
https://github.com/amorenoz/sriov-network-device-plugin/tree/vdpaInfoProvider
To build the SR-IOV Device Plugin run:
make sriov-dp
To build from scratch:
make sriov-dp SCRATCH=y
Deploy the SR-IOV Device Plugin by running the following command:
kubectl create -f ./deployment/sriov-dp-daemonset.yaml
The SR-IOV Device Plugin runs as a DaemonSet (always running as opposedto CNI which is called and returns immediately). It is recommended thatthe SR-IOV Device Plugin run in a container. So this set is to start thecontainer the SR-IOV Device Plugin is running in.
The following command started the SR-IOV Device Plugin DaemonSet:
cd $GOPATH/src/github.com/redhat-nfvpe/vdpa-deployment kubectl create -f ./deployment/sriov-vdpa-daemonset.yaml
To determine if the SR-IOV Device Plugin is running, use thefollowing command and find thekube-sriov-device-plugin-amd64-xxx
pod:
kubectl get pods --all-namespacesNAMESPACE NAME READY STATUS RESTARTS AGEkube-system coredns-5c98db65d4-78v6k 1/1 Running 16 5d23hkube-system coredns-5c98db65d4-r5mmj 1/1 Running 16 5d23hkube-system etcd-nfvsdn-22-oot 1/1 Running 16 5d23hkube-system kube-apiserver-nfvsdn-22-oot 1/1 Running 16 5d23hkube-system kube-controller-manager-nfvsdn-22-oot 1/1 Running 16 5d23hkube-system kube-flannel-ds-amd64-jvnm5 1/1 Running 16 5d23hkube-system kube-multus-ds-amd64-lxv5v 1/1 Running 16 5d23hkube-system kube-proxy-6w7sn 1/1 Running 16 5d23hkube-system kube-scheduler-nfvsdn-22-oot 1/1 Running 16 5d23hkube-system kube-sriov-device-plugin-amd64-6cj7g 1/1 Running 0 4h6m
Once the SR-IOV Device Plugin is started, it probes the systemlooking for VFs that meet the selector’s criteria. This takes acouple of seconds to collect. The following command can be used todetermine the number of detected VFs. (NOTE: This is the allocatedvalues and does not change as VFs are doled out.) See
for node in $(kubectl get nodes | grep Ready | awk '{print $1}' ); do echo "Node $node:" ; kubectl get node $node -o json | jq '.status.allocatable'; doneNode virtlab711.virt.lab.eng.bos.redhat.com:{ "cpu": "32", "ephemeral-storage": "859332986687", "hugepages-1Gi": "10Gi", "hugepages-2Mi": "0", "intel.com/vdpa_intel_vhost: "0", "intel.com/vdpa_mlx_vhost": "2", "memory": "120946672Ki", "pods": "110"}Node virtlab712.virt.lab.eng.bos.redhat.com:{ "cpu": "32", "ephemeral-storage": "844837472087", "hugepages-1Gi": "10Gi", "hugepages-2Mi": "0", "intel.com/vdpa_intel_vhost: "0", "intel.com/vdpa_mlx_vhost": "2", "memory": "120950288Ki", "pods": "110"}
All the images have been pushed to Docker Hub. TBD
There is an issue inrecent kernels (>=5.7.0) that affects hugetlb-cgroup reservation.There are two ways of working arount this issue:
- Build a kernel withthe patch that fixes the issue
- Disable hugepages in your applications. To do that, remove the hugepage mount andresource request in your pod deployment file and pass --no-huge to your DPDK app.
This is a POC that was built (after significant re-work) based on the workdone for Kubecon 2019. This work can be seen in this repository's historyand in thearchive docs