diff --git a/doc/source/test_plans/hardware_features/hardware_offloads/network_diagram_VM-to-VM.png b/doc/source/test_plans/hardware_features/hardware_offloads/network_diagram_VM-to-VM.png new file mode 100644 index 0000000..81d4590 Binary files /dev/null and b/doc/source/test_plans/hardware_features/hardware_offloads/network_diagram_VM-to-VM.png differ diff --git a/doc/source/test_plans/hardware_features/hardware_offloads/network_diagram_VM-to-VM_VxLAN.png b/doc/source/test_plans/hardware_features/hardware_offloads/network_diagram_VM-to-VM_VxLAN.png new file mode 100644 index 0000000..eed7d47 Binary files /dev/null and b/doc/source/test_plans/hardware_features/hardware_offloads/network_diagram_VM-to-VM_VxLAN.png differ diff --git a/doc/source/test_plans/hardware_features/hardware_offloads/network_diagram_physical.png b/doc/source/test_plans/hardware_features/hardware_offloads/network_diagram_physical.png new file mode 100644 index 0000000..da84a91 Binary files /dev/null and b/doc/source/test_plans/hardware_features/hardware_offloads/network_diagram_physical.png differ diff --git a/doc/source/test_plans/hardware_features/hardware_offloads/test_plan.rst b/doc/source/test_plans/hardware_features/hardware_offloads/test_plan.rst new file mode 100644 index 0000000..9db7b5e --- /dev/null +++ b/doc/source/test_plans/hardware_features/hardware_offloads/test_plan.rst @@ -0,0 +1,519 @@ +.. _hardware_offloads_performance_analysis: + +==================================================== +Hardware Offloads - Comparative performance analysis +==================================================== + +:status: **draft** +:version: 1.0 + +:Abstract: + + The aim of this document is to present scenarios for a comparative + performance analysis of various hardware offloads. We examine differences + in throughput as well as the effect on CPU utilization when these hardware + offloads are enabled. + + The end goal is to provide documentation that a deployer of OpenStack + can use to answer a number of common questions when constructing their + environments: + + - Enabling which hardware offloads will give me the biggest "bang for + the buck" when it comes to increased network throughput? + + - What impact on CPU utilization does disabling certain hardware + offloads have? + +:Conventions: + + - **VxLAN:** Virtual eXtensible LAN + + - **NIC:** Network Interface Card + + - **TX:** Transmit (packets transmitted out the interface) + + - **RX:** Receive (packets received on the interface) + + - **TSO:** TCP Segmentation Offload + + - **GRO:** Generic Receive Offload + + - **GSO:** Generic Segmentation Offload + + - **MTU:** Maximum Transmission Unit + + - **OVS:** Open vSwitch + +Test Plan +========= + +The following hardware offloads were examined in this performance +comparison: + +- Tx checksumming + +- TCP segmentation offload (TSO) + +- Generic segmentation offload (GSO) + +- Tx UDP tunneling segmentation + +- Rx checksumming + +- Generic receive offload (GRO) + +The performance analysis is based on the results of running test +scenarios in a lab consisting of two hardware nodes. + +- Scenario #1: Baseline physical + +- Scenario #2: Baseline physical over VxLAN tunnel + +- Scenario #3: VM-to-VM on different hardware nodes + +- Scenario #4: VM-to-VM on different nodes over VxLAN tunnel + +For each of the above scenarios, we used `netperf` to measure the network +throughput and the `sar` utility to measure CPU usage. + +One node was used as the transmitter and the other node served as a +receiver and had a `netserver` daemon running on it. + +Performance validation involved running both TCP and UDP stream tests +with all offloads on and then turning each one of the offloads off, one +by one. + +For the transmit side the following offloads were triggered: + +- TSO + +- Tx checksumming + +- GSO (no VxLAN tunneling) + +- Tx-udp\_tnl-segmentation (VxLAN tunneling only) + +For the receive side: + +- GRO + +- Rx checksumming + +Base physical tests involved turning offloads on and off for the `p514p2` +physical interface on both transmitter and receiver. + +Some hardware offloads require certain other offloads to be enabled or +disabled in order to make any effect. For example, TSO can not be +enabled without Tx checksumming being turned on while GSO only kicks in +when TSO is disabled. Due to that the following order of turning off +offloads was chosen for the tests: + ++-------------------+--------------------------------+--------------------------------+--------------------------------+------------------------------------+ +| **Transmitter** | All on | TSO off | Tx checksumming off | GSO/tx-udp\_tnl-segmentation off | ++===================+================================+================================+================================+====================================+ +| Active offloads | TSO, | tx checksumming, | GSO/tx-udp\_tnl-segmentation | All off | +| | | | | | +| | tx checksumming | GSO/tx-udp\_tnl-segmentation | | | ++-------------------+--------------------------------+--------------------------------+--------------------------------+------------------------------------+ +| Offloads on | TSO, | tx checksumming, | GSO/tx-udp\_tnl-segmentation | All off | +| | | | | | +| | tx checksumming, | GSO/tx-udp\_tnl-segmentation | | | +| | | | | | +| | GSO/tx-udp\_tnl-segmentation | | | | ++-------------------+--------------------------------+--------------------------------+--------------------------------+------------------------------------+ + ++-------------------+-------------------+-------------------+-----------------------+ +| **Receiver** | All on | GRO off | Rx checksumming off | ++===================+===================+===================+=======================+ +| Active offloads | GRO, | Rx checksumming | All off | +| | | | | +| | Rx checksumming | | | ++-------------------+-------------------+-------------------+-----------------------+ +| Offloads on | GRO, | Rx checksumming | All off | +| | | | | +| | Rx checksumming | | | ++-------------------+-------------------+-------------------+-----------------------+ + +Test Environment +---------------- + +Preparation +^^^^^^^^^^^ + +Baseline physical host setup +++++++++++++++++++++++++++++ + +We force the physical device to use only one queue. This allows us to +obtain consistent results between test runs as we can avoid the +possibility of RSS assigning the flows to a different queue, and as a +result a different CPU core: + +``$ sudo ethtool -L p514p2 combined 1`` + +where `p514p2` is the physical interface name. + +Pin that queue to CPU0 either using `set\_irq\_affinity` script or by +running + +``$ sudo echo 1 > /proc/irq//smp_affinity`` + +This guarantees that all interrupts related to the `p514p2` interface are +processed on a designated CPU (CPU0 in our case) which allows to isolate +the tests. + +At the same time, any traffic generators, like `netperf`, were manually +made to use CPU1 using `taskset`. + +``$ sudo taskset -c 1 netperf -H -cC [-t UDP_STREAM]`` + +Next, it is important to prevent CPUs from switching frequencies and +power states (or C-states) as this can affect CPU utilization and thus +affect test results. + +Make the processors keep the C0 C-state by writing 0 to +`/dev/cpu\_dma\_latency` file. This will prevent any other C-states with +from being used, as long as the file `/dev/cpu\_dma\_latency` is kept +open. It can be easily done with `this helper +program `__: + +``$ make setcpulatency`` + +``$ sudo ./setcpulatency 0 &`` + +We made the CPUs run at max frequency by choosing the `“performance”` +scaling governor: + +.. code-block:: bash + + $ echo "performance" | sudo tee + /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor + +In order to set up a VxLAN tunnel, an OVS bridge (`virbr`) was created: + +``$ sudo ovs-vsctl add-br virbr`` + +Next, we created a VxLAN port, specifying the corresponding remote ip: + +.. code-block:: bash + + $ sudo ovs-vsctl add-port virbr vxlan -- set interface vxlan type=vxlan + options:remote_ip=10.1.2.2 options:local_ip=10.1.2.1 + +.. image:: network_diagram_physical.png + :width: 650px + +VM-to-VM on different nodes setup ++++++++++++++++++++++++++++++++++ + +Scenarios 3 and 4 assess the performance impact of hardware offloads in +a deployment with two VMs running on separate hardware nodes. The +scenarios measure VM-to-VM TCP/UDP traffic throughput, with and without +VxLAN encapsulation. + +In order to get more accurate CPU consumption metrics, the CPUs on which +VMs would be running should be isolated from the kernel scheduler. This +prevents any other processes from running on those CPUs. In this +installation, half of the CPUs are isolated by adding the following to +the end of the ``/etc/default/grub`` file: + +``GRUB_CMDLINE_LINUX=”$GRUB_CMDLINE_LINUX isolcpus=6-11”`` + +These settings were applied and the nodes rebooted: + +``$ sudo update-grub`` + +``$ sudo reboot`` + +libvirt was installed and the default storage pool appropriately +configured: + +``$ sudo apt-get install qemu-kvm libvirt-bin virtinst virt-manager`` + +``$ sudo virsh -c qemu:///system pool-define-as store dir --target /home/elena/store`` + +``$ sudo virsh -c qemu:///system pool-autostart store`` + +``$ sudo virsh -c qemu:///system pool-start store`` + +An OVS bridge (`virbr`) was created next: + +``$ sudo ovs-vsctl add-br virbr`` + +Then, we added physical interface `p514p1` as a port to `virbr` and cleaned +all IP addresses from it: + +``$ sudo ovs-vsctl add-port virbr p514p1`` + +``$ sudo ifconfig p514p1 0`` + +An IP address from the same subnet as the VMs was then added to `virbr`. +In this case 10.1.6.0/24 was the subnet used for VM-to-VM communication. + +``$ sudo ip addr add 10.1.6.1/24 dev virbr`` + +From that point, virbr had two IP addresses: 10.1.1.1 and 10.1.6.1 +(10.1.1.2 and 10.1.6.3 on Host 2). Finally, we created a tap port vport1 +that will be used to connect the VM to `virbr`: + +``$ sudo ip tuntap add mode tap vport1`` + +``$ sudo ovs-vsctl add-port virbr vport1`` + +**Guest setup** + +In scenarios 3 and 4, an Ubuntu Trusty cloud image is being used. The +VMs were defined from an XML domain file. Each VM was pinned to a pair +of CPUs that were isolated from the kernel scheduler in the following +way in the `libvirt.xml` files: + +.. code-block:: xml + + 2 + + + + + +Each VM has two interfaces: `eth0` connects through a tap device (`vport1`) +to an OVS bridge (`virbr`) from which there is a link to another host, +`eth1` is connected to em1 through a virtual network (with a NAT) which +allows VM to access the Internet. Interface configuration was done as +follows: + +.. code-block:: xml + + + + + + + + + + + + + + + +Interface configuration for VM1 was done in the following way: + +/etc/network/interfaces.d/eth0.cfg + +.. code-block:: bash + + # The primary network interface + + auto eth0 + iface eth0 inet static + address 10.1.6.2 + netmask 255.255.255.0 + network 10.1.6.0 + nexthop 10.1.6.1 + broadscast 10.1.6.255 + +/etc/network/interfaces.d/eth1.cfg + +.. code-block:: bash + + auto eth1 + iface eth1 inet static + address 192.168.100.2 + netmask 255.255.255.0 + network 192.168.100.0 + gateway 192.168.100.1 + broadscast 192.168.100.255 + dns-nameservers 8.8.8.8 8.8.4.4 + +And on VM2: + +/etc/network/interfaces.d/eth0.cfg + +.. code-block:: bash + + # The primary network interface + + auto eth0 + iface eth0 inet static + address 10.1.6.4 + netmask 255.255.255.0 + network 10.1.6.0 + nexthop 10.1.6.3 + broadscast 10.1.6.255 + +/etc/network/interfaces.d/eth1.cfg + +.. code-block:: bash + + auto eth1 + iface eth1 inet static + address 192.168.100.129 + netmask 255.255.255.0 + network 192.168.100.0 + gateway 192.168.100.1 + broadscast 192.168.100.255 + dns-nameservers 8.8.8.8 8.8.4.4 + +.. image:: network_diagram_VM-to-VM.png + :width: 650px + +For scenario 4, the `p514p1` port was removed from `virbr` and its IP +address restored: + +``$ sudo ovs-vsctl del-port virbr p514p1`` + +``$ sudo ip addr add 10.1.1.1/24 dev p514p1`` + +For setting up a VxLAN tunnel we added a `vxlan` type port to the `virbr` +bridge: + +# Host1 + +.. code-block:: bash + + $ sudo ovs-vsctl add-port virbr vxlan1 -- set interface vxlan1 + type=vxlan options:remote_ip=10.1.1.2 options:local_ip=10.1.1.1 + +# Host2 + +.. code-block:: bash + + $ sudo ovs-vsctl add-port virbr vxlan1 -- set interface vxlan1 + type=vxlan options:remote_ip=10.1.1.1 options:local_ip=10.1.1.2 + +.. image:: network_diagram_VM-to-VM_VxLAN.png + :width: 650px + +Environment description +^^^^^^^^^^^^^^^^^^^^^^^ + +Hardware +++++++++ + +The environment consists of two hardware nodes with the following +configuration: + +.. table:: + + +-------------+-------+-----------------------------------------------------------------------------------------------------------------------+ + | Parameter | Value | Comments | + +-------------+-------+-----------------------------------------------------------------------------------------------------------------------+ + | Server | | Supermicro SYS-5018R-WR, 1U, 1xCPU, 4/6 FAN, 4x 3,5" SAS/SATA Hotswap, 2x PS | + +-------------+-------+-----------------------------------------------------------------------------------------------------------------------+ + | Motherboard | | Supermicro X10SRW-F, 1xCPU(LGA 2011), Intel C612, 8xDIMM Up To 512GB RAM, 10xSATA3, IPMI, 2xGbE LAN,,sn: NM15BS004776 | + +-------------+-------+-----------------------------------------------------------------------------------------------------------------------+ + | CPU | | Intel Xeon E5-2620v3, 2.4GHz, Socket 2011, 15MB Cache, 6 core, 85W | + +-------------+-------+-----------------------------------------------------------------------------------------------------------------------+ + | RAM | | 4x 16GB Samsung M393A2G40DB0-CPB DDR-IV PC4-2133P ECC Reg. CL13 | + +-------------+-------+-----------------------------------------------------------------------------------------------------------------------+ + | Storage | | HDD: 2x 1TB Seagate Constellation ES.3, ST1000NM0033, SATA,6.0Gb/s,7200 RPM,128MB Cache, 3.5” | + +-------------+-------+-----------------------------------------------------------------------------------------------------------------------+ + | NIC | | AOC-STG-i4S, PCI Express 3.0, STD 4-port 10 Gigabit Ethernet SFP+ (`Intel XL710 controller`_) | + +-------------+-------+-----------------------------------------------------------------------------------------------------------------------+ + +Software +++++++++ + +This section describes installed software. + ++--------------+-------+------------------+ +| Parameter | Value | Comment | ++--------------+-------+------------------+ +| OS | | Ubuntu 14.04 | ++--------------+-------+------------------+ +| Kernel | | 4.2.0-27-generic | ++--------------+-------+------------------+ +| QEMU | | 2.0.0 | ++--------------+-------+------------------+ +| Libvirt | | 1.2.2 | ++--------------+-------+------------------+ +| Open vSwitch | | 2.0.2 | ++--------------+-------+------------------+ +| Netperf | | 2.7.0 | ++--------------+-------+------------------+ + +Test Case 1: Baseline physical scenario +--------------------------------------- + +Description +^^^^^^^^^^^ + +This test measures network performance with hardware offloads on/off for two +hardware nodes when sending non-encapsulated traffic. + +List of performance metrics +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +======== =============== ================= ===================================================== +Priority Value Measurement Units Description +======== =============== ================= ===================================================== +1 TCP throughput 10^6 bits/sec Average throughput for TCP traffic +1 UDP throughput 10^6 bits/sec Average throughput for UDP traffic +1 CPU consumption % Average utilization of CPU used for packet processing +======== =============== ================= ===================================================== + +Test Case 2: Baseline physical over VxLAN scenario +-------------------------------------------------- + +Description +^^^^^^^^^^^ + +This test measures network performance with hardware offloads on/off for two +hardware nodes when sending VxLAN-encapsulated traffic. + +List of performance metrics +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +======== =============== ================= ===================================================== +Priority Value Measurement Units Description +======== =============== ================= ===================================================== +1 TCP throughput 10^6 bits/sec Average throughput for TCP traffic +1 UDP throughput 10^6 bits/sec Average throughput for UDP traffic +1 CPU consumption % Average utilization of CPU used for packet processing +======== =============== ================= ===================================================== + +Test Case 3: VM-to-VM on different nodes scenario +------------------------------------------------- + +Description +^^^^^^^^^^^ + +This test assesses the performance impact of hardware offloads in +a deployment with two VMs running on separate hardware nodes. The +scenarios measure VM-to-VM TCP/UDP traffic throughput as well as host +CPU consumption. + +List of performance metrics +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +======== =============== ================= ===================================================== +Priority Value Measurement Units Description +======== =============== ================= ===================================================== +1 TCP throughput 10^6 bits/sec Average throughput for TCP traffic +1 UDP throughput 10^6 bits/sec Average throughput for UDP traffic +1 CPU consumption % Average utilization of CPU used for packet processing +======== =============== ================= ===================================================== + +Test Case 4: VM-to-VM on different nodes over VxLAN scenario +------------------------------------------------------------ + +Description +^^^^^^^^^^^ + +This test assesses the performance impact of hardware offloads in +a deployment with two VMs running on separate hardware nodes. The +scenarios measure VM-to-VM TCP/UDP traffic throughput as well as host +CPU consumption in case VxLAN encapsulation is used. + +List of performance metrics +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +======== =============== ================= ===================================================== +Priority Value Measurement Units Description +======== =============== ================= ===================================================== +1 TCP throughput 10^6 bits/sec Average throughput for TCP traffic +1 UDP throughput 10^6 bits/sec Average throughput for UDP traffic +1 CPU consumption % Average utilization of CPU used for packet processing +======== =============== ================= ===================================================== + +.. _Intel XL710 controller: http://www.intel.com/content/www/us/en/embedded/products/networking/xl710-10-40-gbe-controller-brief.html \ No newline at end of file diff --git a/doc/source/test_plans/hardware_features/index.rst b/doc/source/test_plans/hardware_features/index.rst new file mode 100644 index 0000000..22cd646 --- /dev/null +++ b/doc/source/test_plans/hardware_features/index.rst @@ -0,0 +1,12 @@ +.. raw:: pdf + + PageBreak oneColumn + +============================ +Hardware features test plans +============================ + +.. toctree:: + :maxdepth: 3 + + hardware_offloads/test_plan \ No newline at end of file diff --git a/doc/source/test_plans/index.rst b/doc/source/test_plans/index.rst index 6427ecf..71050f6 100644 --- a/doc/source/test_plans/index.rst +++ b/doc/source/test_plans/index.rst @@ -18,4 +18,5 @@ Test Plans keystone/plan container_cluster_systems/plan neutron_features/l3_ha/test_plan + hardware_features/index diff --git a/doc/source/test_results/hardware_features/hardware_offloads/cpu_consumption01.png b/doc/source/test_results/hardware_features/hardware_offloads/cpu_consumption01.png new file mode 100755 index 0000000..48299fb Binary files /dev/null and b/doc/source/test_results/hardware_features/hardware_offloads/cpu_consumption01.png differ diff --git a/doc/source/test_results/hardware_features/hardware_offloads/cpu_consumption02.png b/doc/source/test_results/hardware_features/hardware_offloads/cpu_consumption02.png new file mode 100755 index 0000000..607c9dd Binary files /dev/null and b/doc/source/test_results/hardware_features/hardware_offloads/cpu_consumption02.png differ diff --git a/doc/source/test_results/hardware_features/hardware_offloads/cpu_consumption03.png b/doc/source/test_results/hardware_features/hardware_offloads/cpu_consumption03.png new file mode 100755 index 0000000..e0dd0da Binary files /dev/null and b/doc/source/test_results/hardware_features/hardware_offloads/cpu_consumption03.png differ diff --git a/doc/source/test_results/hardware_features/hardware_offloads/cpu_consumption04.png b/doc/source/test_results/hardware_features/hardware_offloads/cpu_consumption04.png new file mode 100755 index 0000000..310c78e Binary files /dev/null and b/doc/source/test_results/hardware_features/hardware_offloads/cpu_consumption04.png differ diff --git a/doc/source/test_results/hardware_features/hardware_offloads/cpu_consumption05.png b/doc/source/test_results/hardware_features/hardware_offloads/cpu_consumption05.png new file mode 100755 index 0000000..77f4037 Binary files /dev/null and b/doc/source/test_results/hardware_features/hardware_offloads/cpu_consumption05.png differ diff --git a/doc/source/test_results/hardware_features/hardware_offloads/cpu_consumption06.png b/doc/source/test_results/hardware_features/hardware_offloads/cpu_consumption06.png new file mode 100755 index 0000000..fb2e293 Binary files /dev/null and b/doc/source/test_results/hardware_features/hardware_offloads/cpu_consumption06.png differ diff --git a/doc/source/test_results/hardware_features/hardware_offloads/cpu_consumption07.png b/doc/source/test_results/hardware_features/hardware_offloads/cpu_consumption07.png new file mode 100755 index 0000000..a5454c9 Binary files /dev/null and b/doc/source/test_results/hardware_features/hardware_offloads/cpu_consumption07.png differ diff --git a/doc/source/test_results/hardware_features/hardware_offloads/cpu_consumption08.png b/doc/source/test_results/hardware_features/hardware_offloads/cpu_consumption08.png new file mode 100755 index 0000000..7a78ce4 Binary files /dev/null and b/doc/source/test_results/hardware_features/hardware_offloads/cpu_consumption08.png differ diff --git a/doc/source/test_results/hardware_features/hardware_offloads/cpu_consumption09.png b/doc/source/test_results/hardware_features/hardware_offloads/cpu_consumption09.png new file mode 100755 index 0000000..b04f777 Binary files /dev/null and b/doc/source/test_results/hardware_features/hardware_offloads/cpu_consumption09.png differ diff --git a/doc/source/test_results/hardware_features/hardware_offloads/cpu_consumption10.png b/doc/source/test_results/hardware_features/hardware_offloads/cpu_consumption10.png new file mode 100755 index 0000000..27d598f Binary files /dev/null and b/doc/source/test_results/hardware_features/hardware_offloads/cpu_consumption10.png differ diff --git a/doc/source/test_results/hardware_features/hardware_offloads/cpu_consumption11.png b/doc/source/test_results/hardware_features/hardware_offloads/cpu_consumption11.png new file mode 100755 index 0000000..5ff42e3 Binary files /dev/null and b/doc/source/test_results/hardware_features/hardware_offloads/cpu_consumption11.png differ diff --git a/doc/source/test_results/hardware_features/hardware_offloads/cpu_consumption12.png b/doc/source/test_results/hardware_features/hardware_offloads/cpu_consumption12.png new file mode 100755 index 0000000..af3b83c Binary files /dev/null and b/doc/source/test_results/hardware_features/hardware_offloads/cpu_consumption12.png differ diff --git a/doc/source/test_results/hardware_features/hardware_offloads/test_results.rst b/doc/source/test_results/hardware_features/hardware_offloads/test_results.rst new file mode 100644 index 0000000..06cb26c --- /dev/null +++ b/doc/source/test_results/hardware_features/hardware_offloads/test_results.rst @@ -0,0 +1,197 @@ +================================ +Hardware Offloads - Test results +================================ + +Baseline physical scenario +-------------------------- + +In this scenario, hardware offloads provided a significant performance +gain in terms of CPU usage and throughput both for the transmitter and +the receiver. Throughput being was affected in tests with small MTU +(1500) while CPU consumption changes were only visible with jumbo frames +enabled (MTU 9000). + +- Enabling offloads on a transmitter with MTU 1500 gives 60% throughput + gain for TCP and 5% for UDP with TSO and GSO giving the highest + boost. + +.. image:: throughput01.png + :width: 650px + +- Enabling offloads on a transmitter with MTU 9000 gives 55% CPU + performance gain for TCP and 11.6% for UDP with Tx checksumming + and GSO giving the highest boost. + +.. image:: cpu_consumption01.png + :width: 650px + +- Enabling offloads on a receiver with MTU 1500 gives 40% throughput + gain for TCP and 11.7% for UDP with GRO and Rx checksumming + giving the highest boost respectively. + +.. image:: throughput02.png + :width: 650px + +- Enabling offloads on a receiver with MTU 9000 gives 44% CPU + performance gain for TCP and 27.3% for UDP with with GRO and Rx + checksumming giving the highest boost respectively. + +.. image:: cpu_consumption02.png + :width: 650px + +Baseline physical over VxLAN scenario +------------------------------------- + +Similarly to baseline physical scenario, hardware offloads introduced a +significant performance gain in terms of CPU usage and throughput both +for the transmitter and the receiver. For TCP tests the effect was most +noticeable on the receiver side while for UDP most significant +improvement were achieved on the transmitter side. + +- Enabling offloads on a transmitter with MTU 1500 gives 23.3% + throughput gain for TCP and 7.4% for UDP with Tx checksumming and + Tx UDP tunneling segmentation giving the highest boost. + +.. image:: throughput03.png + :width: 650px + +- Enabling offloads on a transmitter with MTU 9000 gives 25% CPU + performance gain for TCP and 17.4% for UDP with Tx checksumming + giving the highest boost. + +.. image:: cpu_consumption03.png + :width: 650px + +- Enabling offloads on a receiver with MTU 1500 gives 66% throughput + gain for TCP and 2.4% for UDP with GRO giving the highest boost + respectively. + +.. image:: throughput04.png + :width: 650px + +- Enabling offloads on a receiver with MTU 9000 gives 48% CPU + performance gain for TCP and 29% for UDP with with GRO and Rx + checksumming giving the highest boost respectively. + +.. image:: cpu_consumption04.png + :width: 650px + +VM-to-VM on different nodes scenario +------------------------------------ + +- Enabling Tx and TSO on the transmit side increases throughput by 44% + for TCP and by 44.7% for UDP respectively. + +.. image:: throughput05.png + :width: 650px + +- Enabling GRO on the receive side increases throughput by 59.7% for + TCP and by 61.6% for UDP respectively. + +.. image:: throughput06.png + :width: 650px + +- CPU performance improvement is mostly visible on the transmitter with + turning offloads off triggering total CPU consumption rising by + 64.8% and 54.5% for TCP and UDP stream tests + correspondingly. + +.. image:: cpu_consumption05.png + :width: 650px +.. image:: cpu_consumption06.png + :width: 650px + +- Total CPU consumption on the receiver side in TCP stream tests + does not change significantly, but at the same time turning + offloads off leads to processes in user space taking more CPU + time with system time decreasing. + +.. image:: cpu_consumption07.png + :width: 650px + +- In UDP stream tests user part of CPU consumption on the receiver + drops by 3.5% with offloads on. + +.. image:: cpu_consumption08.png + :width: 650px + +VM-to-VM on different nodes over VxLAN scenario +----------------------------------------------- + +- Enabling Tx and TSO on the transmit side increases throughput by + 35.4% for TCP and by 70% for UDP respectively. + +.. image:: throughput07.png + :width: 650px + +- Enabling GRO on the receive side increases throughput by 26% for TCP + and by 4% for UDP respectively. + +.. image:: throughput08.png + :width: 650px + +- CPU performance improvement is mostly visible on the transmitter with + turning offloads off triggering total CPU consumption rise by + 78.6% and 72.5% for TCP and UDP stream tests correspondingly. + +.. image:: cpu_consumption09.png + :width: 650px + +.. image:: cpu_consumption10.png + :width: 650px + +- Total CPU consumption on the receiver side in TCP stream tests + does not change significantly, but at the same time turning + offloads off leads to processes in kernel space taking more CPU + time with user space time decreasing. + +.. image:: cpu_consumption11.png + :width: 650px + +- In UDP stream tests user part of CPU consumption on the receiver + drops by 7% with offloads on. + +.. image:: cpu_consumption12.png + :width: 650px + +Summary +------- + +Network hardware offloads provide significant performance improvement in +terms of CPU usage and throughput on both the transmit and receive side. +This impact in particularly strong in case of VM-to-VM communication. + +Based on testing results the following recommendations on using offloads +for improving throughput and CPU performance can be made: + +- To increase TCP throughput: + + - Enable TSO and GSO (tx-udp\_tnl-segmentation for VxLAN + encapsulation) on transmitter + + - Enable GRO on receiver + +- To increase UDP throughput: + + - Enable Tx checksumming on transmitter + + - Enable GRO on receiver + +- To improve TCP CPU performance: + + - Enable TSO on transmitter + + - Enable GRO on receiver + +- To improve UDP CPU performance: + + - Enable Tx checksumming on transmitter + + - Enable Rx checksumming on receiver + +Using `kernel ver. 3.19 `__ or +higher and hardware capable of performing offloads for encapsulated +traffic like `Intel XL710 +controller `__ +means that these improvements can be seen in deployments that involve +VxLAN encapsulation. \ No newline at end of file diff --git a/doc/source/test_results/hardware_features/hardware_offloads/throughput01.png b/doc/source/test_results/hardware_features/hardware_offloads/throughput01.png new file mode 100755 index 0000000..47128e2 Binary files /dev/null and b/doc/source/test_results/hardware_features/hardware_offloads/throughput01.png differ diff --git a/doc/source/test_results/hardware_features/hardware_offloads/throughput02.png b/doc/source/test_results/hardware_features/hardware_offloads/throughput02.png new file mode 100755 index 0000000..23b818c Binary files /dev/null and b/doc/source/test_results/hardware_features/hardware_offloads/throughput02.png differ diff --git a/doc/source/test_results/hardware_features/hardware_offloads/throughput03.png b/doc/source/test_results/hardware_features/hardware_offloads/throughput03.png new file mode 100755 index 0000000..cec9ecc Binary files /dev/null and b/doc/source/test_results/hardware_features/hardware_offloads/throughput03.png differ diff --git a/doc/source/test_results/hardware_features/hardware_offloads/throughput04.png b/doc/source/test_results/hardware_features/hardware_offloads/throughput04.png new file mode 100755 index 0000000..449161b Binary files /dev/null and b/doc/source/test_results/hardware_features/hardware_offloads/throughput04.png differ diff --git a/doc/source/test_results/hardware_features/hardware_offloads/throughput05.png b/doc/source/test_results/hardware_features/hardware_offloads/throughput05.png new file mode 100755 index 0000000..1b2310d Binary files /dev/null and b/doc/source/test_results/hardware_features/hardware_offloads/throughput05.png differ diff --git a/doc/source/test_results/hardware_features/hardware_offloads/throughput06.png b/doc/source/test_results/hardware_features/hardware_offloads/throughput06.png new file mode 100755 index 0000000..a1174f6 Binary files /dev/null and b/doc/source/test_results/hardware_features/hardware_offloads/throughput06.png differ diff --git a/doc/source/test_results/hardware_features/hardware_offloads/throughput07.png b/doc/source/test_results/hardware_features/hardware_offloads/throughput07.png new file mode 100755 index 0000000..fd98fe4 Binary files /dev/null and b/doc/source/test_results/hardware_features/hardware_offloads/throughput07.png differ diff --git a/doc/source/test_results/hardware_features/hardware_offloads/throughput08.png b/doc/source/test_results/hardware_features/hardware_offloads/throughput08.png new file mode 100755 index 0000000..cf510e0 Binary files /dev/null and b/doc/source/test_results/hardware_features/hardware_offloads/throughput08.png differ diff --git a/doc/source/test_results/hardware_features/index.rst b/doc/source/test_results/hardware_features/index.rst new file mode 100644 index 0000000..b116a5f --- /dev/null +++ b/doc/source/test_results/hardware_features/index.rst @@ -0,0 +1,12 @@ +.. raw:: pdf + + PageBreak oneColumn + +========================= +Hardware features testing +========================= + +.. toctree:: + :maxdepth: 3 + + hardware_offloads/test_results \ No newline at end of file diff --git a/doc/source/test_results/index.rst b/doc/source/test_results/index.rst index 177af59..7f13828 100644 --- a/doc/source/test_results/index.rst +++ b/doc/source/test_results/index.rst @@ -16,4 +16,5 @@ Test Results keystone/index container_platforms/index neutron_features/index + hardware_features/index