Merge "Adding spec: OVS-DPDk containerization"
This commit is contained in:
commit
05932b89d9
@ -0,0 +1,301 @@
|
||||
OVS-DPDK containerization
|
||||
==========================================
|
||||
|
||||
Storyboard:
|
||||
https://storyboard.openstack.org/#!/story/2005496
|
||||
|
||||
As StarlingX moves to containerization, most openstack components have been
|
||||
containerized. That includes OVS containerization, but OVS-DPDK is still
|
||||
running on host. This story is to implement OVS-DPDK containerization.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
Currently, StarlingX supports OVS and OVS-DPDK. OVS is managed by
|
||||
openstack-helm and running in container. But OVS-DPDK is managed by puppet,
|
||||
and running directly on the host. Considering the benefits of containerization,
|
||||
we would like to containerize OVS-DPDK. On the other hand, maintaining two
|
||||
implementations and keeping them consistent cost more resources than
|
||||
maintaining just one implementation.
|
||||
|
||||
Use Cases
|
||||
---------
|
||||
|
||||
Without OVS-DPDK containerization:
|
||||
|
||||
* If we want to make some changes(upgrade OVS version, enable some features)
|
||||
of OVS. We need the changes at two places.
|
||||
* If we want to support other host OS distribution(i.e. Ubuntu), we need to
|
||||
build the OVS/DPDK package for Ubuntu, as we run OVS-DPDK on the host.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
This story includes StarlingX changes and openstack-helm upstream.
|
||||
openstack-helm upstream patches are already in review.
|
||||
|
||||
'ovs-dpdk', 'none' are vswitch types we support for now.
|
||||
'ovs-dpdk' means running OVS-DPDK on host, 'none' means running
|
||||
OVS(without DPDK) in container. For containerized OVS-DPDK we don't create new
|
||||
vswitch type, we enhance the 'none' type to support dpdk. It means 'none' type
|
||||
will support both OVS and OVS-DPDK(containerized). A new kubernetes
|
||||
node label(openvswitch-dpdk=enabled) will be used to control dpdk enable.
|
||||
Once this story is completed, we will not maintain 'ovs-dpdk' type anymore.
|
||||
|
||||
Hugepages need to be reserved for DPDK. Currently, the reservation is done by
|
||||
sysinv/puppet. In this story , the hugepages reservation will still be covered
|
||||
by sysin/puppet. openstack-helm just use the hugepages. StarlingX reserves
|
||||
hugepages for DPDK and nova-compute, we can run 'system host-memory-show
|
||||
controller-0' to show the hugepages info. StarlingX has a default policy for
|
||||
hugepages allocation, users can overwrite the default by
|
||||
'system host-memory-modify'. As k8s doesn't support multiple hugepage sizes,
|
||||
we can only reserve hugepages of a single size.
|
||||
|
||||
::
|
||||
|
||||
[wrsroot@controller-0 ~(keystone_admin)]$ system host-memory-show controller-0 0
|
||||
+-------------------------------------+--------------------------------------+
|
||||
| Property | Value |
|
||||
+-------------------------------------+--------------------------------------+
|
||||
| Memory: Usable Total (MiB) | 9181 |
|
||||
| Platform (MiB) | 7600 |
|
||||
| Available (MiB) | 9181 |
|
||||
| Huge Pages Configured | True |
|
||||
| vSwitch Huge Pages: Size (MiB) | 2 |
|
||||
| Total | 512 |
|
||||
| Available | 0 |
|
||||
| Required | None |
|
||||
| Application Pages (4K): Total | 1826048 |
|
||||
| Application Huge Pages (2M): Total | 1024 |
|
||||
| Available | 1024 |
|
||||
| Application Huge Pages (1G): Total | 0 |
|
||||
| Available | None |
|
||||
| uuid | 56be1dc6-dc10-4318-88e3-953f75eb6684 |
|
||||
| ihost_uuid | 3fc748fa-a831-42f0-8c67-d15786806d6b |
|
||||
| inode_uuid | c4ee7258-fd13-4520-80f5-62c93e2e2b20 |
|
||||
| created_at | 2019-04-28T06:08:42.884178+00:00 |
|
||||
| updated_at | 2019-05-05T06:21:04.987518+00:00 |
|
||||
+-------------------------------------+--------------------------------------+
|
||||
|
||||
From above output, we can see 2M * 512 hugepages are reserved for OVS-DPDK.
|
||||
In this story, `openvswitch helm plugin`_ will be updated to generate memory
|
||||
configuration(dpdk-socket-mem) for openvswitch chart according to the reserved
|
||||
hugepages info. If multiple NUMA nodes exist on the compute node, we should
|
||||
allocated hugepages on every NUMA node.
|
||||
|
||||
To run OVS-DPDk in container, we need to enable kubernetes hugepages feature.
|
||||
Currently kubernetes doesn't support multiple hugepage sizes on a single node.
|
||||
I have opened `the multiple size issue`_ to track it.
|
||||
|
||||
OVS-DPDK process contains 2 types of threads: the control path threads and data
|
||||
path threads. The control path threads run on Platform cores just like all
|
||||
other pods. But the data path threads, known as pmd threads, need to run on one
|
||||
or more dedicated cores.
|
||||
StarlingX needs to reserve CPU cores for OVS-DPDK data path threads. Currently
|
||||
StarlingX reserves CPU cores for OVS-DPDK(no-containerized) by sysinv which
|
||||
generates kernel parameter
|
||||
'isolcpus'. For containerized OVS-DPDK, CPU cores are going to be reserved in
|
||||
the same way. We can run 'system host-cpu-list controller-0' to
|
||||
show the CPU info. StarlingX has a default policy for CPU allocation, users can
|
||||
overwrite the default by 'system host-cpu-modify'.
|
||||
|
||||
::
|
||||
|
||||
[wrsroot@controller-0 ~(keystone_admin)]$ system host-cpu-list controller-0
|
||||
+--------------------------------------+-------+-----------+-------+--------+-------------------------------------------+-------------------+
|
||||
| uuid | log_c | processor | phy_c | thread | processor_model | assigned_function |
|
||||
| | ore | | ore | | | |
|
||||
+--------------------------------------+-------+-----------+-------+--------+-------------------------------------------+-------------------+
|
||||
| a6189494-a2da-4f26-8a18-658d3fa5ad4f | 0 | 0 | 0 | 0 | Intel Core i7 9xx (Nehalem Class Core i7) | Platform |
|
||||
| c7d0de01-7c95-4b90-a423-d19d777e5b86 | 1 | 0 | 1 | 0 | Intel Core i7 9xx (Nehalem Class Core i7) | Platform |
|
||||
| 0e644162-ee11-486d-8249-94099d34a160 | 2 | 0 | 2 | 0 | Intel Core i7 9xx (Nehalem Class Core i7) | vSwitch |
|
||||
| 3b13943e-5d8e-49ab-b63e-17311e314f32 | 3 | 0 | 3 | 0 | Intel Core i7 9xx (Nehalem Class Core i7) | Applications |
|
||||
| a36e8842-2f55-4697-bd89-f074b2e0c567 | 4 | 0 | 4 | 0 | Intel Core i7 9xx (Nehalem Class Core i7) | Applications |
|
||||
| a74c066b-5a9a-48bd-aeec-9e803e395f7f | 5 | 0 | 5 | 0 | Intel Core i7 9xx (Nehalem Class Core i7) | Applications |
|
||||
+--------------------------------------+-------+-----------+-------+--------+-------------------------------------------+-------------------+
|
||||
|
||||
From above output, we can see core 2 is allocated for OVS-DPDK pmd threads.
|
||||
In this story, `openvswitch helm plugin`_ will be updated to generate CPU
|
||||
configurations(dpdk-lcore-mask, pmd-cpu-mask). 'pmd-cpu-mask' is the OVS
|
||||
parameter which specifies which CPU cores will the PMD threads run on.
|
||||
The technology under 'pmd-cpu-mask' is cpuset cgroup. By default, all pods
|
||||
can only see the platform cores. We need to change the cgroup of ovs at
|
||||
launch time. Actually, StarlingX also
|
||||
reserve CPU cores for nova-compute(assigned_function of Applications),
|
||||
finally rendered as 'vcpu_pin_set' in nova.conf
|
||||
|
||||
When a compute node being unlocked, the vswitch.pp does some OVS related works:
|
||||
1) bind datanetwork NICs to a linux module(vfio-pci by default in StarlingX).
|
||||
2) Create OVS bridges 3) Add the NICs to bridges. In this story, the first
|
||||
item can be covered by puppet or openstack-helm or by using
|
||||
NetworkDeviceAttachment which leverages existing SRIOV CNI. The second and
|
||||
the third items will be covered by openstack-helm. To create OVS bridges and
|
||||
add NICs to bridges, openstack-helm needs to know the bridge names and the
|
||||
NIC pci_id. These parameters will be generated by `neutron helm plugin`_
|
||||
according the info in sysinv.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
None
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
None
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
As the k8s hugepage feature doesn't support multiple hugepage sizes for now,
|
||||
we can allocate hugepages of only 1 single size. That means we can only create
|
||||
VM of 1 single hugepage size. The limitation is described in the
|
||||
`hugepage spec commit`_
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
Suppose no impact
|
||||
|
||||
For networking, OVS-DPDK container uses host native network.
|
||||
|
||||
For CPU/memory, although container resource is limited, but the resource used
|
||||
by OVS is configured by OVS parameters instead of container limitation.
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
'openvswitch-dpdk=enabled' label is required for compute nodes to enable
|
||||
OVS-DPDK.
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
Once this feature is implemented, we don't run OVS-DPDK on the host. So the
|
||||
vswitch.pp file will be removed, openstack-helm takes its job for OVS-DPDK
|
||||
configuration.
|
||||
|
||||
Upgrade impact
|
||||
--------------
|
||||
|
||||
None
|
||||
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
chengli3 <cheng1.li@intel.com>
|
||||
|
||||
Other contributors:
|
||||
<launchpad-id or None>
|
||||
|
||||
Repos Impacted
|
||||
--------------
|
||||
|
||||
starlingx/config, starlingx/integ
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* Improve OVS docker image to support dpdk (starlingx/integ).
|
||||
To support dpdk, dpdk should be installed in OVS image and OVS should be
|
||||
built/installed with `dpdk install option`_ (--with-dpdk). The community OVS
|
||||
image already support dpdk by `image patch`_. To build ourselves OVS image,
|
||||
we can author our OVS docker file in starling/integ project. The OVS/DPDK
|
||||
version will be the same as the host. The docker image
|
||||
OS may needs to be CentOS as well, as OVS container mounts host /lib/modules.
|
||||
* Make OVS chart supporting dpdk (openstack-helm-infra).
|
||||
To support dpdk, OVS needs to be setup with `dpdk setup options`_.
|
||||
`ovs patch`_ is in review.
|
||||
* Make neutron chart supporting dpdk (openstack-helm)
|
||||
|
||||
* `Extra neutron configurations`_ are needed for dpdk supporting.
|
||||
* In openstack-helm, neutron chart takes responsibility of adding NIC to OVS
|
||||
bridge. So neutron chart takes `dpdk interface initialization`_ as
|
||||
well. `neutron patch`_ is already in review.
|
||||
* Reserve huge pages for OVS-DPDK and enable k8s hugepage feature
|
||||
(starlingx/config).
|
||||
`huge pages`_ should be reserved for containerized OVS-DPDK. The same as how
|
||||
we reserve huge pages for vswitch_type 'ovs-dpdk'.
|
||||
* Generate dpdk related configurations for openstack deployment
|
||||
(starlingx/config).
|
||||
`openvswitch helm plugin`_ needs be updated to add dpdk configurations.
|
||||
`neutron helm plugin`_ should be updated as well.
|
||||
* Docs update (starlingx/docs)
|
||||
Update the installation guide
|
||||
|
||||
.. _dpdk install option: https://docs.openvswitch.org/en/latest/intro/install/dpdk/#install-ovs
|
||||
.. _image patch: https://review.opendev.org/#/c/665310/
|
||||
.. _dpdk setup options: https://docs.openvswitch.org/en/latest/intro/install/dpdk/#setup-ovs
|
||||
.. _ovs patch: https://review.openstack.org/#/c/626894/
|
||||
.. _Extra neutron configurations: https://docs.openstack.org/neutron/pike/contributor/internals/ovs_vhostuser.html
|
||||
.. _dpdk interface initialization: https://docs.openvswitch.org/en/latest/intro/install/dpdk/#setup-dpdk-devices-using-vfio
|
||||
.. _neutron patch: https://review.openstack.org/#/c/643284/
|
||||
.. _huge pages: https://docs.openvswitch.org/en/latest/intro/install/dpdk/#setup-hugepages
|
||||
.. _openvswitch helm plugin: https://github.com/openstack/stx-config/tree/a5def9a1447a004348b0adfa8fb774c32add34fe/sysinv/sysinv/sysinv/sysinv/helm/openvswitch.py
|
||||
.. _neutron helm plugin: https://github.com/openstack/stx-config/blob/a5def9a1447a004348b0adfa8fb774c32add34fe/sysinv/sysinv/sysinv/sysinv/helm/neutron.py
|
||||
.. _ovs.py: https://opendev.org/starlingx/config/src/commit/e0d453a98b72606ec9a0b90a3acb5bbda546d2ff/sysinv/sysinv/sysinv/sysinv/puppet/ovs.py#L318-L365
|
||||
.. _the multiple size issue: https://github.com/kubernetes/kubernetes/issues/77251
|
||||
.. _hugepage spec commit: https://github.com/kubernetes/community/pull/837/files#r133337110
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
* Needs OVS version >=2.6 to support vhost-user reconnect.
|
||||
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
The host NICs those are planed for data networks must support DPDK.
|
||||
Multiple hosts are needed to test connection cross hosts.
|
||||
|
||||
The following cases are needed:
|
||||
|
||||
* Creating VM and test the networking connection between VMs and the external
|
||||
connection.
|
||||
* Check if any issue with host reboot.
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
The installation guides on the wiki need to be updated. There will be a little
|
||||
difference for deployer on vswitch type setting.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
* http://docs.openvswitch.org/en/latest/intro/install/dpdk/
|
||||
|
||||
* https://opendev.org/openstack/openstack-helm-infra/src/branch/master/openvswitch
|
||||
|
||||
* https://opendev.org/openstack/openstack-helm/src/branch/master/neutron
|
||||
|
||||
History
|
||||
=======
|
||||
|
||||
Optional section intended to be used each time the spec is updated to describe
|
||||
new design, API or any database schema updated. Useful to let reader understand
|
||||
what's happened along the time.
|
||||
|
||||
.. list-table:: Revisions
|
||||
:header-rows: 1
|
||||
|
||||
* - Release Name
|
||||
- Description
|
||||
* - Stein
|
||||
- Introduced
|
Loading…
x
Reference in New Issue
Block a user