tag:blogger.com,1999:blog-53840468111948801442024-03-13T23:44:57.214-04:00John Likes OpenStackUnknownnoreply@blogger.comBlogger92125tag:blogger.com,1999:blog-5384046811194880144.post-4482490706604664532021-09-27T18:19:00.000-04:002021-09-27T18:19:15.632-04:00OpenInfra Live Episode 24: OpenStack and Ceph<p>This Thursday at 14:00 UTC <a href="https://fmount.me">Francesco</a> and I will be in a panel on OpenInfra Live Episode 24: OpenStack and Ceph.</p>
<iframe class="BLOG_video_class" allowfullscreen="" youtube-src-id="zJVoleSpSOk" width="400" height="322" src="https://www.youtube.com/embed/zJVoleSpSOk"></iframe>Unknownnoreply@blogger.comtag:blogger.com,1999:blog-5384046811194880144.post-62107146518698932332020-09-04T14:31:00.000-04:002020-09-04T14:31:14.944-04:00My tox cheat sheetInstall tox on centos8 undercloud deployed by <a href="https://github.com/cjeanner/tripleo-lab">tripleo-lab</a>
<pre>
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python3 get-pip.py
pip install tox
</pre>
Render changes to tripleo docs:
<pre>
cd /home/stack/tripleo-docs
tox -e deploy-guide
</pre>
Check syntax errors before wasting CI time
<pre>
tox -e linters
tox -e pep8
</pre>
Run a specific unit test
<pre>
cd /home/stack/tripleo-common
tox -e py36 -- tripleo_common.tests.test_inventory.TestInventory.test_get_roles_by_service
cd /home/stack/tripleo-ansible
tox -e py36 -- tripleo_ansible.tests.modules.test_derive_hci_parameters.TestTripleoDeriveHciParameters
</pre>
Unknownnoreply@blogger.comtag:blogger.com,1999:blog-5384046811194880144.post-46335432562488149452020-06-26T14:39:00.001-04:002020-12-15T10:46:35.799-05:00Running tripleo-ansible molecule locally for dummies<p>
I've had to re-teach myself how to do this so I'm writing my own notes.
<p>
Prerequisites:
<ol>
<li>Get a working undercloud (perhaps from <a href="https://github.com/cjeanner/tripleo-lab">tripleo-lab</a>)</li>
<li>git clone https://git.openstack.org/openstack/tripleo-ansible.git ; cd tripleo-ansible</li>
<li>Determine the test name: ls roles</li>
</ol>
<p>
Once you have your environment ready run a test with the name from step 3.
<pre>
./scripts/run-local-test tripleo_derived_parameters
</pre>
Some tests in CI are configured to use `--skip-tags`. You can do this for your local tests too by setting the appropriate environment variables. For example:
<pre>
export TRIPLEO_JOB_ANSIBLE_ARGS="--skip-tags run_ceph_ansible,run_uuid_ansible,ceph_client_rsync,clean_fetch_dir"
./scripts/run-local-test tripleo_ceph_run_ansible
</pre>
<p>This last tip should get <a href="https://review.opendev.org/738259">added</a> to <a href="https://docs.openstack.org/tripleo-ansible/latest/contributing_roles.html#local-testing-of-new-roles">the docs</a>.
Unknownnoreply@blogger.comtag:blogger.com,1999:blog-5384046811194880144.post-57051940357261000352020-06-03T08:55:00.001-04:002020-06-03T08:56:05.855-04:00Building a Ceph-powered Cloud: Deploying a containerized Red Hat Ceph Storage 4 cluster for Red Hat Open Stack Platform 16 <a href="https://www.redhat.com/en/blog/building-ceph-powered-cloud-deploying-containerized-red-hat-ceph-storage-4-cluster-red-hat-open-stack-platform-16">https://www.redhat.com/en/blog/building-ceph-powered-cloud-deploying-containerized-red-hat-ceph-storage-4-cluster-red-hat-open-stack-platform-16</a>Unknownnoreply@blogger.comtag:blogger.com,1999:blog-5384046811194880144.post-82864811656445811762019-07-03T16:04:00.001-04:002019-07-03T16:04:41.592-04:00Notes on testing a tripleo-common mistral patch<p>
I recently ran into
<a href="https://bugs.launchpad.net/tripleo/+bug/1834094">
bug 1834094</a> and wanted to test the
<a href="https://review.opendev.org/#/c/668560/">
proposed fix</a>. These are my notes if I have to do this again.
</p>
<h3>Get a patched container</h3>
<p>
Because the mistral-executor is running as a container
on the undercloud I needed to build a new container and
TripleO's
<a href="https://docs.openstack.org/tripleo-docs/latest/install/advanced_deployment/container_image_prepare.html">Container Image Preparation</a>
helped me do this without too much trouble.
</p>
<p>
As described the Container Image Preparation docs, I already
download a local copy of the containers to my undercloud by
running the following:
</p>
<pre>
time sudo openstack tripleo container image prepare \
-e ~/train/containers.yaml \
--output-env-file ~/containers-env-file.yaml
</pre>
where ~/train/containers.yaml has the following:
<pre>
---
parameter_defaults:
NeutronMechanismDrivers: ovn
ContainerImagePrepare:
- push_destination: 192.168.24.1:8787
set:
ceph_image: daemon
ceph_namespace: docker.io/ceph
ceph_tag: v4.0.0-stable-4.0-nautilus-centos-7-x86_64
name_prefix: centos-binary
namespace: docker.io/tripleomaster
tag: current-tripleo
</pre>
<p>
I now want to download the same set of containers to my undercloud
but I want the mistral-executor container to have the
<a href="https://review.opendev.org/#/c/668560/">
proposed fix</a>. If I vist the review and click download
I can see the patch is at refs/changes/60/668560/3
and I can pass this information to TripleO's
<a href="https://docs.openstack.org/tripleo-docs/latest/install/advanced_deployment/container_image_prepare.html">Container Image Preparation</a>
so that it builds me a container with that patch applied.
</p>
<p>
To do this I update my containers.yaml to exclude the mistral-executor
container from the usual tags with the excludes list directive and then
create a separate section with the includes directive specific to the
mistral-executor container.
</p>
<p>
Within this new section I ask that the tripleo-modify-image ansible
role pull that patch and apply it to that source image.
</p>
<pre>
---
parameter_defaults:
NeutronMechanismDrivers: ovn
ContainerImagePrepare:
- push_destination: 192.168.24.1:8787
set:
ceph_image: daemon
ceph_namespace: docker.io/ceph
ceph_tag: v4.0.0-stable-4.0-nautilus-centos-7-x86_64
name_prefix: centos-binary
namespace: docker.io/tripleomaster
tag: current-tripleo
excludes: [mistral-executor]
- push_destination: 192.168.24.1:8787
set:
name_prefix: centos-binary
namespace: docker.io/tripleomaster
tag: current-tripleo
modify_role: tripleo-modify-image
modify_append_tag: "-devel-ps3"
modify_vars:
tasks_from: dev_install.yml
source_image: docker.io/tripleomaster/centos-binary-mistral-executor:current-tripleo
refspecs:
-
project: tripleo-common
refspec: refs/changes/60/668560/3
includes: [mistral-executor]
</pre>
<p>
When I then run the `sudo openstack tripleo container image prepare` command
I see that it took a few extra steps to create my new container image.
</p>
<pre>
Writing manifest to image destination
Storing signatures
INFO[0005] created - from /var/lib/containers/storage/overlay/10c5e9ec709991e7eb6cbbf99c08d87f9f728c1644d64e3b070bc3c81adcbc03/diff
and /var/lib/containers/storage/overlay-layers/10c5e9ec709991e7eb6cbbf99c08d87f9f728c1644d64e3b070bc3c81adcbc03.tar-split.gz (wrote 150320640 bytes)
Completed modify and upload for image docker.io/tripleomaster/centos-binary-mistral-executor:current-tripleo
Removing local copy of 192.168.24.1:8787/tripleomaster/centos-binary-mistral-executor:current-tripleo
Removing local copy of 192.168.24.1:8787/tripleomaster/centos-binary-mistral-executor:current-tripleo-devel-ps3
Output env file exists, moving it to backup.
</pre>
<p>
If I were deploying the mistral container in the overcloud I could just
use 'openstack overcloud deploy ... -e ~/containers-env-file.yaml' and
be done, but because I need to replace my mistral-executor container on
my undercloud I have to do a few manual steps.
</p>
<h3>Run the patched container on the undercloud</h3>
<p>
My undercloud is ready to serve the patched mistral-executor container but it doesn't yet have its own copy of it to run; i.e. I only see the original container:
</p>
<pre>
(undercloud) [stack@undercloud train]$ sudo podman images | grep exec
docker.io/tripleomaster/centos-binary-mistral-executor current-tripleo 1f0ed5edc023 9 days ago 1.78 GB
(undercloud) [stack@undercloud train]$
</pre>
However, the same undercloud will serve it from the following URL:
<pre>
(undercloud) [stack@undercloud train]$ grep executor ~/containers-env-file.yaml
ContainerMistralExecutorImage: 192.168.24.1:8787/tripleomaster/centos-binary-mistral-executor:current-tripleo-devel-ps3
(undercloud) [stack@undercloud train]$
</pre>
So we can pull it down so we can run it on the undercloud:
<pre>
sudo podman pull 192.168.24.1:8787/tripleomaster/centos-binary-mistral-executor:current-tripleo-devel-ps3
</pre>
I now want to stop the running mistral-executor container and start my new one in it's place.
As per <a href="https://docs.openstack.org/tripleo-docs/latest/install/containers_deployment/tips_tricks.html#debugging-with-paunch">
Debugging with Paunch</a> I can use the print-cmd action to extract the command which is used
to start the mistral-executor container and save it to a shell script:
<pre>
sudo paunch debug --file /var/lib/tripleo-config/container-startup-config-step_4.json --container mistral_executor --action print-cmd > start_executor.sh
</pre>
I'll also add the exact container image name to the shell script
<pre>
sudo podman images | grep ps3 >> start_executor.sh
</pre>
Next I'll edit the script to update the container name and make sure the container is named mistral_executor:
<pre>
vim start_executor.sh
</pre>
Before I restart the container I'll prove that the current container isn't running the patch (the same command later will prove that it is).
<pre>
(undercloud) [stack@undercloud train]$ sudo podman exec mistral_executor grep render /usr/lib/python2.7/site-packages/tripleo_common/utils/config.py
# string so it's rendered in a readable format.
template_data = deployment_template.render(
template_data = host_var_server_template.render(
(undercloud) [stack@undercloud train]$
</pre>
Stop the mistral-executor container with systemd (otherwise it will automatically restart).
<pre>
sudo systemctl stop tripleo_mistral_executor.service
</pre>
Remove the container with podman to ensure the name is not in use:
<pre>
sudo podman rm mistral_executor
</pre>
Start the new container:
<pre>
sudo bash start_executor.sh
</pre>
and now I'll verify that my new container does have the patch:
<pre>
(undercloud) [stack@undercloud train]$ sudo podman exec mistral_executor grep render /usr/lib/python2.7/site-packages/tripleo_common/utils/config.py
def render_network_config(self, stack, config_dir, server_roles):
# string so it's rendered in a readable format.
template_data = deployment_template.render(
template_data = host_var_server_template.render(
self.render_network_config(stack, config_dir, server_roles)
(undercloud) [stack@undercloud train]$
</pre>
For a bonus, I also see it fixed the bug.
<pre>
(undercloud) [stack@undercloud tripleo-heat-templates]$ openstack overcloud config download --config-dir config-download
Starting config-download export...
config-download export successful
Finished config-download export.
Extracting config-download...
The TripleO configuration has been successfully generated into: config-download
(undercloud) [stack@undercloud tripleo-heat-templates]$
</pre>
Unknownnoreply@blogger.comtag:blogger.com,1999:blog-5384046811194880144.post-82705853096934908682019-01-29T17:26:00.000-05:002019-01-29T17:26:45.354-05:00How do I re-run only ceph-ansible when using tripleo config-download?<p>After config-download runs the first time, you may do the following:</p>
<pre>
cd /var/lib/mistral/config-download/
bash ansible-playbook-command.sh --tags external_deploy_steps
</pre>
<p>The above runs only the external deploy steps, which for the ceph-ansible integration, means run the ansible which generates the inventory and then execute ceph-ansible.</p>
<p>More on this in <a href="https://docs.openstack.org/tripleo-docs/latest/install/advanced_deployment/ansible_config_download.html#manual-config-download">TripleO config-download User’s Guide: Deploying with Ansible</a>.</p>
<p>
If you're using the <a href="https://docs.openstack.org/tripleo-docs/latest/install/containers_deployment/standalone.html">standalone deployer</a>, then config-download does not provide the ansible-playbook-command.sh. You can workaround this by doing the following:</p>
<pre>
cd /root/undercloud-ansible-su_6px97
ansible -i inventory.yaml -m ping all
ansible-playbook -i inventory.yaml -b deploy_steps_playbook.yaml --tags external_deploy_steps
</pre>
<p>The above makes the following assumptions:</p>
<ul>
<li>You ran standalone with `--output-dir=$HOME` as root and that undercloud-ansible-su_6px97 was created by config download and contains the downloaded playbooks. Use `ls -ltr` to find the latest version.</li>
<li>If you're using the newer python3-only versions you ran something like `ln -s $(which ansible-3) /usr/local/bin/ansible`</li>
<li>That config-download already generated the overcloud inventory.yaml (the second command above is just to test that the inventory is working)</li>
</ul>
Unknownnoreply@blogger.comtag:blogger.com,1999:blog-5384046811194880144.post-36211135303579762162019-01-07T15:04:00.001-05:002019-01-07T15:07:28.918-05:00ceph-ansible podman with vagrant<p>
These are just my notes on how I got vagrant with libvirt working on
CentOS7 and then used ceph-ansible's fedora29 podman tests to deploy a
containerized ceph nautilus preview cluster without docker. I'm doing
this in hopes of hooking Ceph into the
<a href="http://my1.fr/blog/openstack-containerization-with-podman-part-1-undercloud/">
new podman TripleO deploys</a>.
<h3>Configure Vagrant with libvirt on CentOS7</h3>
<p>
I already have a
<a href="http://blog.johnlikesopenstack.com/2018/08/pc-for-tripleo-quickstart.html">
CentOS7 machine I used for tripleo quickstart</a>.
I did the following to get vagrant working on it with libvirt.
</p>
<p>1. Create a vagrant user</p>
<pre>
sudo useradd vagrant
sudo usermod -aG wheel vagrant
sudo usermod --append --groups libvirt vagrant
sudo su - vagrant
mkdir .ssh
chmod 700 .ssh/
cd .ssh/
curl https://github.com/fultonj.keys > authorized_keys
chmod 600 authorized_keys
</pre>
<p>Continue as the vagrant user.</p>
<p>2. Install the Vagrant and other RPMs</p>
Download the CentOS Vagrant RPM
from <a href="https://www.vagrantup.com/downloads.html">https://www.vagrantup.com/downloads.html</a> and install other RPMs needed for it to work with libvirt.
<pre>
sudo yum install vagrant_2.2.2_x86_64.rpm
sudo yum install qemu libvirt libvirt-devel ruby-devel gcc qemu-kvm
vagrant plugin install vagrant-libvirt
</pre>
<p>Note that I already had many of the libvirt deps above from quickstart.</p>
<p>3. Get a CentOS7 box for verification</p>
Download the <a href="https://app.vagrantup.com/centos/boxes/7">centos/7 box</a>.
<pre>
[vagrant@hamfast ~]$ vagrant box add centos/7
==> box: Loading metadata for box 'centos/7'
box: URL: https://vagrantcloud.com/centos/7
This box can work with multiple providers! The providers that it
can work with are listed below. Please review the list and choose
the provider you will be working with.
1) hyperv
2) libvirt
3) virtualbox
4) vmware_desktop
Enter your choice: 2
==> box: Adding box 'centos/7' (v1811.02) for provider: libvirt
box: Downloading: https://vagrantcloud.com/centos/boxes/7/versions/1811.02/providers/libvirt.box
box: Download redirected to host: cloud.centos.org
==> box: Successfully added box 'centos/7' (v1811.02) for 'libvirt'!
[vagrant@hamfast ~]$
</pre>
Create a Vagrant file for it with `vagrant init centos/7`.
<p>4. Configure Vagrant to use a custom storage pool (Optional)</p>
<p>
Because I was already using libvirt directly with an images pool,
vagrant was unable to download the centos/7 system. I like this as I
want to keep my images pool separate for when I use libvirt directly.
To make Vagrant happy I created my own pool for it and added the
following to my Vagrantfile:</p>
<pre>
Vagrant.configure("2") do |config|
config.vm.provider :libvirt do |libvirt|
libvirt.storage_pool_name = "vagrant_images"
end
end
</pre>
<p>After doing the above `vagrant up` worked for me.</p>
<!--
<h3>Details on creating my own storage pool</h3>
<p>The error:</p>
<pre>
[vagrant@hamfast ~]$ vagrant up
Bringing machine 'default' up with 'libvirt' provider...
==> default: Checking if box 'centos/7' is up to date...
There was error while creating libvirt storage pool: Call to virStoragePoolDefineXML failed: operation failed: Storage source conflict with pool: 'images'
[vagrant@hamfast ~]$ sudo virsh pool-list
Name State Autostart
-------------------------------------------
images active yes
oooq_pool active yes
[vagrant@hamfast ~]$
</pre>
<p>Create a new pool directory</p>
<pre>
mkdir /var/lib/libvirt/vagrant_images
</pre>
<p>Create a new pool definition file</p>
<pre>
[root@hamfast ~]# uuidgen
bad53f21-1e4f-4b5a-83c6-5dbe9ff335eb
[root@hamfast ~]# cd /etc/libvirt/storage
[root@hamfast storage]# vi vagrant_images.xml
[root@hamfast storage]# cat vagrant_images.xml
<pool type='dir'>
<name>vagrant_images</name>
<uuid>bad53f21-1e4f-4b5a-83c6-5dbe9ff335eb</uuid>
<capacity unit='bytes'>0</capacity>
<allocation unit='bytes'>0</allocation>
<available unit='bytes'>0</available>
<source>
</source>
<target>
<path>/var/lib/libvirt/vagrant_images</path>
</target>
</pool>
[root@hamfast storage]#
</pre>
<p>Configure the pool to start</p>
<pre>
sudo virsh pool-define /etc/libvirt/storage/vagrant_images.xml
sudo virsh pool-start vagrant_images
sudo virsh pool-autostart vagrant_images
</pre>
<p>`vagrant up` should now work after modifying the Vagrantfile to
use the new pool.</p>
-->
<h3>Run ceph-ansible's Fedora 29 podman tests</h3>
<p>1. Clone ceph-ansible master</p>
<pre>git clone git@github.com:ceph/ceph-ansible.git; cd ceph-ansible</pre>
<p>2. Satisfy dependencies</p>
<pre>
sudo pip install -r requirements.txt
sudo pip install tox
cp vagrant_variables.yml.sample vagrant_variables.yml
cp site.yml.sample site.yml
</pre>
<p>Optionally: modify Vagrantfile for vagrant_images storage pool</p>
<p>3. Deploy with the container_podman</p>
<pre>
tox -e dev-container_podman -- --provider=libvirt
</pre>
<p>The above will result in tox triggering vagrant to create 10
virtual machines and then ceph-ansible will install ceph on
them.</p>
<p>4. Inspect Deployment</p>
<p>Verify the virtual machines are running:</p>
<pre>
[vagrant@hamfast ~]$ cd ~/ceph-ansible/tests/functional/fedora/29/container-podman
[vagrant@hamfast container-podman]$ cp vagrant_variables.yml.sample vagrant_variables.yml
[vagrant@hamfast container-podman]$ vagrant status
Current machine states:
mgr0 running (libvirt)
client0 running (libvirt)
client1 running (libvirt)
rgw0 running (libvirt)
mds0 running (libvirt)
rbd-mirror0 running (libvirt)
iscsi-gw0 running (libvirt)
mon0 running (libvirt)
mon1 running (libvirt)
mon2 running (libvirt)
osd0 running (libvirt)
osd1 running (libvirt)
This environment represents multiple VMs. The VMs are all listed
above with their current state. For more information about a specific
VM, run `vagrant status NAME`.
[vagrant@hamfast container-podman]$
</pre>
<p>Connect to a monitor and see that it's running Ceph containers</p>
<pre>
[vagrant@hamfast container-podman]$ vagrant ssh mon0
Last login: Mon Jan 7 17:11:28 2019 from 192.168.121.1
[vagrant@mon0 ~]$
[vagrant@mon0 ~]$ sudo podman ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
c494695eb0c2 docker.io/ceph/daemon:latest-master /opt/ceph-container... 4 hours ago Up 4 hours ago ceph-mgr-mon0
dbabf02df984 docker.io/ceph/daemon:latest-master /opt/ceph-container... 4 hours ago Up 4 hours ago ceph-mon-mon0
[vagrant@mon0 ~]$
[vagrant@mon0 ~]$ sudo podman images
REPOSITORY TAG IMAGE ID CREATED SIZE
docker.io/ceph/daemon latest-master 24fdc8c3cb3f 4 weeks ago 726MB
[vagrant@mon0 ~]$
[vagrant@mon0 ~]$ which docker
/usr/bin/which: no docker in (/home/vagrant/.local/bin:/home/vagrant/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin)
[vagrant@mon0 ~]$
</pre>
Observe the status of the Ceph cluster:
<pre>
[vagrant@mon0 ~]$ sudo podman exec dbabf02df984 ceph -s
cluster:
id: 9d2599f2-aec7-4c7c-a88e-7a8d39ebb557
health: HEALTH_WARN
application not enabled on 1 pool(s)
services:
mon: 3 daemons, quorum mon0,mon1,mon2 (age 71m)
mgr: mon1(active, since 70m), standbys: mon2, mon0
mds: cephfs-1/1/1 up {0=mds0=up:active}
osd: 4 osds: 4 up (since 68m), 4 in (since 68m)
rbd-mirror: 1 daemon active
rgw: 1 daemon active
data:
pools: 13 pools, 124 pgs
objects: 194 objects, 3.5 KiB
usage: 54 GiB used, 71 GiB / 125 GiB avail
pgs: 124 active+clean
[vagrant@mon0 ~]$
</pre>
<p>Observe the installed versions:</p>
<pre>
[vagrant@mon0 ~]$ sudo podman exec -ti dbabf02df984 /bin/bash
[root@mon0 /]# cat /etc/redhat-release
CentOS Linux release 7.6.1810 (Core)
[root@mon0 /]#
[root@mon0 /]# ceph --version
ceph version 14.0.1-1496-gaf96e16 (af96e16271b620ab87570b1190585fffc06daeac) nautilus (dev)
[root@mon0 /]#
</pre>
<p>Observe the OSDs</p>
<pre>
[vagrant@hamfast container-podman]$ vagrant ssh osd0
[vagrant@osd0 ~]$ sudo su -
[root@osd0 ~]# podman ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
4fe23502592c docker.io/ceph/daemon:latest-master /opt/ceph-container... About an hour ago Up About an hour ago ceph-osd-2
f582b4311076 docker.io/ceph/daemon:latest-master /opt/ceph-container... About an hour ago Up About an hour ago ceph-osd-0
[root@osd0 ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 50G 0 disk
sdb 8:16 0 50G 0 disk
├─test_group-data--lv1 253:1 0 25G 0 lvm
└─test_group-data--lv2 253:2 0 12.5G 0 lvm
sdc 8:32 0 50G 0 disk
├─sdc1 8:33 0 25G 0 part
└─sdc2 8:34 0 25G 0 part
└─journals-journal1 253:3 0 25G 0 lvm
vda 252:0 0 41G 0 disk
├─vda1 252:1 0 1G 0 part /boot
└─vda2 252:2 0 40G 0 part
└─atomicos-root 253:0 0 40G 0 lvm /sysroot
[root@osd0 ~]#
[root@osd0 ~]# podman exec 4fe23502592c cat var/lib/ceph/osd/ceph-2/type
bluestore
[root@osd0 ~]#
</pre>
Unknownnoreply@blogger.comtag:blogger.com,1999:blog-5384046811194880144.post-73973186483643224072019-01-04T14:38:00.000-05:002019-01-04T14:38:47.475-05:00Simulate edge deployments using TripleO Standalone<p>My colleagues presented at OpenStack Summit Berlin on <a href="https://www.youtube.com/watch?v=A4l3vPMaJew">Distributed Hyperconvergence</a>. This includes using TripleO to deploy a central controller
node, extracting information from that central node, and then passing that information as input to a second TripleO deployment at a remote location ("on the edge of the network"). This edge deployment could host its own Ceph cluster which is collocated with compute nodes in its own availability zone. A third TripleO deployment could be added for a second remote edge deployment and users could then use the central deployment to schedule workloads per availability zone closer to where the workloads are needed.</p>
<p>You can simulate this type of deployment today with a single hypervisor and TripleO's standalone installer as per the <a href="https://docs.openstack.org/tripleo-docs/latest/install/containers_deployment/standalone.html#example-2-nodes-2-nic-using-remote-compute-with-tenant-and-provider-networks">newly merged upstream docs</a>.</p>Unknownnoreply@blogger.comtag:blogger.com,1999:blog-5384046811194880144.post-9027295620035893212018-08-28T10:43:00.002-04:002018-08-29T17:09:15.733-04:00PC for tripleo quickstart<p>I built a machine for running <a href="https://docs.openstack.org/tripleo-quickstart/latest/">TripleO Quickstart</a> at home.</p>
<p>My complete part list is on <a href="https://pcpartpicker.com/user/fultonj/saved/v9KLD3">pcpart picker</a> with the exception of the extra <a href="http://a.co/d/80QD1lp">Noctua NM-AM4 Mounting Kit</a> and <a href="http://a.co/d/8vO0rJ8">video card</a> (which I only go to install the OS)</p>
<p>I also have <a href="https://photos.app.goo.gl/Qh8yE8zrTsj355zT6">photos</a> from when I built it.</p>
<p>My <a href="https://github.com/fultonj/oooq/blob/b55063591208f10d3eacbf9c1cbbac8d4984b22e/under/nodes.yaml">nodes.yaml</a> gives me:
<ul>
<li>Three 9GB 2CPU controller nodes</li>
<li>Three 6GB 2CPU ceph storage nodes</li>
<li>One 3GB 2CPU compute node (that's enough to spawn one nested VM for a quick test)</li>
<li>One 13GB 8CPU undercloud node</li>
</ul>
That leaves less than 2GB of RAM for the hypervisor and all 16 vCPUs (8 cores * 2 threads) are marked for a VM so I'm pushing it a little.
<p>
When using this system with the same ndoes.yaml my run times are as follows for <a href="http://lists.openstack.org/pipermail/openstack-dev/2018-August/133792.html">Rocky RC1</a>:
<ul>
<li>unercloud install of rocky: 43m44.118s</li>
<li>overcloud install of rocky: 49m51.369s</li>
</ul>
Unknownnoreply@blogger.comtag:blogger.com,1999:blog-5384046811194880144.post-79277675788685063032018-08-22T23:30:00.000-04:002018-10-05T08:13:55.649-04:00Updating ceph-ansible in a containerized undercloud<h3>Update</h3>
<p>What's below won't be the case for much longer because <a href="https://review.rdoproject.org/r/#/c/16362">ceph-ansible will be come a dependency of TripleO</a> and <a href="https://review.openstack.org/#/c/604357">the mistral-executor container will bind mount the ceph-ansible source directory on the container host</a>. What's in this post could still be used as an example of updating a package in a TripleO container but don't be mislead about it being the way to update ceph-ansible any longer.
<h3>Original Content</h3>
<p>
In Rocky the TripleO undercloud will run containers. If you're using TripleO to deploy Ceph in Rocky, this means that ceph-ansible shouldn't be installed on your undercloud server directly because your undercloud server is a container host. Instead ceph-ansible should be installed on the mistral-executor container because, as per <a href="https://docs.openstack.org/tripleo-docs/latest/install/advanced_deployment/ansible_config_download.html">config-download</a>, That is the container which runs ansible to configure the overcloud.
<p>
If you install ceph-ansible on your undercloud host it will lead to confusion about what version of ceph-ansible is being used when you try to debug it. Instead install it on the mistral-executor container.
<p>
So this is the new normal in Rocky on an undercloud that can deploy Ceph:
<pre>
[root@undercloud-0 ~]# rpm -q ceph-ansible
package ceph-ansible is not installed
[root@undercloud-0 ~]#
[root@undercloud-0 ~]# docker ps | grep mistral
0a77642d8d10 192.168.24.1:8787/tripleomaster/openstack-mistral-api:2018-08-20.1 "kolla_start" 4 hours ago Up 4 hours (healthy) mistral_api
c32898628b4b 192.168.24.1:8787/tripleomaster/openstack-mistral-engine:2018-08-20.1 "kolla_start" 4 hours ago Up 4 hours (healthy) mistral_engine
c972b3e74cab 192.168.24.1:8787/tripleomaster/openstack-mistral-event-engine:2018-08-20.1 "kolla_start" 4 hours ago Up 4 hours (healthy) mistral_event_engine
d52708e0bab0 192.168.24.1:8787/tripleomaster/openstack-mistral-executor:2018-08-20.1 "kolla_start" 4 hours ago Up 4 hours (healthy) mistral_executor
[root@undercloud-0 ~]#
[root@undercloud-0 ~]# docker exec -ti d52708e0bab0 rpm -q ceph-ansible
ceph-ansible-3.1.0-0.1.rc18.el7cp.noarch
[root@undercloud-0 ~]#
</pre>
<p>
So what happens if you're in a situation where you want to try a
different ceph-ansible version on your unercloud?
<p>
In the next example I'll update my mistral-executor container from
ceph-ansible rc18 to rc21. These commands are just variations of the
upstream <a href="https://docs.openstack.org/tripleo-docs/latest/install/containers_deployment/tips_tricks.html#testing-a-code-fix-in-a-container">documentation</a>
but with a focus on updating the undercloud, not overcloud, container.
Here's the image I want to update:
<pre>
[root@undercloud-0 ~]# docker images | grep mistral-executor
192.168.24.1:8787/tripleomaster/openstack-mistral-executor 2018-08-20.1 740bb6f24755 2 days ago 1.05 GB
[root@undercloud-0 ~]#
</pre>
I have a copy of ceph-ansible-3.1.0-0.1.rc21.el7cp.noarch.rpm in my current working directory
<pre>
[root@undercloud-0 ~]# mkdir -p rc21
[root@undercloud-0 ~]# cat > rc21/Dockerfile <<EOF
> FROM 192.168.24.1:8787/tripleomaster/openstack-mistral-executor:2018-08-20.1
> USER root
> COPY ceph-ansible-3.1.0-0.1.rc21.el7cp.noarch.rpm .
> RUN yum install -y ceph-ansible-3.1.0-0.1.rc21.el7cp.noarch.rpm
> USER mistral
> EOF
[root@undercloud-0 ~]#
</pre>
So again that file is (for copy/paste later):
<pre>
[root@undercloud-0 ~]# cat rc21/Dockerfile
FROM 192.168.24.1:8787/tripleomaster/openstack-mistral-executor:2018-08-20.1
USER root
COPY ceph-ansible-3.1.0-0.1.rc21.el7cp.noarch.rpm .
RUN yum install -y ceph-ansible-3.1.0-0.1.rc21.el7cp.noarch.rpm
USER mistral
[root@undercloud-0 ~]#
</pre>
Build the new container
<pre>
[root@undercloud-0 ~]# docker build --rm -t 192.168.24.1:8787/tripleomaster/openstack-mistral-executor:2018-08-20.1 ~/rc21
Sending build context to Docker daemon 221.2 kB
Step 1/5 : FROM 192.168.24.1:8787/tripleomaster/openstack-mistral-executor:2018-08-20.1
---> 740bb6f24755
Step 2/5 : USER root
---> Using cache
---> 8d7f2e7f9993
Step 3/5 : COPY ceph-ansible-3.1.0-0.1.rc21.el7cp.noarch.rpm .
---> 54fbf7185eec
Removing intermediate container 9afe4b16ba95
Step 4/5 : RUN yum install -y ceph-ansible-3.1.0-0.1.rc21.el7cp.noarch.rpm
---> Running in e80fce669471
Examining ceph-ansible-3.1.0-0.1.rc21.el7cp.noarch.rpm: ceph-ansible-3.1.0-0.1.rc21.el7cp.noarch
Marking ceph-ansible-3.1.0-0.1.rc21.el7cp.noarch.rpm as an update to ceph-ansible-3.1.0-0.1.rc18.el7cp.noarch
Resolving Dependencies
--> Running transaction check
---> Package ceph-ansible.noarch 0:3.1.0-0.1.rc18.el7cp will be updated
---> Package ceph-ansible.noarch 0:3.1.0-0.1.rc21.el7cp will be an update
--> Finished Dependency Resolution
Dependencies Resolved
================================================================================
Package
Arch Version Repository Size
================================================================================
Updating:
ceph-ansible
noarch 3.1.0-0.1.rc21.el7cp /ceph-ansible-3.1.0-0.1.rc21.el7cp.noarch 1.0 M
Transaction Summary
================================================================================
Upgrade 1 Package
Total size: 1.0 M
Downloading packages:
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Updating : ceph-ansible-3.1.0-0.1.rc21.el7cp.noarch 1/2
Cleanup : ceph-ansible-3.1.0-0.1.rc18.el7cp.noarch 2/2
Verifying : ceph-ansible-3.1.0-0.1.rc21.el7cp.noarch 1/2
Verifying : ceph-ansible-3.1.0-0.1.rc18.el7cp.noarch 2/2
Updated:
ceph-ansible.noarch 0:3.1.0-0.1.rc21.el7cp
Complete!
---> 41a804e032f5
Removing intermediate container e80fce669471
Step 5/5 : USER mistral
---> Running in bc0db608c299
---> f5ad6b3ed630
Removing intermediate container bc0db608c299
Successfully built f5ad6b3ed630
[root@undercloud-0 ~]#
</pre>
Upload the new container to the registry:
<pre>
[root@undercloud-0 ~]# docker push 192.168.24.1:8787/tripleomaster/openstack-mistral-executor:2018-08-20.1
The push refers to a repository [192.168.24.1:8787/tripleomaster/openstack-mistral-executor]
606ffb827a1b: Pushed
fc3710ffba43: Pushed
4e770d9096db: Layer already exists
4d7e8476e5cd: Layer already exists
9eef3d74eb8b: Layer already exists
977c2f6f6121: Layer already exists
00860a9b126f: Layer already exists
366de6e5861a: Layer already exists
2018-08-20.1: digest: sha256:50aae064d930e8d498702673c6703b70e331d09e966c6f436b683bb152e80337 size: 2007
[root@undercloud-0 ~]#
</pre>
Now we see new the f5ad6b3ed630 container in addition to the old one:
<pre>
[root@undercloud-0 ~]# docker images | grep mistral-executor
192.168.24.1:8787/tripleomaster/openstack-mistral-executor 2018-08-20.1 f5ad6b3ed630 4 minutes ago 1.09 GB
192.168.24.1:8787/tripleomaster/openstack-mistral-executor <none> 740bb6f24755 2 days ago 1.05 GB
[root@undercloud-0 ~]#
</pre>
The old container is still running though:
<pre>
[root@undercloud-0 ~]# docker ps | grep mistral
373f8c17ce74 192.168.24.1:8787/tripleomaster/openstack-mistral-api:2018-08-20.1 "kolla_start" 6 hours ago Up 6 hours (healthy) mistral_api
4f171deef184 192.168.24.1:8787/tripleomaster/openstack-mistral-engine:2018-08-20.1 "kolla_start" 6 hours ago Up 6 hours (healthy) mistral_engine
8f25657237cd 192.168.24.1:8787/tripleomaster/openstack-mistral-event-engine:2018-08-20.1 "kolla_start" 6 hours ago Up 6 hours (healthy) mistral_event_engine
a7fb6df4e7cf 740bb6f24755 "kolla_start" 6 hours ago Up 6 hours (healthy) mistral_executor
[root@undercloud-0 ~]#
</pre>
Merely updating the image doesn't restart the container and neither does `docker restart a7fb6df4e7cf`. Instead I need to stop it and start it but there's a lot that goes into starting these containers with the correct parameters.
<p>
The upstream docs section on <a href="https://docs.openstack.org/tripleo-docs/latest/install/containers_deployment/tips_tricks.html#debugging-with-paunch">Debugging with Paunch</a> shows me a command to get the exact command that was used to start my container. I just needed to use `paunch list | grep mistral` first to know I need to look at the tripleo_step4.
<pre>
[root@undercloud-0 ~]# paunch debug --file /var/lib/tripleo-config/docker-container-startup-config-step_4.json --container mistral_executor --action print-cmd
docker run --name mistral_executor-glzxsrmw --detach=true --env=KOLLA_CONFIG_STRATEGY=COPY_ALWAYS --net=host --health-cmd=/openstack/healthcheck --privileged=false --restart=always --volume=/etc/hosts:/etc/hosts:ro --volume=/etc/localtime:/etc/localtime:ro --volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro --volume=/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro --volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro --volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro --volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro --volume=/dev/log:/dev/log --volume=/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro --volume=/etc/puppet:/etc/puppet:ro --volume=/var/lib/kolla/config_files/mistral_executor.json:/var/lib/kolla/config_files/config.json:ro --volume=/var/lib/config-data/puppet-generated/mistral/:/var/lib/kolla/config_files/src:ro --volume=/run:/run --volume=/var/run/docker.sock:/var/run/docker.sock:rw --volume=/var/log/containers/mistral:/var/log/mistral --volume=/var/lib/mistral:/var/lib/mistral --volume=/usr/share/ansible/:/usr/share/ansible/:ro --volume=/var/lib/config-data/nova/etc/nova:/etc/nova:ro 192.168.24.1:8787/tripleomaster/openstack-mistral-executor:2018-08-20.1
[root@undercloud-0 ~]#
</pre>
Now that I know the command I can see my six-hour old conatiner:
<pre>
[root@undercloud-0 ~]# docker ps | grep mistral_executor
a7fb6df4e7cf 740bb6f24755 "kolla_start" 6 hours ago Up 12 minutes (healthy) mistral_executor
[root@undercloud-0 ~]#
</pre>
stop it
<pre>
[root@undercloud-0 ~]# docker stop a7fb6df4e7cf
a7fb6df4e7cf
[root@undercloud-0 ~]#
</pre>
ensure it's gone
<pre>
[root@undercloud-0 ~]# docker rm a7fb6df4e7cf
Error response from daemon: No such container: a7fb6df4e7cf
[root@undercloud-0 ~]#
</pre>
and then run the command I got from above to start the container and finally see my new container
<pre>
[root@undercloud-0 ~]# docker ps | grep mistral-executor
d8e4073441c0 192.168.24.1:8787/tripleomaster/openstack-mistral-executor:2018-08-20.1 "kolla_start" 14 seconds ago Up 13 seconds (health: starting) mistral_executor-glzxsrmw
[root@undercloud-0 ~]#
</pre>
Finally I confirm that my container has the new ceph-ansible package:
<pre>
(undercloud) [stack@undercloud-0 ~]$ docker exec -ti d8e4073441c0 rpm -q ceph-ansible
ceph-ansible-3.1.0-0.1.rc21.el7cp.noarch
(undercloud) [stack@undercloud-0 ~]$
</pre>
I was then able to deploy my overcloud and see that the rc21 version fixed a bug.
Unknownnoreply@blogger.comtag:blogger.com,1999:blog-5384046811194880144.post-41657024499140263162018-06-21T11:56:00.000-04:002018-06-21T11:56:04.310-04:00Tips on searching ceph-install-workflow.log on TripleO<p>1. Only look at the logs relevant to the last run
<p>
/var/log/mistral/ceph-install-workflow.log will contain a concatenation of the ceph-ansible runs. The last N lines of the file will have what you're looking for, so what is N?
<p>
Determine how long the file is:
<pre>
[root@undercloud mistral]# wc -l ceph-install-workflow.log
20287 ceph-install-workflow.log
[root@undercloud mistral]#
</pre>
<p>Find the lines where previous ansible runs finshed.
<pre>
[root@undercloud mistral]# grep -n failed=0 ceph-install-workflow.log
5425:2018-06-18 23:06:58,901 p=22256 u=mistral | 172.16.0.21 : ok=118 changed=19 unreachable=0 failed=0
5426:2018-06-18 23:06:58,901 p=22256 u=mistral | 172.16.0.23 : ok=81 changed=13 unreachable=0 failed=0
5427:2018-06-18 23:06:58,901 p=22256 u=mistral | 172.16.0.25 : ok=113 changed=18 unreachable=0 failed=0
5428:2018-06-18 23:06:58,901 p=22256 u=mistral | 172.16.0.27 : ok=38 changed=3 unreachable=0 failed=0
5429:2018-06-18 23:06:58,901 p=22256 u=mistral | 172.16.0.28 : ok=77 changed=13 unreachable=0 failed=0
5430:2018-06-18 23:06:58,901 p=22256 u=mistral | 172.16.0.29 : ok=58 changed=7 unreachable=0 failed=0
5431:2018-06-18 23:06:58,901 p=22256 u=mistral | 172.16.0.30 : ok=83 changed=18 unreachable=0 failed=0
5432:2018-06-18 23:06:58,902 p=22256 u=mistral | 172.16.0.31 : ok=110 changed=17 unreachable=0 failed=0
9948:2018-06-20 12:06:38,325 p=11460 u=mistral | 172.16.0.21 : ok=107 changed=12 unreachable=0 failed=0
9949:2018-06-20 12:06:38,326 p=11460 u=mistral | 172.16.0.23 : ok=69 changed=4 unreachable=0 failed=0
9950:2018-06-20 12:06:38,326 p=11460 u=mistral | 172.16.0.25 : ok=102 changed=11 unreachable=0 failed=0
9951:2018-06-20 12:06:38,326 p=11460 u=mistral | 172.16.0.27 : ok=26 changed=0 unreachable=0 failed=0
9952:2018-06-20 12:06:38,326 p=11460 u=mistral | 172.16.0.29 : ok=46 changed=5 unreachable=0 failed=0
9953:2018-06-20 12:06:38,326 p=11460 u=mistral | 172.16.0.30 : ok=70 changed=8 unreachable=0 failed=0
9954:2018-06-20 12:06:38,326 p=11460 u=mistral | 172.16.0.31 : ok=99 changed=10 unreachable=0 failed=0
14927:2018-06-20 23:14:57,881 p=7702 u=mistral | 172.16.0.23 : ok=118 changed=19 unreachable=0 failed=0
14928:2018-06-20 23:14:57,881 p=7702 u=mistral | 172.16.0.27 : ok=110 changed=17 unreachable=0 failed=0
14932:2018-06-20 23:14:57,881 p=7702 u=mistral | 172.16.0.34 : ok=113 changed=18 unreachable=0 failed=0
20255:2018-06-21 09:46:40,571 p=17564 u=mistral | 172.16.0.22 : ok=118 changed=19 unreachable=0 failed=0
20256:2018-06-21 09:46:40,571 p=17564 u=mistral | 172.16.0.26 : ok=134 changed=18 unreachable=0 failed=0
20257:2018-06-21 09:46:40,571 p=17564 u=mistral | 172.16.0.27 : ok=102 changed=14 unreachable=0 failed=0
20258:2018-06-21 09:46:40,571 p=17564 u=mistral | 172.16.0.28 : ok=113 changed=18 unreachable=0 failed=0
20260:2018-06-21 09:46:40,571 p=17564 u=mistral | 172.16.0.34 : ok=110 changed=17 unreachable=0 failed=0
[root@undercloud mistral]#
</pre>
<p>Subtract the last run's line number from the total file lines:
<pre>
[root@undercloud mistral]# echo $(( 20260 - 14932))
5328
[root@undercloud mistral]#
</pre>
<p>
Tail from that line line going forward.
<p>
2. Identify the node(s) where the playbook run failed:
<p>
I know the last 100 lines of the relevant run will have failed set to true if there was a failure. Doing a grep for that will also show me the host:
<pre>
[root@undercloud mistral]# tail -5328 ceph-install-workflow.log | tail -100 | grep failed=1
2018-06-21 09:46:40,571 p=17564 u=mistral | 172.16.0.32 : ok=66 changed=14 unreachable=0 failed=1
[root@undercloud mistral]#
</pre>
<p>
Now that I know the host I want to see on which task that host failed so I grep for 'failed:'.
Just grepping for failed won't help as the log will be full of '"failed": false'.
<p>
In this case I extract out the failure:
<pre>
[root@undercloud mistral]# tail -5328 ceph-install-workflow.log | grep 172.16.0.32 | grep failed:
2018-06-21 09:46:06,093 p=17564 u=mistral | failed: [172.16.0.32 -> 172.16.0.22] (item=[{u'rule_name': u'', u'pg_num': 128, u'name': u'metrics'},
{'_ansible_parsed': True, 'stderr_lines': [u"Error ENOENT: unrecognized pool 'metrics'"], u'cmd': [u'docker', u'exec', u'ceph-mon-controller02',
u'ceph', u'--cluster', u'ceph', u'osd', u'pool', u'get', u'metrics', u'size'], u'end': u'2018-06-21 13:46:01.070270', '_ansible_no_log': False,
'_ansible_delegated_vars': {'ansible_delegated_host': u'172.16.0.22', 'ansible_host': u'172.16.0.22'}, '_ansible_item_result': True, u'changed':
True, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': False, u'_raw_params': u'docker exec ceph-mon-controller02
ceph --cluster ceph osd pool get metrics size', u'removes': None, u'creates': None, u'chdir': None, u'stdin': None}}, u'stdout': u'', u'start':
u'2018-06-21 13:46:00.729965', u'delta': u'0:00:00.340305', 'item': {u'rule_name': u'', u'pg_num': 128, u'name': u'metrics'}, u'rc': 2, u'msg':
u'non-zero return code', 'stdout_lines': [], 'failed_when_result': False, u'stderr': u"Error ENOENT: unrecognized pool 'metrics'",
'_ansible_ignore_errors': None, u'failed': False}]) => {"changed": false, "cmd": ["docker", "exec", "ceph-mon-controller02", "ceph",
"--cluster", "ceph", "osd", "pool", "create", "metrics", "128", "128", "replicated_rule", "1"], "delta": "0:00:01.421755", "end":
"2018-06-21 13:46:06.390381", "item": [{"name": "metrics", "pg_num": 128, "rule_name": ""}, {"_ansible_delegated_vars":
{"ansible_delegated_host": "172.16.0.22", "ansible_host": "172.16.0.22"}, "_ansible_ignore_errors": null, "_ansible_item_result":
true, "_ansible_no_log": false, "_ansible_parsed": true, "changed": true, "cmd": ["docker", "exec", "ceph-mon-controller02",
"ceph", "--cluster", "ceph", "osd", "pool", "get", "metrics", "size"], "delta": "0:00:00.340305", "end": "2018-06-21 13:46:01.070270",
"failed": false, "failed_when_result": false, "invocation": {"module_args": {"_raw_params": "docker exec ceph-mon-controller02
ceph --cluster ceph osd pool get metrics size", "_uses_shell": false, "chdir": null, "creates": null, "executable": null,
"removes": null, "stdin": null, "warn": true}}, "item": {"name": "metrics", "pg_num": 128, "rule_name": ""}, "msg":
"non-zero return code", "rc": 2, "start": "2018-06-21 13:46:00.729965", "stderr": "Error ENOENT: unrecognized pool
'metrics'", "stderr_lines": ["Error ENOENT: unrecognized pool 'metrics'"], "stdout": "", "stdout_lines": []}],
"msg": "non-zero return code", "rc": 34, "start": "2018-06-21 13:46:04.968626", "stderr": "Error ERANGE:
pg_num 128 size 3 would mean 768 total pgs, which exceeds max 600 (mon_max_pg_per_osd 200 * num_in_osds 3)",
"stderr_lines": ["Error ERANGE: pg_num 128 size 3 would mean 768 total pgs, which exceeds max 600
(mon_max_pg_per_osd 200 * num_in_osds 3)"], "stdout": "", "stdout_lines": []}
...
[root@undercloud mistral]#
</pre>
<p>
So that's how I quickly find what went wrong in a ceph-ansible run when debugging a TripleO deployment.
<p>
3. Extra
<p>
You may be wondering what that error is.
<p>
There was a ceph-ansible issue with creating pools before the OSDs were running made the deployment fail because of the <a href="https://ceph.com/community/new-luminous-pg-overdose-protection">overdose protection check</a>. This is something you can still fail if your PG numbers and OSDs are not aligned correctly (use <a href="https://ceph.com/pgcalc/">pgcalc</a>) but better to fail a deployment then put production data on a misconfigured cluster. You could also fail it because of <a href="https://github.com/ceph/ceph-ansible/commit/9d5265fe11fb5c1d0058525e8508aba80a396a6b">this issue that ceph-ansible rc9 fixed</a> (technically it was fixed in an earlier version but it had other bugs so I recommend rc9).
Unknownnoreply@blogger.comtag:blogger.com,1999:blog-5384046811194880144.post-84969188395797841172018-06-21T11:39:00.000-04:002018-06-21T11:39:35.080-04:00TripleO Ceph Integration on the Road in June<p>The first week of June I went to an upstream TripleO workshop in Brno. The labs we used are at <a href="https://github.com/redhat-openstack/tripleo-workshop">https://github.com/redhat-openstack/tripleo-workshop</a></p>
<p>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjk7y4QRS20hLzvV3lesMDkUMBkWRCCpEUa4mEIsIivlB6L6O6NkEqNwjLSer8JlYoDN22OC3-Yt9Z8ZbNCA0ssa5jEFKxrv5W_hBtz9t-7iKj0TfoUGaaM3udeFyXW0rPMY2v7K0jpdGGl/s1600/workshop-brno-june-2018.jpg" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjk7y4QRS20hLzvV3lesMDkUMBkWRCCpEUa4mEIsIivlB6L6O6NkEqNwjLSer8JlYoDN22OC3-Yt9Z8ZbNCA0ssa5jEFKxrv5W_hBtz9t-7iKj0TfoUGaaM3udeFyXW0rPMY2v7K0jpdGGl/s320/workshop-brno-june-2018.jpg" width="320" height="213" data-original-width="1600" data-original-height="1067" /></a></p>
<p>The third week of June I went to a downstream Red Hat OpenStack Platform event in Montreal for those deploying the upcoming version 13 in the field. I covered similar topics with respect to Ceph deployment via TripleO.</p>Unknownnoreply@blogger.comtag:blogger.com,1999:blog-5384046811194880144.post-52620335797225061042018-04-25T08:55:00.000-04:002018-04-25T08:55:04.382-04:00Red Hat Summit 2018: HCI LabI will be at Red Hat Summit in SFO on May 8th jointly hosting the lab <a href="https://agenda.summit.redhat.com/SessionDetail.aspx?id=153599">Deploy a containerized HCI IaaS with OpenStack and Ceph</a>.Unknownnoreply@blogger.comtag:blogger.com,1999:blog-5384046811194880144.post-75987378712714777422017-09-08T07:38:00.001-04:002017-09-08T11:38:44.779-04:00Debugging TripleO Ceph-Ansible Deployments<p>
Starting in Pike it is possible to <a href="https://docs.openstack.org/tripleo-docs/latest/install/advanced_deployment/ceph_config.html">use TripleO to deploy Ceph in containers using ceph-ansible</a>. This is a guide to help you if there is a problem. It
asks questions, somewhat rhetorically, to help you track down the problem.
</p>
<h3>What does this error from openstack overcloud deploy... mean?</h3>
<p>
If TripleO's new Ceph deployment fails, then you'll see an error like the following:
<pre>
Stack overcloud CREATE_FAILED
overcloud.AllNodesDeploySteps.WorkflowTasks_Step2_Execution:
resource_type: OS::Mistral::ExternalResource
physical_resource_id: bb9e685c-fbe9-4573-8d74-2c053bc5de0d
status: CREATE_FAILED
status_reason: |
resources.WorkflowTasks_Step2_Execution: ERROR
Heat Stack create failed.
</pre>
<p>
TripleO installs the OS and configures networking and other base
services for OpenStack for the nodes during step 1 of its
five-step deployment. During step 2, a new type of Heat
<a href="https://github.com/openstack/heat/commit/725b404468bdd2c1bdbaf16e594515475da7bace">OS::Mistral::ExternalResource</a> is created which calls a new
<a href="https://github.com/openstack/tripleo-common/commit/fa0b9f52080580b7408dc6f5f2da6fc1dc07d500">Mistral workflow</a> which uses
<a href="https://github.com/openstack/tripleo-common/commit/e6c8a46f00436edfa5de92e97c3a390d90c3ce54">a new Mistral action to call an Ansible playook</a>.
The playbook that is called is
<a href="https://github.com/ceph/ceph-ansible/blob/master/site-docker.yml.sample">
site-docker.yaml.sample</a> from ceph-ansible.
Giulio covers this in more detail in
<a href="http://giuliofidente.com/2017/07/understanding-ceph-ansible-in-tripleo.html">
Understanding ceph-ansible in TripleO</a>.
The above error message indicates that Heat was able to call Mistral,
but that the Mistral workflow failed. So, the next place to look is
the Mistral logs on the undercloud to see if the ceph-ansible site-docker.yml
playbook ran.
</p>
<h3>Did the ceph-ansible playbook run?</h3>
<p>The most helpful file for debugging TripleO ceph-ansible deployments is:
<pre>
/var/log/mistral/ceph-install-workflow.log
</pre>
If it doesn't exist or is empty, then the ceph-ansible playbook run did not happen.
</p>
<p>
If it does exist, then it's the key to solving the
problem! Read it as it will contain the output of the ceph-ansible run
which you can use to debug ceph-ansible as you normally
would. The <a href="http://docs.ceph.com/ceph-ansible/master">
ceph-ansible docs</a> should help. Once you think the environment
has been changed so that you won't have the problem (details
on that below), then re-run the `openstack overcloud deploy ...`
command, and after TripleO does its normal checks, it will
re-run the playbook. Because ceph-ansible and TripleO are
idempotent, this process may be repeated as necessary.
</p>
<h3>Why didn't the ceph-ansible playbook run?</h3>
<p>The following will show the playbook call to ceph-ansible:</p>
<pre>
cd /var/log/mistrtal/
grep site-docker.yml.sample executor.log | grep ansible-playbook
</pre>
<p>If there's an error during the playbook run, then it should look something like this...</p>
<pre>
2017-09-06 12:13:22.181 20608 ERROR mistral.executors.default_executor Command:
ansible-playbook -v /usr/share/ceph-ansible/site-docker.yml.sample --user tripleo-admin --become ...
</pre>
<p>
<p>
If you don't see a playbook call like the above, then the Mistral
tasks that set up the environment for a ceph-ansible run failed.
</p>
<h3>What does Mistral do to prepare the environment to run ceph-ansible?</h3>
<p>
A copy of the Mistral workbook which prepares the overcloud and undercloud
to run ceph-ansible, and then runs it, is in:</p>
<pre>
/usr/share/tripleo-common/workbooks/ceph-ansible.yaml
</pre>
<p>The Mistral tasks do the following:</p>
<ul>
<li>Configure the SSH key-pairs so the undercloud can run ansible
tasks on the overcloud ndoes and the tripleo-admin user</li>
<li>Create a temporary fetch directory for ceph-ansible to use to copy
configs between overcloud ndoes</li>
<li>Build a temporary Ansible inventory in a file like
/tmp/ansible-mistral-actionSYRh6Q/inventory.yaml</li>
<li>Set the <a href="http://docs.ansible.com/ansible/latest/intro_configuration.html#forks">Ansible fork count</a> to the number of nodes (but not >100).
<li>Run the ceph-ansible site-docker.yaml.sample playbook</li>
<li>Clean up temproary files</li>
</ul>
<p>
To check the details of the Mistral tasks used by ceph-ansible,
extract the workflow's UUID with the following:
</p>
<pre>
WORKFLOW='tripleo.storage.v1.ceph-install'
UUID=$(mistral execution-list | grep $WORKFLOW | awk {'print $2'} | tail -1)
</pre>
<p>Then use the ID to examine each task:</p>
<pre>
for TASK_ID in $(mistral task-list $UUID | awk {'print $2'} | egrep -v 'ID|^$'); do
mistral task-get $TASK_ID
mistral task-get-result $TASK_ID | jq . | sed -e 's/\\n/\n/g' -e 's/\\"/"/g'
done
</pre>
<p>
If you really need to update the workbook itself, you can modify a copy
and upload it with the following, but please see if your problem can instead
be solved by simply overriding the default values in a Heat environment file
as per the <a href="https://docs.openstack.org/tripleo-docs/latest/install/advanced_deployment/ceph_config.html">documentation</a>.</p>
<pre>
source ~/stackrc
cp /usr/share/tripleo-common/workbooks/ceph-ansible.yaml .
vi ceph-ansible.yaml
mistral workbook-update ceph-ansible.yaml
</pre>
<h3>I already know ceph-ansible; how do I edit the files in group_vars?</h3>
<p>
Please don't. It will break the TripleO integration. Instead please use
TripleO as usual, and override the default values in a Heat environment file
like ceph.yaml which you then use -e to add to your openstack overcloud deploy
command as described in the <a href="https://docs.openstack.org/tripleo-docs/latest/install/advanced_deployment/ceph_config.html">documentation</a>.
</p>
<h3>What changes does the TripleO ceph-ansible integration make to the files in ceph-ansible's group_vars?</h3>
<p>
None. Instead YAQL within tripleo-head-templates builds a Mistral environment which
the ceph-ansible.yaml Mistral workbook may access to when it calls ceph-ansible.
The workbook then passes those parameters as JSON with the ansible-playbook command's
<a href="http://docs.ansible.com/ansible/latest/playbooks_variables.html#passing-variables-on-the-command-line">--extra-vars</a> option. To see what parameters were passed using this method, grep the executor.log as above to see the ceph-ansible playbook call.
The sample file, site-docker.yml.sample is called because that file is
shipped by ceph-ansible. This allows TripleO to not need to maintain
its own ceph-ansible fork.
</p>
<h3>What does a usual ceph-ansible playbook call look like when run by TripleO?</h3>
<pre>
ansible-playbook -v /usr/share/ceph-ansible/site-docker.yml.sample
--user tripleo-admin
--become
--become-user root
--extra-vars
{"monitor_secret": "***",
"ceph_conf_overrides":
{"global": {"osd_pool_default_pg_num": 32,
"osd_pool_default_size": 1}},
"osd_scenario": "non-collocated",
"fetch_directory": "/tmp/file-mistral-action3_a1Cb",
"user_config": true,
"ceph_docker_image_tag": "tag-build-master-jewel-centos-7",
"ceph_release": "jewel",
"containerized_deployment": true,
"public_network": "192.168.24.0/24",
"copy_admin_key": false,
"journal_collocation": false,
"monitor_interface": "eth0",
"admin_secret": "***",
"raw_journal_devices": ["/dev/vdd", "/dev/vdd"],
"keys": [{"mon_cap": "allow r",
"osd_cap": "allow class-read object_prefix rbd_children, allow rwx pool=volumes, ... ],
"openstack_keys": [{"mon_cap": "allow r", ... ],
"generate_fsid": false,
"osd_objectstore": "filestore",
"monitor_address_block": "192.168.24.0/24",
"ntp_service_enabled": false,
"ceph_docker_image": "ceph/daemon",
"docker": true,
"fsid": "2d87a5e8-8e72-11e7-a223-003da9b9b610",
"journal_size": 256,
"cephfs_metadata": "manila_metadata",
"openstack_config": true,
"ceph_docker_registry": "docker.io",
"pools": [],
"cephfs_data": "manila_data",
"ceph_stable": true,
"devices": ["/dev/vdb", "/dev/vdc"],
"ceph_origin": "distro",
"openstack_pools": [
{"rule_name": "", "pg_num": 32, "name": "volumes"},
{"rule_name": "", "pg_num": 32, "name": "backups"},
{"rule_name": "", "pg_num": 32, "name": "vms"},
{"rule_name": "", "pg_num": 32, "name": "images"},
{"rule_name": "", "pg_num": 32, "name": "metrics"}],
"ip_version": "ipv4",
"ireallymeanit": "yes",
"cluster_network": "192.168.24.0/24",
"cephfs": "cephfs",
"raw_multi_journal": true
}
--forks 6
--ssh-common-args "-o StrictHostKeyChecking=no"
--ssh-extra-args "-o UserKnownHostsFile=/dev/null"
--inventory-file /tmp/ansible-mistral-actiontrguE1/inventory.yaml
--private-key /tmp/ansible-mistral-actiontrguE1/ssh_private_key
--skip-tags package-install,with_pkg
</pre>
<p>You can get the above in an unformated version of the following
from a grep to /var/log/mistral/executor.log as described above.</p>
<h3>How can I re-run only the ceph-ansible playbook?</h3>
<p>Careful. This should not be done on a production deployment because
if you re-run the Mistral deployment directly after getting the error
posted under the first question, then the Heat Stack will not be
updated. Thus, Heat will believe the OS::Mistral::ExternalResource
resource has status CREATE_FAILED. If you are doing a practice
deployment or development, then you can use
<a href="https://specs.openstack.org/openstack/mistral-specs/specs/mitaka/approved/mistral-rerun-update-env.html">Mistral's task-rerun</a>.
But this only works if the task has failed.</p>
<p>First get the Task ID</p>
<pre>
WORKFLOW='tripleo.storage.v1.ceph-install'
UUID=$(mistral execution-list | grep $WORKFLOW | awk {'print $2'} | tail -1)
mistral task-list $UUID | grep ERROR
</pre>
For example:
<pre>
(undercloud) [stack@undercloud workbooks]$ mistral task-list $UUID | grep ERROR
| 31257437-c877-40f8-872f-2576da89a8ea | ceph_install | tripleo.storage.v1.ceph-install | a5287f5c-f781-40cf-8fce-c56c21c52918 | ERROR | Failed to run action [act... | 2017-09-07 15:31:43 | 2017-09-07 15:31:46 |
(undercloud) [stack@undercloud workbooks]$
</pre>
Then re-run the task
<pre>
(undercloud) [stack@undercloud workbooks]$ mistral task-rerun 31257437-c877-40f8-872f-2576da89a8ea
+---------------+--------------------------------------+
| Field | Value |
+---------------+--------------------------------------+
| ID | 31257437-c877-40f8-872f-2576da89a8ea |
| Name | ceph_install |
| Workflow name | tripleo.storage.v1.ceph-install |
| Execution ID | a5287f5c-f781-40cf-8fce-c56c21c52918 |
| State | RUNNING |
| State info | None |
| Created at | 2017-09-07 15:31:43 |
| Updated at | 2017-09-08 16:24:04 |
+---------------+--------------------------------------+
(undercloud) [stack@undercloud workbooks]$
</pre>
<p>
If you run the above and keep the following in another window:
<pre>
tail -f /var/log/mistral/ceph-install-workflow.log
</pre>
Then it's just like running `ansible-playbook site-docker.yaml ...`
but you don't need to pass all of the --extra-vars because the
same Mistral environment built by Heat is available.
</p>
Unknownnoreply@blogger.comtag:blogger.com,1999:blog-5384046811194880144.post-63445663840588719642017-09-07T11:28:00.000-04:002017-09-07T11:32:48.453-04:00Make a NUMA-aware VM with virsh<p><a href="https://sysnet-adventures.blogspot.fr">Grégory</a> showed me how he uses `virsh edit` on a VM to add something like the following:</p>
<pre>
<cpu mode='custom' match='exact' check='partial'>
<model fallback='allow'>SandyBridge</model>
<feature policy='force' name='vmx'/>
<numa>
<cell id='0' cpus='0-1' memory='4096000' unit='KiB'/>
<cell id='1' cpus='2-3' memory='4096000' unit='KiB'/>
</numa>
</cpu>
</pre>
<p>After that `lstopo` will show NUMA nodes you can use. E.g. if you want to start a process on your VM with `numactl`.</p>
<pre>
# lstopo-no-graphics
Machine (7999MB total)
NUMANode L#0 (P#0 3999MB)
Package L#0 + L3 L#0 (16MB) + L2 L#0 (4096KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
Package L#1 + L3 L#1 (16MB) + L2 L#1 (4096KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)
NUMANode L#1 (P#1 4000MB)
Package L#2 + L3 L#2 (16MB) + L2 L#2 (4096KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)
Package L#3 + L3 L#3 (16MB) + L2 L#3 (4096KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)
Misc(MemoryModule)
HostBridge L#0
PCI 8086:7010
PCI 1013:00b8
GPU L#0 "card0"
GPU L#1 "controlD64"
3 x { PCI 1af4:1000 }
2 x { PCI 1af4:1001 }
</pre>Unknownnoreply@blogger.comtag:blogger.com,1999:blog-5384046811194880144.post-41962856676464988372017-09-05T09:58:00.000-04:002017-09-05T10:44:56.563-04:00Trick to test external ceph clusters using only tripleo-quickstart<p>
TripleO can stand up a Ceph cluster as part of an overcloud. However, if all you have
is a <a href="https://docs.openstack.org/developer/tripleo-quickstart">tripleo-quickstart</a> env and want to test an overcloud feature which uses an external Ceph cluster, then
can have quickstart stand up two heat stacks, one to make a separate ceph
cluster and the other to stand up an overcloud which uses that ceph cluster.
</p>
<h3>Deploy stand alone ceph cluster</h3>
<p>
I use <a href="https://github.com/fultonj/oooq/blob/8e04565ad9d21d47f23d650a6f7361bb766e7314/deploy-ceph-only.sh">deploy-ceph-only.sh</a> with
<a href="https://github.com/fultonj/oooq/blob/master/tht/ceph-only.yaml">ceph-only.yaml</a>,
based on <a href="http://giuliofidente.com/2016/12/tripleo-to-deploy-ceph-standlone.html">Giulio's example</a>. I add `-- stack ceph` to `openstack overcloud deploy ...` so that the Heat stack is
not called "overcloud". You cannot rename a Heat stack.</p>
<p>After deploying the ceph cluster, get the monitor node's IP (CephExternalMonHost), use `ceph auth list` to get the secret key secret for the client.openstack keyring (CephClientKey), and look at the ceph.conf to get the FSID (CephClusterFSID), so that <a href="https://github.com/fultonj/tripleo-ceph-ansible/blob/master/tht/overcloud-ceph-ansible-external.yaml">overcloud-ceph-ansible-external.yaml</a> may be updated accordingly.</p>
<h3>Deploy an overcloud to use external ceph</h3>
<p>
I use <a href="https://github.com/fultonj/tripleo-ceph-ansible/blob/master/deploy-ext-ceph.sh">deploy-ext-ceph.sh</a> with <a href="https://github.com/fultonj/tripleo-ceph-ansible/blob/master/tht/overcloud-ceph-ansible-external.yaml">overcloud-ceph-ansible-external.yaml</a>.
This uses changes in
<a href="https://review.openstack.org/#/q/topic:bug/1714271">tripleo</a> and
<a href="https://github.com/ceph/ceph-ansible/pull/1850">ceph-ansible</a>
which are unmerged (at this time of writing).
</p>
<h3>Results</h3>
<a href="https://github.com/fultonj/oooq/blob/master/myconfigfile.yml"></a>
<pre>
(undercloud) [stack@undercloud ceph-ansible]$ openstack server list
+--------------------------------------+-------------------------+--------+------------------------+----------------+--------------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+-------------------------+--------+------------------------+----------------+--------------+
| 28d57de8-8354-43e0-8d4e-46de33ea4672 | overcloud-controller-0 | BUILD | ctlplane=192.168.24.8 | overcloud-full | control |
| 298943dd-b3d2-4302-93fd-c45d8375ff16 | overcloud-novacompute-0 | BUILD | ctlplane=192.168.24.21 | overcloud-full | compute |
| f4d15186-775c-4cab-ae5d-c3fd48ecfccf | ceph-cephstorage-2 | ACTIVE | ctlplane=192.168.24.18 | overcloud-full | ceph-storage |
| 24da4c0f-f945-4489-bdeb-eb9b2cf70bc0 | ceph-cephstorage-0 | ACTIVE | ctlplane=192.168.24.9 | overcloud-full | ceph-storage |
| 248eacd5-e0ae-47b2-a3a9-2b4f3d0dfa6c | ceph-cephstorage-1 | ACTIVE | ctlplane=192.168.24.15 | overcloud-full | ceph-storage |
| 5af9a2ae-3492-4874-b8ab-2de2f8530b60 | ceph-controller-0 | ACTIVE | ctlplane=192.168.24.6 | overcloud-full | control |
+--------------------------------------+-------------------------+--------+------------------------+----------------+--------------+
(undercloud) [stack@undercloud ceph-ansible]$ openstack stack list
+--------------------------------------+------------+----------------------------------+--------------------+----------------------+--------------+
| ID | Stack Name | Project | Stack Status | Creation Time | Updated Time |
+--------------------------------------+------------+----------------------------------+--------------------+----------------------+--------------+
| c016b71d-0c73-468d-bed5-baf26d88ea23 | overcloud | d8e1f76b116f467cbe9e60b6c91c80b3 | CREATE_IN_PROGRESS | 2017-09-05T14:30:02Z | None |
| 91370b74-41bd-4923-bacb-c24d98ca148f | ceph | d8e1f76b116f467cbe9e60b6c91c80b3 | CREATE_COMPLETE | 2017-09-05T14:11:04Z | None |
+--------------------------------------+------------+----------------------------------+--------------------+----------------------+--------------+
(undercloud) [stack@undercloud ceph-ansible]$
</pre>
<p>I had set up my virtual hardware by running `quickstart.sh -e @myconfigfile.yml` with <a href="https://github.com/fultonj/oooq/blob/master/myconfigfile.yml">myconfigfile.yml</a>.</p>
<p>
In this scenario I used puppet-ceph to deploy the ceph cluster
and ceph-ansible to deploy the ceph-client, which is the reverse
of a more popular scenario. All four combinations are possible,
though the puppet-ceph method will be deprecated.
</p>Unknownnoreply@blogger.comtag:blogger.com,1999:blog-5384046811194880144.post-37131852643995503492017-06-06T14:34:00.001-04:002017-06-06T14:42:47.678-04:00Accessing a Mistral Environment in a CLI workflowRecently, with some help of the Mistral devs in freenode #openstack-mistral, I was able to create a simple environment and then write a workflow to access it. I will share my example below.
<p>
You can define a mistral environment file in YAML:
<pre>
(undercloud) [stack@undercloud 101]$ cat env.yaml
---
name: "my_env"
variables:
foo: bar
service_ips:
ceph_mon_ctlplane_node_ips:
- "192.168.24.13"
- "192.168.24.15"
(undercloud) [stack@undercloud 101]$
</pre>
You can then ask Mistral to store that enviornment:
<pre>
(undercloud) [stack@undercloud 101]$ mistral environment-create -f yaml env.yaml
Name: my_env
Description: null
Variables: "{\n \"foo\": \"bar\", \n \"service_ips\": {\n \"ceph_mon_ctlplane_node_ips\"\
: [\n \"192.168.24.13\", \n \"192.168.24.15\"\n ]\n\
\ }\n}"
Scope: private
Created at: '2017-06-06 16:31:01'
Updated at: null
(undercloud) [stack@undercloud 101]$
</pre>
Observe it in the environment list:
<pre>
(undercloud) [stack@undercloud 101]$ mistral environment-list
+-------------------+-------------------+---------+-------------------+---------------------+
| Name | Description | Scope | Created at | Updated at |
+-------------------+-------------------+---------+-------------------+---------------------+
| tripleo | None | private | 2017-06-02 | <none> |
| .undercloud- | | | 21:24:12 | |
| config | | | | |
| overcloud | None | private | 2017-06-02 | 2017-06-02 23:32:53 |
| | | | 21:24:21 | |
| ssh_keys | SSH keys for | private | 2017-06-02 | <none> |
| | TripleO | | 21:24:40 | |
| | validations | | | |
| my_env | None | private | 2017-06-06 | <none> |
| | | | 16:32:41 | |
+-------------------+-------------------+---------+-------------------+---------------------+
(undercloud) [stack@undercloud 101]$
</pre>
Look at it directly:
<pre>
(undercloud) [stack@undercloud 101]$ mistral environment-get my_env
+-------------+-----------------------------------------+
| Field | Value |
+-------------+-----------------------------------------+
| Name | my_env |
| Description | <none> |
| Variables | { |
| | "foo": "bar", |
| | "service_ips": { |
| | "ceph_mon_ctlplane_node_ips": [ |
| | "192.168.24.13", |
| | "192.168.24.15" |
| | ] |
| | } |
| | } |
| Scope | private |
| Created at | 2017-06-06 16:32:41 |
| Updated at | <none> |
+-------------+-----------------------------------------+
(undercloud) [stack@undercloud 101]$
</pre>
You can define a workflow which can access the variables in the Mistral environment:
<pre>
---
version: "2.0"
wf:
tasks:
show_env_synax1:
action: std.echo output=<% $.get('__env') %>
on-complete: show_env_synax2
show_env_synax2:
action: std.echo output=<% env() %>
on-complete: show_ips
show_ips:
action: std.echo output=<% env().get('service_ips', {}).get('ceph_mon_ctlplane_node_ips', []) %>
</pre>
You can then have a Mistral worfklow use it by specifying it
as a param as per the
<a href="https://docs.openstack.org/cli-reference/mistral.html#mistral-execution-create">documentation</a>.
<pre>
mistral execution-create workflow_identifier [workflow_input] [params]
</pre>
In [params] we specify the environment name. If your workflow has no [workflow_input],
then pass '' to make it clear your are specifying the environment name with params as
the second argument.
<p>
First we create (or update) our workflow:
<pre>
(undercloud) [stack@undercloud 101]$ mistral workflow-update mistral-env.yaml
+----------------+------+----------------+--------+-------+----------------+----------------+
| ID | Name | Project ID | Tags | Input | Created at | Updated at |
+----------------+------+----------------+--------+-------+----------------+----------------+
| 18e9daee-06db- | wf | f282a331978146 | <none> | | 2017-06-05 | 2017-06-06 |
| 42bc-b0bf- | | ce988911bc5643 | | | 17:04:31 | 19:04:06 |
| 228c19bf2c99 | | 5db4 | | | | |
+----------------+------+----------------+--------+-------+----------------+----------------+
(undercloud) [stack@undercloud 101]$
</pre>
Next we execute our workflow and indicate that the [workflow_input] is empty by passing ''
and after that we pass some JSON specifying that the "env" key should be "my_env" as
defined above:
<pre>
(undercloud) [stack@undercloud 101]$ mistral execution-create wf '' '{"env": "my_env"}'
+-------------------+--------------------------------------+
| Field | Value |
+-------------------+--------------------------------------+
| ID | f2c62c11-d5b6-4698-88af-3ef91240b837 |
| Workflow ID | 18e9daee-06db-42bc-b0bf-228c19bf2c99 |
| Workflow name | wf |
| Description | |
| Task Execution ID | <none> |
| State | RUNNING |
| State info | None |
| Created at | 2017-06-06 19:05:17 |
| Updated at | 2017-06-06 19:05:17 |
+-------------------+--------------------------------------+
(undercloud) [stack@undercloud 101]$
</pre>
As a shortcut we save the UUID of the execution, and use it to get the IDs of the list of tasks:
<pre>
(undercloud) [stack@undercloud 101]$ UUID=f2c62c11-d5b6-4698-88af-3ef91240b837
(undercloud) [stack@undercloud 101]$ mistral task-list $UUID | awk {'print $2'} | egrep -v 'ID|^$'
edf9576b-e4b7-41c9-9d0d-2486e886ce96
5e6559d0-d875-4f30-8567-dfd1dbf7ac32
6a7f2793-41a4-4ef9-8366-4d59f936044d
(undercloud) [stack@undercloud 101]$
</pre>
Next we make sure our ID maps to the task we want to see the output for:
<pre>
(undercloud) [stack@undercloud 101]$ mistral task-get edf9576b-e4b7-41c9-9d0d-2486e886ce96
+---------------+--------------------------------------+
| Field | Value |
+---------------+--------------------------------------+
| ID | edf9576b-e4b7-41c9-9d0d-2486e886ce96 |
| Name | show_env_synax1 |
| Workflow name | wf |
| Execution ID | f2c62c11-d5b6-4698-88af-3ef91240b837 |
| State | SUCCESS |
| State info | None |
| Created at | 2017-06-06 19:05:17 |
| Updated at | 2017-06-06 19:05:18 |
+---------------+--------------------------------------+
(undercloud) [stack@undercloud 101]$
</pre>
So what was the result of using syntax1?
<pre>
(undercloud) [stack@undercloud 101]$ mistral task-get-result edf9576b-e4b7-41c9-9d0d-2486e886ce96
{
"foo": "bar",
"service_ips": {
"ceph_mon_ctlplane_node_ips": [
"192.168.24.13",
"192.168.24.15"
]
}
}
(undercloud) [stack@undercloud 101]$
</pre>
The environment we passed. Note that the more compact syntax2 does the same thing:
<pre>
(undercloud) [stack@undercloud 101]$ mistral task-get-result 6a7f2793-41a4-4ef9-8366-4d59f936044d
{
"foo": "bar",
"service_ips": {
"ceph_mon_ctlplane_node_ips": [
"192.168.24.13",
"192.168.24.15"
]
}
}
(undercloud) [stack@undercloud 101]$
</pre>
What's nice is that we can specifically pick items out with the env() dictionary as shown in the show_ips task.
<pre>
(undercloud) [stack@undercloud 101]$ mistral task-get-result 5e6559d0-d875-4f30-8567-dfd1dbf7ac32
[
"192.168.24.13",
"192.168.24.15"
]
(undercloud) [stack@undercloud 101]$
</pre>
As a refresh the output of the task above, came from the following task:
<pre>
show_ips:
action: std.echo output=<% env().get('service_ips', {}).get('ceph_mon_ctlplane_node_ips', []) %>
</pre>
Unknownnoreply@blogger.comtag:blogger.com,1999:blog-5384046811194880144.post-51129909977063713392017-05-08T15:37:00.000-04:002017-05-09T14:21:46.240-04:00Red Hat Summit 2017: DPDK and HCI<ul>
<li>I am back from Red Hat Summit 2017</li>
<li><a href="https://www.linkedin.com/in/andrew-theurer-4b70385/">Andrew Theurer</a> and I did a <a href="https://rh2017.smarteventscloud.com/connect/sessionDetail.ww?SESSION_ID=104845&tclass=popup">presentation</a> on Hyper-converged OpenStack/Ceph and DPDK workloads</li>
<li>We achieved our goal of proving that you can run VMs, OSDs, and a DPDK workload on the same server.</li>
<li>Andrew was able to run a workload to maintain 5.5 million packets per second per interface, for a total of 11 Mpps for 11 hours; even with some Ceph stroage activity in the middle of that time period.</li>
<li><a href="https://rh2017.smarteventscloud.com/connect/fileDownload/session/FFC04AC230870565C83E18304087CE3D/hci-nfv-rh-summit-2017.pdf">Slides</a> from this session</li>
</ul>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixy67y5bp8sNDfSGYsHWjaRP1BSY2w3wiA3P9UbIzhSzolzb5_65CzaPHwd5rbrlLvizhZB_38qDxqqZ3hPoR1Hy6rbzbjRAvrKPpn9dlB3oME8tt0lZSCKpMddfJqgRdqRPUFsapFiM_Y/s1600/dpdk-hci.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixy67y5bp8sNDfSGYsHWjaRP1BSY2w3wiA3P9UbIzhSzolzb5_65CzaPHwd5rbrlLvizhZB_38qDxqqZ3hPoR1Hy6rbzbjRAvrKPpn9dlB3oME8tt0lZSCKpMddfJqgRdqRPUFsapFiM_Y/s320/dpdk-hci.png" width="320" height="181" /></a></div>Unknownnoreply@blogger.comtag:blogger.com,1999:blog-5384046811194880144.post-25203450339834007342017-04-18T09:26:00.000-04:002017-04-18T09:26:17.191-04:00openstack baremetal introspection data save<p>
I am happy with python-ironic-inspector-client 1.4.0 (Pike and newer) as I can more easily access my introspection data with:
<pre>
openstack baremetal introspection data save $UUID
</pre>
<p>In the past I used to use a <a href="https://github.com/fultonj/derived-tht-poc/blob/4eb77d4bcf080959ff5c63019d7be1357ab7216b/ironic_download.sh">script</a> to do the above.</p>
<p>
For example, I quickly use it after using <a href="https://docs.openstack.org/developer/tripleo-quickstart/node-configuration.html">quickstart</a> to make sure that my ceph flavor got its extra disks. When I run quickstart with `-e @myconfigfile.yml` where myconfigfile.yml contains a control flavor and a ceph flavor like so:</p>
<pre>
overcloud_nodes:
- name: control_0
flavor: control
virtualbmc_port: 6230
- name: ceph_0
flavor: ceph
virtualbmc_port: 6231
</pre>
<p>
Then the
<a href="https://github.com/openstack/tripleo-quickstart/blob/ecb109d647b0cf9a5640abf5d44ff9993318ffdc/roles/common/defaults/main.yml#L73-L77">ceph flavor gets the extradisks boolean set to true</a>. So when I first SSH into my deployed undercloud and simply run the following commands, then I can verify that my introspection data does contain the extra disks.
</p>
<pre>
[stack@undercloud ~]$ openstack baremetal node list
+-----------------------+-----------+---------------+-------------+--------------------+
| UUID | Name | Instance UUID | Power State | Provisioning State |
+-----------------------+-----------+---------------+-------------+--------------------+
| 4bbe35d4-9c79-4b80 | control-0 | None | power off | available |
| -816c-fca8a9f8a895 | | | | |
| bd9123b8-01a2-48ea-a2 | ceph-0 | None | power off | available |
| 47-2d38cfaa1102 | | | | |
+-----------------------+-----------+---------------+-------------+--------------------+
[stack@undercloud ~]$
openstack baremetal introspection data save control-0 > control-0
openstack baremetal introspection data save ceph-0 > ceph-0
[stack@undercloud ironic]$ cat ceph-0 | jq "." | grep dev
"name": "/dev/vda",
"name": "/dev/vdb",
"name": "/dev/vdc",
"name": "/dev/vdd",
"name": "/dev/vda",
[stack@undercloud ironic]$
</pre>
<p>More details on `openstack baremetal introspection data save` in the document <a href="https://docs.openstack.org/developer/tripleo-docs/advanced_deployment/introspection_data.html">Accessing Introspection Data</a>.</p>
Unknownnoreply@blogger.comtag:blogger.com,1999:blog-5384046811194880144.post-36379046524405474892017-04-13T15:35:00.001-04:002017-04-13T15:35:47.306-04:00Ceph OSDs and Systemd Basics<p>
As of Infernalis and then into Jewel/RHCS2, Ceph uses
<a href="https://www.freedesktop.org/wiki/Software/systemd/">systemd</a>
to start services when installed on a Red Hat based system. Prior to
that, e.g. Hammer/RHCS1.3, it used SysV init.
<p>
When <a href="https://github.com/openstack/puppet-ceph">puppet-ceph</a>
or <a href="https://github.com/ceph/ceph-ansible">ceph-ansible</a>
configure Ceph OSD services, they do not need to run commands like:
<pre>
systemctl enable ceph-osd@0
</pre>
because those tools call `ceph-disk` (implemented in Python) directly
to prepare and activate the OSDs and then `ceph-disk` enables the
service in systemd. Thus, after puppet-ceph runs on your system you
can see evidence of the service being systemd enabled even though you
won't see anything like `systemctl enable ceph-osd@$i` in the module
itself:
<pre>
$ journalctl | grep "Created symlink from /run/systemd/system/ceph-osd.target.wants/ceph-osd"
Apr 12 19:16:01 compute-1.localdomain os-collect-config[1921]: Notice:
/Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-activate-/srv/data]/returns:
Created symlink from /run/systemd/system/ceph-osd.target.wants/ceph-osd@0.service
to /usr/lib/systemd/system/ceph-osd@.service.
</pre>
Note that the symlink is in /run/systemd/ and not /etc/systemd/ as per a
somewhat recent
<a href="https://github.com/ceph/ceph/commit/539385b143feee3905dceaf7a8faaced42f2d3c6">
commit</a> which adds the `--runtime` option.
<p>
To see if your OSDs are running with the --runtime option use
something like the following. In the example below the --runtime
was not used:
<pre>
[stack@hci-director ~]$ ansible osds -b -m shell -a "systemctl list-unit-files | grep ceph | grep osd"
192.168.1.26 | SUCCESS | rc=0 >>
ceph-osd@.service enabled
ceph-osd.target enabled
192.168.1.28 | SUCCESS | rc=0 >>
ceph-osd@.service enabled
ceph-osd.target enabled
192.168.1.31 | SUCCESS | rc=0 >>
ceph-osd@.service enabled
ceph-osd.target enabled
[stack@hci-director ~]$
</pre>
<p>
In this example the --runtime option was used:
<pre>
[stack@hci-director ~]$ ansible osds -b -m shell -a "systemctl list-unit-files | grep ceph | grep osd"
192.168.1.25 | SUCCESS | rc=0 >>
ceph-osd@.service enabled-runtime
ceph-osd.target enabled
192.168.1.23 | SUCCESS | rc=0 >>
ceph-osd@.service enabled-runtime
ceph-osd.target enabled
192.168.1.27 | SUCCESS | rc=0 >>
ceph-osd@.service enabled-runtime
ceph-osd.target enabled
[stack@hci-director ~]$
</pre>
If you have directory-based OSDs, then I recommend they be enabled
without --runtime.
<p>
If you want to restart your OSDs by sequentially running the
<a href="https://github.com/ceph/ceph/blob/master/systemd/ceph-osd%40.service">
ceph-osd@.service</a> for each OSD ID, then you may do so like this:
<pre>
OSD_IDS=$(ls /var/lib/ceph/osd | awk 'BEGIN { FS = "-" } ; { print $2 }')
for OSD_ID in $OSD_IDS; do
systemctl status ceph-osd@$OSD_ID
systemctl restart ceph-osd@$OSD_ID
systemctl status ceph-osd@$OSD_ID
done
</pre>
<p>
It is not necessary to start them sequentially however as
<a href="https://github.com/ceph/ceph/blob/master/systemd/ceph-osd.target">
ceph-osd.target</a> will start them all. You can verify this is working by
stopping your OSD directly and then restarting only the target. You
can then see that the individual OSD was started:
<pre>
ls /var/lib/ceph/osd/
i=1
systemctl stop ceph-osd@$i
systemctl status ceph-osd@$i
systemctl status ceph-osd.target
systemctl restart ceph-osd.target
systemctl status ceph-osd@$i
</pre>
Unknownnoreply@blogger.comtag:blogger.com,1999:blog-5384046811194880144.post-74344185198223438152017-04-06T10:56:00.000-04:002017-04-06T10:56:02.285-04:00TripleO Ceph Ansible Spec MergedThe <a href="https://specs.openstack.org/openstack/tripleo-specs/specs/pike/tripleo-ceph-ansible-integration.html">TripleO Ceph Ansible Spec</a> merged today. It's going to be a busy cycle :) Unknownnoreply@blogger.comtag:blogger.com,1999:blog-5384046811194880144.post-59481543826073938342017-04-05T17:11:00.001-04:002017-04-05T17:11:23.690-04:00openstack server image create.. the hard way<p>If you need to rescue a nova instance when <a href="https://docs.openstack.org/developer/python-openstackclient/command-objects/server-image.html">openstack server image create</a> isn't working and its backend is ceph, then here's how I did it for a pet called demo3 (all commands run on compute node except those starting with "openstack", which was run on the undercloud).</p>
<pre>
openstack server show demo3
# instance is running on overcloud-osd-compute-3 as instance-0000002b
virsh dumpxml instance-0000002b | grep rbd
# I see it is in rbd:vms/e6674b4d-40f4-4af3-b16d-c1ee37a3e1a6_disk
openstack server suspend demo3
# quiesce your instance
qemu-img info rbd:vms/e6674b4d-40f4-4af3-b16d-c1ee37a3e1a6_disk
# verify you have an image qemu-img can read
qemu-img snapshot -c demo3-snap1 rbd:vms/e6674b4d-40f4-4af3-b16d-c1ee37a3e1a6_disk
# snapshot your image
qemu-img info rbd:vms/e6674b4d-40f4-4af3-b16d-c1ee37a3e1a6_disk
# verify snapshot exists on rbd
rbd -p vms ls -l
# observe the e6674b4d-40f4-4af3-b16d-c1ee37a3e1a6_disk@demo3-snap1
qemu-img convert rbd:vms/e6674b4d-40f4-4af3-b16d-c1ee37a3e1a6_disk@demo3-snap1 demo3-snap1.raw
# pickle your snapshot to a local image file (took 35 seconds)
qemu-img info demo3-snap1.raw
# verify the export is a readable by qemu-img
# now we have demo3-snap1.raw to import or even save offline
openstack server resume demo3
# resume your instance (confirmed it answered `ping 10.1.1.9`)
</pre>
We were then able to use `openstack image create demo3-image1 --disk-format=raw --container-format=bare < demo3-snap1.raw` to import that image of the instance into glance so that it may live again. Unknownnoreply@blogger.comtag:blogger.com,1999:blog-5384046811194880144.post-91246855261977659242017-04-05T06:56:00.002-04:002017-04-05T07:14:09.488-04:00TripleO backports and dealing with unclean cherry picks<p>
Sometimes using a <a href="http://think-like-a-git.net/sections/rebase-from-the-ground-up/cherry-picking-explained.html">git cherry-pick</a> to do a backport is easy because you simply use the "Cherry Pick" button in Gerrit's web UI. Other times you get a merge conflict that's resolvable on the CLI. The Contributor Guide's <a href="https://docs.openstack.org/contributor-guide/additional-git-workflow/cherry-pick.html">Cherry pick a change</a> is the thing to read in that case, but it assumes a clean cherry pick in the end. Sometimes it's practical to use a cherry pick to set up a change but then abort the cherry pick and manually submit an unclean cherry pick of a <em>clean change</em> for review. I recently had to do this and learned a few things from TripleO cores in IRC so I want to share what I learned here in case it helps others. I'm no expert at this but I have a process I can follow to do this again without any issues.</p>
<p>
The following changes were made in TripleO/Ocata to the following repositories:
<ul>
<li><a href="https://review.openstack.org/#/c/411987">THT gerrit 411987</a></li>
<li><a href="https://review.openstack.org/#/c/411984">puppet-tripleo gerrit 411984</a></li>
<li><a href="https://review.openstack.org/#/c/411983">puppet-nova gerrit 411983</a></li>
</ul>
<p>
These changes are important for running OpenStack on Ceph with more
than 700 block-device backed OSDs and should be backported for those
running TripleO/Newton.
</p>
<p>
Stable backport policy requires a bug in Launchpad. I opened
<a href="https://bugs.launchpad.net/tripleo/+bug/1673995">1673995</a>.
I then viewed the original changes in gerrit and was able to click
cherry-pick and enter "stable/newton". The ones
for <a href="https://review.openstack.org/#/c/442970">puppet-triple</a>
and <a href="https://review.openstack.org/#/c/442969">puppet-nova</a>
went cleanly and I got two new gerrit IDs. I changed their topic to
bug/1673995.
</p>
<p>
The attempted GUI backport
for <a href="https://review.openstack.org/#/c/411987">THT</a> had a
conflict so I had to resolve it by the command line with the following
process.
</p>
<ol>
<li>Get a clean copy of the repo you need to backport to (I normally do this with a <a href="https://github.com/fultonj/oooq/blob/b313c8d340135fbc2a8312ca692c7400788006d7/setup-deploy-artifacts.sh">script</a>)
<pre>
git config --global gitreview.username fultonj
git clone https://git.openstack.org/openstack/tripleo-heat-templates.git
cd tripleo-heat-templates
git remote add gerrit ssh://fultonj@review.openstack.org:29418/openstack/tripleo-heat-templates.git
git review -s
git fetch origin
</pre>
</li>
<li>Create a topic branch for the bug from the stable branch
<pre>
git checkout -b bug/1673995 remotes/origin/stable/newton
</pre>
</li>
<li>
On <a href="https://review.openstack.org/#/c/41198"7>the review page</a>
for the THT change, click the Download pull-down menu and copy the
cherry pick command. Running the cherry pick command for me looked
like the following:
<pre>
[jfulton@skagra tripleo-heat-templates{bug/1673995}]$ git fetch \
https://review.openstack.org/openstack/openstack-manuals refs/changes/34/235734/1 \
&& git cherry-pick -x FETCH_HEAD
warning: no common commits
remote: Counting objects: 107419, done
remote: Finding sources: 100% (107419/107419)
remote: Total 107419 (delta 66297), reused 91954 (delta 66297)
Receiving objects: 100% (107419/107419), 372.15 MiB | 2.92 MiB/s, done.
Resolving deltas: 100% (66297/66297), done.
From https://review.openstack.org/openstack/openstack-manuals
* branch refs/changes/34/235734/1 -> FETCH_HEAD
error: could not apply c95d624... Spelling miss in Networking Guide
hint: after resolving the conflicts, mark the corrected paths
hint: with 'git add <paths>' or 'git rm <paths>'
hint: and commit the result with 'git commit'
[jfulton@skagra tripleo-heat-templates{bug/1673995}]$
</pre>
</li>
<li>
I knew there would be a conflict but was advised to start with the
`cherry-pick -x` to get the reference to what commit the changes came
from; even when you know there will be manual changes required.
From there its normal to leave the conflicts that won't be committed
and commit only the changes that need to be made. Also, note in the
commit message, or otherwise comment about this that the proposed
commit wasn't a clean cherry pick so that reviewers know to pay extra
attention.
</p>
</li>
<li>
At this point I abort the cherry pick to get ready for manual clean up.
<pre>
[jfulton@skagra tripleo-heat-templates{bug/1673995}]$ git status
On branch bug/1673995
Your branch is up-to-date with 'origin/stable/newton'.
You are currently cherry-picking commit c95d624.
(fix conflicts and run "git cherry-pick --continue")
(use "git cherry-pick --abort" to cancel the cherry-pick operation)
Unmerged paths:
(use "git add/rm <file>..." as appropriate to mark resolution)
deleted by us: doc/networking-guide/source/adv_config_sriov.rst
no changes added to commit (use "git add" and/or "git commit -a")
[jfulton@skagra tripleo-heat-templates{bug/1673995}]$ git cherry-pick --abort
[jfulton@skagra tripleo-heat-templates{bug/1673995}]$
</pre>
</li>
<li>
I don't need to `git rm doc/networking-guide/source/adv_config_sriov.rst`
as it's not staged for commit.
</li>
<li>
I then manually edit `puppet/services/nova-libvirt.yaml` to add the
three required lines from the upstream change.
<pre>
nova::compute::libvirt::qemu::configure_qemu: true
nova::compute::libvirt::qemu::max_files: 32768
nova::compute::libvirt::qemu::max_processes: 131072
</pre>
</li>
<li>
From there I `git add puppet/services/nova-libvirt.yaml` and `git
commit` which brings me to writing the commit message for this type of
change.
</li>
<li>
In my case I copied/pasted the commit message from the original patch
but added "Unclean cherry-pick from I1e79675f6aac1b0fe6cc7269550fa6bc8586e1fb".
Be sure to update the commit message to keep the Change-ID the same as
the cherry-pick, even if there were coflicts. When this happens add
the conflicts: to the commit message but leave the Change-ID the same.
(I had <a href="https://review.openstack.org/#/c/448122/2//COMMIT_MSG">originally
overlooked this</a> and let a new change ID be generated but the
commmit file can just be edited to set a new change ID).
</li>
<li>
Like the original patch, I set the dependencies so that all three of
them would be tested together in CI.
</li>
<li>After the
<a href="https://review.openstack.org/#/q/topic:bug/1673995">changes for the bug</a>
merged, the status for the launch pad bug could be set to "Fix
Committed".</li>
</ol>
Unknownnoreply@blogger.comtag:blogger.com,1999:blog-5384046811194880144.post-46307866784184591372017-04-03T11:42:00.000-04:002017-04-03T11:42:06.297-04:00Finding the right NUMA node for HCI with DPDK<p>
Q: Which NUMA node do I use so that a process I want to run doesn't have to jump NUMA boundaries?
</p>
<p>
A: Find the numa node (e.g. 0 or 1) using `lstopo-no-graphics`. In the example below, I see that em1 is on the same IRQ as my ceph disks and I deployed my overcloud using em1 to host my ceph storage networks. Thus, I'm going to use numactl -t prefered to start my OSD processes on NUMA node 0 (e.g. see this <a href="https://github.com/RHsyseng/hci/blob/master/custom-templates/post-deploy-template.yaml#L23">post deploy template</a>).</p>
<p>
Similarly, I see that my p4p1 and p4p2 NICs are on the other NUMA node. I'm doing HCI and want to run DPDK so I am going to tell the DPDK processes to use the second NUMA node.
</p>
<pre>
[stack@c10-h01-r730xd ~]$ lstopo-no-graphics
Machine (128GB total)
NUMANode L#0 (P#0 64GB)
Package L#0 + L3 L#0 (25MB)
L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0
PU L#0 (P#0)
PU L#1 (P#20)
L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1
PU L#2 (P#2)
PU L#3 (P#22)
L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2
PU L#4 (P#4)
PU L#5 (P#24)
L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3
PU L#6 (P#6)
PU L#7 (P#26)
L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4
PU L#8 (P#8)
PU L#9 (P#28)
L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5
PU L#10 (P#10)
PU L#11 (P#30)
L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6
PU L#12 (P#12)
PU L#13 (P#32)
L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7
PU L#14 (P#14)
PU L#15 (P#34)
L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8
PU L#16 (P#16)
PU L#17 (P#36)
L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9
PU L#18 (P#18)
PU L#19 (P#38)
HostBridge L#0
PCIBridge
PCI 1000:005d
Block(Disk) L#0 "sda"
Block(Disk) L#1 "sdb"
Block(Disk) L#2 "sdc"
Block(Disk) L#3 "sdd"
Block(Disk) L#4 "sde"
Block(Disk) L#5 "sdf"
Block(Disk) L#6 "sdg"
Block(Disk) L#7 "sdh"
Block(Disk) L#8 "sdi"
Block(Disk) L#9 "sdj"
Block(Disk) L#10 "sdk"
Block(Disk) L#11 "sdl"
Block(Disk) L#12 "sdm"
Block(Disk) L#13 "sdn"
Block(Disk) L#14 "sdo"
Block(Disk) L#15 "sdp"
Block(Disk) L#16 "sdq"
PCIBridge
PCI 8086:1572
Net L#17 "em1"
PCI 8086:1572
Net L#18 "em2"
PCIBridge
PCIBridge
PCIBridge
PCI 144d:a820
PCIBridge
PCI 8086:1521
Net L#19 "em3"
PCI 8086:1521
Net L#20 "em4"
PCIBridge
PCIBridge
PCIBridge
PCIBridge
PCI 102b:0534
GPU L#21 "card0"
GPU L#22 "controlD64"
NUMANode L#1 (P#1 64GB)
Package L#1 + L3 L#1 (25MB)
L2 L#10 (256KB) + L1d L#10 (32KB) + L1i L#10 (32KB) + Core L#10
PU L#20 (P#1)
PU L#21 (P#21)
L2 L#11 (256KB) + L1d L#11 (32KB) + L1i L#11 (32KB) + Core L#11
PU L#22 (P#3)
PU L#23 (P#23)
L2 L#12 (256KB) + L1d L#12 (32KB) + L1i L#12 (32KB) + Core L#12
PU L#24 (P#5)
PU L#25 (P#25)
L2 L#13 (256KB) + L1d L#13 (32KB) + L1i L#13 (32KB) + Core L#13
PU L#26 (P#7)
PU L#27 (P#27)
L2 L#14 (256KB) + L1d L#14 (32KB) + L1i L#14 (32KB) + Core L#14
PU L#28 (P#9)
PU L#29 (P#29)
L2 L#15 (256KB) + L1d L#15 (32KB) + L1i L#15 (32KB) + Core L#15
PU L#30 (P#11)
PU L#31 (P#31)
L2 L#16 (256KB) + L1d L#16 (32KB) + L1i L#16 (32KB) + Core L#16
PU L#32 (P#13)
PU L#33 (P#33)
L2 L#17 (256KB) + L1d L#17 (32KB) + L1i L#17 (32KB) + Core L#17
PU L#34 (P#15)
PU L#35 (P#35)
L2 L#18 (256KB) + L1d L#18 (32KB) + L1i L#18 (32KB) + Core L#18
PU L#36 (P#17)
PU L#37 (P#37)
L2 L#19 (256KB) + L1d L#19 (32KB) + L1i L#19 (32KB) + Core L#19
PU L#38 (P#19)
PU L#39 (P#39)
HostBridge L#11
PCIBridge
PCI 8086:1572
Net L#23 "p4p1"
PCI 8086:1572
Net L#24 "p4p2"
[stack@c10-h01-r730xd ~]$
</pre>
<p>
Q: Now that I know I want numa node 1, how do I know which CPUs to pass to the HostCpusList in <a href="https://github.com/openstack/tripleo-heat-templates/blob/12fbad1345c34a7c6f7da7490dbb0601b8f90fe9/puppet/services/neutron-ovs-dpdk-agent.yaml#L21-L25">THT</a>?
</p>
<p>
A: Use `lscpu`
</p>
<pre>
[stack@c10-h01-r730xd ~]$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 40
On-line CPU(s) list: 0-39
Thread(s) per core: 2
Core(s) per socket: 10
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 63
Model name: Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz
Stepping: 2
CPU MHz: 2574.113
BogoMIPS: 4603.71
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 25600K
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
[stack@c10-h01-r730xd ~]$
</pre>
<p>
For example, I can see that NUMA node1's CPUs are the odd numbered ones. Thus, the settings look like this:
<pre>
# Add a list or range of physical CPU cores to be reserved for virtual machine processes:
NovaVcpuPinSet: ['9,11,13,15']
# Set a list or range of physical CPU cores to be tuned:
HostCpusList: '1,3,5,7,9,11,13,15'
</pre>
Unknownnoreply@blogger.comtag:blogger.com,1999:blog-5384046811194880144.post-69261335069800017932017-03-29T11:38:00.000-04:002017-03-29T13:34:29.415-04:00Ironic Metadata Disk Cleaning instead of a first-boot zap disk <p>I verified that Ironic Metadata Disk Cleaning works well with 10 Dell RX730s and OSPd/Tripleo on OSP10/Newton.</p>
<p>This was not via TripleO's "clean_nodes=True" param but purely a change in Ironic. I used the following steps to turn it on after deploying the undercloud.</p>
<p>Identify the neutron UUID of the TripleO ctlplane:</p>
<pre>
[stack@c10-h01-r730xd ~]$ neutron net-list
+--------------------------------------+----------+----------------------------------------+
| id | name | subnets |
+--------------------------------------+----------+----------------------------------------+
| 40a26da2-bcc6-47c9-b308-49c8d6911f8d | ctlplane | 5541d13e-3d44-442b-b2c3-1c99bc959861 |
| | | 192.0.2.0/24 |
+--------------------------------------+----------+----------------------------------------+
[stack@c10-h01-r730xd ~]$
</pre>
Modify ironic.conf:
<pre>
[conductor]
automated_clean = True
[deploy]
erase_devices_priority = 0
erase_devices_metadata_priority = 10
[neutron]
cleaning_network_uuid = $UUID
</pre>
For example:
<pre>
[root@c10-h01-r730xd ironic]# egrep "clean|erase" /etc/ironic/ironic.conf | egrep -v \#
automated_clean = True
erase_devices_priority = 0
erase_devices_metadata_priority = 10
cleaning_network_uuid = 40a26da2-bcc6-47c9-b308-49c8d6911f8d
[root@c10-h01-r730xd ironic]#
</pre>
Bounce the ironic conductor service.
<pre>
systemctl restart openstack-ironic-conductor.service
</pre>
<p>
Once that's done, merely trying to put the nodes into the ironic state "available" will put them in the "cleaning" state first and then clean the disks before they finally get set to the "available" state. I set the state on my nodes out of available and back with the following:</p>
<pre>
for ironic_id in $(ironic node-list | awk {'print $2'} | grep -v UUID | egrep -v '^$'); do
ironic node-set-provision-state $ironic_id manage;
done
</pre>
<pre>
for ironic_id in $(ironic node-list | awk {'print $2'} | grep -v UUID | egrep -v '^$'); do
ironic node-set-provision-state $ironic_id provide;
done
</pre>
<p>Simply by doing the above to bash loops the nodes were booted on a ram disk and every disk, including the root disk (e.g. /dev/sda), had its metadata removed. After that `ceph-disk prepare` was able to make the server's disks into OSDs without any problems even though I did not use my <a href="https://github.com/RHsyseng/hci/commit/9962912333d44ef43d6c67d7f3dae8771fc6523e#diff-90d12d5e31f7b7b34a42faeee2d32323">first-boot Heat template</a> which I used to use to wipe the disks.</p>
<p>After running `openstack stack delete overcloud --yes --wait` I see that the nodes are automatically turned back on with the following status as it cleans the nodes. After that the nodes go back to "power off" of "available". The cleaning process takes about 3 minutes so I'm pretty happy with it</p>
<pre>
[stack@c10-h01-r730xd ~]$ ironic node-list
+--------------------------------------+-----------------+---------------+-------------+--------------------+-------------+
| UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+-----------------+---------------+-------------+--------------------+-------------+
| 014479db-e90a-4837-8834-7edea44a91fc | h03-control-mon | None | power on | clean wait | False |
| 64ba2e30-ba46-4ac7-93ce-126c6da0da65 | h07-control-mon | None | power on | clean wait | False |
| 6a1f895a-dd1f-42db-b0f8-b11303168561 | h09-control-mon | None | power on | clean wait | False |
| c1b93d79-a92a-49af-9b66-79aab861b395 | h11-compute-osd | None | power on | clean wait | False |
| 48c47e63-6bc2-4f0c-937f-c0f2397cd194 | h13-compute-osd | None | power on | clean wait | False |
| 8254d5fd-600a-4607-a0b0-38b4c95b22df | h15-compute-osd | None | power on | clean wait | False |
| b142b1b6-63c8-44af-9c0b-23788219e318 | h17-compute-osd | None | power on | clean wait | False |
| deef869f-519b-4b6f-9fad-999f376f5b98 | h19-compute-osd | None | power on | clean wait | False |
| 43fc22fa-49c4-4a99-b799-7aece8e359f6 | h21-compute-osd | None | power on | clean wait | False |
| 29a41539-5b12-4545-8680-595d6b4dceb2 | h23-compute-osd | None | power on | clean wait | False |
+--------------------------------------+-----------------+---------------+-------------+--------------------+-------------+
[stack@c10-h01-r730xd ~]$
</pre>
Unknownnoreply@blogger.com