The steps to make a PXE-bootable node were the following:

  1. Create a virtual node and uninstall everything that is not needed, making it as small as possible
  2. Create an initrd image from the node ( the kernel can be deleted from the image to save more space )
  3. Set up a DHCP and a TFTP-HPA server for PXE booting
  4. Place a script in the node to apply the configuration
  5. Create some directories at the root of the TFTP server to provide specified files to each node

This is my infrastructure:

DHCP server

A shared configuration was applied to serve the LAN (192.1680.x ) and the OpenStack dedicated network ( 10.0..0x).

class "openstack" {
match substring (hardware, 1, 6); 
}
#G6 node01
subclass "openstack" xx:xx:xx:xx:x:x;
#G7 node02
subclass "openstack" xx:xx:xx:xx:x:x;
#HP2 node03
subclass "openstack" xx:xx:xx:xx:x:x;
...
max-lease-time 7200;
option domain-name-servers 208.67.222.222, 208.67.220.220;
shared-network LOCAL-NET {
  subnet 192.168.0.0 netmask 255.255.255.0 {
   option broadcast-address 192.168.0.255;
   option routers 192.168.0.1;
  }
  subnet 10.0.0.0 netmask 255.255.255.0 {
   option subnet-mask 255.255.255.0;
   option routers 10.0.0.1;
}
  pool {
    deny members of "openstack";
    range 192.168.0.111 192.168.0.150;
  }
  pool {
    allow members of "openstack";
    range 10.0.0.200 10.0.0.200;
    deny unknown-clients;
  }
host node01-nova {
   hardware ethernet xx:xx:xx:xx:x:x;
   fixed-address  10.0.0.51;
   option routers 10.0.0.1;
   option host-name "node01";
   next-server 10.0.0.2;
   filename "pxelinux.0";user
}
host node02-nova {
   hardware ethernet xx:xx:xx:xx:x:x;
   fixed-address  10.0.0.52;
   option routers 10.0.0.1;
   option host-name "node02";
   next-server 10.0.0.2;
   filename "pxelinux.0";
}
...
host desktop2 {
    hardware ethernet xx:xx:xx:xx:x:x;
    fixed-address 192.168.0.100;
}
log-facility local7;

TFTPD-HPA server

The TFTPD-HPA components ( configuration files, node image, other OS images, etc ) are on ZFS storage and the TFTP_DIRECTORY has been changed from /srv/tftp to the ZFS dataset.

This is the configuration of the TFTP server in pxelinux.cfg/default:

#default menu.c32
default nova
prompt 0
timeout 30

menu title OpenSOpenStacktack Nova PXE

label node
   kernel nova/vmlinuz
   append rw root=/dev/ram0 initrd=nova/initrd.img TFTP=10.0.0.2
#TFTP=xxx should always be the last parameter or we have to modify the nodeconfig.sh script

label TINY
   kernel tiny/linux
   append rw root=/dev/ram0 initrd=tiny/initrd.xz
...

The TFTP entry on the node menu is also set explicitly by passing the network during boot, then parsing it from /proc/cmdline.

Each node has a directory on the TFTP server to get some configuration files:

10-0-0-51, 10-0-0-52, etc.

At the moment, these contain the hosts file, hostname and a script that will be executed on the node at boot time.

This script can pass any command, which will be executed as root. At the moment it only adds the controller and a client ssh pub key together with the NFS settings ( one entry on the fstat + the mount command ).

This can be easily extended to address any future needs.

Booting process

node-config.service 
[Unit]
Description=Configure Openstack nodes
After=network.service

[Service]
ExecStart=sudo /usr/local/bin/nodeconfig.sh

[Install]
WantedBy=default.target
 #===============================================================================
#
#          FILE: nodeconfig.sh
#
#         USAGE: nodeconfig.sh
#
#   DESCRIPTION: This script will set the network interfaces and the nova config files
#
#       OPTIONS: ---
#  REQUIREMENTS: ---
#          BUGS: ---
#         NOTES: To use with pxe boot
#        AUTHOR: Massimo Bollati (), max@bollati.info
#  ORGANIZATION: 
#       CREATED: 22/03/21 15:48:38
#      REVISION:  ---
#===============================================================================
#!/bin/bash
#---------------------------------------------------------------------------------
#  Stopping the node service ans libvirtd  until we have an hostname
#  otherwise the node will be registered as localhost !
#---------------------------------------------------------------------------------
systemctl stop nova-compute.service
systemctl stop libvirtd* ;pkill libvirtd;rm /run/libvird.pid
#---------------------------------------------------------------------------------
#	Starting the dhcpclient to get IP/GW 
#---------------------------------------------------------------------------------
dhclient -v ; sleep 20
#---------------------------------------------------------------------------------
#	The pxelinux command line should have a custom parameter to get the subnet 
#---------------------------------------------------------------------------------
TFTP=$(cat /proc/cmdline|awk -F"TFTP=" '{split($2,a," ");printf a[1]}')
SUB=$(echo $TFTP|awk -F. '{print $1"."$2"."$3".*"}')
IFC=$(ip -f inet a |awk -v SUB="$SUB"  -F ": " '$0 ~ SUB {split(a,b); print b[2]}{a = $0}')
IFC2=$(ip  -f inet l |awk -F ": " '/state UP/  {print $2}' |grep -v "$IFC"|head -1)
IP=$(ip -f inet a |awk -v SUB="$SUB" '$0 ~ SUB {sub(/\/../,"");print $2}')
GW=$(ip r s default | awk '{print $3}')
SSH=$(cat /proc/cmdline|awk -F "sshpub=" '{print$2}'|awk -F " " '{print $1" "$2}')
#------------------------------------------------------------------------------------
mv /etc/systemd/network/KKK.network /etc/systemd/network/"$IFC2".network
sed -i s/KKK/"$IFC2"/ /etc/systemd/network/"$IFC2".network
sed -i s/KKK/"$IFC2"/ /etc/netplan/00-installer-config.yaml
sed -i s/XXX/"$IFC"/ /etc/netplan/00-installer-config.yaml
sed -i s/YYY/"$IP"/ /etc/netplan/00-installer-config.yaml
sed -i s/ZZZ/"$GW"/ /etc/netplan/00-installer-config.yaml
echo "$IP"| sed -e s/"\."/-/g |tee /etc/hostname  /etc/hostname-old
#------------------------------------------------------------------------------------
netplan apply
#-------------------------------------------------------------------------------------
#	Setting the ssh auth according with boot par. (../tfp/pxelinux.cfg/default)
#-------------------------------------------------------------------------------------
if [ -n "$SSH" ]; then
echo "$SSH" >> /root/.ssh/authorized_keys;fi
#-------------------------------------------------------------------------------
#	Openstack settings 
#-------------------------------------------------------------------------------
sed -i s/XXX/"$IFC2"/ /etc/neutron/plugins/ml2/linuxbridge_agent.ini
find /etc/neutron/plugins/ml2/linuxbridge_agent.ini /etc/nova/nova.conf -exec sed -i "s/YYY/"$IP"/g"  {} \; 
#-------------------------------------------------------------------------------
# 	Getting same config files from the host
#-------------------------------------------------------------------------------
DIR=$(cat /etc/hostname-old) 
tftp  "$TFTP" -l -c get "$DIR"/hosts /etc/hosts
tftp  "$TFTP" -l -c get "$DIR"/hostname /etc/hostname
tftp  "$TFTP" -l -c get "$DIR"/settings.sh /usr/local/bin/settings.sh 2>/dev/null
#------------------------------------------------------------------------------------
# The hostname should be set and we can start libvirt and nova
#------------------------------------------------------------------------------------
hostnamectl set-hostname $(cat /etc/hostname) && systemctl start libvirtd && systemctl start nova-compute.service
#------------------------------------------------------------------------------------
# If there is a script to be pulled from the host, get it and execute
#------------------------------------------------------------------------------------
chmod +x /usr/local/bin/settings.sh 2>/dev/null
if [ -x /usr/local/bin/settings.sh ]; then /usr/local/bin/settings.sh ; fi

Nodes have either 2 or 6 network cards; in both cases, using the first interface as main and second for the bridge works well.

As can be seen, the script can download any file from the node-dedicated directory on the TFTP server. This means the procedure can be made even simpler, as all configs can be passed through files.

The subnet can be set starting from the IP assigned from the DHCP server, but passing it via cmdline seems to be more reliable.

OpenStack installation

To better explore OpenStack's complexity, I opted not to use Ansible or other deployment software, going for a manual installation + configuration. This helped me better understand (and troubleshoot) the infrastructure.

During the installation I had an issue with the systemd libvirtd.service which meant I had to comment Type=notify to avoid libvirtd crashes and tweak arp_protect.py, as described here:

https://ask.openstack.org/en/question/128445/linuxbridge-agent-failing-on-vm-shutdown/

At the moment, using only NFS storage, migration is working well.

Some pictures of the infrastructure

VXLAN