mirror of
https://pagure.io/fedora-infra/ansible.git
synced 2026-03-20 03:57:02 +08:00
HOW TO SETUP A NEW HYPERVISOR
=============================
First make sure you understand how resalloc configuration works
(pools.yaml contents and format), and that the referenced scripts (like
'libvirt-new' and 'vm-delete') can correctly identify the pool/hypervisor
(those scripts likely need to be updated when you are adding new
hypervisor). Especially check that the scripts can assign (and later
remove) appropriate IPv6 addresses to builders, work with swap, etc.
Before the groups/copr-hypervisor.yml is run against the new host,
manually prepare the disk layout. Typically there are many disks we might
be tempted to place everything into raid6, but this would be suboptimal!
We do something like this instead (on each of those disks):
sdd 8:48 0 279.4G 0 disk
├─sdd1 8:49 0 8M 0 part
├─sdd2 8:50 0 1G 0 part
│ └─md0 9:0 0 1022M 0 raid1 /boot
├─sdd3 8:51 0 40G 0 part [SWAP]
├─sdd4 8:52 0 1K 0 part
├─sdd5 8:53 0 10G 0 part
│ └─md127 9:127 0 60G 0 raid6
│ └─vg_server-root 253:0 0 32G 0 lvm /
└─sdd6 8:54 0 228.4G 0 part
└─md126 9:126 0 1.8T 0 raid0
└─vg_guests-images 253:1 0 1.8T 0 lvm /libvirt-images
The layout description:
1. I.e. /boot (or efi) partition is on raid1 spread across all (even 10)
the disks.
2. There's SWAP partition, namely because if there are multiple SWAP
partitions, with the same priority mount option, kernel spreads the
swap use among all the partitions uniformly.
3. There's / partition, this can be on raid6 because we don't write to
that partition that often, and some redundancy is needed to not loose
the machine upon a disk failure. We create `vg_server-root` LV on
the raid6 to ease potential future data movements to different disks.
4. There's the large 'raid0' which hosts `vg_guests-images` LV, which is
mounted on /libvirt-images for use by libvirt. This is a read-write
intensive place, so raid0 helps us to speed things up.
Manually create the br0 interface:
TODO: Not done by @praiskup but the infra team, we need a HOWTO!
General VM requirements for builder VMs
---------------------------------------
- 2xvCPU+
- 4GB+ RAM
- 16G for built results => /var/lib/copr-rpmbuild
- 4G for the root / partition
- SWAP+RAM
- 32GB tmpfs mountpoint for /var/cache/mock
- 140GB+ SWAP for tmpfs for mock root(s) => /var/lib/mock
AMD hypervisors
===============
vmhost-x86-copr0[1-4].rdu-cc.fedoraproject.org
- 256G RAM
- 64 threads (2x16 cores, AMD EPYC 7302 16-Core Processor)
This brings us 32+ (few devel) overcommitted builders on each builder:
- 32*16G = up to 512 GRAM (256 RAM + 300 G SWAP)
- 4G+16G = 20G*32 = up to 640G for root+results
- 172G*32 = up to 5.5T SWAP for tmpfs (huge overcommit, though only limitted
amount of build tasks require that much tmpfs)
Power8 hypervisors
==================
- 128G RAM
Power9 hypervisor
=================
vmhost-p09-copr01.rdu-cc.fedoraproject.org
- storage:
├─vg_guests-LogVol00 253:0 0 32G 0 lvm /
├─vg_guests-swap 253:1 0 300G 0 lvm [SWAP]
└─vg_guests-images 253:2 0 1002.6G 0 lvm /libvirt-images
- 256G RAM
- 160 threads
- Thread(s) per core: 4
Core(s) per socket: 20
Socket(s): 2
- Let's take the risk, and overload by 32 + 1 stage. So each of them has only
about 32G storage.
Overall allocation:
33*16G RAM => up to 528G used (256RAM + 300G SWAP can hold this)
33*140G disk => 4TB+ ... about 400% overcommit in the worst case
33*5 CPU = 165CPU used in the worst case