fedora-infra_ansible

mirror of https://pagure.io/fedora-infra/ansible.git synced 2026-05-01 22:11:01 +08:00

Author	SHA1	Message	Date
Adam Williamson	666196bbed	openqa/worker: don't start worker unless NFS mount is up There's this annoying pattern where the NFS mount fails on boot and then the worker services all start up and take jobs, but they instafail because the share isn't there. Ideally we could handle this very easily with Restart= directives but systemd has...opinions about this: https://github.com/systemd/systemd/issues/4468 https://github.com/systemd/systemd/issues/1312 so we have to do some fairly awkward hacks to just express: * Retry the NFS mount if it fails * Don't start the workers unless the NFS mount is up * Retry the workers after a while if they were blocked It's ugly, but in testing this same config on one worker it seems to work... Signed-off-by: Adam Williamson <awilliam@redhat.com>	2025-07-10 19:07:54 -07:00
Adam Williamson	9d931214ea	Revert "openQA: rename openvswitch bridge device to avoid conflict" This reverts commit `4dc01bc892` and a follow-up commit. I'm having trouble getting things to work and want to see if it works if we go back to having the openQA bridge be br0, and rename the bridge used for the system's bonded network connection to something else instead.	2025-07-02 17:25:18 -07:00
Adam Williamson	10b68ac01f	openqa/worker: remove old unused files Signed-off-by: Adam Williamson <awilliam@redhat.com>	2025-07-02 17:23:42 -07:00
Adam Williamson	b343d8de52	Try and fix openQA bridge config Signed-off-by: Adam Williamson <awilliam@redhat.com>	2025-07-02 16:20:39 -07:00
Adam Williamson	6606be7010	openqa tap: tell os-autoinst-openvswitch to use the right bridge Signed-off-by: Adam Williamson <awilliam@redhat.com>	2025-07-02 08:37:35 -07:00
Adam Williamson	4dc01bc892	openQA: rename openvswitch bridge device to avoid conflict On the new rdu3 worker hosts, br0 already exists and is the main system 'interface' (it's a bridge on two bonded physical interfaces connected to different switches, to make networking upgrades easier). So we can't call our openvswitch bridge 'br0' any more. Let's try calling it 'openqabr0' and see if anything explodes. Signed-off-by: Adam Williamson <awilliam@redhat.com>	2025-06-30 16:14:01 -07:00
Adam Williamson	4d801444a9	openqa: set up a side repo for prod as well as lab Sometimes we want to deploy something to prod before it goes stable (or even to u-t). Signed-off-by: Adam Williamson <awilliam@redhat.com>	2024-11-25 17:06:34 -08:00
Adam Williamson	cb026b4120	openqa/worker: kill stuck qemu processes daily This is an awful hack to deal with https://github.com/os-autoinst/os-autoinst/issues/2549 while we try and fix it properly. This finds stuck qemu processes by parsing the journal messages of the workers, and kills them. workers stuck in the broken state should then recover on the next checkin with the server. I tested this manually on all the worker hosts and it...seemed to work, mostly. I'll keep an eye on things after deploying it. Signed-off-by: Adam Williamson <awilliam@redhat.com>	2024-10-15 13:13:42 -07:00
Adam Williamson	1cb0c0cdc6	Put openqa-lab-repo.repo in worker role as well as server role Signed-off-by: Adam Williamson <awilliam@redhat.com>	2023-10-27 18:09:24 -07:00
Adam Williamson	f122367c34	openqa/worker: change name on kernel override config file It really needs to be called exactly 60-block-scheduler.rules as it's overriding a file of the same name in `/usr`. Signed-off-by: Adam Williamson <awilliam@redhat.com>	2022-08-17 14:28:18 -04:00
Adam Williamson	ca8e3db401	Add file missed from previous commit Signed-off-by: Adam Williamson <awilliam@redhat.com>	2022-08-17 13:46:16 -04:00
Adam Williamson	bf4f704096	openqa: improve how we do the git config thing The background to this is https://bugzilla.redhat.com/show_bug.cgi?id=2073414 , in response to which git was changed to die if a user runs git commands on a repo which it doesn't own. In openQA, the test directory is a git repo and openQA itself likes to run git commands on it, but this is often going to be as a different user than the owner of the directory. In fact on the worker hosts, the user that owns the directory (geekotest on the server box) doesn't even exist. This just sets the config by copying a file in place rather than running a git command (which is hard to get to be idempotent) and uses `/etc/gitconfig` so we don't wind up with a file in the _openqa-worker user's home directory, which is meant to be empty. Signed-off-by: Adam Williamson <awilliam@redhat.com>	2022-05-27 10:24:34 -07:00
Adam Williamson	224e28131d	openQA: prepare for prod deployment of latest releases This unifies prod and stg onto the ways of doing things for the latest packages, and rejigs the swtpm stuff a bit to tear down more (we shouldn't need the custom SELinux policy any more). Signed-off-by: Adam Williamson <awilliam@redhat.com>	2021-12-06 10:40:33 -08:00
Adam Williamson	13f59ad0eb	openqa/worker: have swtpm service restart on success This is because swtpm is designed not to be persistent, it's sort of tied to a single "system" (VM in this case). We can't expect an instance will stick around after it's been "used", it doesn't do that, it exits successfully. So we need to restart it when that happens. Signed-off-by: Adam Williamson <awilliam@redhat.com>	2020-09-08 10:56:12 -07:00
Adam Williamson	d9f5530046	openqa/worker: configure to use 172. IP range not 10. Signed-off-by: Adam Williamson <awilliam@redhat.com>	2020-07-23 17:27:19 -07:00
Adam Williamson	6b196e70ab	openqa/worker: set up swtpm service on tap worker hosts swtpm is a TPM emulator we want to use for testing Clevis on IoT (and potentially other things in future). We're implementing this by having os-autoinst just add the qemu args but expect swtpm itself to be running already - that's counted as the sysadmin's responsibility. My approach to this is to have openQA tap worker hosts also be tpm worker hosts, meaning they run one instance of swtpm per worker instance (as a systemd service) and are added to a 'tpm' worker class which tests can use to ensure they run on a suitably-equipped worker. This sets up all of that. We need a custom SELinux policy module to allow systemd to run swtpm - this is blocked by default. Signed-off-by: Adam Williamson <awilliam@redhat.com>	2020-06-24 16:59:11 -07:00
Adam Williamson	be6a4937ea	openqa/worker: revert br0 netmask os-autoinst really really wants it to be this. The helper service fails if it isn't. Signed-off-by: Adam Williamson <awilliam@redhat.com>	2020-06-10 17:05:24 -07:00
Adam Williamson	44434ee9fa	openqa/worker: tighten netmask for br0 tap bridge It shouldn't need anything but 10.0.2.*, and hopefully this will stop it interfering with the rest of the infra network... Signed-off-by: Adam Williamson <awilliam@redhat.com>	2020-06-09 16:31:33 -07:00
Adam Williamson	26005bf805	openqa: correct scratch repo config filename Signed-off-by: Adam Williamson <awilliam@redhat.com>	2020-04-30 12:38:48 -07:00
Adam Williamson	bb1525bdef	openqa/{server,worker}: enhance package handling This provides a mechanism for deploying scratch builds, and also for controlling whether or not to install openQA and os-autoinst from updates-testing. I have been doing the scratch build thing for years already, just manually by ssh'ing into the boxes. This is getting tiring now we have like 15 worker hosts. The scratch build mechanism isn't properly idempotent, but fixing that would be hard and I really only intend to use it transiently when I'm updating the packages, so I don't think it's worth the effort. This also adds a notification for restarting openQA worker services when the packages or config are updated, and fixes the worker playbook to enable the last worker service. Signed-off-by: Adam Williamson <awilliam@redhat.com>	2020-04-30 12:23:57 -07:00
Adam Williamson	3f284aed9e	openqa/worker: run the ppc64 script hourly The /dev/kvm permissions just seem to keep getting reset somehow and I'm sick of it. Let's see if this helps.	2017-12-21 16:55:35 -08:00
Adam Williamson	1e3d6deed8	openqa/worker: add boot script to fix KVM perms and disable SMT For some reason /dev/kvm has 0600 perms after boot on the ppc64 worker host. Also, qemu won't run unless SMT is turned off, on ppc64. I've just been doing this manually every time the box got restarted, but that's dumb, so let's make it happen on boot with a script and a service to run it. Signed-off-by: Adam Williamson <awilliam@redhat.com>	2017-09-15 17:52:15 -07:00
Adam Williamson	dbf0ac249c	Add the createhdds cronjob file to worker role	2017-08-17 14:37:37 -07:00
Adam Williamson	fa7d2529fa	openqa/worker: enable STP on bridge I think we'll need this to avoid routing loops with the tunnels.	2016-05-05 13:17:47 -07:00
Adam Williamson	9ce401e74d	use an ifup-pre-local for tap device creation holy crap, this is some ancient magic.	2016-04-27 15:46:13 -07:00
Adam Williamson	48291f1640	openqa/worker: initial attempt at openvswitch config this is highly experimental and for deployment only to stg at present...I have this stuff working on happyassassin, now trying to translate it to stg.	2016-04-27 13:32:56 -07:00

26 Commits