This was disabled due to a bug for some time now. Originally I
meant to turn it back on, but now I don't think I do: it makes
more sense to just keep letting the worker hosts handle disk
image building, it doesn't make any sense to have the server do
it for x86_64 but worker hosts do it for other arches. If the
server can't do it *all*, we may as well be consistent across
arches and always have the worker hosts do it.
This does mean that on initial deployment using these plays there
is a time where the server is up and running but any jobs run
that need the base disk images will fail because the worker play
won't have built them yet. But I think that's not a big problem,
and it was already the case for non-x86_64 arches anyhow.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
This provides a mechanism for deploying scratch builds, and also
for controlling whether or not to install openQA and os-autoinst
from updates-testing.
I have been doing the scratch build thing for years already, just
manually by ssh'ing into the boxes. This is getting tiring now
we have like 15 worker hosts.
The scratch build mechanism isn't properly idempotent, but fixing
that would be hard and I really only intend to use it transiently
when I'm updating the packages, so I don't think it's worth the
effort.
This also adds a notification for restarting openQA worker
services when the packages or config are updated, and fixes the
worker playbook to enable the last worker service.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
We just invented a new format for openQA templates. This makes
the openqa/server role handle loading templates in either format.
I'll remove old-format loading when we're done tweaking the new
setup and it's deployed to prod.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
We're going to test Cloud images in openQA now autocloud is
retiring. We need a cloud-init ISO to be able to boot them.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
Also tweak how we do the plugin config a bit, I don't like the
whole 'do special stuff if deployment_mode is set' thing any
more.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
In ansible 2.8 the - character isn't supposed to be valid in group names.
While we could override this, might has well just bite the bullet and change it.
So, just switch all group names to use _ instead of -
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
This is the config option added in the most recent openqa build
which prevents it cancelling an entire cluster of parallel jobs
if just one of the children fails. See:
https://progress.opensuse.org/issues/46295
Signed-off-by: Adam Williamson <awilliam@redhat.com>
This includes some tweaks to the core fedmsg roles to allow a
'generic' way of indicating that a box should use fedmsg-hub-3
not fedmsg-hub, and make the restart notification work for that.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
The UI makes this easier to investigate now, and it looks like
we could give x86_64 a bit more space, but other arches less.
So let's tweak things to do that. This should also reduce
overall usage on staging a bit, as it's up against its limits.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
1539330 is turning out to be pretty intransigent and the images
are getting very old, plus we need f28 branched base images. So
let's do image builds on the x86_64 workers for now. Also re-
enable image builds on workers - I never should've turned that
off in the first place, as they shouldn't be susceptible to the
bug.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
Staging is running out of space...let's kick it back down to
300, and also create a separate setting for update group asset
size. We test lots of updates, and for each update we only need
to upload one disk image, so we really don't need 300GB of
asset space for update job groups, that just means we'll keep
like 300 update disk images lying around. If PPC starts getting
incompletions again I'll have to, uh, do something? Yeahhh.
Something.
Actually openQA playbook already includes the apache role, which
sets up the config file. The 404s are for some other reason. But
keep the comment change.
It does, sometimes, and that probably shouldn't stop the play.
We likely already have some older images, and even if this is
the *first* creation, we can go ahead with the rest of the
deployment safely enough, and debug the image creation problem
later.