In ansible 2.8 the - character isn't supposed to be valid in group names.
While we could override this, might has well just bite the bullet and change it.
So, just switch all group names to use _ instead of -
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
This is the config option added in the most recent openqa build
which prevents it cancelling an entire cluster of parallel jobs
if just one of the children fails. See:
https://progress.opensuse.org/issues/46295
Signed-off-by: Adam Williamson <awilliam@redhat.com>
This includes some tweaks to the core fedmsg roles to allow a
'generic' way of indicating that a box should use fedmsg-hub-3
not fedmsg-hub, and make the restart notification work for that.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
The UI makes this easier to investigate now, and it looks like
we could give x86_64 a bit more space, but other arches less.
So let's tweak things to do that. This should also reduce
overall usage on staging a bit, as it's up against its limits.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
1539330 is turning out to be pretty intransigent and the images
are getting very old, plus we need f28 branched base images. So
let's do image builds on the x86_64 workers for now. Also re-
enable image builds on workers - I never should've turned that
off in the first place, as they shouldn't be susceptible to the
bug.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
Staging is running out of space...let's kick it back down to
300, and also create a separate setting for update group asset
size. We test lots of updates, and for each update we only need
to upload one disk image, so we really don't need 300GB of
asset space for update job groups, that just means we'll keep
like 300 update disk images lying around. If PPC starts getting
incompletions again I'll have to, uh, do something? Yeahhh.
Something.
Actually openQA playbook already includes the apache role, which
sets up the config file. The 404s are for some other reason. But
keep the comment change.
It does, sometimes, and that probably shouldn't stop the play.
We likely already have some older images, and even if this is
the *first* creation, we can go ahead with the rest of the
deployment safely enough, and debug the image creation problem
later.
The capability to handle a variance between prod and staging
here is just temporary while I'm testing the new fixed asset
handling stuff by deploying it on staging. Once it's tested
and merged we'll just have prod and staging do the same thing.
But for now we need to cleanly handle them having the static
disk images in different places.
I've enhanced `createhdds check` to exit 1 if all images are
present but some are old, and 2 if any images are missing. We
use this to only create images if any are missing here in the
play; we rely on the daily cron job to rebuild old images.
This is kind of a band-aid for a weird issue on openqa01 where
virt-install runs just don't seem to work properly after the
box has been running for a while, so createhdds doesn't actually
work and any playbook run gets hung up on it for a long time.
This doesn't fix that, but does at least mean we can run the
playbook without being bothered by it. To get createhdds to run
properly and actually regenerate the outdated images, we have
to reboot the system and run it right away, it seems to work
fine right after the system boots up.
we need to install some additional packages for the revised
createhdds (but we no longer need pexpect), and ensure libvirtd
is running before running createhdds.
it's not really fatal when it fails (except on first deployment)
and nothing else later depends on it, so we can go ahead and
continue the run even if it fails
instead of just relying on it getting run when we do an
ansible run, since that's intermittent and it's annoying
when you want to do an ansible run and it sits there for
hours creating disk images. This way we'll know they'll
get updated regularly and ansible runs should never get
blocked on image creation, though we still do it in the
ansible plays just in case (and for initial deployment).
This should now be safe, with the recent changes to make it
time out gracefully and run atomically. We also use withlock
to make sure we don't stack jobs.
this will only work with the new openqa package builds I just
did, but won't break anything with older ones. With a new enough
openQA package, it'll prevent the web UI from showing download
links for ISOs and HDD files.