I broke the trigger when I switched from one container to multiple
containers in the pod. Syntax-wise, I found this multi-line variant in
the CoreOS Cincinnati deployment config, and it seems like there's not a
way to say "all container images in the spec". Or there might be, but I
couldn't find an example or documentation.
Signed-off-by: Jeremy Cline <jeremycline@linux.microsoft.com>
In the beginning, this just handled Azure images. Now it does Azure,
AWS, GCP, and containers. Currently, it processes images serially, which
is mostly okay. However, it does mean that whatever service is handled
last has to wait for all the others to succeed before it starts, and it
also means if any of the handlers for their respective platform fail, it
retries *all* the images again. For most things this is a no-op (or a
few inexpensive calls), but it does have to re-download the image from
Koji to checksum it.
This adds an AMQP message queue for each content type we handle, and
produces a fedora-messaging config for each content type. The deployment
is now made up of 4 containers: azure-image-uploader,
aws-image-uploader, container-image-uploader, and
google-cloud-image-uploader. They only differ in the secrets injected
into them and the fedora-messaging config file they use. The end result
is that images should be available faster and its more resilient to
remote services being down.
Finally, it's worth noting that this bumps the warning threshold for
queue sizes. It can take some services (Azure and AWS) upwards of 30
minutes to replicate the images around the world, and since we subscribe
to _any_ compose status changes, it's not unreasonable for 5-10 messages
to stack up when we hit a compose change that is "FINISHED" with images.
Signed-off-by: Jeremy Cline <jeremycline@linux.microsoft.com>
We finally merged https://pagure.io/fedora-iot/pungi-iot/pull-request/102
which changes the properties of the container images built in the
IoT compose. This should adjust to that and publish both the base
and IoT images, if we got it all right.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
I used the wrong name for the file being mounted in the volume. This
will fix the image uploader boot-looping in staging.
Signed-off-by: Jeremy Cline <jeremycline@linux.microsoft.com>
Now that the F41 freeze is over, switch container pushes over to
fedora-image-uploader for the remaining repositories.
This also renames the Onyx and Sericea repositories to make it clear
what they are.
Signed-off-by: Jeremy Cline <jeremycline@linux.microsoft.com>
We've finally ironed out the issues in stage with this, so this is the
configuration to enable it in production.
This should be rolled out in conjunction with disabling the bash script
that currently handles image pushes.
This reverts commit 15dbcbb7ac, which was
a revert of commit 5e0ad1134d (pr #2200).
Adam Williamson got rid of the need for buildah so hopefully we won't
run into lots of permission issues.
The AMI description setting wasn't actually being used.
More importantly, however, is the AMI volume size is now also unused.
The reason for this is that when we import the image, the default is to
use the snapshot's size as the volume size, but you can optionally set
it to something else. AWS pre-allocates volumes of a couple different
sizes (currently 1G, 5G, 8G, and 10G).
Folks building the image set the size to be 5G, but this setting
(carried over from fedimg) overrode it. Dropping it lets them control
the AMI size by adjusting how big the images are. Probably not optimal
for upload speed, but less confusing since there are fewer configuration
layers people might not be aware of.
This reverts commit 5e0ad1134d (pr #2200).
Unfortunately, using buildah inside an unprivileged OpenShift container
turns out to not be very simple, even though we're not building any
containers, just importing and pushing them.
We can either figure out how to make it work with OpenShift (and while
it is definitely possible, I don't know if folks are okay with the
compromises that might be required) or deploy it in a VM for now.
In the meantime, the staging container is bootlooping so I'd like to
back this configuration out for the sake of my inbox.
Upload images to the stage registry. Rather than massaging the
credentials into the format written out by podman-login, just pass the
credentials in and have the app run podman-login with them. The
configuration includes the registry along with the prefix used for the
environment variables containing the credentials.
I had hoped to get some feedback on the message schema, but perhaps the
easiest way is to publish these and figure out if anything is missing or
wrong. We can always update the schema.
Now that AWS image uploads work in staging, enable them in production
via the config. A functional build is already deployed to OpenShift in
prod, but since there is no AWS config it won't do anything until we
merge and deploy this.
The service will soon emit messages when new images are uploaded. This
grants it access to publish under the fedora_image_uploader topic.
Specific topics under the org.fedoraproject.prod prefix look like:
fedora_image_uploader.azure.Fedora-Cloud-40.aarch64
fedora_image_uploader.container.fedora.f40
The Python package was renamed[0] upstream. Because the container
contains a default CMD that was updated to reference the new callback
path, the only thing that broke in production was the logging.
[0] https://pagure.io/cloud-image-uploader/pull-request/15
The cloud-image-uploader uses Pungi compose messages starting with
v0.3.0. This switches the routing keys and also adds a one-off task to
remove the queue before re-adding it to flush out any queued up messages
and remove the old topic bindings.
The latest import failed due to "ERROR: Unable to create local
directories(/.ansible/tmp): [Errno 13] Permission denied: b'/.ansible'".
Which implies the code is being executed from `/`, despite the WORKDIR
variable being set in the container image--I suspect this is a quirk of
kube/openshift that was not expected.
This change sets the workingDir to /srv/cloud-uploader, as specified in
the Containerfile, which should resolve the execution error.
Signed-off-by: Neil Hanlon <neil@shrug.pw>
env renders to "production" which is not what messages are published
under ("prod"). Match what other apps are doing and just use a wildcard
so it'll match anything. Since prod and stage are separate brokers this
is fine.
The image needs to be replicated to a region to be usable in that
region. It's likely we'll want to expand this list and potentially add
logic to the uploader to not replicate nightly images until they are
promoted to the latest image in the stream so I've templated it it
in the configuration.
Storage account names need to be globally unique. It seems fedoraimages
was already taken, so I've adjusted it to one that's not taken. It's
only used to import the images so the name doesn't really matter.
I assumed gallery names were unique per resource group, but this is not
the case. They're unique per subscription, oddly, so we need to use a
different name in staging.
The client certificate contains "cloud-image-uploader.stg" for the CN,
so our RabbitMQ name needs to match. Additionally, the queue name needs
to start with the username, so we need to adjust that as well.