Using a memory database causes tests/test_build to intermittently
fail, because using the same pysqlite3 connection object from multiple
threads - as was done so that the threads shared the same memory database
- is not, in the end, thread safe. One thread will stomp on the transaction
state of other threads, resulting in errors from starting a new transaction
when another is already in progress, or trying to commit a transaction
that is not in progress.
To avoid a significant speed penalty, the session-scope fixture sets up
a database in the pytest temporary directory, which will typically be on
tmpfs. Time to complete all tests:
memory backend: 38 seconds
file on tmpfs: 40 seconds
file on nvme ssd with btrfs: 137 seconds
MBSSQLAlchemy, which attempted to make the memory backend work, is removed.
Session hooks are installed on the Session class rather than on the
scoped_session instance - this works better when we're changing from
one database to another at test setup time.
No implementations of MBS are using Greenwave, and there are no current plans
to do so. Koji Resolver will be sufficient for any usecase dependent on gating.
There is a need to rebuild the module builds done in CentOS 9 Stream
internally in MBS to include them in RHEL. This is currenly a hard task,
because the RPM components included in a module are usually
taken from HEAD of the branch defined by their `ref` value.
For the rebuild task, it means we would have to ensure that the HEAD
of all RPM components points to right commit hash right before we start
rebuilding CentOS 9 Stream module in internal MBS. This is very hard
and fragile thing to do, especially if there are two different modules
using the RPM component from the same branch. This is prone to race
condition and makes the rebuilds quite complex and in some cases
not possible to do without force pushes to RPM component repositories
which is not acceptable by internal dist-git policy.
This commit fixes it by allowing overriding the commit hash while
submitting the module build. This helps in the mentioned situation,
because we can keep internal RPM components branches in 1:1 sync with
CentOS 9 Stream branches and HEAD can always point to the same commit
in both internal and CentOS 9 Stream repositories.
When the module rebuild is submitted in internal MBS,
we can use this new feature to override the `ref` for each RPM component
so it points to particular commit and the requirement for HEAD to point
to this commit is no longer there.
The `ref` is overriden only internally in MBS (but it is recorded in logs
and in XMD section), so the input modulemd file is not altered. This is
the same logic as used for other overrides (`buildrequire_overrides` or
`side_tag`).
This does not bring any security problem, because it is already possible
to use commit hash in `ref`, so the package maintainer can already change
the commit hash to any particular commit by using this `ref` value.
Signed-off-by: Jan Kaluza <jkaluza@redhat.com>
This works around a case where tagging messages are missed for a build
with high reuse. In such a case, we can start out in the final batch,
but have incorrect tag state for reused components from previous
batches.
Since ComponentBuildTrace(s) get created with db_session.commit() call,
is is not possible to commit more items in bulk if they already have been flushed.
Current unit-tests' setup can be significantly sped up if items can be quickly
flushed on the fly and bulk-commited only once at the end. Moreover in general it
seems more appropriate/safer to handle this in before_flush as any implicit
or accidental flush could cause new build traces not to be created at all. As flush
is implicitly called before every commit anyway, this change shouldn't pose any harm.
This function could get called multiple times if the init handler
runs more than once. This can happen if the build failed in the
init handler due to external infrastructure being down and the
user resumes their build.
Commit 98b1ac79 ensures the message is sent after data changes are
committed into database. Hence, it is doable to remove these two
workarounds.
Signed-off-by: Chenxiong Qi <cqi@redhat.com>
In MBS, there are two cases to send a message when a module build moves
to a new state. One is to create a new module build, with
ModuleBuild.create particularly, when user submit a module build.
Another one is to transition a module build to a new state with
ModuleBuild.transition. This commit handles these two cases in a little
different ways.
For the former, existing code is refactored by moving the publish call
outside ModuleBuild.create.
For the latter, message is sent in a hook of SQLAlchemy ORM event
after_commit rather than immediately inside the ModuleBuild.transition.
Both of these changes ensure the message is sent after the changes are
committed into database successfully. Then, the backend can have
confidence that the database has the module build data when receive a
message.
Signed-off-by: Chenxiong Qi <cqi@redhat.com>
This also includes `from __future__ import absolute_import`
in every file so that the imports are consistent in Python 2 and 3.
The Python 2 tests fail without this.
This moves the code used by the backend and API to common/submit.py,
the code used just by the API to web/submit.py, and the code used
just by the backend to scheduler/submit.py.
This puts backend specific code in either the builder or scheduler
subpackage. This puts API specific code in the new web subpackage.
Lastly, any code shared between the API and backend is placed in the
common subpackage.
The following handler arguments are not used at all:
1. `build_id` in handlers/components.py:build_task_finalize
2. `build_name` in handlers/tags.py:tagged
Add route_task function to route celery tasks to different queues.
If we can figure out what the module build is a task ran for by
checking the task arguments, then we route this task to a queue
named:
"mbs-{}".format(module_build_id % num_workers)
"num_workers" has default value of 1, and can be changed in
backend_config.py. If module build id can't be figured out, task will
be routed to the default queue which is named "mbs-default".
While setting up the workers, the number of workers should match with
"num_workers" in config, and each worker will listen on two queues:
1. mbs-default
2. mbs-{number} # for example, the first worker listens on "mbs-0"
By this design, all tasks for a particular module build will be routed
to the same queue and run on the same worker serially.
Poller methods within original class MBSProducer become module level
functions and are registered as Celery periodic tasks.
Code logging the size of fedmsg-hub queue are removed from log_summary.
process_open_component_builds is still kept there and not converted to a
periodic task.
There are some small refactor:
* do not format string in logging method call.
* reformat some lines of code doing SQLAlchemy database query to make
them more readable.
Signed-off-by: Chenxiong Qi <cqi@redhat.com>
We started using `events.scheduler` to plan fake events instead of internal
queue implemented by fedmsg-hub. Thanks to that, we can stop using fedmsg-hub
completely for local builds.
In this commit, the `scheduler.local.main` is rewritten to not use fedmsg-hub.
The original `scheduler.local.main` is still needed by test_build tests and
therefore it is moved there.
To support multiple backend, we need to get rid of `further_work` concept
which is used in multiple places in the MBS code. Before this commit, if
one handler wanted to execute another handler, it planned this work by
constructing fake message and returning it. MBSConsumer then planned
its execution by adding it into the event loop.
In this commit, the new `events.scheduler` instance of new Scheduler
class is used to acomplish this. If handler wants to execute another
handler, it simply schedules it using `events.scheduler.add` method.
In the end of each handler, the `events.scheduler.run` method is
executed which calls all the scheduled handlers.
The idea is that when Celery is enabled, we can change the
`Scheduler.run` method to execute the handlers using the Celery, while
during the local builds, we could execute them directly without Celery.
Use of Scheduler also fixes the issue with ordering of such calls. If
we would call the handlers directly, they could have been executed
in the middle of another handler leading to behavior incompatible
with the current `further_work` concept. Using the Scheduler, these
calls are executed always in the end of the handler no matter when
they have been scheduled.