This should help with query performance, as the recursive queries using
derivation_inputs and derivation_outputs are particularly sensitive to the
n_distinct values for these tables.
This means that an output is viewed to not be blocking if it has a scheduled
build, just as if it has a succeeded build. Also, scheduling builds will
unblock blocked builds.
This is helpful as it means that it reduces noise for blocking builds.
This will hopefully provide a less expensive way of finding out if a scheduled
build is probably blocked by other builds failing or being canceled.
By working this out when the build events are recieved, it should be more
feasible to include information about whether builds are likely blocked or not
in various places (e.g. revision comparisons).
The build event information can now contain the derivation outputs, as well as
the name of the derivation. This allows the Guix Data Service to join builds
up with derivations, even if it doesn't know about the derivation being built.
This means you can query for derivations where builds exist or don't exist on
a given build server.
I think this will come in useful when submitting builds from a Guix Data
Service instance.
And create a proper git_branches table in the process.
I'm hoping this will help with slow deletions from the
package_derivations_by_guix_revision_range table in the case where there are
lots of branches, since it'll separate the data for one branch from another.
These migrations will remove the existing data, so
rebuild-package-derivations-table will currently need manually running to
regenerate it.
To the end of the main revision processing transaction.
Currently, I think there are issues when this query does update some builds,
as those rows in the build table remain locked until the end of the
transaction. This then causes build event submission to hang. Moving this part
of the revision loading process to the end of the transaction should help to
mitigate this.
Switch from using a recursive query to doing a breath first search through the
graph of derivations, as I think PostgreSQL wasn't doing a great job of
planning the recursive queries (it would overestimate the rows involved, and
prefer sequential scans for the derivation_outputs table).
Previously it would compute a long list of strings, potentially more than
100,000 elements long, then split this string up and insert it in chunks. Only
then could memory be freed.
This new approach builds the strings in batches for the insertion query, then
moves on to the next batch. This should mean that more memory can be freed and
reused along the way.