guix-data-service/scripts
Christopher Baines e39c9da028 Store the distribution of derivations related to packages
This might be generally useful, but I've been looking at it as it offers a way
to try and improve query performance when you want to select all the
derivations related to the packages for a revision.

The data looks like this (for a specified system and target):

┌───────┬───────┐
│ level │ count │
├───────┼───────┤
│    15 │     2 │
│    14 │     3 │
│    13 │     3 │
│    12 │     3 │
│    11 │    14 │
│    10 │    25 │
│     9 │    44 │
│     8 │    91 │
│     7 │  1084 │
│     6 │   311 │
│     5 │   432 │
│     4 │   515 │
│     3 │   548 │
│     2 │  2201 │
│     1 │ 21162 │
│     0 │ 22310 │
└───────┴───────┘

Level 0 reflects the number of packages. Level 1 is similar as you have all
the derivations for the package origins. The remaining levels contain less
packages since it's mostly just derivations involved in bootstrapping.

When using a recursive CTE to collect all the derivations, PostgreSQL assumes
that the each derivation has the same number of inputs, and this leads to a
large overestimation of the number of derivations per a revision. This in turn
can lead to PostgreSQL picking a slower way of running the query.

When it's known how many new derivations you should see at each level, it's
possible to inform PostgreSQL this by using LIMIT's at various points in the
query. This reassures the query planner that it's not going to be handling
lots of rows and helps it make better decisions about how to execute the
query.
2023-03-09 08:29:39 +00:00
..
guix-data-service-backup-database Allow customising the pg_dump command used 2021-01-03 19:05:41 +00:00
guix-data-service-create-small-backup Fix more issues with the git_commits introduction 2022-05-23 22:49:51 +01:00
guix-data-service-manage-build-servers.in Add a lookup_builds field to the build_servers table 2020-05-24 17:02:53 +01:00
guix-data-service-process-branch-updated-email.in Switch to processing emails as bytevectors 2019-09-26 18:08:12 +01:00
guix-data-service-process-branch-updated-mbox.in Warn if process-branch-updated-mbox won't match any emails 2020-02-01 14:03:26 +01:00
guix-data-service-process-job.in Allow skipping processing system tests 2023-02-08 14:56:48 +00:00
guix-data-service-process-jobs.in Allow skipping processing system tests 2023-02-08 14:56:48 +00:00
guix-data-service-query-build-servers.in Support not querying pending builds 2020-11-01 22:52:53 +00:00
guix-data-service-query-substitute-servers.in Split out querying of build servers and substitute servers 2020-05-03 13:23:43 +01:00
guix-data-service.in Store the distribution of derivations related to packages 2023-03-09 08:29:39 +00:00