Commit graph

94 commits

Author SHA1 Message Date
961441cab0 Reliably perform the background processing for build events
Insert the build ids to a table and remove the ids once processed. If the data
service is restarted and the in memory queue is lost, process all the items
from the database queue.
2025-05-25 13:08:27 +01:00
62d6b59013 Make build_servers.id just generated by default
As to make it easier to set explicitly.
2024-12-15 20:54:11 +00:00
4ecd20b973 Add an index on builds.build_server_build_id
As this is used when rendering builds.
2024-09-07 18:45:26 +01:00
5f80856ad1 Allow prioritising jobs from specific repositories
To help the QA data service prioritise branches over patches.
2024-03-07 10:02:18 +00:00
10bad53ad5 Support polling git repositories for new branches/revisions
This is mostly a workaround for the occasional problems with the guix-commits
mailing list, as it can break and then the data service doesn't learn about
new revisions until the problem is fixed.

I think it's still a generally good feature though, and allows deploying the
data service without it consuming emails to learn about new revisions, and is
a step towards integrating some kind of way of notifying the data service to
poll.
2023-10-09 22:19:02 +01:00
1461aa037f Make database changes to help with deleting entries in the nars table 2023-08-01 14:12:48 +01:00
e39c9da028 Store the distribution of derivations related to packages
This might be generally useful, but I've been looking at it as it offers a way
to try and improve query performance when you want to select all the
derivations related to the packages for a revision.

The data looks like this (for a specified system and target):

┌───────┬───────┐
│ level │ count │
├───────┼───────┤
│    15 │     2 │
│    14 │     3 │
│    13 │     3 │
│    12 │     3 │
│    11 │    14 │
│    10 │    25 │
│     9 │    44 │
│     8 │    91 │
│     7 │  1084 │
│     6 │   311 │
│     5 │   432 │
│     4 │   515 │
│     3 │   548 │
│     2 │  2201 │
│     1 │ 21162 │
│     0 │ 22310 │
└───────┴───────┘

Level 0 reflects the number of packages. Level 1 is similar as you have all
the derivations for the package origins. The remaining levels contain less
packages since it's mostly just derivations involved in bootstrapping.

When using a recursive CTE to collect all the derivations, PostgreSQL assumes
that the each derivation has the same number of inputs, and this leads to a
large overestimation of the number of derivations per a revision. This in turn
can lead to PostgreSQL picking a slower way of running the query.

When it's known how many new derivations you should see at each level, it's
possible to inform PostgreSQL this by using LIMIT's at various points in the
query. This reassures the query planner that it's not going to be handling
lots of rows and helps it make better decisions about how to execute the
query.
2023-03-09 08:29:39 +00:00
a6f239fb8a Add an extra index on blocked_builds
I think this might help with queries that don't use the build_server_id.
2023-03-05 10:21:14 +00:00
fc5f562731 Add index on derivation_outputs id and derivation_id fields
This might help with doing recursive queries on the derivations graph.
2022-11-12 10:42:04 +00:00
fb9d99a076 Add extended statistics on package_derivations
This helps row count estimates when filtering on system_id and target.
2022-11-12 10:40:43 +00:00
1fb291be40 Add support for incrementally tracking blocked builds
This will hopefully provide a less expensive way of finding out if a scheduled
build is probably blocked by other builds failing or being canceled.

By working this out when the build events are recieved, it should be more
feasible to include information about whether builds are likely blocked or not
in various places (e.g. revision comparisons).
2022-11-11 10:35:09 +00:00
e3062abf33 Speed up finding the locales for a revision 2022-09-14 10:26:03 +01:00
ad39492cf0 Support per repository configuration for fetching substitutes
This is so that data.qa.guix.gnu.org can be configured only to query the
branches from the main repository.
2022-09-09 12:39:38 +01:00
d19eb07138 Add an index on git_commits
And also remove the duplicates that have crept in.
2022-06-17 12:54:43 +01:00
3b5950024b Add an index on package_derivations_by_guix_revision_range
This helps render the package version range related pages.
2022-06-17 12:51:10 +01:00
64be52844e Partition the package_derivations_by_guix_revision_range table
And create a proper git_branches table in the process.

I'm hoping this will help with slow deletions from the
package_derivations_by_guix_revision_range table in the case where there are
lots of branches, since it'll separate the data for one branch from another.

These migrations will remove the existing data, so
rebuild-package-derivations-table will currently need manually running to
regenerate it.
2022-05-23 19:10:25 +01:00
5cefa250d8 Add an index on derivation_output_details.id
Where hash is not null, since this helps with the fixed output package
derivations query.
2021-07-11 15:25:12 +01:00
af209170f7 Track package replacements
Start at least looking for package replacements, and storing the
details (particularly the derivation). I'm looking at doing this so that build
servers using the Guix Data Service can build these derivations.
2021-07-11 11:57:05 +01:00
Canan Talayhan
9c1b989d20 Add index for package_metadata
This index speeds up inserting new entries to the package_metadata table.

Signed-off-by: Christopher Baines <mail@cbaines.net>
2021-05-27 21:26:56 +01:00
f4a87ec059 Add some indexes
This is from reading the slow query log on data.guix.gnu.org.
2021-05-17 18:40:45 +01:00
b430c632b7 Create a table for systems
And use it for the systems in the derivations and package derivations tables.

The primary motivation here is to allow quickly working out what systems the
database contains, and having a small table with just the right data seems a
good way to do that.
2021-04-23 11:14:51 +01:00
7fbcb3a3c2 Store channel instance derivations in a separate transaction
This means that these derivations are stored, even if a later part of the
process fails. Having the channel instance derivations stored might help work
out why the failure occurred, or better display information about it.
2021-02-02 23:36:56 +00:00
6e390c9b9d Increase statistics targets for the derivation_inputs table
It has a lot of rows, and this seems to help the query planning for
select-required-builds-that-failed, as I think it helps PostgreSQL better
estimate the row counts that different parts of the query will produce.
2020-12-27 10:37:06 +00:00
67f1ace9eb Add index to speed up deleting revisions
This is necessary to quickly delete entries from package_derivaitons.
2020-10-27 16:59:09 +00:00
6a04d474c0 Regenerate the latest_build_status table
As it doesn't handle wierdness for statuses, like builds being canceled with a
timestamp of 0.
2020-10-21 19:46:27 +01:00
83884ed2ea Add a latest_build_status table
This will avoid many queries trying to figure out what the latest build status
is, which will hopefuly simplify queries as well as improving performance.
2020-10-13 19:31:43 +01:00
9f9c23010e Support storing whether to authenticate commits by git repository
To allow using the Guix Data Service when commits aren't authenticated.
2020-10-07 18:50:13 +01:00
e2a7705d3d Add an index for derivation_sources.derivation_source_file_id
As this speeds up deleting derivation_source_files.
2020-10-02 20:15:23 +01:00
841f5fb186 Change a constraint to add ON DELETE CASCADE
I've not used these in many places, to try and avoid hiding deleting data, but
in this case, this will allow more easily deleting the derivation source file
nars, by just deleting the derivation_source_files table entry.
2020-10-02 20:15:10 +01:00
6e0e33addf Change the autovacuum config for some tables
Looking at data for the the patches deployment of the Guix Data Service, these
tables look like they might benefit from vacuuming/analyzing more often, so
adjust the configuration so this will hopefully happen.
2020-10-01 22:30:39 +01:00
53341c70fc Change the locale codeset representation
From the normalized one, to the one actually contained within glibc. Recent
versions of glibc also contain symlinks linking the normalized codeset to the
locales with the .UTF-8 ending, but older ones do not.

Maybe handling codeset normalisation for queries would be good, but the locale
values ending in .UTF-8 are more compatible and allow the code to be
simplified. For querying, maybe there should be a locales table which handles
different representations.
2020-09-26 11:45:57 +01:00
c7048abe31 Change a package_descriptions index to use MD5 of the description
To avoid the index data being too large.

This was first seen in guix commit 1bb4fd64b7bbe5a17eda6f0ca8105283c038f7c8:

  psql-query-error (fatal-error PGRES_FATAL_ERROR ERROR: index row size 2808
  exceeds maximum 2712 for index "package_descriptions_locale_description_key"
  HINT: Values larger than 1/3 of a buffer page cannot be indexed.
2020-08-23 14:42:32 +01:00
Danjela Lura
ee613cdb30 Add the package_metadata_tsvectors table
Signed-off-by: Christopher Baines <mail@cbaines.net>
2020-08-12 08:53:35 +01:00
5dc9ed65ff Add a index to derivation_output_details_sets
This helps with queries like:

  SELECT *
  FROM derivation_output_details_sets
  WHERE ARRAY[5563227] <@ derivation_output_details_ids;
2020-07-04 08:59:29 +01:00
0242bc6a2e Remove the constraint on having one derivation build per server
As the Guix Build Coordinator can build one derivation many times.
2020-07-01 09:35:40 +01:00
5ee8207565 Avoid duplicate build status entries
This effectively makes adding build status entries idempotent to some degree.
2020-07-01 09:35:29 +01:00
ddbe4c2192 Support storing the id's representing builds in other services
Like Cuirass, or the Guix Data Service. This isn't so necessary with Cuirass,
as the derivation/output are unique identifiers, but for the Guix Build
Coordinator, one derivation/output can be built many times, with potentially
differing results.
2020-06-27 14:05:51 +01:00
1d48885e6a Make build_status.timestamp nullable
The Guix Build Coordinator doesn't record timestamps for events
currently. This is something to fix, but allow for build statuses without
timestamps as they're not necessary.
2020-06-27 14:05:51 +01:00
Danjela Lura
8d9a4d105c Get the translated package synopsis and descriptions into the database
Signed-off-by: Christopher Baines <mail@cbaines.net>
2020-06-19 16:03:32 +01:00
Danjela Lura
f67bea719d Get the translated lint checker descriptions into the database
Signed-off-by: Christopher Baines <mail@cbaines.net>
2020-06-07 20:17:03 +01:00
916b594473 Make nar_urls file sizes optional 2020-06-03 06:56:59 +01:00
b6754c8a4c Add a lookup_builds field to the build_servers table
This is to allow for build servers where only the substitutes should be
queried, and it shouldn't be assumed that they're running Cuirass.
2020-05-24 17:02:53 +01:00
111576e3f9 Remove odd package derivations with mismatched systems
This follows on from 79993bfaef, disassociating
the relevant derivations from the revisions.
2020-04-24 21:47:06 +01:00
776d0462e9 Fix a missing ; in a Sqitch migration 2020-03-29 12:05:00 +01:00
b30904e1af Make jobs unique by commit
Given that a commit, regardless of what repository it comes from should
contain the same exactly the same data, just track jobs by commit.
2020-03-28 19:59:35 +00:00
0c4e6a2fe4 Create an index on narinfo_fetch_records
This greatly improves the performance of the derivation-outputs page.
2020-03-25 19:11:03 +00:00
ca0d3ee754 Stop using package_versions_by_guix_revision_range
It's been replaced by the package_derivations_by_guix_revision_range table.
2020-03-24 20:44:57 +00:00
c44297b615 Generate and store system test derivations for all supported systems
Rather than just the native system. I'm not quite sure of the value here, as I
guess system tests should behave the same regardless of the way the software
is compiled, but this seems like it could be useful, and being explicit about
the system the derivation is for is good.
2020-03-20 08:28:55 +00:00
e5a6680808 Make a constraint deferrable
To allow for deferring it in a transaction when deleting derivations.
2020-02-16 22:28:37 +00:00
dfccbf911c Update the build_servers_build_config values
Change target to '' if it matches system, as this matches the updated way of
storing derivations.
2020-02-13 23:11:36 +00:00