Commit graph

107 commits

Author SHA1 Message Date
f66ff9a3ff Rework how data is inserted
This is the big change needed to allow for parallel revision
processing. Previously, a lock was used to prevent this since the parallel
transactions could deadlock if each inserted data that the other then went to
insert.

By defining the order in which inserts happen, both in terms of the order of
tables, and the order of rows within the table, this change should guarantee
that there won't be deadlocks.

I'm also hoping this change will address whatever issue was causing some
derivation data to be missing from the database.
2026-02-05 12:23:10 +00:00
2073e446b7 Tweak the job script
Don't use with-fluids (I forget why), and enable core dumps.
2025-07-09 12:50:24 +01:00
4210699949 Don't use the after-gc-hook for monitoring garbage collection
This seems to be happening not in the thread I expect, so avoid using the
hook.
2025-06-29 23:22:31 +02:00
fc6f78ca9a Move the gc watcher to start earlier
This means it doesn't use the fibers sleep, don't know if this makes a
difference.
2025-06-29 21:30:29 +02:00
7c0779519b Allow specifying a limit to inferior memory usage
To help manage the inferiors that use gigabytes of memory while computing
derivations.
2025-06-28 09:29:10 +02:00
0dd14c0a67 Use drain? #t for fibers when loading revisions
To check that there's no left over fibers.
2025-06-28 09:29:10 +02:00
0e09e5af2e Shift setup and add more logging for polling git repositories 2025-05-25 16:10:58 +01:00
5717ce82ce Move away from cgit to more flexible linking to repositories 2025-05-25 13:08:27 +01:00
2eb5714829 Use = when comparing numbers 2025-03-11 17:15:00 +00:00
f56cae63fc Fix the --repl option 2025-03-10 08:27:42 +00:00
e591346684 Use with-exception-handler in place of with-throw-handler 2025-02-25 10:38:10 +00:00
37b7c568ed Make the job timeout configurable 2025-02-10 11:05:07 +00:00
931b7bc593 Add a slightly crude method to ignore systems and targets
While processing a revision. It would be good to also record what systems and
targets are in the platforms so it's clear what data is missing, but that can
be added later.
2025-02-03 22:59:34 +00:00
acdedb075d Ensure COLUMNS is set 2025-02-03 13:06:45 +01:00
c58ee6726b Fix starting with an empty database 2024-11-08 12:52:18 +00:00
f6eadb0b16 Make the free space requirement configurable 2024-08-20 09:49:41 +01:00
31bd2156f7 Support setting environment variables in the inferior
When processing jobs, this is mostly to allow setting GUIX_DOWNLOAD_METHODS.
2024-06-24 23:02:14 +01:00
7f5f11048b Add error handling for startup failures 2024-04-02 12:16:27 +01:00
b5f59189e1 Move backfilling in to the server module and use the connection pool
To avoid using the old PostgreSQL connection per thread code.
2024-04-01 21:51:29 +01:00
ca69d3329d Add exception handling to the process-jobs script
As I'm seeing this exit on beid, but I'm not sure why.
2024-03-05 10:57:41 +00:00
a900a1c2ec Remove drain? #t from process job
As it now uses more fibers.
2024-01-18 22:41:02 +00:00
c1d2f3a1b7 Add meaningful parallelism to processing jobs
Make parallel use of inferiors when computing channel instance derivations,
and when extracting information about a revision. This should allow for some
horizontal scalability, reducing the impact of additional systems for which
derivations need computing.

This commit also fixes an apparent issue with package replacements, as
previously the wrong id was used, and this hid some issues around
deduplication.
2024-01-18 15:34:40 +00:00
a3ec1f326d Set %file-port-name-canonicalization when processing jobs
Just in case this helps with performance.
2023-12-04 11:06:27 +00:00
c3cb04cb80 Use fibers when processing new revisions
Just have one fiber at the moment, but this will enable using fibers for
parallelism in the future.

Fibers seemed to cause problems with the logging setup, which was a bit odd in
the first place. So move logging to the parent process which is better anyway.
2023-11-05 13:46:20 +00:00
10bad53ad5 Support polling git repositories for new branches/revisions
This is mostly a workaround for the occasional problems with the guix-commits
mailing list, as it can break and then the data service doesn't learn about
new revisions until the problem is fixed.

I think it's still a generally good feature though, and allows deploying the
data service without it consuming emails to learn about new revisions, and is
a step towards integrating some kind of way of notifying the data service to
poll.
2023-10-09 22:19:02 +01:00
7251c7d653 Stop using a pool of threads for database operations
Now that squee cooperates with suspendable ports, this is unnecessary. Use a
connection pool to still support running queries in parallel using multiple
connections.
2023-07-10 18:56:31 +01:00
29d49ba31a Detach the database setup from the main guix-data-service process
This will allow restarting them independently, leaving it up to the operator
to ensure that all processes are compatible.
2023-06-09 16:11:06 +01:00
5c9ec28cb5 Query for outputs when build events arrive
This will keep the substitute information more up to date.
2023-06-09 16:11:06 +01:00
688f4cd79d Set request timeouts for the thread pools
The request timeout should ensure that the operations don't back up if the
thread pool is overloaded.
2023-04-27 14:58:47 +02:00
9f080524bc Split the thread pool used for database connections
In to two thread pools, a default one, and one reserved for essential
functionality.

There are some pages that use slow queries, so this should help stop those
pages block other operations.
2023-04-27 10:31:09 +02:00
519f0c6f67 Defer backfilling derivation distribution counts until later
After the migrations have run.
2023-03-09 09:39:47 +00:00
e39c9da028 Store the distribution of derivations related to packages
This might be generally useful, but I've been looking at it as it offers a way
to try and improve query performance when you want to select all the
derivations related to the packages for a revision.

The data looks like this (for a specified system and target):

┌───────┬───────┐
│ level │ count │
├───────┼───────┤
│    15 │     2 │
│    14 │     3 │
│    13 │     3 │
│    12 │     3 │
│    11 │    14 │
│    10 │    25 │
│     9 │    44 │
│     8 │    91 │
│     7 │  1084 │
│     6 │   311 │
│     5 │   432 │
│     4 │   515 │
│     3 │   548 │
│     2 │  2201 │
│     1 │ 21162 │
│     0 │ 22310 │
└───────┴───────┘

Level 0 reflects the number of packages. Level 1 is similar as you have all
the derivations for the package origins. The remaining levels contain less
packages since it's mostly just derivations involved in bootstrapping.

When using a recursive CTE to collect all the derivations, PostgreSQL assumes
that the each derivation has the same number of inputs, and this leads to a
large overestimation of the number of derivations per a revision. This in turn
can lead to PostgreSQL picking a slower way of running the query.

When it's known how many new derivations you should see at each level, it's
possible to inform PostgreSQL this by using LIMIT's at various points in the
query. This reassures the query planner that it's not going to be handling
lots of rows and helps it make better decisions about how to execute the
query.
2023-03-09 08:29:39 +00:00
3ba8418656 Allow skipping processing system tests
Generating system test derivations are difficult, since you generally need to
do potentially expensive builds for the system you're generating the system
tests for. You might not want to disable grafts for instance because you might
be trying to test whatever the test is testing in the context of grafts being
enabled.

I'm looking at skipping the system tests on data.guix.gnu.org, because they're
not used and quite expensive to compute.
2023-02-08 14:56:48 +00:00
7ae1c97b92 Drop the thread pool idle seconds
To hopefully bring down the memory usage from idle connections.
2022-11-24 12:37:45 +00:00
d06230fcf4 Close postgresql connections when the thread pool thread is idle
I think the idle connections associated with idle threads are still taking up
memory, so especially now that you can configure an arbitrary number of
threads (and thus connections), I think it's good to close them regularly.
2022-10-23 11:28:37 +01:00
ff77bbea7e Make it possible to increase the number of thread pool threads
And double the default to 16.
2022-10-02 15:08:18 +01:00
8e23d38660 Handle migrations and server startup better
The server part of the guix-data-service doesn't work great as a guix service,
since it often fails to start if the migrations take any time at all.

To address this, start the server before running the migrations, and serve the
pages that work without the database, plus a general 503 response. Once the
migrations have completed, switch to the normal behaviour.
2022-06-17 13:13:21 +01:00
d4bb0ffaaa Fix more issues with the git_commits introduction 2022-05-23 22:49:51 +01:00
8beab2511c Query substitutes for latest processed revisions periodically
This is a step towards having up to date substitute availability data.
2021-11-16 19:08:46 +00:00
d1a2a7125c Fix a regression with running sqitch
Introduced in 0dc05982cd.
2021-07-11 12:40:48 +01:00
b4188bda9d Run sqitch in the change mode
Since this rolls back migrations less, which is good when the rollback bit
isn't always implemented.
2021-07-04 10:43:13 +01:00
0dc05982cd Try to adapt the PostgreSQL paramstring to use with sqitch 2021-06-16 13:44:00 +01:00
2a8a574f4a Allow customising the pg_dump command used
As this
2021-01-03 19:05:41 +00:00
375a6a37dc Support not querying pending builds
As this can take some time.
2020-11-01 22:52:53 +00:00
f485423d5a Allow only fetching builds for a specific system 2020-11-01 22:49:49 +00:00
6a7f6b5a0e Fix create small backup issue with latest_build_status 2020-10-23 20:01:43 +01:00
3225766207 Make it easier to get to a repl 2020-10-10 13:44:37 +01:00
18b6dd9e6d Stop opening a PostgreSQL connection per request
This was good in that it avoided having to deal with long running connections,
but it probably takes some time to open the connection, and these changes are
a step towards offloading the PostgreSQL queries to other threads, so they
don't block the threads for fibers.
2020-10-03 09:22:29 +01:00
39b5df04eb Remove development code from the process job script 2020-09-28 08:29:20 +01:00
033858410b Add a JSON page for repository branches 2020-09-27 16:32:56 +01:00