guix-data-service

Author	SHA1	Message	Date
Christopher Baines	66793a5568	Rework how data is inserted This is the big change needed to allow for parallel revision processing. Previously, a lock was used to prevent this since the parallel transactions could deadlock if each inserted data that the other then went to insert. By defining the order in which inserts happen, both in terms of the order of tables, and the order of rows within the table, this change should guarantee that there won't be deadlocks. I'm also hoping this change will address whatever issue was causing some derivation data to be missing from the database.	2026-01-03 20:40:09 +00:00
Christopher Baines	2073e446b7	Tweak the job script Don't use with-fluids (I forget why), and enable core dumps.	2025-07-09 12:50:24 +01:00
Christopher Baines	4210699949	Don't use the after-gc-hook for monitoring garbage collection This seems to be happening not in the thread I expect, so avoid using the hook.	2025-06-29 23:22:31 +02:00
Christopher Baines	fc6f78ca9a	Move the gc watcher to start earlier This means it doesn't use the fibers sleep, don't know if this makes a difference.	2025-06-29 21:30:29 +02:00
Christopher Baines	7c0779519b	Allow specifying a limit to inferior memory usage To help manage the inferiors that use gigabytes of memory while computing derivations.	2025-06-28 09:29:10 +02:00
Christopher Baines	0dd14c0a67	Use drain? #t for fibers when loading revisions To check that there's no left over fibers.	2025-06-28 09:29:10 +02:00
Christopher Baines	0e09e5af2e	Shift setup and add more logging for polling git repositories	2025-05-25 16:10:58 +01:00
Christopher Baines	5717ce82ce	Move away from cgit to more flexible linking to repositories	2025-05-25 13:08:27 +01:00
Christopher Baines	2eb5714829	Use = when comparing numbers	2025-03-11 17:15:00 +00:00
Christopher Baines	f56cae63fc	Fix the --repl option	2025-03-10 08:27:42 +00:00
Christopher Baines	e591346684	Use with-exception-handler in place of with-throw-handler	2025-02-25 10:38:10 +00:00
Christopher Baines	37b7c568ed	Make the job timeout configurable	2025-02-10 11:05:07 +00:00
Christopher Baines	931b7bc593	Add a slightly crude method to ignore systems and targets While processing a revision. It would be good to also record what systems and targets are in the platforms so it's clear what data is missing, but that can be added later.	2025-02-03 22:59:34 +00:00
Christopher Baines	acdedb075d	Ensure COLUMNS is set	2025-02-03 13:06:45 +01:00
Christopher Baines	c58ee6726b	Fix starting with an empty database	2024-11-08 12:52:18 +00:00
Christopher Baines	f6eadb0b16	Make the free space requirement configurable	2024-08-20 09:49:41 +01:00
Christopher Baines	31bd2156f7	Support setting environment variables in the inferior When processing jobs, this is mostly to allow setting GUIX_DOWNLOAD_METHODS.	2024-06-24 23:02:14 +01:00
Christopher Baines	7f5f11048b	Add error handling for startup failures	2024-04-02 12:16:27 +01:00
Christopher Baines	b5f59189e1	Move backfilling in to the server module and use the connection pool To avoid using the old PostgreSQL connection per thread code.	2024-04-01 21:51:29 +01:00
Christopher Baines	ca69d3329d	Add exception handling to the process-jobs script As I'm seeing this exit on beid, but I'm not sure why.	2024-03-05 10:57:41 +00:00
Christopher Baines	a900a1c2ec	Remove drain? #t from process job As it now uses more fibers.	2024-01-18 22:41:02 +00:00
Christopher Baines	c1d2f3a1b7	Add meaningful parallelism to processing jobs Make parallel use of inferiors when computing channel instance derivations, and when extracting information about a revision. This should allow for some horizontal scalability, reducing the impact of additional systems for which derivations need computing. This commit also fixes an apparent issue with package replacements, as previously the wrong id was used, and this hid some issues around deduplication.	2024-01-18 15:34:40 +00:00
Christopher Baines	a3ec1f326d	Set %file-port-name-canonicalization when processing jobs Just in case this helps with performance.	2023-12-04 11:06:27 +00:00
Christopher Baines	c3cb04cb80	Use fibers when processing new revisions Just have one fiber at the moment, but this will enable using fibers for parallelism in the future. Fibers seemed to cause problems with the logging setup, which was a bit odd in the first place. So move logging to the parent process which is better anyway.	2023-11-05 13:46:20 +00:00
Christopher Baines	10bad53ad5	Support polling git repositories for new branches/revisions This is mostly a workaround for the occasional problems with the guix-commits mailing list, as it can break and then the data service doesn't learn about new revisions until the problem is fixed. I think it's still a generally good feature though, and allows deploying the data service without it consuming emails to learn about new revisions, and is a step towards integrating some kind of way of notifying the data service to poll.	2023-10-09 22:19:02 +01:00
Christopher Baines	7251c7d653	Stop using a pool of threads for database operations Now that squee cooperates with suspendable ports, this is unnecessary. Use a connection pool to still support running queries in parallel using multiple connections.	2023-07-10 18:56:31 +01:00
Christopher Baines	29d49ba31a	Detach the database setup from the main guix-data-service process This will allow restarting them independently, leaving it up to the operator to ensure that all processes are compatible.	2023-06-09 16:11:06 +01:00
Christopher Baines	5c9ec28cb5	Query for outputs when build events arrive This will keep the substitute information more up to date.	2023-06-09 16:11:06 +01:00
Christopher Baines	688f4cd79d	Set request timeouts for the thread pools The request timeout should ensure that the operations don't back up if the thread pool is overloaded.	2023-04-27 14:58:47 +02:00
Christopher Baines	9f080524bc	Split the thread pool used for database connections In to two thread pools, a default one, and one reserved for essential functionality. There are some pages that use slow queries, so this should help stop those pages block other operations.	2023-04-27 10:31:09 +02:00
Christopher Baines	519f0c6f67	Defer backfilling derivation distribution counts until later After the migrations have run.	2023-03-09 09:39:47 +00:00
Christopher Baines	e39c9da028	Store the distribution of derivations related to packages This might be generally useful, but I've been looking at it as it offers a way to try and improve query performance when you want to select all the derivations related to the packages for a revision. The data looks like this (for a specified system and target): ┌───────┬───────┐ │ level │ count │ ├───────┼───────┤ │ 15 │ 2 │ │ 14 │ 3 │ │ 13 │ 3 │ │ 12 │ 3 │ │ 11 │ 14 │ │ 10 │ 25 │ │ 9 │ 44 │ │ 8 │ 91 │ │ 7 │ 1084 │ │ 6 │ 311 │ │ 5 │ 432 │ │ 4 │ 515 │ │ 3 │ 548 │ │ 2 │ 2201 │ │ 1 │ 21162 │ │ 0 │ 22310 │ └───────┴───────┘ Level 0 reflects the number of packages. Level 1 is similar as you have all the derivations for the package origins. The remaining levels contain less packages since it's mostly just derivations involved in bootstrapping. When using a recursive CTE to collect all the derivations, PostgreSQL assumes that the each derivation has the same number of inputs, and this leads to a large overestimation of the number of derivations per a revision. This in turn can lead to PostgreSQL picking a slower way of running the query. When it's known how many new derivations you should see at each level, it's possible to inform PostgreSQL this by using LIMIT's at various points in the query. This reassures the query planner that it's not going to be handling lots of rows and helps it make better decisions about how to execute the query.	2023-03-09 08:29:39 +00:00
Christopher Baines	3ba8418656	Allow skipping processing system tests Generating system test derivations are difficult, since you generally need to do potentially expensive builds for the system you're generating the system tests for. You might not want to disable grafts for instance because you might be trying to test whatever the test is testing in the context of grafts being enabled. I'm looking at skipping the system tests on data.guix.gnu.org, because they're not used and quite expensive to compute.	2023-02-08 14:56:48 +00:00
Christopher Baines	7ae1c97b92	Drop the thread pool idle seconds To hopefully bring down the memory usage from idle connections.	2022-11-24 12:37:45 +00:00
Christopher Baines	d06230fcf4	Close postgresql connections when the thread pool thread is idle I think the idle connections associated with idle threads are still taking up memory, so especially now that you can configure an arbitrary number of threads (and thus connections), I think it's good to close them regularly.	2022-10-23 11:28:37 +01:00
Christopher Baines	ff77bbea7e	Make it possible to increase the number of thread pool threads And double the default to 16.	2022-10-02 15:08:18 +01:00
Christopher Baines	8e23d38660	Handle migrations and server startup better The server part of the guix-data-service doesn't work great as a guix service, since it often fails to start if the migrations take any time at all. To address this, start the server before running the migrations, and serve the pages that work without the database, plus a general 503 response. Once the migrations have completed, switch to the normal behaviour.	2022-06-17 13:13:21 +01:00
Christopher Baines	d4bb0ffaaa	Fix more issues with the git_commits introduction	2022-05-23 22:49:51 +01:00
Christopher Baines	8beab2511c	Query substitutes for latest processed revisions periodically This is a step towards having up to date substitute availability data.	2021-11-16 19:08:46 +00:00
Christopher Baines	d1a2a7125c	Fix a regression with running sqitch Introduced in `0dc05982cd`.	2021-07-11 12:40:48 +01:00
Christopher Baines	b4188bda9d	Run sqitch in the change mode Since this rolls back migrations less, which is good when the rollback bit isn't always implemented.	2021-07-04 10:43:13 +01:00
Christopher Baines	0dc05982cd	Try to adapt the PostgreSQL paramstring to use with sqitch	2021-06-16 13:44:00 +01:00
Christopher Baines	2a8a574f4a	Allow customising the pg_dump command used As this	2021-01-03 19:05:41 +00:00
Christopher Baines	375a6a37dc	Support not querying pending builds As this can take some time.	2020-11-01 22:52:53 +00:00
Christopher Baines	f485423d5a	Allow only fetching builds for a specific system	2020-11-01 22:49:49 +00:00
Christopher Baines	6a7f6b5a0e	Fix create small backup issue with latest_build_status	2020-10-23 20:01:43 +01:00
Christopher Baines	3225766207	Make it easier to get to a repl	2020-10-10 13:44:37 +01:00
Christopher Baines	18b6dd9e6d	Stop opening a PostgreSQL connection per request This was good in that it avoided having to deal with long running connections, but it probably takes some time to open the connection, and these changes are a step towards offloading the PostgreSQL queries to other threads, so they don't block the threads for fibers.	2020-10-03 09:22:29 +01:00
Christopher Baines	39b5df04eb	Remove development code from the process job script	2020-09-28 08:29:20 +01:00
Christopher Baines	033858410b	Add a JSON page for repository branches	2020-09-27 16:32:56 +01:00

1 2 3

107 commits