These changes were motivated by switching to a mechanism of loading data that
isn't dependent on the big advisory lock that prevents more than one revision
from being processed at a time.
Since INSERT ... RETURNING id; is used, this can block if another transaction
inserts the same data, and then cause an error when that transaction
commits. The solution is to use ON CONFLICT DO NOTHING, but you have to handle
the case when the INSERT doesn't return an id since the other transaction has
inserted it.
This commit rewrites insert-missing-data-and-return-all-ids to do as described
above, as well as being more efficient in how existing data is detected and to
use more vectors. Other utilities for inserting data are added as well.
This is currently happening because some linters try to evaluate parts of
packages for cross-building to aarch64-linux-gnu, but not all packages support
that and some crash in this case.
I'm not quite sure what the correct behaviour should be, but maybe the data
service needs to try and handle these crashes rather than not processing the
entire revision.
This took a while to find as process-job would just get stuck, and this wasn't
directly related to any particular change, just that more fibers increased the
chance of hitting it.
This commit includes lots of the things I changed while debugging.
To allow for calling derivation-file-names->derivation-ids in parallel across
multiple fibers, using the PostgreSQL connection fiber to perform atomic
operations.
So that the WAL can grow more when there's sufficient space. When the
inferiors are closed it takes time to restart them, so doing this less should
speed up processing revisions.
Since getting an inferior from the pool can take some time, it's not
sufficient to just check prior to attempting to fetch an inferior from the
pool. Instead set a timeout and check periodically.
Rather than for each system and target, as this should mean the
response (pausing and allowing inferiors and store connections to be closed)
is quicker.