Previously the package_derivations table wasn't considered, which would mean
derivations would still be referenced. This commit fixes that, along with also
deleting unreferenced entries in some linter related tables.
Some revisions have got disassociated from branches, probably because they
were associated with multiple branches in the first place. This should allow
deleting them.
Channels don't represent some channel on which messages travel, at least not a
very long one because it can't accommodate any messages. They simply represent
a direct exchange of the message between a sender and receiver. Because of
this, put-message blocks the fiber, and if all the threads on the other end
are waiting for replies to be received, then you have a deadlock.
To avoid this situation, spawn new fibers to send the messages. I think this
works at least, although I'm unsure how sensible it is.
Previously, a connection was passed through the code handling the
request. When queries were performed, this could block the thread though,
potentially leaving the server unable to serve other requests.
Instead, this now runs queries in a pool of threads. This should remove the
possibility of blocking the threads used by the web server, and in doing so,
some of the queries have been parallelised.
I''m still not sure about the naming and syntax, but I think the functionality
is a sort of step forward.
In to a generic thing more like (ice-9 futures). Including copying some bits
from the (ice-9 threads) module and adapting them to work with this fibers
approach, rather than futures. The advantage being that using fibers channels
doesn't block the threads being used by fibers, whereas futures would.
This was good in that it avoided having to deal with long running connections,
but it probably takes some time to open the connection, and these changes are
a step towards offloading the PostgreSQL queries to other threads, so they
don't block the threads for fibers.
I've not used these in many places, to try and avoid hiding deleting data, but
in this case, this will allow more easily deleting the derivation source file
nars, by just deleting the derivation_source_files table entry.
Looking at data for the the patches deployment of the Guix Data Service, these
tables look like they might benefit from vacuuming/analyzing more often, so
adjust the configuration so this will hopefully happen.
Stop querying for the file-name, as it's unused. Rather than fetching all ids,
then looking at each to see if it can be deleted, do some imperfect but not
too slow checks in the initial query.
Previously, the name wasn't taken in to account when filtering results, so a
search like "git-annex" wouldn't find the git-annex package, since it's
synopsis or description doesn't include the name.
Filtering on the name made the queries much slower, so to address that, the
filtering by revision is moved to a separate part of the CTE, which means
PostgreSQL filters down the rows by quite a lot before it begins filtering by
name.
Also, add in a variant of the query without dashes (-) because that helps with
searches like ruby-engine.
From the normalized one, to the one actually contained within glibc. Recent
versions of glibc also contain symlinks linking the normalized codeset to the
locales with the .UTF-8 ending, but older ones do not.
Maybe handling codeset normalisation for queries would be good, but the locale
values ending in .UTF-8 are more compatible and allow the code to be
simplified. For querying, maybe there should be a locales table which handles
different representations.