Looking at data for the the patches deployment of the Guix Data Service, these
tables look like they might benefit from vacuuming/analyzing more often, so
adjust the configuration so this will hopefully happen.
Stop querying for the file-name, as it's unused. Rather than fetching all ids,
then looking at each to see if it can be deleted, do some imperfect but not
too slow checks in the initial query.
Previously, the name wasn't taken in to account when filtering results, so a
search like "git-annex" wouldn't find the git-annex package, since it's
synopsis or description doesn't include the name.
Filtering on the name made the queries much slower, so to address that, the
filtering by revision is moved to a separate part of the CTE, which means
PostgreSQL filters down the rows by quite a lot before it begins filtering by
name.
Also, add in a variant of the query without dashes (-) because that helps with
searches like ruby-engine.
From the normalized one, to the one actually contained within glibc. Recent
versions of glibc also contain symlinks linking the normalized codeset to the
locales with the .UTF-8 ending, but older ones do not.
Maybe handling codeset normalisation for queries would be good, but the locale
values ending in .UTF-8 are more compatible and allow the code to be
simplified. For querying, maybe there should be a locales table which handles
different representations.
To avoid the index data being too large.
This was first seen in guix commit 1bb4fd64b7bbe5a17eda6f0ca8105283c038f7c8:
psql-query-error (fatal-error PGRES_FATAL_ERROR ERROR: index row size 2808
exceeds maximum 2712 for index "package_descriptions_locale_description_key"
HINT: Values larger than 1/3 of a buffer page cannot be indexed.
Use the deduplicated list of packages when fetching lint warnings, to avoid
duplicates. This was first seen in the following Guix commit
843344273c6a587b8e6c84d8406500fd64d8908a.
Only include a failed build if that build server hasn't had any success
building that output. The Guix Build Coordinator can build one output with
many different builds, so this helps avoid showing lots of spurious failures.