These changes were motivated by switching to a mechanism of loading data that
isn't dependent on the big advisory lock that prevents more than one revision
from being processed at a time.
Since INSERT ... RETURNING id; is used, this can block if another transaction
inserts the same data, and then cause an error when that transaction
commits. The solution is to use ON CONFLICT DO NOTHING, but you have to handle
the case when the INSERT doesn't return an id since the other transaction has
inserted it.
This commit rewrites insert-missing-data-and-return-all-ids to do as described
above, as well as being more efficient in how existing data is detected and to
use more vectors. Other utilities for inserting data are added as well.
From the normalized one, to the one actually contained within glibc. Recent
versions of glibc also contain symlinks linking the normalized codeset to the
locales with the .UTF-8 ending, but older ones do not.
Maybe handling codeset normalisation for queries would be good, but the locale
values ending in .UTF-8 are more compatible and allow the code to be
simplified. For querying, maybe there should be a locales table which handles
different representations.
squee, returns all data as strings, and expects strings as inputs to
queries. So, keeping the ids as strings was easy initially, but it means that
you can't tell from the type whether it should be quoted, or not...
Therefore, handle ids as strings, converting them to numbers when they're
fetched from the database, and back to strings as part of the queries.
The licenses table, along with the package_metadata table had duplicate
values. This could happen as the unique constraints on those tables didn't
properly account for the nullable fields.
The duplicates in those tables also affected the license_sets, packages,
package_derivations tables in a similar way. Finally, the
guix_revision_package_derivations table was also affected.
This commit adds a migration to fix the data, as well as the constraints. THe
code to populate the licenses and package_metadata tables is also updated.
On one code path, they were handled as numbers, whereas elsewhere they were
handled as strings. This led to the package-metadata code trying to insert
duplicate entries.
Instead, just handle them as strings everywhere.
Store the location a package can be found at, and display this on the package
page.
If available, link off to the git repository containing the package.
I'm thinking about adding more fields to this table, and the sha1_hash values
will make this tricker.
Therefore, remove the value, and adjust the existing code to cope. This commit
also adds a new test which coveres some of the changed functionality.