This runs our zuul-registry container image builds on noble because we
rely on skopeo being able to talk to docker's daemon but jammy skopeo is
too old to talk to modern docker. Running on noble should get us newer
skopeo can speak to new docker as well. Note we explicitly add
python3-setuptools to the extra packages list because docker-compose
needs it and doesn't explicitly list it as a dependency.
While we are at it we add python3.12 unittesting on noble too.
Change-Id: I3e84b0a5fc9cd29ab8cf510a59ef6bfed8b5abbd
An upload can be finished with a PUT with no body. If that happens,
we send an empty chunk to our backend, which can end up using
extra storage and potentially using multipart objects on object
storage servers when not strictly necessary.
Simplify this by checking to see if we are getting any data from the
client before we start our push to the backend.
Change-Id: Ia2069364ba2a5e03331f9eabca692b5aa065ce6e
This is a cython implementation of sha256 where we can get and set
the state for resumable hashing. The rehash library uses openssl but
does not support openssl 3.x and is therefore becomming more difficult
to use.
Change-Id: I5ece8a7532421d76f5d9507131abd5f86c806158
This matches what Zuul and Nodepool are doing. Getting off the older
images will also allow OpenDev to stop building them. We update the
unittest job to python 3.11 in the process to better cover what people
will deploy.
Note we wanted to update the base OS from bullseye to bookworm, but this
update involves upgrading openssl from 1.1.x to 3.x and the rehash
library which the registry depends on is not compatible with openssl
3.x. Getting to python3.11 on bullseye would at least allow us to stop
building python3.9 and python3.10 bullseye images.
Change-Id: Ie609a2f5f3c488cea8d98e78c9fa6353ebb78f62
This reverts commit 53371d7e867a6c8703f50885548433d850952894.
OpenDev is moving back to docker.io in order to continue doing
speculative testing of container images.
Depends-On: https://review.opendev.org/c/opendev/system-config/+/884256
Change-Id: I8cc8537f03e0a799db9123b2a12e542b911564b8
OpenDev is moving images to quay.io. This change updates zuul-registry's
use of those images to match. The depends on will ensure we don't merge
this before the move has occurred.
Depends-On: https://review.opendev.org/c/opendev/system-config/+/881932
Change-Id: I4e7b2003d5641d66dd283e5827cda87521e93d69
Tox v4 released and has introduced a number of compatibility problems.
Nox is an alternative to nox that is much simpler and uses standard
tools and zuul repos are switching to it. This change updates
zuul-registry.
Change-Id: I588842a0c47d0591b6dacbe28e96915e1b15819e
We do this to take advantage of python 3.10's speed improvements as
illustrated by Zuul proper. But also OpenDev would like to drop python
3.8 image builds to make room for python 3.11.
We add python3.10 unittesting to prevent regressing the docker images on
that version of python.
Change-Id: Ib2e39e0c5d09b4fdce40faa072fe51b55c8d7407
This patch resolves the issue that causes Docker to fail when
it runs the "docker manifest create" command since the registry
does not mention that it supports the 2.0 API inside it's
headers.
Change-Id: I777bd4218c822fa665f0d29f573a6ae120194284
Due to a bug in Docker[1], it fails to create the manifest
manually due to a missing header when Docker runs a ping to
the API. This has been fixed in the development branch but it
will likely take time for it to trickle down.
This is the first patch of two which will demonstrate the actual
failure, with another follow-up patch which will demonstrate
the fix.
[1]: https://github.com/docker/cli/issues/3161
Change-Id: I1f5c3121c58f93503be92c5ea3c4ae446cc44938
I'm not sure why I62f6c238de5c55b7673ae8b38a6ededdd77f4d4b added these
with Fedora; maybe it was the only 3.8 environment at the time. I
don't think we want to run the tests there now.
Change-Id: Iee0e2b60e49e8149b045eaedccff034e5f5a1cfd
These matches look like they should be "or" (if any of these images is
already there, fail), not "and". This caused some confusion when the
image was being leaked (I722ba599fbc690d6cb967070c05215b98a73dcaf) as
this wasn't triggering.
Change-Id: Ic6dad7bbf9013e994eb80a54bc5191a96bd4be94
Currently we pull test/image here but do not clean it out. This
interferes with the buildset-registry podman tests that happen
afterwards when using podman 3.4. It's unclear if this is a feature
or a bug, see [1].
[1] https://github.com/containers/podman/issues/13383
Change-Id: I722ba599fbc690d6cb967070c05215b98a73dcaf
We use a non trivial image now (the registry's image in fact) and then
push with 4 processes at a time every time we push to the test registry.
The idea here is we're generating concurrency in the system that should
trip over the bugs that we are finding if they still exist.
Change-Id: I4f03132fa0fad6f40806030b14d429d29036a42f
The way the registry was previously written if two concurrent uploads
of the same blob were happening one would fail to grab the lock and then
return early. The uploading client would then immediately HEAD the blob
and if it did so quickly enough would get a short size or 404. To avoid
this we need the upload to continue until all concurrent uploads are
complete.
To make this happen we treat upload chunks separately per upload so that
separate uploads cannot overwrite the chunks once they are moved to the
blob directory. We end up moving the chunks to the blob directory in
upload specific locations to facilitate this. Once that is done we can
atomically update the actual blob data from the chunks. In the
filesystem driver we concatenate the chunks into the blob then
atomically rename the result into its final blob/data location. This
ensures that we only ever return valid HEAD info for a blob, and that it
is only requested by the client once it exists.
This should be safe because the objects are hashsum addresses which
means their contents should be identical. If we end up moving one copy
into place then another atomically they will always have the same data
and size.
These logs from an OpenDev test job seem to capture this in action:
# First upload is completing and grabs the lock
2022-02-25 21:28:14,514 INFO registry.api: [u: 935f8eddbb9a4dab8dd8cc45ce7f9384] Upload final chunk _local opendevorg/gerrit digest sha256:0c6b8ff8c37e92eb1ca65ed8917e818927d5bf318b6f18896049b5d9afc28343
2022-02-25 21:28:14,576 DEBUG registry.storage: [u: 935f8eddbb9a4dab8dd8cc45ce7f9384] Locked digest sha256:0c6b8ff8c37e92eb1ca65ed8917e818927d5bf318b6f18896049b5d9afc28343
# Second upload attempts to complete but ends early without the lock
2022-02-25 21:28:15,517 INFO registry.api: [u: e817d8fd6c464f80bf405581e580cbab] Upload final chunk _local opendevorg/gerrit digest sha256:0c6b8ff8c37e92eb1ca65ed8917e818927d5bf318b6f18896049b5d9afc28343
2022-02-25 21:28:15,578 WARNING registry.storage: [u: e817d8fd6c464f80bf405581e580cbab] Failed to obtain lock(1) on digest sha256:0c6b8ff8c37e92eb1ca65ed8917e818927d5bf318b6f18896049b5d9afc28343
2022-02-25 21:28:15,588 INFO registry.api: [u: e817d8fd6c464f80bf405581e580cbab] Upload complete _local opendevorg/gerrit digest sha256:0c6b8ff8c37e92eb1ca65ed8917e818927d5bf318b6f18896049b5d9afc28343
2022-02-25 21:28:15,589 INFO cherrypy.access.140551593545056: ::ffff:172.17.0.1 - - [25/Feb/2022:21:28:15] "PUT /v2/opendevorg/gerrit/blobs/uploads/e817d8fd6c464f80bf405581e580cbab?digest=sha256%3A0c6b8ff8c37e92eb1ca65ed8917e818927d5bf318b6f18896049b5d9afc28343 HTTP/1.1" 201 - "" "docker/20.10.12 go/go1.16.12 git-commit/459d0df kernel/5.4.0-100-generic os/linux arch/amd64 UpstreamClient(Docker-Client/20.10.12 \(linux\))"
# Second upload completion triggers the HEAD requests that is either a
# 404 or short read. This causes the second upload client to error.
2022-02-25 21:28:15,605 INFO registry.api: Head blob _local opendevorg/gerrit sha256:0c6b8ff8c37e92eb1ca65ed8917e818927d5bf318b6f18896049b5d9afc28343 not found
2022-02-25 21:28:15,607 INFO cherrypy.access.140551593545056: ::ffff:172.17.0.1 - - [25/Feb/2022:21:28:15] "HEAD /v2/opendevorg/gerrit/blobs/sha256:0c6b8ff8c37e92eb1ca65ed8917e818927d5bf318b6f18896049b5d9afc28343 HTTP/1.1" 404 735 "" "docker/20.10.12 go/go1.16.12 git-commit/459d0df kernel/5.4.0-100-generic os/linux arch/amd64 UpstreamClient(Docker-Client/20.10.12 \(linux\))"
# Now first upload has completed and the HEAD request by the first
# upload client is successful
2022-02-25 21:28:18,898 INFO registry.api: [u: 935f8eddbb9a4dab8dd8cc45ce7f9384] Upload complete _local opendevorg/gerrit digest sha256:0c6b8ff8c37e92eb1ca65ed8917e818927d5bf318b6f18896049b5d9afc28343
2022-02-25 21:28:18,898 INFO cherrypy.access.140551593545056: ::ffff:172.17.0.1 - - [25/Feb/2022:21:28:18] "PUT /v2/opendevorg/gerrit/blobs/uploads/935f8eddbb9a4dab8dd8cc45ce7f9384?digest=sha256%3A0c6b8ff8c37e92eb1ca65ed8917e818927d5bf318b6f18896049b5d9afc28343 HTTP/1.1" 201 - "" "docker/20.10.12 go/go1.16.12 git-commit/459d0df kernel/5.4.0-100-generic os/linux arch/amd64 UpstreamClient(Docker-Client/20.10.12 \(linux\))"
2022-02-25 21:28:18,915 INFO registry.api: Head blob _local opendevorg/gerrit sha256:0c6b8ff8c37e92eb1ca65ed8917e818927d5bf318b6f18896049b5d9afc28343 size 54917164
2022-02-25 21:28:18,916 INFO cherrypy.access.140551593545056: ::ffff:172.17.0.1 - - [25/Feb/2022:21:28:18] "HEAD /v2/opendevorg/gerrit/blobs/sha256:0c6b8ff8c37e92eb1ca65ed8917e818927d5bf318b6f18896049b5d9afc28343 HTTP/1.1" 200 54917164 "" "docker/20.10.12 go/go1.16.12 git-commit/459d0df kernel/5.4.0-100-generic os/linux arch/amd64 UpstreamClient(Docker-Client/20.10.12 \(linux\))"
Change-Id: Ibdf1ca554756af61247d705b2ea3cf85c39c2b83
The PPA this is using is deprecated. The ensure-podman role will
setup the repositories correctly for us from upstream and install
podman.
Change-Id: I4d04431499230a03ae685ebb21d5d88ffec7d679
This reverts commit 0acf9951759ea44b93b3b5cbc9c9dc77bb06e1d8.
This changed the errors from short reads leading to manifest mismatches
to unknown blob errors. The unknown blob errors are due to not
responding to the HEAD requests until after all the data is written.
Either we get the HEAD request early and return a short size or we
return 404 and either way the end result is failure.
I'm not sure how what the solution is at this point but want to have
this revert ready in case this is desireable.
Change-Id: I6ba420e53e2091456db23c1c1cf4c76be04a4d03
We are seeing problems where manifests are invalid because the size in
the manifest for blobs is smaller than the actual size of a blob. The
reason for this appears to be that the docker client will HEAD blobs to
get their size for use in the manifest, and zuul-registry will return a
HEAD value if the blob is on disk. With larger blobs concatenating the
chunks into the final blob location may be slow and we'll return a short
HEAD response while the blob is being concetenated.
Avoid this problem by concatenating into a tmpfile and then atomically
moving that file into its final position. The file length should be
complete at that point and HEAD will only see the path as existing once
it is complete.
Change-Id: I0c85fa5657106175140fbb8b8193659bcfe2d6c8
This updates the base image to bullseye. Buster has a shelf life now and
we should shift to bullseye.
Change-Id: Ia136695f00a9ba71fc8ffe984e9563e1061ff099
Prior change Iaf6b83b61a470e8c5a61d18b7c241f9ca002c08a contained a bug
that and was returning an blank Content-Length for manifest HEAD
queries. For this case, we have already read() the blob so we can
just return it's length for the HEAD query.
(the extant code was wrong for this case and self.storage.blob_size
was returning None, which got turned into 0 silently. This leads to
the client reporting:
failed commit on ref "manifest-sha256:ad38c...": unexpected commit digest
sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855,
expected sha256:ad38c... failed precondition
I put this in for anyone googling who is seeing that e3b0c4... hash;
it's a clue that somehow you're trying to hash something empty. This
slipped through because I tested pulling images, which worked, but it
fails when pulled in by docker build).
Change-Id: I7218843879ecd7939730060b6cee18af8190b888
Manifest HEAD requests should return:
Content-Length: <length of manifest>
Docker-Content-Digest: <digest>
Currently we are not returning Content-Length. Add this, and similar
to I089fa28bf16d4161507821452d6cf91d86818dc6 make sure it is actually
returned by disabling fiddling with encoding.
I think that Buildkit getting confused
(I5416694d861c5391fa0aebcebfd83d7c1b0a4e4d) is probably related to us
not returning the correct Content-Length here. Thus I have reverted
these checks. The standard definitely says these requests should
return both headers and a blank body; so if this is still causing
problems we should investigate further.
Change-Id: Iaf6b83b61a470e8c5a61d18b7c241f9ca002c08a
The reason for missing sizes was traced back to our bad response to
HEAD requests for blobs (I089fa28bf16d4161507821452d6cf91d86818dc6).
Now that is fixed, we should not accept missing sizes from clients.
Change-Id: I2514ea01789a53f26ace345ec86177e1634dfa98
As noted inline, a blank body falls into CherryPy code that will
delete the Content-Length.
The end result is that we are sending responses to blob HEAD requests
with an invalid Content-Length.
One consequence of this is that the "config" struct in manifests are
missing their "size" attribute.
[1] a7983fe61f/cherrypy/lib/encoding.py (L137)
Change-Id: I089fa28bf16d4161507821452d6cf91d86818dc6
This is a partial revert of I70a7bb5f73d1dddc540e96529784bb8c9bb0b9e3
The off-by-one error turned out to come from our range response
(I1fb1abf3c76ea8db7820caa90c97ddbf92997842). There are no clients we
know of that send an incorrect size in the manifest; we should error
if we see that.
Revert the adding of layer sizes done with
Id5b1c5726fbe046b2f9f2994bf34f5fd7ecd90de and replace it with a hard
error. I have tested both docker and podman pushes against this
change and both correctly set a size for each layer in the manifest
per [1].
Although I can not replicate it, the missing sizes might have already
been fixed upstream, or related to the aforementioned off-by-one
response we were giving.
[1] https://docs.docker.com/registry/spec/manifest-v2-2/#manifest-list-field-descriptions
Change-Id: Ibe061171bfd8ab6043b491bbab933bf277f8e12b
The range starts from zero, so we are returning one byte too many.
Docker echos this back to us in the manifest it uploads which is where
the extra byte is coming from.
This is actually the root cause of the off-by-one error worked around
in 134c9428353247c22b69f497b904593faf93292d. A follow-on will clean
this up (Ibe061171bfd8ab6043b491bbab933bf277f8e12b).
Change-Id: I1fb1abf3c76ea8db7820caa90c97ddbf92997842
The "does the size on disk match the manifest" size checks in
I70a7bb5f73d1dddc540e96529784bb8c9bb0b9e3 appear to having a
false-positive on some changes.
One example job I found failed with
Manifest has invalid size for layer
sha256:9815a275e5d0f93566302aeb58a49bf71b121debe1a291cf0f64278fe97ec9b5
(size:203129434 actual:185016320)
but when I went to look at the file on disk (quite some time later),
it was correct
$ ls -l ./_local/blobs/sha256:9815a275e5d0f93566302aeb58a49bf71b121debe1a291cf0f64278fe97ec9b5
-rw-r--r-- 1 root root 203129433 Sep 7 01:53 data
(well, off by one byte, but that's the problem worked-around by the
original change).
I think what happened here is that is that when we're getting into
multi-hundred megabyte layers there's plenty of chance for the writes
to join the upload chunks to have not flushed by the time we check it
when the manifest is uploaded.
Add some flush and sync points after upload and concatentation writes
to avoid this.
Change-Id: I46606287d41fa6745de7c8bfb31c6b0d28e32957
I70a7bb5f73d1dddc540e96529784bb8c9bb0b9e3 put in logging messages that
assume the size coming back is a integer. To keep this compatible
with the stat() returns from the file-system version, cast the
content-length to int.
Change-Id: Ic79e212865129abbc4011ad8ddf2c804c4374766
The image manifest specification specifies a "size" for each layer
that " ... exists so that a client will have an expected size for the
content before validating. If the length of the retrieved content does
not match the specified length, the content should not be trusted."
This validates that a pushed manifest has the correct size for the
layers, and if it does not, returns an error. A function is added to
clear a blob+metadata which essentially rolls-back all the layers from
the manifest (otherwise you get errors when trying again about
existing objects).
This is a change to the status quo, although I believe a correct one.
A flag is added if the old behaviour is required for whatever obscure
reason.
The logic was getting a bit tangled, so I've refactored slightly to
hopefully make it more understandable.
Current docker seems to have an issue where is misreports the size; we
put in a workaround for this since it has been identified.
[1] https://docs.docker.com/registry/spec/manifest-v2-2/#image-manifest
Change-Id: I70a7bb5f73d1dddc540e96529784bb8c9bb0b9e3
Docker build can in some cases omit size attributes in the layer
entries in its manifests. This isn't a critical error for podman,
but it does cause it not to draw the little progress arrows. So
let's add them if they're missing too.
Change-Id: Id5b1c5726fbe046b2f9f2994bf34f5fd7ecd90de
This is related to an incompatibility between docker and podman.
"docker build" produces image manifests that look like this:
{'config': {'digest': 'sha256:abc',
'mediaType': 'application/vnd.docker.container.image.v1+json'},
'layers': [{'digest': 'sha256:def',
'mediaType': 'application/vnd.docker.image.rootfs.diff.tar.gzip',
'size': 5678},
...
"podman build" manifests look like this:
{'config': {'digest': 'sha256:abc',
'mediaType': 'application/vnd.docker.container.image.v1+json',
'size': 1234},
'layers': [{'digest': 'sha256:def',
'mediaType': 'application/vnd.docker.image.rootfs.diff.tar.gzip',
'size': 5678},
...
Podman includes the size attribute for the image config, but Docker
does not. If you "docker build" and "docker push" to a zuul-registry,
and then "podman pull" from it, it will fail because of the missing
size.
One solution to this would be to simply "podman build" instead of
"docker build". That's probably best.
But somehow, when the result of "docker build" is pushed to Docker Hub,
and then that is used with "podman pull", it works. We can surmise that
Docker Hub is correcting the manifest with the missing data in that case.
To maintain compatability, let's do the same.
Also, add a missing content-type header on blob GETs. This doesn't seem
to be important, but it's more correct.
Change-Id: I0db0dbf9775b02438880624bcf98d2b8f4d2575c
According to the docs, these content-length headers should be
present, so include them. We should also return a 202 on the PUT
rather than a 204.
None of that seems to matter, but as long as that's the case, why
not match the spec?
Also, adjust some log lines to match others and add a few more
useful lines.
Change-Id: Ib06d282891984918f369a562d075ad2b8c7349e2
So that the registry may be run behind a TLS terminating proxy,
make the TLS configuration optional.
Change-Id: I234ab825d809fc2d5cd31a192d235902a2fff065
Enable running the registry in a mode where authentication is required
for pulling images. This could be useful in an environment where even
an intermediate or buildset registry should require authentication to
pull images. Or it could make this more useful as a general registry
(that's not a priority use case for this project, but this doesn't add
much complexity).
If a "read" level user is specified, then we assume that anonymous
read access should not be allowed.
Change-Id: I1455a1031590ff0206a4b6da0d8c08093cf0e3cd
This is a followon from the previous change and updates the get_blob
method to also handle an http GET ns parameter. The buildx client
sends this parameter when getting blobs as well:
cherrypy.access.139663382723600: ::ffff:172.17.0.1 - - [02/Feb/2021:00:04:09] "GET /v2/upstream/image/blobs/sha256:358792b44b713f905789d54d9733e75c702792522c07f812bf873b5ffc5efe77?ns=docker.io HTTP/1.1" 404 735 "" "containerd/1.4.0+unknown"
As with the manifests we simply accept the paramater and ignore it.
Ianw found more info in containerd's git log:
adeba792f1
Change-Id: Ibcf08c26844bf514a003b9809743540e6d320170
Some image clients (like the one in buildx) send a namespace parameter
when getting manifests. Cherrypy responds with a 404 if it doesn't know
about parameters so we add an ns parameter to get_manifest.
Note that it is possible we need to add this to blob gets as well, but
we'll take this one step at a time.
Change-Id: I1538e07242133a26021ba38eff886b5f8f7402e3