Troubleshooting¶

This section provides instructions on troubleshooting common issues that may arise during the deployment of workload applications in a Kubernetes cluster, protected with Intel TDX and verified using attestation.

If the below guide does not resolve your issue, refer to Confidential Containers Troubleshooting Guide for more information.

Pods Failed to Start¶

This section provides guidance on how to resolve the issue when pods fail to start due to missing parent snapshot. Such a problem might occur when containerd's plugin (Nydus Snapshotter) failed to clean the images correctly.

To see if your pod is affected by this issue, run the following command:

kubectl describe pod <pod name>

Below kind of errors with containerd's plugin (Nydus Snapshotter) will be indicated by the following error message:

failed to create containerd container: create snapshot: missing parent \"k8s.io/2/sha256:961e...\" bucket: not found

failed to create containerd container: error unpacking image: failed to extract layer sha256:<hash1>: failed to get reader from content store: content digest sha256:<hash2>: not found

Error: failed to create containerd container: error unpacking image: failed to extract layer sha256:<SHA>: failed to get reader from content store: content digest sha256:<SHA>: not found

To resolve the issue, try the following procedure:

Remove your pod:
```
kubectl delete pod <pod name>
```

Clear the Kubernetes images cache:

# Remove cache for the image causing problems
sudo crictl rmi <image name with tag>
# Remove all unused cached images
sudo crictl rmi --prune

Remove all data collected by containerd's plugin (Nydus Snapshotter):

sudo ctr -n k8s.io images rm $(sudo ctr -n k8s.io images ls -q)
sudo ctr -n k8s.io content rm $(sudo ctr -n k8s.io content ls -q)
sudo ctr -n k8s.io snapshots rm $(sudo ctr -n k8s.io snapshots --snapshotter nydus ls | awk 'NR>1 {print $1}')

Disable optimized disk usage enabled in containerd:

sudo sed -i 's/discard_unpacked_layers = true/discard_unpacked_layers = false/' /etc/containerd/config.toml
sudo grep discard_unpacked_layers /etc/containerd/config.toml
sudo systemctl restart containerd

Re-deploy Confidential Containers-related runtime classes using simplified commands based on install Confidential Containers Operator:

kubectl delete -k "github.com/confidential-containers/operator/config/samples/ccruntime/default?ref=$OPERATOR_RELEASE_VERSION"
kubectl apply -k "github.com/confidential-containers/operator/config/samples/ccruntime/default?ref=$OPERATOR_RELEASE_VERSION"

Re-deploy your pod:
```
kubectl apply -f <pod yaml>
```

Attestation Failure¶

This section pinpoints the most common reasons for attestation failure and provides guidance on how to resolve them. An attestation failure is indicated by the fact that pod is in Init:Error state and ATTESTATION FAILED message is present in the logs of the pod.

Note

The example outputs presented in the following might differ from your output, because of different names of the pods/deployments or different IP addresses.

To identify if you encounter an attestation failure, follow the steps below:

Retrieve the status of the nginx-td-attestation pod:

kubectl get pods

Sample output with nginx-td-attestation pod is in Init:Error state:

NAME                   READY   STATUS    RESTARTS   AGE
nginx-td-attestation   0/1     Init:Error   0          1m

Get the logs of the init-attestation container in the nginx-td-attestation pod:

kubectl logs pod/nginx-td-attestation -c init-attestation

Sample output indicating the ATTESTATION FAILED message:

NAME                   READY   STATUS    RESTARTS   AGE
starting
(...)
ATTESTATION FAILED

In case of attestation failure, follow the steps below to troubleshoot the issue:

If you have configured KBS with Intel® Trust Authority, check that API key is correct and KBS was deployed with this value:
```
kubectl describe configmap kbs-config -n coco-tenant | grep -i api_key
```
Expected output:
```
api_key = "<YOUR_ITA_API_KEY>"
```

Check if the KBS pod is running and accessible:

echo $(kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}'):$(kubectl get svc kbs -n coco-tenant -o jsonpath='{.spec.ports[0].nodePort}')

Expected output:

<protocol>://<address>:<port>

Check KBS logs for any errors:
```
kubectl logs deploy/kbs -n coco-tenant
```
An HTTP 400 Bad Request error might suggest that platform is not registered correctly. Refer to the platform registration section of the Intel TDX Enabling Guide for details.
Check for errors in Intel PCCS service:
```
systemctl status pccs
```
Use the following command to get more logs:
```
sudo journalctl -u pccs
```
Check for errors in Intel TDX Quote Generation Service:
```
systemctl status qgsd
```
Use the following command to get more logs:
```
sudo journalctl -u qgsd
```
The following error occurs if the platform is not registered correctly.
```
[QPL] No certificate data for this platform.
```
Refer to the platform registration section of the Intel TDX Enabling Guide for details.