Deployment Issues¶

Issues related to the container runtime and infrastructure — pods not starting, images not pulling, or containers failing to run.

Prerequisites

Troubleshooting commands require administrative (sudo) privileges and kubectl access to the cluster (for Kubernetes deployments) or Docker CLI access (for Docker deployments).

Kubernetes¶

These entries cover Kubernetes-specific deployment failures, including pod scheduling, image pulls, and PVC provisioning.

Pods Not Reaching Running State¶

Symptom

kubectl get pods -n <namespace> shows one or more pods in Pending, CrashLoopBackOff, Error, or Init:0/1 state.

Cause

Pods can fail to start for several reasons:

Insufficient CPU, memory, or storage on cluster nodes
Misconfigured values in .values.yaml (e.g., invalid passwords, missing required fields)
Persistent Volume Claims (PVCs) cannot be fulfilled by the configured storage class
Database initialization failure due to incorrect credentials

Diagnostic steps

List all pods and identify the failing one:
```
kubectl get pods -n <namespace>
```

Inspect pod events for scheduling or resource errors:

kubectl describe pod <pod-name> -n <namespace>

Check container logs for application-level errors:
```
kubectl logs <pod-name> -n <namespace>
```

For pods with multiple containers, specify the container name:

kubectl logs <pod-name> -c <container-name> -n <namespace>

Review PVC status:
```
kubectl get pvc -n <namespace>
```

Resolution

Pending pods due to insufficient resources — scale the cluster or reduce resource requests in .values.yaml.
Pending PVCs — verify that storageClasses.standardClass and storageClasses.databaseClass match available storage classes in your cluster (kubectl get sc).
CrashLoopBackOff pods — check logs for the specific error. Common fixes include correcting database passwords or admin credentials in .values.yaml, then reapplying:

helm upgrade excalibur-v4 xclbr/excalibur -f .values.yaml --namespace <namespace>

After applying the fix, verify all pods are running:

kubectl get pods -n <namespace>

All pods should show Running status with all containers ready (e.g., 1/1).

Image Pull Errors¶

Symptom

Pods are stuck in ImagePullBackOff or ErrImagePull state. Events show Failed to pull image messages.

Cause

Image pull failures occur when the registry credentials are invalid or the registry is unreachable:

Invalid or expired GitHub Personal Access Token (PAT)
Image pull secret not created or not associated with the service account
Network restrictions blocking access to ghcr.io
Incorrect image tag or version specified

Diagnostic steps

Check pod events for image pull details:

kubectl describe pod <pod-name> -n <namespace> | grep -A 5 "Events"

Verify the image pull secret exists:

kubectl get secrets -n <namespace> | grep regcred

Test registry connectivity from the cluster:
```
curl -v https://ghcr.io
```

Resolution

Expired or invalid PAT — verify your GitHub PAT has the read:packages scope and has not expired. Update the token in .values.yaml and reapply:

helm upgrade excalibur-v4 xclbr/excalibur -f .values.yaml --namespace <namespace>

Network restrictions — allow outbound HTTPS traffic to ghcr.io:443 in your firewall or proxy configuration.

After applying the fix, verify pods transition to Running:

kubectl get pods -n <namespace>

Docker¶

These entries cover Docker-specific deployment failures, including container startup, registry authentication, and host resource issues.

Docker Containers Not Starting¶

Symptom

After running docker compose up --detach, docker ps shows containers in Exited, Restarting, or missing entirely.

Cause

Container startup failures are typically caused by invalid configuration or host-level resource constraints:

Invalid credentials in the .env file
Docker login to ghcr.io failed or the session expired
Port conflicts with services already running on the host
Insufficient disk space or memory on the host

Diagnostic steps

Check the status of all containers (including stopped ones):
```
docker ps -a
```
View container logs for the failing service:
```
docker logs <container-name>
```

Verify Docker registry authentication:

docker login --username excalibur-enterprise --password <provided-token> ghcr.io

Check available disk space and memory:
```
df -h && free -h
```

Resolution

Expired authentication — re-run docker login with a valid token.
Invalid .env values — fix the values and re-create the containers:

docker compose --env-file .env --file <filename>.yml up --detach

Port conflicts — stop the conflicting service or remap ports in the compose file.
Insufficient disk space — free up space or mount additional storage.

Warning

Running docker compose down removes containers and their data. Use this command only if you intend to re-create the deployment from scratch, or ensure volumes are configured for data persistence.

After applying the fix, verify all containers are running:

docker ps

All containers should show Up status with healthy health checks.