Deployment Issues¶
Issues related to the container runtime and infrastructure — pods not starting, images not pulling, or containers failing to run.
Prerequisites
Troubleshooting commands require administrative (sudo) privileges and kubectl access to the cluster (for Kubernetes deployments) or Docker CLI access (for Docker deployments).
Kubernetes¶
These entries cover Kubernetes-specific deployment failures, including pod scheduling, image pulls, and PVC provisioning.
Pods Not Reaching Running State¶
Symptom
kubectl get pods -n <namespace> shows one or more pods in Pending, CrashLoopBackOff, Error, or Init:0/1 state.
Cause
Pods can fail to start for several reasons:
- Insufficient CPU, memory, or storage on cluster nodes
- Misconfigured values in
.values.yaml(e.g., invalid passwords, missing required fields) - Persistent Volume Claims (PVCs) cannot be fulfilled by the configured storage class
- Database initialization failure due to incorrect credentials
Diagnostic steps
-
List all pods and identify the failing one:
kubectl get pods -n <namespace> -
Inspect pod events for scheduling or resource errors:
kubectl describe pod <pod-name> -n <namespace> -
Check container logs for application-level errors:
kubectl logs <pod-name> -n <namespace> -
For pods with multiple containers, specify the container name:
kubectl logs <pod-name> -c <container-name> -n <namespace> -
Review PVC status:
kubectl get pvc -n <namespace>
Resolution
Pendingpods due to insufficient resources — scale the cluster or reduce resource requests in.values.yaml.PendingPVCs — verify thatstorageClasses.standardClassandstorageClasses.databaseClassmatch available storage classes in your cluster (kubectl get sc).CrashLoopBackOffpods — check logs for the specific error. Common fixes include correcting database passwords or admin credentials in.values.yaml, then reapplying:
helm upgrade excalibur-v4 xclbr/excalibur -f .values.yaml --namespace <namespace>
After applying the fix, verify all pods are running:
kubectl get pods -n <namespace>
All pods should show Running status with all containers ready (e.g., 1/1).
Image Pull Errors¶
Symptom
Pods are stuck in ImagePullBackOff or ErrImagePull state. Events show Failed to pull image messages.
Cause
Image pull failures occur when the registry credentials are invalid or the registry is unreachable:
- Invalid or expired GitHub Personal Access Token (PAT)
- Image pull secret not created or not associated with the service account
- Network restrictions blocking access to
ghcr.io - Incorrect image tag or version specified
Diagnostic steps
-
Check pod events for image pull details:
kubectl describe pod <pod-name> -n <namespace> | grep -A 5 "Events" -
Verify the image pull secret exists:
kubectl get secrets -n <namespace> | grep regcred -
Test registry connectivity from the cluster:
curl -v https://ghcr.io
Resolution
- Expired or invalid PAT — verify your GitHub PAT has the
read:packagesscope and has not expired. Update the token in.values.yamland reapply:
helm upgrade excalibur-v4 xclbr/excalibur -f .values.yaml --namespace <namespace>
- Network restrictions — allow outbound HTTPS traffic to
ghcr.io:443in your firewall or proxy configuration.
After applying the fix, verify pods transition to Running:
kubectl get pods -n <namespace>
Docker¶
These entries cover Docker-specific deployment failures, including container startup, registry authentication, and host resource issues.
Docker Containers Not Starting¶
Symptom
After running docker compose up --detach, docker ps shows containers in Exited, Restarting, or missing entirely.
Cause
Container startup failures are typically caused by invalid configuration or host-level resource constraints:
- Invalid credentials in the
.envfile - Docker login to
ghcr.iofailed or the session expired - Port conflicts with services already running on the host
- Insufficient disk space or memory on the host
Diagnostic steps
-
Check the status of all containers (including stopped ones):
docker ps -a -
View container logs for the failing service:
docker logs <container-name> -
Verify Docker registry authentication:
docker login --username excalibur-enterprise --password <provided-token> ghcr.io -
Check available disk space and memory:
df -h && free -h
Resolution
- Expired authentication — re-run
docker loginwith a valid token. - Invalid
.envvalues — fix the values and re-create the containers:
docker compose --env-file .env --file <filename>.yml up --detach
- Port conflicts — stop the conflicting service or remap ports in the compose file.
- Insufficient disk space — free up space or mount additional storage.
Warning
Running docker compose down removes containers and their data. Use this command only if you intend to re-create the deployment from scratch, or ensure volumes are configured for data persistence.
After applying the fix, verify all containers are running:
docker ps
All containers should show Up status with healthy health checks.