Skip to main content
Kubernetes is the de-facto runtime for containerised workloads at scale, but its surface area is enormous. These notes focus on the operational slice — the commands and manifest patterns you reach for when deploying, debugging, and maintaining services day-to-day. Theory is kept to a minimum; working YAML and shell commands are kept to a maximum.

kubectl Essentials

# List resources (the most-used command in k8s)
kubectl get pods                              # current namespace
kubectl get pods -n kube-system               # specific namespace
kubectl get pods -A                           # all namespaces
kubectl get pods -o wide                      # extra columns (node, IP)
kubectl get pods -w                           # watch for changes

# Get all common resources at once
kubectl get all -n myapp

# Output as YAML (great for diffing live state vs. your repo)
kubectl get deployment myapp -o yaml

# JSON path query — e.g. get the image of the first container
kubectl get pod myapp-abc123 \
  -o jsonpath='{.spec.containers[0].image}'

# Describe gives events + full spec (essential for debugging)
kubectl describe pod myapp-abc123
kubectl describe node worker-1
kubectl describe service myapp-svc

Pod Troubleshooting Workflow

When a pod isn’t behaving, follow this sequence:
1

Check pod status

kubectl get pod myapp-abc123 -o wide
Look at STATUS and RESTARTS. Common problem states:
  • CrashLoopBackOff — the container keeps crashing; check logs
  • OOMKilled — container exceeded its memory limit; check describe
  • Pending — scheduler can’t place the pod; check events in describe
  • ImagePullBackOff — registry credentials or image name issue
2

Describe the pod for events

kubectl describe pod myapp-abc123
Scroll to the Events section at the bottom. This is the fastest way to diagnose scheduling failures, image pull errors, and liveness probe failures.
3

Read the logs

# Current run
kubectl logs myapp-abc123

# If it crashed, read the previous run's logs
kubectl logs myapp-abc123 --previous
4

Exec in if the container is running

kubectl exec -it myapp-abc123 -- sh

# Inside: check environment, DNS, connectivity
env | grep -i db
nslookup postgres-svc
wget -qO- http://localhost:3000/healthz
5

Run a debug pod if exec isn't possible

For distroless or minimal images where sh doesn’t exist:
# Ephemeral debug container (Kubernetes ≥ 1.23)
kubectl debug -it myapp-abc123 \
  --image=busybox:latest \
  --target=myapp

# Or spin up a one-off pod with network access in the same namespace
kubectl run debug --rm -it \
  --image=nicolaka/netshoot \
  --restart=Never \
  -- bash
6

Check node-level issues

kubectl describe node $(kubectl get pod myapp-abc123 -o jsonpath='{.spec.nodeName}')

# Check if the node is under memory/CPU pressure
kubectl top node
kubectl top pod myapp-abc123 --containers

ConfigMaps and Secrets

ConfigMaps store non-sensitive configuration. They can be consumed as environment variables or mounted as files.
# configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: myapp-config
  namespace: myapp
data:
  APP_ENV: production
  LOG_LEVEL: info
  config.yaml: |
    server:
      port: 3000
      timeout: 30s
    feature_flags:
      new_ui: true
# Consume as env vars in a Deployment
spec:
  containers:
    - name: myapp
      image: myapp:1.0.0
      envFrom:
        - configMapRef:
            name: myapp-config          # all keys become env vars

      # Or select individual keys
      env:
        - name: LOG_LEVEL
          valueFrom:
            configMapKeyRef:
              name: myapp-config
              key: LOG_LEVEL

      # Mount as a file
      volumeMounts:
        - name: config-vol
          mountPath: /app/config
          readOnly: true
  volumes:
    - name: config-vol
      configMap:
        name: myapp-config
        items:
          - key: config.yaml
            path: config.yaml
# Imperative creation (useful for quick tests)
kubectl create configmap myapp-config \
  --from-literal=APP_ENV=production \
  --from-file=config.yaml=./config.yaml

Resource Limits & Requests

Setting resource requests and limits is one of the most impactful things you can do for cluster stability. Without them, a noisy neighbour pod can starve everything else on the same node.
spec:
  containers:
    - name: myapp
      image: myapp:1.0.0
      resources:
        requests:          # guaranteed allocation — used for scheduling decisions
          cpu: "250m"      # 250 millicores = 0.25 vCPU
          memory: "256Mi"
        limits:            # hard cap — container is killed if it exceeds memory limit
          cpu: "1000m"     # 1 vCPU
          memory: "512Mi"
requests = what the scheduler reserves on the node. limits = the hard ceiling. Set requests accurately to get good bin-packing; set limits conservatively to prevent OOM kills cascading across pods.
ScenarioSymptomFix
Memory limit too lowPod shows OOMKilled in kubectl describeIncrease memory limit or fix a memory leak
CPU limit too lowPod is throttled — slow but not killedIncrease CPU limit or profile the hot path
No requests setPods scheduled on already-full nodesAlways set requests for production workloads
requests > limitsInvalid — Kubernetes rejects the manifestEnsure limits >= requests

LimitRange (Namespace Defaults)

Avoid the “forgot to set resources” footgun with a LimitRange:
apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: myapp
spec:
  limits:
    - type: Container
      default:          # applied if no limit is specified
        cpu: "500m"
        memory: "256Mi"
      defaultRequest:   # applied if no request is specified
        cpu: "100m"
        memory: "128Mi"
      max:
        cpu: "4"
        memory: "4Gi"

Readiness & Liveness Probes

Probes are the mechanism by which Kubernetes knows whether your pod is healthy and ready to receive traffic.
spec:
  containers:
    - name: myapp
      image: myapp:1.0.0
      ports:
        - containerPort: 3000

      # Readiness: pod receives traffic only when this passes
      readinessProbe:
        httpGet:
          path: /healthz/ready
          port: 3000
        initialDelaySeconds: 5    # wait before first probe
        periodSeconds: 10         # probe every 10 seconds
        failureThreshold: 3       # 3 consecutive failures → not ready

      # Liveness: pod is restarted when this fails repeatedly
      livenessProbe:
        httpGet:
          path: /healthz/live
          port: 3000
        initialDelaySeconds: 15   # give the app time to start
        periodSeconds: 20
        failureThreshold: 3

      # Startup: disables liveness until the app has started (slow-start apps)
      startupProbe:
        httpGet:
          path: /healthz/live
          port: 3000
        failureThreshold: 30      # 30 × 10s = 5 minutes to start
        periodSeconds: 10
Use three separate endpoints/healthz/ready, /healthz/live, and optionally /healthz/startup. The readiness endpoint should check downstream dependencies (DB connectivity, cache warmup). The liveness endpoint should check only internal process health — never external dependencies, or a dependency outage will cause a cascade of unnecessary pod restarts.

Rolling Updates

Kubernetes Deployments perform rolling updates by default. These fields control the rollout behaviour:
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  replicas: 4
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1           # at most 1 extra pod above desired count during rollout
      maxUnavailable: 0     # never go below desired count (zero-downtime)
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      # Graceful shutdown: give the pod time to finish in-flight requests
      terminationGracePeriodSeconds: 30
      containers:
        - name: myapp
          image: myapp:1.1.0
          lifecycle:
            preStop:
              exec:
                command: ["/bin/sh", "-c", "sleep 5"]  # drain before SIGTERM
# Trigger a rollout by updating the image
kubectl set image deployment/myapp myapp=myapp:1.2.0

# Watch the rollout progress
kubectl rollout status deployment/myapp --timeout=5m

# Pause a rollout mid-way (canary-style)
kubectl rollout pause deployment/myapp

# Resume
kubectl rollout resume deployment/myapp

# Undo the last rollout
kubectl rollout undo deployment/myapp

# Undo to a specific revision
kubectl rollout history deployment/myapp
kubectl rollout undo deployment/myapp --to-revision=2

Useful One-Liners

# Get all pods that are NOT running
kubectl get pods -A --field-selector='status.phase!=Running'

# Force-delete a stuck terminating pod
kubectl delete pod myapp-abc123 --force --grace-period=0

# Scale a deployment
kubectl scale deployment myapp --replicas=6

# Restart all pods in a deployment (zero-downtime rolling restart)
kubectl rollout restart deployment/myapp

# Copy a file from a pod to localhost
kubectl cp myapp-abc123:/app/logs/app.log ./app.log

# Get resource usage for all pods, sorted by memory
kubectl top pod -A --sort-by=memory

# List all images running in the cluster
kubectl get pods -A \
  -o jsonpath='{range .items[*]}{.spec.containers[*].image}{"\n"}{end}' \
  | sort -u

# Find pods with a specific label
kubectl get pods -l app=myapp,env=production

# Add a label to a pod (temporary — use manifests for permanent changes)
kubectl label pod myapp-abc123 debug=true

# Taint a node to prevent new scheduling
kubectl taint nodes worker-3 maintenance=true:NoSchedule

# Remove the taint
kubectl taint nodes worker-3 maintenance=true:NoSchedule-

Docker Essentials

Build the container images that Kubernetes runs — Dockerfiles, multi-stage builds, and Compose.

GitLab CI/CD

Automate kubectl apply and Helm deployments from a GitLab pipeline.

Cloud & Terraform

Provision EKS, GKE, or AKS clusters and the supporting infrastructure with Terraform.

Linux Troubleshooting

Many pod-level issues trace back to OS-level networking, DNS, or filesystem problems.
Last modified on June 9, 2026