There are tasks that need to take place within your cluster on an ad hoc basis. These tasks tend to have short lifetimes, an example might be “updating the schema in a database”.
Kubernetes offers two pathways to achieve this goal, both very similar, but differ in very subtle ways.
Jobs, and init containers.
Both of these pathways execute the same tasks, and, as will be seen further down, they both have similar definitions in manifests.
Jobs
A Kubernetes Job will execute a task as soon as it is loaded into the cluster.
This is the manifest I have for a container that I have defined elsewhere to have a certain set of commands be applied to it (basically, I have a container with Goose and netcat in it, an image of which is sent to my container repository, and this job ‘runs’ it with the commands).
The key things to note are that it’s a job, it has a name, a container image that it will interact with, some environment variables to be set on that container, and a set of commands that will be run on that container.
apiVersion: batch/v1
kind: Job
metadata:
name: accounts-schema-init
namespace: "$NAMESPACE"
spec:
template:
metadata:
name: accounts-schema
spec:
containers:
- name: accounts-schema
image: "onepage/accounts-schema:$TAG"
imagePullPolicy: "$IMAGEPULLPOLICY"
env:
- name: PGUSER
valueFrom:
secretKeyRef:
name: account-secret
key: username
- name: PGPASSWORD
valueFrom:
secretKeyRef:
name: account-secret
key: password
- name: PGDBNAME
value: "$DBNAME"
- name: PGHOST
value: "$HOST"
command:
- /bin/sh
- -c
- |
while ! nc -z "$PGHOST" "5432"; do
echo "Waiting for $PGHOST:5432 to become available..."
sleep 5
done
goose -v -dir=migrations postgres "user=$PGUSER dbname=$PGDBNAME password=$PGPASSWORD host=$PGHOST sslmode=disable" up;
restartPolicy: Never
Init containers
A bit of background first. A Pod will have one or more containers in it that will run. However before those containers are loaded and run on that pod an init container can be run on that pod (and will exit before the main container(s) will run).
This means that when the deployment of the pod takes place then the init container will run in each pod before the main container(s) in the pod do, and we can run something very similar to the above job on the init container.
Note the manifest below, the key things are: It’s a deployment, with a name, a container image that it will work with (for the init container) as well as the main container, and a sidecar container (I know that it muddies things a bit to have it in the manifest, but it demonstrates that the containers in the pod each have different purposes).
The initContainer specification is almost identical to the job above.
apiVersion: apps/v1
kind: Deployment
metadata:
namespace: "$NAMESPACE"
name: accountserver
labels:
app: account
spec: # this is the specification of the deployment
replicas: $REPLICAS
selector:
matchLabels:
app: account
template:
metadata:
annotations: # inject linkerd into this pod.
linkerd.io/inject: enabled
labels:
app: account
spec: # this is the specification of the pods.
initContainers: # container that will run when this pod is initialised.
- name: accounts-schema
image: "onepage/accounts-schema:$TAG"
imagePullPolicy: "$IMAGEPULLPOLICY"
env:
- name: PGUSER
valueFrom:
secretKeyRef:
name: account-secret
key: username
- name: PGPASSWORD
valueFrom:
secretKeyRef:
name: account-secret
key: password
- name: PGDBNAME
value: "$DBNAME"
- name: PGHOST
value: "$HOST"
command:
- /bin/sh
- -c
- |
while ! nc -z "$PGHOST" "5432"; do
echo "Waiting for $PGHOST:5432 to become available..."
sleep 5
done
goose -v -dir=migrations postgres "user=$PGUSER dbname=$PGDBNAME password=$PGPASSWORD host=$PGHOST sslmode=disable" up;
containers:
- name: account
image: onepage/accounts
imagePullPolicy: "$IMAGEPULLPOLICY"
ports:
- containerPort: $TARGETPORT
env:
- name: PORT_NUM
value: "$TARGETPORT"
When to use which.
If they’re doing the same thing, why use one or the other? Or, When should one approach be used over the other?
In this case, migrating the schema of a database, we would use the initContainers
because it means that the coupling of the schema to a version of the application container is explicit, and, that when a new version of a container is rolled out that the schema will be in a proscribed state before that new container will begin its work.
However, there are times when out of band updates are required (eg. “Oh god, I rolled out a change to the cluster that needs to be undone without rolling out a new deployment” - nobody has ever encountered that… have they :). At that point a Job
is appropriate, pushing it into the cluster has an immediate effect.