What is a stateful application? Any application that stores data to keep track of its state. A database, for example.
Stateless applications, on the other hand, do not store their state. Each new request they deal with is completely independent of any and all other requests that it has dealt with.
An example would be business logic that passes requests on to a storage layer that CRUDs data depending on existing state in the datastore.
The two different layers, assuming different applications/microservices working together, get deployed differently.
Stateless applications are deployed using the Deployment
component.
Stateful applications, on the other hand, are deployed using the `StatefulSet component.
Both of these are managing pods that are based on container/template specifications.
Storage can be configured for both in the same way.
So, why do we use different components?
Horizontal scaling - creating replicas of Deployments is far easier, there’s no need to be concerned about managing the CAP principle with the data.
Scaling up and down is a lot easier.
Replicas in a stateful app has its own Pod Identity
can that is sticky.
That Identity is maintained such that if the pod is replaced, the identity is reestablished, versus the deployment where the identity of the pod dies when that pod dies.
Why is the identity important?
When a single pod exists for a stateful application, then that pod will be used for read and write operations, but when you add a second instance, then you need to determine which one has the authoritative copy. Thus, one container becomes the leader, and handles all writes for that part of the application.
All of the pods in the stateful application will be using different physical storage, replicating the data from the leader.
When a new stateful application joins the cluster, it firsts clones the data held by the previous pod, that is, when “mongo-3” joins the cluster, it clones the data held by “mongo-2”, before listening for continuous updates from the leader.
Data will survive even if all the pods in the stateful application dies if persistent volumes are used (in theory volatile storage could be used for stateful applications, but that runs the risk of a complete data loss if the pods all die.
This is because the persistent volume lifecycle isn’t connected to any other component’s lifecycle.
When a pod dies, it’s persistent volume is reattached to the pod that replaces it, by recycling the identifier of the pod being replaced.
Note: Remote storage is the key to all of this, because when a pod is recreated, there’s no guarantee that it will be placed on the same node. Therefore the storage needs to be accessible to all nodes in the cluster.
One last point about Pod Identifiers.
A Deployment gets assigned a name-
married to a random hash, eg. mongo-c9792345e8763
. A stateful set gets fixed, ordered names, eg mongo-0
(these are zero indexed).
The stateful set will not create the next pod if the previous pod is not up and running.
Deletion is similar. The deletion starts with the last pod creted, and then the next one will be removed. The removal of one depends on the previous one no longer existing.
Note that a pod in the stateful application set has two DNS names.
A loadbalancer service awarded name (the same as the deployment pods get one) and an individual service name.
The individual service name is composed of the pod name and the governing service domain.
eg.
mongo-0.svc
When a pod restarts the IP address may change, but the name and endpoint stays the same.
When replicating stateful applications, it’s up to the developers/administrators to
- Configure the cloning and data synchronisation
- Make the remote storage available
- Manage and back the data up.