How to K8s: Persistent and Ephemeral Volumes

Learn about the two major classes of data storage solutions available in Kubernetes in this post.

Kubernetes ephemeral and persistent volumes

Kubernetes, by default, is designed to be a high-availability system, or one that won’t break easily. This is possible, because user-created elements of the larger system – such as pods, replica sets and deployments – are designed such that they can break, or be deleted or become unhealthy in some way, and this won’t cause the system to go down.

Rather, the system maintains an awareness of its current state and continuously compares that state with the user-supplied state definition stored in etcd. If all is well, the actual state and the user-defined, desired state will be aligned. If they are somehow misaligned, the system will work to regain the user’s desired state.

A simple example of this behavior is Kubernetes' ability to create and delete replicas of a given pod on the fly, such that there are always the same number of replicas available as are called for in the defined desired state of the system.

In order to facilitate this high availability requirement, each user-defined API object must be self-contained so that it can be ephemeral. That is, they have to be able to be created or deleted as needed without relying on any information not contained in the user-supplied definition. This means that any data that lives in a given pod cannot be trusted to be accurate and complete, because at any point, it could have been torn down and replaced from scratch.

However, there are use cases in which you will want data to persist and be available to the various resources in your cluster – either for the duration of a given resource’s lifespan, like, that of a given pod, or indefinitely, so that regardless of the number of times a new pod is created, that same data will be available to each in turn.

For this, we’ll turn to Kubernetes volumes. Volumes in K8s are datastores that can be separated into two fundamental classes – persistent and ephemeral. A persistent datastore is designed to outlive a given pod that is reading from it. Ephemeral datastores, on the other hand, only exist for the duration that their associated pods live.

Ephemeral Volumes

Common Ephemeral Volume Types

  • emptyDir: an empty directory that is mounted at a user-specified path at pod creation.
  • configMap: provides a means of injecting configuration data into a given pod at runtime.

Use Cases for Ephemeral Volumes

  • Caching
  • Sharing non-essential data between containers in a pod
  • Injecting config data into a given pod

Example YAML

The emptyDir volume type can be created within a pod’s definition, as shown below.

apiVersion: v1
kind: Pod
metadata:
name: myvolumes-pod
spec:
containers:
- image: alpine:latest
imagePullPolicy: Never
name: myvolumes-container

command: ['sh', '-c', 'echo Container is Running ; sleep 3600']

volumeMounts:
- mountPath: /tmp
name: example-volume
volumes:
- name: example-volume
emptyDir: {}

Here, we are spinning up a single pod with an Alpine linux container image. We pass it a command at startup that will keep the container alive for an hour, so we can inspect our work. Without the sleep command, the container would immediately complete the only process it was given – i.e. starting – and it would exit when finished.

Persistent Volumes

NOTE: If you require a persistent volume in your Orka cluster, you’ll want to contact MacStadium support to request that one be created according to your specific needs. We have included an example PV definition below for illustration’s sake. However, when working with your actual Orka cluster, you will only need to create the persistent volume claim, and to mount that claim onto your targeted pod(s), as shown in the example YAML below.

Common Persistent Volume Types

  • NFS: Network filesystem; a NFS server in your cluster’s network.
  • Cloud Provider-Specific Volume Examples:
    undefinedundefined

Use Cases for Persistent Volumes

  • Databases
  • Storing essential data that must be available for the application to run

Example YAML

Unlike emptyDir volumes, persistent volumes require additional api objects to correctly associate themselves with a given, targeted pod. You’ll need to define the volume itself, and a persistent volume claim that will look for an existing volume with adequate resources, which will need to be associated with the your targeted pod.

---
kind: PersistentVolume
apiVersion: v1
metadata:
name: my-persistent-volume
labels:
type: nfs
spec:
storageClassName: pv-demo
capacity:
storage: 100Mi
accessModes:
- ReadWriteOnce
hostPath:
path: "/tmp"

---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: my-persistent-volumeclaim
spec:
storageClassName: pv-demo
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Mi

---
kind: Pod
apiVersion: v1
metadata:
name: myvolumes-pod
spec:
containers:
- image: alpine
imagePullPolicy: IfNotPresent
name: myvolumes-container

command: ['sh', '-c', 'echo Container 1 is Running ; sleep 3600']

volumeMounts:
- mountPath: "/my-pv-path"
name: my-persistent-volumeclaim-name

volumes:
- name: my-persistent-volumeclaim-name
persistentVolumeClaim:
claimName: my-persistent-volumeclaim

Above, we have created three resources in order to associate a persistent volume with a given pod – the volume itself, the persistent volume claim that will “hook onto” the volume, and the pod on which we defined our persistent volume claim, much like you would with an ephemeral volume.

TL;DR

Kubernetes, by default, will not reliably store data that is collected or generated during an application’s runtime, because specific API objects, such as pods, need to be ephemeral – that is, they need to be able to be torn down and recreated at any time in order to facilitate k8s' high availability. However, there are multiple means by which you can store such data, and you can choose among them according to the specific behavior your application requires.

Specifically, there are two classes of volumes we’ve discussed above that you can use to store data – ephemeral volumes, or those that live for only as long as the pod that is reading from them, and persistent volumes, or those that live on indefinitely, and that can thus be read from by any replica of a given pod that is spun up over the course of your application’s runtime.