Video Thumbnail for Lesson
4.13: PersistentVolume & PersistentVolumeClaim

PersistentVolume & PersistentVolumeClaim

Transcript:

The next topic to cover is how we can store data such that it will persist across container restarts. Containers are great by themselves. They represent this ephemeral file system that every time we create a new container, we're going to get a fresh new system. If we want data to persist across those restarts, we need to store those data in what is known as a persistent volume.

There are many different implementations of persistent volumes. When you think of cloud providers, these are often going to be the block storage devices that they offer. In Amazon, it would be EBS. In GCP, it's going to be a Compute Engine persistent disk.

You can also use things like network file shares as a persistent volume. But essentially, the persistent volume and persistent volume claims are Kubernetes' interface for creating, managing, and consuming storage that is going to outlive any particular pod.

One of the most important attributes of the storage are the access modes. One of the most common ones that you'll see are read-write-once. That is saying that you can mount this storage device with a single pod in a read-write mode. If it's already mounted to a pod and you try to mount it from another, that second one would fail. This is how most of those block storage devices are going to behave.

There's a little nuance here in that read-write-once actually allows multiple pods to mount it in a read-write fashion if they are on the same node, because it's associated with the node the container is running on. And so there's a new version of this that is read-write-once-pod, which actually limits it to a single pod, regardless of whether or not the other pods are on the same node.

There's also read-only-many, and this allows multiple pods to mount it, but only in a read-only fashion. You would use this if you needed multiple pods to pull data from a single source, but the underlying storage didn't have a way to de-conflict multiple writes at the same time.

And then read-write-many allows multiple pods across multiple nodes to mount this volume and read and write to it simultaneously. These underlying implementations need to support handling those multiple writes in a consistent fashion.

These will often be things like NFS systems, which are already designed to handle this type of use case but generally have limitations in other areas, such as the speed at which you can access data.

Another very important configuration spec here is the reclaim policy for the storage class. This determines what should happen to the underlying persistent volume when the persistent volume claim is deleted.

In the retain case, the persistent volume—and therefore the disk it provisioned, or the volume it provisioned in the cloud—would remain. Versus if it's set to delete, when you delete that persistent volume claim, the underlying persistent volume will also go away. Retain can be safer, but it may leave some unattached persistent volumes that have no consumer in your cluster, which you'll have to clean up manually if the data are no longer needed.

You can provision persistent volumes and persistent volume claims directly, or within a stateful set, you can provide a template that will provision them dynamically.

In the specifications here on the slide, I've shown a storage class. These storage classes are what map the persistent volume to an underlying implementation. I'm showing here the one for the Civo implementation of a persistent volume that maps to their volume object. And then I'm showing a persistent volume claim, which uses that storage class to provision a persistent volume behind the scenes.

There you can see the access mode as well as the size of the volume that will be provisioned.

In our diagram, you can see that a pod is going to specify one of two things: either a volume claim template, in the case of a stateful set, that will enable dynamic provisioning of the persistent volume, or in other cases, a persistent volume claim, which is provisioned directly.

That persistent volume claim is going to consume a persistent volume. And so persistent volume maps to the underlying storage, and a persistent volume claim is the tie between that storage and the pods consuming it.

The pod consuming it is either going to specify a storage class, which can dynamically provision the underlying storage, or, for example, if you're operating in an on-premises environment, you could pre-provision the underlying storage and then define a persistent volume that manually maps to that persistent storage. This is a case where the underlying implementation across clusters is going to be different, and so I'm going to demonstrate this on all three of our different clusters: the kind cluster, the Civo cluster, and the GKE cluster, so we can get a feel for how these behave.

The first thing that we're going to do is manually provision a persistent volume and a PVC, as well as a pod to consume it, using our kind cluster. So I'll switch to our kind cluster. I'll then create the namespace. And now let's look at the definition of those.

In the kind subdirectory, I've got this persistent volume definition. I'm naming it manual kind, specifying how large it should be, the access mode, and which storage class. I can look at the storage classes in my cluster by doing k get storageclasses. As you can see, I have one storage class that was created by default by my kind cluster using this rancher local path provisioner, so that's going to be a local path within my host system.

For this type of storage class, I need to specify a path. And so here, this could be anything, but I'm putting this some path and container as an example. I'm specifying that it needs to be on a specific worker node in my cluster.

Now my persistent volume claim is going to find that persistent volume using the match labels selector. So as we saw, I have this label here on my persistent volume, and then the persistent volume claim uses that same label to find it. I'm saying that I want to use my entire 100 megabytes that I provisioned, again specifying my access mode and the storage class name.

Finally, I have a pod here that I'm going to create, which is going to use the persistent volume claim as a volume and mount it in at a path within my container.

So let me go ahead and deploy all that with T02. As you can see, we created our persistent volume, we created our persistent volume claim, and our pod to consume it. We can look at our persistent volume. We can see that it has been created and is in a bound status because we're consuming it from that pod. We can look at our PVC, and you can see that it maps to our manual volume. And then finally, our pod, and it's running.

If we do -o wide, we see it's running on the specific worker that I mentioned. That is where we provisioned that manual persistent volume. Let's exec into that container and navigate to the mount path for the volume. If I cat that file, hello from host, we see hi from number one.

When we set up our kind cluster, that is one of the things I did within the kind config, which was put a host path. This is an absolute path on my host file system, and it was mounted into the container path at the same path I used in my persistent volume.

Because of this, the file that exists on my host system is now showing up inside this. I can now create a new file here, go up here to my directory on my host system, and I see that file was just created.

So this is just showcasing how that host path volume connected between my host system and the container. If you're working with a Kubernetes cluster on-premises, you may be using something like this, where you're creating volumes that map to paths on your physical host.

So that shows the manual creation of persistent volumes and persistent volume claims. Let's now look at how you can dynamically provision these.

In this case, we don't actually need a persistent volume. We can specify a persistent volume claim directly and use a storage class name such that Kubernetes can go ahead and provision that for us. You'll notice that we don't have a selector here because there is no persistent volume already in existence with a set of labels we want to match to.

I'm also defining a deployment here, which is going to consume that persistent volume claim. So here under volumes, I have my persistent volume claim and the name that I gave it, and again mounting. Because this volume type is read-write-once, these pods can only run simultaneously on the same node. If they were scheduled onto separate nodes, one of them wouldn't be able to mount the persistent volume.

Let me go ahead and deploy that. You can see they were both scheduled onto the same worker, which they had to be in order to be able to mount that same persistent volume.

This is an interesting thing about how deployments versus stateful sets work with persistent volumes. In the case of a deployment, if you specify a persistent volume claim, all of the replicas are going to mount the same one. In the case of a stateful set, they will each get an independent persistent volume claim.

Now like I said, stateful sets consume these slightly differently. Let's go ahead and look at how a stateful set would define its persistent volumes. So now rather than needing to create a persistent volume claim separate from the deployment, instead we have a section of our specification called a volume claim template. Here you give it a name and you give it a specification, and this will be used for each replica. It will go off and provision its own persistent volume claim using the information provided.

In this case, I'll have two replicas and I'm creating my headless service just because every stateful set is supposed to have one. With the deployment, both of those replicas came up right away. With the stateful set, they come up in series. Okay, the first one is running, the second one is running, and if we look at the persistent volume claims now, you can see for the for the dynamic pvcs from the deployment, there's only one which is shared across the two pods.

For the stateful set, each one gets its own individual pvc. Each of these will have a corresponding persistent volume, and they're all bound to the four pods that are currently running. Hopefully that gives you an idea of the different paths for which you can provision these types of things. You can manually specify them, you can dynamically specify them, and there's a different behavior between those dynamically specified deployments and stateful sets.

Let's also go ahead and deploy a version of the stateful set onto Civo and onto GKE. Okay, so let's go ahead and deploy a version of the stateful set onto Civo and onto GKE. To do this, I'll switch to my Civo cluster, and I'll deploy T05. If I navigate to the Civo subdirectory, you can see I've got my stateful set defined here. It looks identical to the one I deployed into my kind cluster, with the only difference being the storage that I'm going to be using.

If I get the storage classes within this cluster, you can see it has that Civo volume that was provisioned by default by Civo when the cluster was created. It's using this Civo CSI provisioner. Remember we talked about the container storage interface in the previous lesson? This is where that comes into play, where a different driver can be used to interact with the different underlying implementations of that storage provider. If I look at the pods now, we see it's in a container creating state. We can see the underlying PVC here. Now if I go to the UI, I can see that volume created within the Civo cloud outside of my cluster. Let's see if it has fully spun up with the pod.

It's still in container creating state. Let's describe it. It's saying unavailable. Volume's not available. Presumably that will resolve over time. Let's give it a little bit of time. Let me check again. Okay, so now my first pod is running and my second pod is creating. That should have created the PVC behind the scenes, and so let me refresh here and see if I can see if I can see that. There it is. It should be spinning up, and so because these behind the scenes are going off and provisioning another resource within the cloud provider, it can take a little longer.

On my kind cluster, I was just mapping to a path on the host file system, and so it was very, very quick. In these cases, I need to make an API call to the Civo cloud, the volume needs to provision, it needs to be attached to the cluster, etc. So there can be a little bit more of a delay when provisioning a resource. So let's go ahead and do that. So now I'm provisioning these cloud resources. However, they do enable you to scale much more because you're no longer tied to the specific storage devices on your system.

You're provisioning and attaching as much additional storage as you want via these volumes. All right, now both of our pods have come up. They dynamically provision the underlying PVCs. Let's do the same on Google. It's going to look almost identical, but just showcasing how it would work. I'm going to delete this staple set. I'm going to go ahead and clean those up. And we can move over to the GKE cluster. We'll create our namespace. If we look at the storage classes here, there are three different storage classes provided.

The standard, which uses the Kubernetes IO Google Compute Engine Persistent Disk one, that one is included in the Kubernetes upstream. So before the CSI existed, the implementation for these lived in the Kubernetes project themselves. And then afterwards, the individual CSI storage drivers were created. And so there's three classes. This one uses the older implementation. These two use the newer implementation. As you can see, the standard read-write-once is the default. And so in my storage class name here, I'm using that standard read-write-once option. Otherwise, it should be pretty much identical to my SIVO one. Let's go ahead and deploy it.

Okay, as you can see, the first pod is already up. So the volume creation on GKE was quite fast. It's creating that second one now. And if we navigate to the cloud console, and go to Compute Engine, here are the two nodes in our cluster. If we go to disks, these two are the disks attached to my two nodes, and these two are the disks that I just provisioned for my StatefulSet. Awesome. Let me go ahead and delete my StatefulSet. Those are now gone. There is a configuration on the persistent volume claim itself that defines what the behavior is when the underlying resource consuming it is deleted. Historically, this would just remain. Now you can specify both when deleted and when scaled, what will happen. And so the when scaled makes sense in the case where you have a StatefulSet, which may be scaled up or down.

Let's say you had five replicas, you scale to four, what should happen to that fifth persistent volume claim? And then the when deleted is if you delete the StatefulSet entirely. In this case, the default behavior is to retain. If you wanted them to be cleaned up automatically, you could specify it as delete. As you can see, I did not include this in my specification here. And so it defaulted to retain, and it stayed in place. So the reclaim policy on the PVC specifies what happens when we delete that PVC, should the underlying persistent volume and therefore the disk in the cloud provider remain, or should it be deleted as well? Whereas the persistent volume claim retention policy specifies what should happen when the consumer is scaled or deleted. Let me go ahead and delete these two PVCs to delete the underlying volumes.

Hopefully that gives you a sense of how to work with persistent volumes in the various different configurations and how you would use them to store and persist your data outside the lifecycle of any particular pod. As you host Stateful Applications on Kubernetes, understanding these behaviors is going to be very important to ensure that your data are safe.