Video Thumbnail for Lesson
9.1: CloudNativePG

CloudNativePG

Transcript:

Now that we've learned about the idea of extending the Kubernetes API and how to build systems on top of Kubernetes, let's take a few examples of that type of project and deploy them into the cluster. These are going to be applications that enhance our usage of the cluster in one way or another. There are many companies and open-source projects that are building this type of tooling, and so you'll want to take a look across the landscape to understand if there are tools that would make your application platform on top of Kubernetes even better. If we think about how we would operate a database for an application that we're running in Kubernetes, there's four main options. The first of which is to keep the database outside of the cluster entirely. You could deploy it on your own, or you could use a database as a service. So if you're operating in AWS, that could be an RDS. If you're in Google Cloud, that could be Cloud SQL. If your cloud provider offers a database like this, it could be a great way to go to shift some of that operational overhead of managing a database and backups and ensuring everything is working and testing those things onto the cloud provider. And so that can be a great way to reduce the amount of time and effort that you and your team need to focus on the database itself. However, if you do plan on hosting it within the cluster, there's three primary options. You could write your own stateful set. You could use a helm chart like we did previously, where someone else has written a stateful set for you that you just configure. Or there's also this project, CloudNativePG, that has built an operator that you deploy on Kubernetes, and then you can declaratively manage your Postgres clusters via Kubernetes custom resources. What does this actually look like? You will deploy the operator into your clusters. This green box is the operator pod. Along with it are a number of custom resources. For example, there's a cluster resource up here at the top that you can see you define how many instances you want, how much storage you want on the back end, along with a number of other configuration options. When you create that custom cluster resource, the operator will see that, and based on the configuration provided, it will create one or more pods, a primary pod that is read and write, and one or more secondary or replica pods that will be read-only within your cluster. It also has custom resources for backups, where you can specify that you want to back your cluster up to an object store of your choosing. So on Google Cloud, that's going to be Google Cloud Storage. Sivo Cloud has an object store as well. Amazon has S3. And so you can define your backups or a scheduled backup for your database in a declarative fashion. Let's go ahead and jump over to our code editor and deploy Cloud NativePG and see how we can set up this type of thing. I have a number of tasks defined here within my task file for the module 9 subdirectory. The first thing that I'm going to do is create a namespace. That is the 09-scnpg namespace. Next up, I'm going to install the operator. To install the operator, I'm using a Helm chart. So I use Helm repo add to add the repo to my Helm configuration. And then I call the Helm upgrade install command, giving it a name for the release, putting it in the Cloud NativePG system namespace, creating that namespace, and specifying which chart to use. I'm using all the default values here. If you do need to do some customization, take a look at the Helm chart values to figure out what you might need to change. It then outputs a sample cluster that we can use. If we look at the pods in the Cloud NativePG system namespace, we have one pod, and it's up and running. Great. We can then also look at the custom resources that this has deployed into our cluster by doing k API resources, and then grepping Postgres. You can see we have a cluster resource, a backup resource, a scheduled backup resource, as well as image catalogs and cluster image catalogs. I believe those are how you define the container images you want to use for each particular version of a cluster you want to deploy. And then this poolers resource, which I believe uses pgBouncer to set up connection pooling for your cluster. With that installed, we can now deploy a cluster custom resource. I'm using the absolute minimum configuration here. The goal of this section is not to teach you how to deploy the optimal Postgres database with Cloud NativePG. It's to show you how you can use Cloud NativePG and interact with these custom resources. So if you're using this, you should look into the options that the cluster CRD provides. And I believe Cloud NativePG even has a helm chart with some of the best practices for deploying a cluster encoded into it. So that could be a good place to start. Let's go ahead and deploy this minimal cluster into our Kubernetes cluster. If we now look at the cluster, we can see that it's setting up the primary. If we get the pods in our namespace, there's this initdb job, which is executed first, before it spins up our primary replica. Looks like that pod associated with the job is completed. And now, here's our primary replica for the database coming up. We should see, if we do kget pvc, we can see that it has created two pvcs. The first one for our primary replica is bound to the pod that we just saw. The second one, because that second pod hasn't come up, is still in a pending status. Our primary replica is now running. And we have a join job, which is going to allow that additional read replica to join our cluster. That is now done. And now, our read replica is coming up. We can see the two instances, but only one of them are ready so far. And now, both of those replicas are healthy, and our cluster is in a healthy state. We can also look at the services that it stood up and see that we have a read-write endpoint that's going to target our primary replica. And then we also have a read-only endpoint that's going to target our read replica. Interestingly, if we do kget statefulSets, there are none. So the Cloud Native PG project has decided to skip statefulSets entirely and manage its pods directly. Presumably, this was due to some of the limitations with how statefulSets work within Kubernetes. And they found that those limitations were easier to overcome by directly controlling the pods themselves. Specifically, most fields within a statefulSet, including things like the size of the volumes that it's provisioning, are immutable, even though volumes can be dynamically expanded now in Kubernetes. They have chosen as a project to skip statefulSets and manage their pods directly. So just an interesting call out there. Now let's take a look at what we need to do to set up backups for our Postgres cluster. Here is a configuration for a cluster that looks quite similar to before. We're going to have two instances, a very small volume size, but I've added two things here. One, I've added this backup section along with a Google Cloud Storage bucket that I'm about to create. I'm specifying that it can use the Google credentials that it finds from my GKE environment and telling it to retain backups for 30 days. Next, I'm setting up a service account template. And this is necessary to use a feature within GKE called Workload Identity that will allow me to link together a Kubernetes service account with a Google Identity and Access Managed service account such that my Kubernetes service account can leverage the same permissions and roles that the IAM service account has. And this will allow me to access this bucket to store my backups without needing to store any sort of static credential into a secret in the cluster. The way that that works is you put an annotation on the Kubernetes service account that looks like this. This will be my IAM service account name. This is the GCP project that it is associated with. Let's go ahead and create those resources. So first we can create the bucket. Here I'm running gcloud storage buckets create and passing it that name. We can then go over to the Google Cloud console, click refresh, and we see this bucket was just created with nothing in it. Now let's add the necessary permissions on the Google Cloud side to enable us to store objects in that bucket. This is going to do a few things. First, we're going to create a service account named cnpg-backups. Then we're going to attach two roles to it, the storage object admin role as well as the legacy bucket reader role. And those are specifically for the bucket that I just created. So I'm using the gsutil command line tool to add those two roles associated with the bucket that I just created. It's important to call out here that this bucket name needs to be globally unique, so you cannot share the same bucket name as me. If you're following along, you'll need to modify the name of this bucket, maybe add a postfix, maybe put your own name in there somewhere, and that will allow you to have your own globally unique Google Cloud Storage bucket. With those two roles associated with the bucket, I can then add this role, the workload identity user, to the IAM service account. And this is what allows the Kubernetes identity to assume the IAM identity. And specifically, you need to give it this member where I'm saving the service account within this Google Cloud project, specifically in the namespace that I'm working in, and this service account name is allowed to use that IAM service account. Looks like that was successful. We can see all this set up on the Google Cloud side. If I click into permissions and scroll down, here is the service account that I created, and here are the two roles that I attached to it for this bucket. If we go over to the IAM page and click under service accounts, we can see the service account created here. If I click into it, go to permissions, here is that workload identity user, which references our namespace and the name of the service account that we're going to create in the Kubernetes cluster. And then the final piece that we needed to make this happen that I set up quite a while ago in module 3 when we created our cluster, was we used this workload pool option, which passes our Google project ID, dot service ID Google, and that enables the GKE cluster to utilize identities in this workload pool. That can be seen in the cluster configuration here where workload identity is enabled and the workload identity namespace is the one associated with this GCP project. With that all set up, we can now apply this cluster config with the backup, and this did three things. One, it applied this file, and earlier I didn't call out what this stood for. Barman is the backup and recovery manager. You'll see Barman a few times here in the backup configurations. That's what Barman stands for. So I applied this cluster with backup configuration. I then applied a scheduled backup file. So this is essentially a cron job that's going to use the backup configuration in my cluster on this schedule, so it would run at midnight UTC daily. And then just so we'd have something to look at and didn't have to wait until midnight, I also added a backup resource, which will run right now in this namespace, pointing it at my cluster. So first let's do kget clusters. Looks like this cluster is still coming up. Let's do kget backups. It's in a running phase. Our cluster is now healthy. And our backup is completed. Let's go look in the bucket and see what it has created. So we click into our bucket. This is the name of our Postgres cluster. We've got both our base data backup with a timestamp, and then any information that was still in the write-ahead log that hadn't been fully synced into the primary database storage would be here in the walls subdirectory. Great, so just like that, we were able to deploy cloud-native-pg-operator into our cluster, deploy a couple of Postgres clusters, set up backups using workload identity to store those data into a Google Cloud Storage bucket, and utilize the effort and the testing that has gone into that cloud-native-pg project without needing to define our own stateful set or rely on that Helm chart for this logic. It's also possible to set up backups in a cloud like cvostore. You can use any S3-compatible object store as the destination for these backups. In this case, you do need to give it this endpoint URL because it is not using the default AWS S3 endpoints, and there's no workload identity feature on Civo, so you would, as you create your bucket, you would create a set of credentials with an access key and a secret access key and store those within a Kubernetes secret. But other than that, it would work in the same way.