TL;DR - We open-sourced https://github.com/benchsci/rules_kustomize

Building a SaaS product such as our AI-Assisted Reagent Selection application presents interesting operational challenges, and over the past couple of years at BenchSci, we wrestled with solutions for configuration management and reproducible builds. Not only are there lots of choices when it comes to tool selection in each area, but there are also organization-specific considerations of how to integrate with the existing ecosystem of tools, processes, and culture.

In a Google Cloud Blog post, we previously described how BenchSci leverages a number of Google Cloud Platform (GCP) services to build and serve our AI-Assisted Reagent Selection application, used to exponentially increase the speed and quality of life-saving research. Of those services and tools mentioned, Kubernetes (K8s being the popular numeronym) plays a central role in how we operate on the cloud. All BenchSci engineering teams deploy both customer-facing and internal applications on Google Kubernetes Engine (GKE). Internal data and ML teams often rely on GKE to manage parts of their pipelines. The infrastructure team manages deployment and network configurations within GKE and also all of GCP at large.

Given that multiple teams use this Kubernetes-native ecosystem, how do we manage our configuration data effectively? How do we ensure our CI/CD pipelines don’t deviate from one common set of dependencies and builds/deployments are always reproducible?

As with any decision, it’s helpful to have overarching guiding principles to help narrow down possible solutions. At BenchSci, we strive for simplicity and re-usability, such that improvements in one tool or process are most likely to introduce a multiplier effect in the ecosystem, instead of adopting/creating new ad-hoc tools/pipelines for every new problem that we face. The next few sections describe the application of this principle to:

  • How we do Cloud infrastructure-as-code without adopting a new set of tools
  • The challenge of configuration management where everything is Kubernetes-first
  • The challenge of having reproducible builds across the board
  • How the solution to these challenges intersect and led us to rules_kustomize

What about infrastructure-as-code? Introducing Config Connector

Our Core Infrastructure team wanted a way to manage GCP resources without adopting yet another tool with a new set of idiosyncrasies (leading by example philosophy of unified tooling). So we chose Kubernetes Config Connector (KCC) to manage GCP resources -- including CloudSQL, Load-Balancers, IAM, BigQuery, CloudBuild, and (in the turtles-all-the-way-down fashion) other GKE clusters. Unlike other tools with their own domain-specific language (DSL) that require not only a learning curve but more special care to maintain, such as Terraform, KCC is a Kubernetes add-on (using custom resource definitions) that builds on the same system and syntax. 

This unifies our tooling, pipelines, and configuration language used for cloud infrastructure with those of our container environments. It also reduces context switching and allows the same skills and experience gained in development or debugging to be re-used in both areas.

Additionally, our cloud resources now have the same eventual consistency control loop as native Kubernetes resources; for example, any drift in CloudSQL configurations will self-correct, providing more resilience against system and human errors. This is a feature that the more obvious GCP Deployment Manager tool lacks. The question is how to re-use and compose both native and KCC resource configurations and integrate our cloud resource configuration with our build tool.

What about configuration “templating”? Introducing kustomize

Unified tooling for GKE and GCP configuration is great and reduces overhead. However, it places greater emphasis on the importance of keeping Kubernetes configuration data well organized. How to do that well for both Kubernetes-native and KCC namespaces adds another consideration. To generate modular and re-usable configuration manifests fed to the Kubernetes clusters, we use kustomize for template-free configuration management. The paradigm of kustomize revolves around defining application-namespaced “bases” to logically group configurations that can then vary across environments by stacking patches in “overlays.” 

We preferred kustomize over other templating tools (e.g., Helm) because it’s natively supported by `kubectl,` it has a supportive Kubernetes-centric community, and it helps us avoid a world where templating input becomes unintelligible as the project scales in any number of dimensions. The documentation section on avoiding unstructured edits provides further explanations and examples of this common pitfall.

Since kustomize is inspired by object-oriented programming (OOP)’s inheritance model, where bases are essentially parent classes, infrastructure engineers get to think about Kubernetes namespaces in a familiar OOP mindset. 

For us, a typical directory structure looks like this:

gke/base/<NAMESPACE>/
    ├── kustomization.yaml
    ├── <NAMESPACE>.namespace.yaml
    ├── ...
    └── resourceN.<TYPE>.yaml
gke/overlays/<ENVIRONMENT>/<NAMESPACE>/
    ├── kustomization.yaml
    ├── ...
    ├── patchN.<TYPE>.yaml
    └── resourceN.<TYPE>.yaml

 

NAMESPACE can be either the GKE application name like “hello-world” or something like “_kcc_hello-world,” which defines the corresponding GCP resources needed to support serving the application. This kind of organization clarifies exactly what infrastructure pieces each application needs (and doesn’t need when it comes time for garbage collection).

What about the build tool? Introducing Bazel 

It’s desirable to have reproducible, hermetic builds for the BenchSci engineering codebase. It’s especially important when working with a monorepo because of the many contributors and applications it has to support. This must satisfy requirements for both development teams writing code and infrastructure teams writing configuration data.

Bazel, an open-source version of Google’s internal Blaze, is our build tool of choice. It can be used for defining module dependencies, building docker images, defining binaries used for testing and running command toolchains, and much more. Its extensibility is only limited by the availability of third-party modules typically named as “rules_foo”; one example of this is rules_docker, which defines rules for building and handling docker images with Bazel. 

Using Bazel, we can easily pin dependency versions to be used both locally and in continuous integration (CI) pipelines. It forces teams to understand and be explicit about build and test dependencies and develop directory structures that make sense. Despite a steep learning curve (its language is a subset of Python called Starlark), Bazel’s extensible nature allows it to be a single “entry point” for understanding everything that various teams build.

So, where is the intersection? Introducing “rules_kustomize”

Kustomize is great on its own for Kubernetes configuration management (both native and GCP via KCC), and Bazel is great on its own for creating reproducible build/test toolchains. They both solve problems in the domain for which they were designed. Thinking back to the guiding principle of simplicity and re-usability, and because both tools are core parts of our stack, it makes sense to link them up instead of having separate workflows/pipelines for each.

As mentioned, Bazel is limitlessly extensible with “rules_foo,” and so we’ve built “rules_kustomize.” In a nutshell, it’s a collection of Bazel macros for working with kustomize. These rules are intentionally lightweight and intended to be composable with other Kubernetes-related rules. It is used for compiling the full kubectl-ready YAML manifests for a chosen Kubernetes namespace. The “build” targets create the outputs of `kustomize build,` the “test” targets perform a golden test against the expected build output. Finally, the “run” targets feed the compiled manifest into kubectl to apply against the desired GKE cluster(s). 

Here is a concrete operational example tying it together: 

  • Previously, to generate the manifest for a namespace, one would run:
    kustomize build gke/overlays/production/hello-world
  • Now with Bazel: bazel build //gke/overlays/production/hello-world

Now, because resources both inside and outside of GKE are defined by the same configuration language with KCC, we can use the same kustomize base/overlay concepts for GCP infrastructure. This means for infrastructure resources inside of a GCP project, the same above commands hold but would look slightly different since they refer to the “KCC” bases:

  • Previously, to generate the manifest for a KCC namespace, one would run:
    kustomize build gke/overlays/production/_kcc_hello-world
  • Now we’d run: bazel build //gke/overlays/production/_kcc_hello-world

A major benefit this provides is that we can stop worrying about what versions of certain binaries (e.g., kustomize) are installed on a developer’s local machine and whether that matches the version used in CI or the version used by someone else. It’s all defined in the Bazel WORKSPACE file.

But what about actuation?

Everyone in the Kubernetes space knows that kubectl is the CLI tool to talk to the cluster and make things happen. This is defined as “actuation”—like how in the physical world, an actuator is a device that moves something else physically (e.g., motors or solenoids). The details of how to use `kubectl` to make calls to the cluster master depend so much on the individual organization’s needs and opinions that we’ve decided to keep it out of `rules_kustomize.`

You might guess (correctly) that we also use Bazel here, staying true to the theme of tool unification and re-use. We take advantage of run targets and build toolchains for various `kubectl` commands.  Again, local and CI systems are guaranteed to use the same binary versions (both kustomize and kubectl) in all cases.

Continuing the above concrete operational example: 

  • To apply a native Kubernetes namespace against cluster: kustomize build gke/overlays/production/hello-world | kubectl apply -f -
  • Now with Bazel: bazel run //gke/overlays/production/hello-world:apply, which allows us to hide gcloud commands to connect to certain clusters using kubeconfig’s.

Lastly, with KCC, we use Bazel to create our cloud resources directly:

  • To apply a native Kubernetes namespace against cluster:
    kustomize build gke/overlays/production/_kcc_hello-world | kubectl apply -f -
  • Now with Bazel: bazel run //gke/overlays/production/_kcc_hello-world:apply

These commands are then used in both our CI/CD pipelines and any local development scripts. It guarantees that there are as few surprises as possible across different environments.

Spreading the word

So far, the only announcement about this open sourcing has been in a Github thread where the community expressed the lack of such a tool at bazelbuild. We hope this article will garner more interest and potentially community contributions.

We encourage you to fork, star, watch the repository and see how it might help your organization or personal project, and share this article with your networks. If you have any ideas or suggestions, feel free to start a discussion at https://github.com/benchsci/rules_kustomize/issues

Written By:
Tony Zhang

Comments