Kubernetes and environment variables

This week I had to work more than usual with lots of Kubernetes resources which (ab)use environment variables and I thought I would write about the topic because there was enough that I think is somewhat confusing or surprising.

Starting from the official Kubernetes doc, we’re greeted right away with a relatively complex example of what is possible, defining an environment variable that combines other environment variables defined in the same resource. Let’s pull the example out of the doc for convenience:

apiVersion: v1
kind: Pod
metadata:
  name: print-greeting
spec:
  containers:
  - name: env-print-demo
    image: bash
    env:
    - name: GREETING
      value: "Warm greetings to"
    - name: HONORIFIC
      value: "The Most Honorable"
    - name: NAME
      value: "Kubernetes"
    - name: MESSAGE
      value: "$(GREETING) $(HONORIFIC) $(NAME)"
    command: ["echo"]
    args: ["$(MESSAGE)"]

As a starter example, I find this to be quite a lot: it starts right off with environment variables reusing other environment variables, without mentioning downsides of the approach. It looks like the Kubernetes docs really like dependent vars because they wrote one more document that goes even deeper on the topic. The document is supposed to help with using dependent environment variables and it contains warnings like:

Note that order matters in the env list. An environment variable is not considered “defined” if it is specified further down the list.

It’s great to have a document that explains what’s possible and somewhat warns about possible mistakes. In general though, dependent environment variables for which the order matters seem like a big risk of misconfiguration: when the list of variables becomes bigger than just three, the risk of misconfiguring the application increases. I understand that it may save the person the need to template the file, but having environment variables that are based on others seems pretty dangerous: it’s easy in an example with only 3 variables, but the dependency on ordering is easy to get wrong. I’d personally take the opportunity to change the application so that the composite one is built inside the application, but I also recognize that sometimes that’s not possible.

Things get slightly more complicated when we deal with a combination of envFrom and env. Kubernetes has a special rule when dealing with a combination of those: first all the envFrom will be defined and then all the env, which will overwrite the values from the envFrom, with the proof being this comment from 8 years ago. I think that the idea of this logic is to have a default global configuration (like in a ConfigMap) and then override some of the variables defined in it. Applications that make extensive use of both keywords and dependent environment variables might bring a serious risk of not knowing what will end up being the actual set of environment variables used by an application, especially at the time of authoring a resource: the Kubernetes resource will look correct to Kubernetes itself even with variables overriding other variables and possible unwanted overrides will only show up at runtime. In general, no matter where you stand regarding templating of configuration or similar topics, it is a good idea to prevent misconfigurations that can only be found at runtime, given that they could have possible impact on production systems and be hard to debug.

If you really need to use lots of environment variables, my recommendation is the following:

  1. Avoid dependent environment variables.
  2. Avoid a mix of envFrom and env.
  3. Lint aggressively. If you mix env and envFrom, write a smart linter that validates the end result.

And if you can avoid environment variables, even better!

Hope this was interesting… see you in the next one.