Kubernetes and environment variables

This week I had to work more than usual with lots of Kubernetes resources which (ab)use environment variables and I thought I would write about the topic because there was enough that I think is somewhat confusing or surprising.

Starting from the official Kubernetes doc, we’re greeted right away with a relatively complex example of what is possible, defining an environment variable that combines other environment variables defined in the same resource. Let’s pull the example out of the doc for convenience:

apiVersion: v1
kind: Pod
metadata:
  name: print-greeting
spec:
  containers:
  - name: env-print-demo
    image: bash
    env:
    - name: GREETING
      value: "Warm greetings to"
    - name: HONORIFIC
      value: "The Most Honorable"
    - name: NAME
      value: "Kubernetes"
    - name: MESSAGE
      value: "$(GREETING) $(HONORIFIC) $(NAME)"
    command: ["echo"]
    args: ["$(MESSAGE)"]

As a starter example, I find this to be quite a lot: it starts right off with environment variables reusing other environment variables, without mentioning downsides of the approach. It looks like the Kubernetes docs really like dependent vars because they wrote one more document that goes even deeper on the topic. The document is supposed to help with using dependent environment variables and it contains warnings like:

Note that order matters in the env list. An environment variable is not considered “defined” if it is specified further down the list.

It’s great to have a document that explains what’s possible and somewhat warns about possible mistakes. In general though, dependent environment variables for which the order matters seem like a big risk of misconfiguration: when the list of variables becomes bigger than just three, the risk of misconfiguring the application increases. I understand that it may save the person the need to template the file, but having environment variables that are based on others seems pretty dangerous: it’s easy in an example with only 3 variables, but the dependency on ordering is easy to get wrong. I’d personally take the opportunity to change the application so that the composite one is built inside the application, but I also recognize that sometimes that’s not possible.

Things get slightly more complicated when we deal with a combination of envFrom and env. Kubernetes has a special rule when dealing with a combination of those: first all the envFrom will be defined and then all the env, which will overwrite the values from the envFrom, with the proof being this comment from 8 years ago. I think that the idea of this logic is to have a default global configuration (like in a ConfigMap) and then override some of the variables defined in it. Applications that make extensive use of both keywords and dependent environment variables might bring a serious risk of not knowing what will end up being the actual set of environment variables used by an application, especially at the time of authoring a resource: the Kubernetes resource will look correct to Kubernetes itself even with variables overriding other variables and possible unwanted overrides will only show up at runtime. In general, no matter where you stand regarding templating of configuration or similar topics, it is a good idea to prevent misconfigurations that can only be found at runtime, given that they could have possible impact on production systems and be hard to debug.

If you really need to use lots of environment variables, my recommendation is the following:

  1. Avoid dependent environment variables.
  2. Avoid a mix of envFrom and env.
  3. Lint aggressively. If you mix env and envFrom, write a smart linter that validates the end result.

And if you can avoid environment variables, even better!

Hope this was interesting… see you in the next one.

How DaemonSet’s Status field works

This week I embarked on an unexpected journey to figure out how the Status on DaemonSets work. I was trying to debug a problem in a piece of software that I maintain at work, which involves computing the Status for DaemonSet objects in Kubernetes, and I naively assumed that DaemonSets were just a “regular resource” in Kubernetes, but unfortunately I should have remembered that there are only guidelines for resources to implement Statuses and that controllers are pretty much free to do whatever they want. So, with a bug in front of me, I took my dear friend kubectl and started to build a mental model of how things work.

The DaemonSet Status Section

If we take a look at the DaemonSetStatus spec, we see there are a lot of fields and some are more interesting than others:

  • CurrentNumberScheduled
  • DesiredNumberScheduled
  • NumberAvailable
  • NumberUnavailable
  • NumberMisscheduled
  • NumberReady
  • ObservedGeneration
  • UpdatedNumberScheduled

And more. You can refer to the linked spec documentation for what they mean. I’m a bit stubborn and not really a quick learner so even if I had a description at hand, it took a while for me to understand how those values actually change.

Note that the DaemonSetStatus specification also talks about Conditions that should define the status. The field is in fact there, but it is actually not populated… So how does it work? Well, it’s complicated.

How the DaemonSet controller updates the status

Thankfully we have the code for the DaemonSet controller1 to read and it’s quite well written and easy to follow albeit long. The code really helped me understand what happens to the various fields over time and to understand how they change in different situations that may arise. To explain how things work, let’s make two cases: the first in which we’re analyzing a newly created DaemonSet and the second an update of an existing DaemonSet. The two cases aren’t much different, but details of the behavior are important to how the Status is computed and what it means from the point of view of the system.

When a new DaemonSet is created, the controller will immediately create the Pod objects based on the Nodes. Once those are created, the ObservedGeneration field will go to 1, but numberReady and numberAvailable will be 0 and slowly growing. Reasonable, as Generation: 0 is never a thing, but it also means that there’s never a “0 generation” to compare with at the beginning…

In the case in which the DaemonSet was already existing and it was updated, we will see the following:

  • The spec is updated, but the status isn’t. At this point in time, the Generation in the spec is updated (let’s say to 165) but the ObservedGeneration is still the old one, 164.
  • The Pod objects are created and the ObservedGeneration is bumped to the new value 165. The pods aren’t necessarily running or ready.
  • As the pods start and become ready, the UpdatedNumberScheduled will be increased accordingly.
  • The NumberReady, NumberAvailabile and NumberUnavailable will change during the rollout.

Note an important thing: as the process is incremental, the NumberReady and NumberAvailable will fluctuate and will be the result of a sum across the generation of the pods. Those values will also depend on if the PodSpec part of the DaemonSet will contain a ReadinessProbe or not. I found this to be slightly confusing when dealing with such a complicated state as it mixes global cross generation counters (i.e. NumberReady) with generation specific ones (all the ones starting with Updated).

The simplified process can be represented with the following picture:

timeline for DaemonSet rollout

Conclusions

The way the status is computed for DaemonSet is significantly different from other “core” Kubernetes resources. Another good reference for how the Status actually is computed is the kstatus library which has a specific piece of code exactly to deal with the quirks of DaemonSets.

I hope that was an interesting read and maybe I will be writing more about all the weird things about Kubernetes Statuses that I discovered over the years. Until next time, have a wonderful day.

  1. Note that it is called “Daemon Controller”. 

ExternalDNS maintainer meeting 2023-08-10

This post is just a quick summary of what we have discussed as part of a maintainers’ meeting that we held on 2023-08-10.

Next priorities

We’re going to focus on a few things:

  1. Webhook provider, which has this PR as single pre-requisite. We have a plan of testing with new external providers (i.e. IONOS) and hope to get some feedback on its implementation. This will allow us to plan the move of some of the alpha providers to be out of tree and enable new providers to be created.
  2. TXT registry. There are some fixes to be made to fix the v2 version of records. We will then start work on a v3 version.
  3. We will prioritize merging fixes to the DynamoDB registry in this PR.
  4. IPv6 fixes are coming.
  5. Various improvements on CI with prow presubmit.

We have also discussed that we’ll do a next release as soon as #3724 and #3726 will be merged. We’ll be investigating improvements to the release process by looking into some of the automation that the kops project has.

Towards the end of the meeting, we also discussed improvements to the DNSEndpoint CRD.

Practical PR authoring tips

I have been working for GitHub for a few years and, as you surely know, GitHub is a big fan of pull requests. The teams I have been working with have been mostly distributed anywhere between UTC+2 and UTC-7 and that meant very little overlap opportunities for synchronous work. At the same time, during my tenure at the company, I did quite a bit of opensource work on ExternalDNS, which meant working with different sets of people, different cultures, different skills and, at the end of the day, different ways of collaboration. Those asynchronous ways of working gave me a new perspective on the importance of well structured pull requests. I want to talk about some of this today.

Pull requests as a unit of work

The main reason we write pull requests is to modify or add something to an existing project. Pull requests are a way to contain a unit of work. At GitHub, that work is done when the pull request has been discussed, reviewed, approved and deployed. GitHub is still using the “GitHub flow”, a model of work in which pull requests are deployed before merging them, which makes it so that a piece of work is done when it is proven in front of customers. It’s a powerful way of working with PROs and CONs, especially at scale, which I’m not going to describe now. Funnily enough, while this approach should help with the definition of done and thus work estimation, it really doesn’t, but that’s also a more complicated topic for another time. Anyway, enough with the context, for the rest of this post, I want to focus on who we write pull requests for.

Who do we write pull requests for

Pull requests are not a way to limit contributions to a codebase. I’d argue that the main value they bring is not even “code quality”. The main value is communication, sharing context and establishing relationships with the people we collaborate with, including our future selves.

Given that, my approach throughout my years has always been: write for the next person who doesn’t know anything about the work, for who is interested in the reasons behind the work and write for who is interested in all of the little implementation details. But even more, write for yourself a year from now or your colleagues when you will be working somewhere else, so that you can go back and try to understand why something was made in a certain way. I’ve used this mindset countless times and it has been a superpower during debugging, refactoring or simply when explaining to colleagues why things are in a certain way.

How to actually do that

My practical PR authoring tips are:

  • Write a great description: say what the PR does, why the work is done in the first place, what alternatives have been considered, link related work. Note that this doesn’t always apply: sometimes you just need a quick revert and need to go fast, some other times you want to open a PR early to show an idea and don’t have a lot of things to write yet. You will likely have time to go back and edit the description for most of those cases.
  • Remember that you are not coding for yourself: you are most likely contributing code to a project that is not your private project or to a codebase owned by your employer. In those cases, the code is not yours, so you shouldn’t put yourself in the position of wanting to win an argument with a specific approach, rather to do what is the right thing for the codebase that you are contributing to.
  • Do a self review: After opening the PR, I always go through it and review it myself. I normally try to anticipate questions that I think I will receive and try to add an inline comment to anything that could be controversial or a major decision point for the implementation that I am contributing. This approach helps to have the discussions on decision points that I know we may want to have, especially if there are major decisions that may influence the project’s future.

I don’t do all of those things right all the time. Sometimes I go too quickly, sometimes I’m not descriptive enough. Keeping those things in mind has helped me through the years to be a better contributor, both at work and in open source.

There's no such thing as a stable system

This post is the short story of an incident that I experienced while operating services at a previous job.

At the time, we were running a microservices architecture and we had a small service that was responsible, without going too much in the details, about user employee authentication. The service was a real microservice: it was doing exactly one thing and doing it well. At the time my team inherited that service, it was stable and not being developed anymore. It “just worked”.

It’s important to also mention that being the service responsible for employee authentication and being the number of employees a relatively small number that was constantly but slowly growing, the service was never subject to unpredictable load or massive growth. In that sense, it was easy to operate. And it even had a dashboard!

And still… one day it started failing

Surprise surprise, the stable service one day started throwing 500s. We started getting reports from employees that they can’t do their work and we could see clearly that some instances of the service were throwing 500s, but we struggled to understand what was wrong. The process was up, the error was cryptic and we didn’t change nor deploy the software in months.

After quite some digging on the few machines that were running the service, we learned something: the service, among the things that it was doing, was creating temporary files in temporary nested directories in a specific location of the filesystem. While it did implement logic to delete the temporary file after use, it wasn’t deleting the associated nested directories. That meant that, on the machines we were looking at, we had many “leaked” directories.

And over the months that we didn’t change nor deploy the service, we were accumulating those leaked directories which resulted in using all the available inodes making it impossible to create new temporary files.

But the service is stable!

Now that I spoiled a bunch of stuff, what I remember vividly was the reaction of some of my colleagues: management was very surprised to see the service fail because “it has been stable for so long” and “it was written by one of our most experienced engineers”.

The point is that none of this matters: there is no such thing as a forever stable service. We keep thinking of services as “just software” or “pieces of code”, but the reality is that they exist as static code in an extremely dynamic world: they are subject to user input, the machine in which they run changes over time, the time of the day changes and a million other things can and will change.

Thinking that things are done or stable is simply meaningless and it is similarly useless to think that we can freeze a system in its state or somewhat be worried about change, because change is a constant in the system that we will never control.

And complex systems will fail sooner or later.