How DaemonSet’s Status field works

This week I embarked on an unexpected journey to figure out how the Status on DaemonSets work. I was trying to debug a problem in a piece of software that I maintain at work, which involves computing the Status for DaemonSet objects in Kubernetes, and I naively assumed that DaemonSets were just a “regular resource” in Kubernetes, but unfortunately I should have remembered that there are only guidelines for resources to implement Statuses and that controllers are pretty much free to do whatever they want. So, with a bug in front of me, I took my dear friend kubectl and started to build a mental model of how things work.

The DaemonSet Status Section

If we take a look at the DaemonSetStatus spec, we see there are a lot of fields and some are more interesting than others:

  • CurrentNumberScheduled
  • DesiredNumberScheduled
  • NumberAvailable
  • NumberUnavailable
  • NumberMisscheduled
  • NumberReady
  • ObservedGeneration
  • UpdatedNumberScheduled

And more. You can refer to the linked spec documentation for what they mean. I’m a bit stubborn and not really a quick learner so even if I had a description at hand, it took a while for me to understand how those values actually change.

Note that the DaemonSetStatus specification also talks about Conditions that should define the status. The field is in fact there, but it is actually not populated… So how does it work? Well, it’s complicated.

How the DaemonSet controller updates the status

Thankfully we have the code for the DaemonSet controller1 to read and it’s quite well written and easy to follow albeit long. The code really helped me understand what happens to the various fields over time and to understand how they change in different situations that may arise. To explain how things work, let’s make two cases: the first in which we’re analyzing a newly created DaemonSet and the second an update of an existing DaemonSet. The two cases aren’t much different, but details of the behavior are important to how the Status is computed and what it means from the point of view of the system.

When a new DaemonSet is created, the controller will immediately create the Pod objects based on the Nodes. Once those are created, the ObservedGeneration field will go to 1, but numberReady and numberAvailable will be 0 and slowly growing. Reasonable, as Generation: 0 is never a thing, but it also means that there’s never a “0 generation” to compare with at the beginning…

In the case in which the DaemonSet was already existing and it was updated, we will see the following:

  • The spec is updated, but the status isn’t. At this point in time, the Generation in the spec is updated (let’s say to 165) but the ObservedGeneration is still the old one, 164.
  • The Pod objects are created and the ObservedGeneration is bumped to the new value 165. The pods aren’t necessarily running or ready.
  • As the pods start and become ready, the UpdatedNumberScheduled will be increased accordingly.
  • The NumberReady, NumberAvailabile and NumberUnavailable will change during the rollout.

Note an important thing: as the process is incremental, the NumberReady and NumberAvailable will fluctuate and will be the result of a sum across the generation of the pods. Those values will also depend on if the PodSpec part of the DaemonSet will contain a ReadinessProbe or not. I found this to be slightly confusing when dealing with such a complicated state as it mixes global cross generation counters (i.e. NumberReady) with generation specific ones (all the ones starting with Updated).

The simplified process can be represented with the following picture:

timeline for DaemonSet rollout

Conclusions

The way the status is computed for DaemonSet is significantly different from other “core” Kubernetes resources. Another good reference for how the Status actually is computed is the kstatus library which has a specific piece of code exactly to deal with the quirks of DaemonSets.

I hope that was an interesting read and maybe I will be writing more about all the weird things about Kubernetes Statuses that I discovered over the years. Until next time, have a wonderful day.

  1. Note that it is called “Daemon Controller”. 

ExternalDNS maintainer meeting 2023-08-10

This post is just a quick summary of what we have discussed as part of a maintainers’ meeting that we held on 2023-08-10.

Next priorities

We’re going to focus on a few things:

  1. Webhook provider, which has this PR as single pre-requisite. We have a plan of testing with new external providers (i.e. IONOS) and hope to get some feedback on its implementation. This will allow us to plan the move of some of the alpha providers to be out of tree and enable new providers to be created.
  2. TXT registry. There are some fixes to be made to fix the v2 version of records. We will then start work on a v3 version.
  3. We will prioritize merging fixes to the DynamoDB registry in this PR.
  4. IPv6 fixes are coming.
  5. Various improvements on CI with prow presubmit.

We have also discussed that we’ll do a next release as soon as #3724 and #3726 will be merged. We’ll be investigating improvements to the release process by looking into some of the automation that the kops project has.

Towards the end of the meeting, we also discussed improvements to the DNSEndpoint CRD.

Practical PR authoring tips

I have been working for GitHub for a few years and, as you surely know, GitHub is a big fan of pull requests. The teams I have been working with have been mostly distributed anywhere between UTC+2 and UTC-7 and that meant very little overlap opportunities for synchronous work. At the same time, during my tenure at the company, I did quite a bit of opensource work on ExternalDNS, which meant working with different sets of people, different cultures, different skills and, at the end of the day, different ways of collaboration. Those asynchronous ways of working gave me a new perspective on the importance of well structured pull requests. I want to talk about some of this today.

Pull requests as a unit of work

The main reason we write pull requests is to modify or add something to an existing project. Pull requests are a way to contain a unit of work. At GitHub, that work is done when the pull request has been discussed, reviewed, approved and deployed. GitHub is still using the “GitHub flow”, a model of work in which pull requests are deployed before merging them, which makes it so that a piece of work is done when it is proven in front of customers. It’s a powerful way of working with PROs and CONs, especially at scale, which I’m not going to describe now. Funnily enough, while this approach should help with the definition of done and thus work estimation, it really doesn’t, but that’s also a more complicated topic for another time. Anyway, enough with the context, for the rest of this post, I want to focus on who we write pull requests for.

Who do we write pull requests for

Pull requests are not a way to limit contributions to a codebase. I’d argue that the main value they bring is not even “code quality”. The main value is communication, sharing context and establishing relationships with the people we collaborate with, including our future selves.

Given that, my approach throughout my years has always been: write for the next person who doesn’t know anything about the work, for who is interested in the reasons behind the work and write for who is interested in all of the little implementation details. But even more, write for yourself a year from now or your colleagues when you will be working somewhere else, so that you can go back and try to understand why something was made in a certain way. I’ve used this mindset countless times and it has been a superpower during debugging, refactoring or simply when explaining to colleagues why things are in a certain way.

How to actually do that

My practical PR authoring tips are:

  • Write a great description: say what the PR does, why the work is done in the first place, what alternatives have been considered, link related work. Note that this doesn’t always apply: sometimes you just need a quick revert and need to go fast, some other times you want to open a PR early to show an idea and don’t have a lot of things to write yet. You will likely have time to go back and edit the description for most of those cases.
  • Remember that you are not coding for yourself: you are most likely contributing code to a project that is not your private project or to a codebase owned by your employer. In those cases, the code is not yours, so you shouldn’t put yourself in the position of wanting to win an argument with a specific approach, rather to do what is the right thing for the codebase that you are contributing to.
  • Do a self review: After opening the PR, I always go through it and review it myself. I normally try to anticipate questions that I think I will receive and try to add an inline comment to anything that could be controversial or a major decision point for the implementation that I am contributing. This approach helps to have the discussions on decision points that I know we may want to have, especially if there are major decisions that may influence the project’s future.

I don’t do all of those things right all the time. Sometimes I go too quickly, sometimes I’m not descriptive enough. Keeping those things in mind has helped me through the years to be a better contributor, both at work and in open source.

There's no such thing as a stable system

This post is the short story of an incident that I experienced while operating services at a previous job.

At the time, we were running a microservices architecture and we had a small service that was responsible, without going too much in the details, about user employee authentication. The service was a real microservice: it was doing exactly one thing and doing it well. At the time my team inherited that service, it was stable and not being developed anymore. It “just worked”.

It’s important to also mention that being the service responsible for employee authentication and being the number of employees a relatively small number that was constantly but slowly growing, the service was never subject to unpredictable load or massive growth. In that sense, it was easy to operate. And it even had a dashboard!

And still… one day it started failing

Surprise surprise, the stable service one day started throwing 500s. We started getting reports from employees that they can’t do their work and we could see clearly that some instances of the service were throwing 500s, but we struggled to understand what was wrong. The process was up, the error was cryptic and we didn’t change nor deploy the software in months.

After quite some digging on the few machines that were running the service, we learned something: the service, among the things that it was doing, was creating temporary files in temporary nested directories in a specific location of the filesystem. While it did implement logic to delete the temporary file after use, it wasn’t deleting the associated nested directories. That meant that, on the machines we were looking at, we had many “leaked” directories.

And over the months that we didn’t change nor deploy the service, we were accumulating those leaked directories which resulted in using all the available inodes making it impossible to create new temporary files.

But the service is stable!

Now that I spoiled a bunch of stuff, what I remember vividly was the reaction of some of my colleagues: management was very surprised to see the service fail because “it has been stable for so long” and “it was written by one of our most experienced engineers”.

The point is that none of this matters: there is no such thing as a forever stable service. We keep thinking of services as “just software” or “pieces of code”, but the reality is that they exist as static code in an extremely dynamic world: they are subject to user input, the machine in which they run changes over time, the time of the day changes and a million other things can and will change.

Thinking that things are done or stable is simply meaningless and it is similarly useless to think that we can freeze a system in its state or somewhat be worried about change, because change is a constant in the system that we will never control.

And complex systems will fail sooner or later.

The desired state

When talking about Kubernetes, a key concept is the one about desired state vs actual state and the functionality of reconciling the actual state to match the desired state. While that is easy to understand, the idea of the desired state needs to be discussed a bit further. What is the desired state from a user’s point of view?

A growing system

In the early days of Kubernetes, there were only a few resources: ReplicationControllers, Services, ConfigMaps and a few other things. As the system matured, more and more resources were introduced and with Third Party Resources first which became Custom Resource Definitions, there was an explosion of new resources. At the same time, more controllers/operators were created which also started to manage resources themselves. The webhook system of Kubernetes was extended to support mutating webhooks which injected things in some of the resources and more components were created to deal with user submitted resources, for example like the Horizontal Pod Autoscaler or the Vertical Pod Autoscaler.

Due to all of those new controllers and resources, n the modern Kubernetes world, when the user applies a resource, there is so much going on that it is likely that the resources will be modified. The classics are:

  • Replica count being changed
  • Memory/CPU requests and limits being adjusted
  • Sidecars being injected

And more depending on the use case.

Your desired state is not the actual state… ever.

The reason for the existence of all the aforementioned modifications of user provided resources is that Kubernetes is a really complex system and that we want to “simplify” things for the users and create automation, to avoid having the users in the loop for every little thing.

For that reason, the system does a lot of things behind the scenes… which makes the resources that the user interacts with different in the live system if compared to the applied resources.

It’s all about abstractions

To better understand what is explained above, I would like to make an example. Let’s say we are dealing with a Deployment and a Vertical Pod Autoscaler resource. The Vertical Pod Autoscaler doesn’t have any value per se, but it exists to tell Kubernetes that it can adjust limits and requests for the pods associated with the Deployment. This is a simple, modular approach, but what we are really dealing with is a single resource: a Deployment that has automatic limits and request adjustment. We could call this a AutomaticallyLimitRequestAdjustedDeployment. But not even Kubernetes is not enough Java to do that :-) And obviously that wouldn’t work either for reasons of modularity.

Conclusion

You don’t always get your state exactly as you specify it to Kubernetes. That is not bad as this is mostly done to simplify the developers’ lives. It’s important for platform teams that are managing Kubernetes in the context of a company to clearly define to the users the concepts behind the abstractions, to make it easier to work with Kubernetes and limit possible surprises that arise from the obvious difference between the live resources and the desired state in Git.