One reason why YAML is bad for configuration

YAML is a data serialization language that is widely used for application configuration. YAML is relatively readable, flexible and, compared to JSON, it allows for adding comments.

I don’t think that YAML is generally terrible for configuration, but the abuse of YAML when dealing with complex systems like Kubernetes makes all of its problems more evident: wrong indentations, the fact that you can cut a YAML in two and it’s likely still valid YAML, that problem with Norway and so on.

But today I’d like to talk about a more specific example that can seem surprising and that I found in a codebase I’m working on these days.

A simple Kubernetes deployment

I found myself facing a file that, for the sake of this blogpost, is equivalent to the following:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
  annotations:
    foo: "bar"
    foo: "bar"
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80

Is this a valid YAML? If you are unsure, you can use your preferred way for validating YAML, I will use Ruby’s irb:

% irb
2.6.0 :001 > require 'yaml'
 => true
2.6.0 :002 > a = YAML.load_file("deployment.yaml")
 => {"apiVersion"=>"apps/v1", "kind"=>"Deployment", "metadata"=>{"name"=>"nginx-deployment", "labels"=>{"app"=>"nginx"}, "annotations"=>{"foo"=>"bar"}}, "spec"=>{"replicas"=>3, "selector"=>{"matchLabels"=>{"app"=>"nginx"}}, "template"=>{"metadata"=>{"labels"=>{"app"=>"nginx"}}, "spec"=>{"containers"=>[{"name"=>"nginx", "image"=>"nginx", "ports"=>[{"containerPort"=>80}]}]}}}}

Valid, cool. Now look at the annotations.

2.6.0 :003 > a["metadata"]["annotations"]
 => {"foo"=>"bar"}
2.6.0 :004 >

The original deployment.yaml file had a duplicate annotation which is perfectly valid in YAML. You are basically saying “the key foo has value bar” and repeating it twice. Not too bad, except that the duplicate is not duplicate once parsed.

Things can be a little bit more fun though:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
  annotations:
    foo: "bar"
    foo: "baz"
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80

Now we have the same key with two different values. Let’s apply this to Kubernetes with kubectl and see what we have in the cluster:

kubectl get deployments nginx-deployment -oyaml
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
    foo: baz

[CUT]

Fun, there’s no sign of the value “bar”. This means that if for any reason you have two duplicate keys, you will not have an invalid YAML and just overwrite things.

This case seems rare and again not too terrible, but there are more similar cases:

% cat deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
  annotations:
    foo: "bar"
    foo: "baz"
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80
        env:
          - name: foo
            value: bar
        env:
          - name: foo
            value: baz

The YAML file above has a duplicate env. Let’s kubectl apply it and look at the env:

 spec:
      containers:
      - env:
        - name: foo
          value: baz

Fun, isn’t it? Now imagine this “problem” over thousands of templated lines…

Please validate your YAML a lot

What is allowed in YAML is not always what you want to do. Config changes are still the reason for outages, issues in production and generally unexpected behaviors. I’m not going to say that YAML was a bad idea for Kubernetes resources because it would require a much more complicated and detailed discussion, but for sure if you want to deal with YAML files to configure your applications and infrastructure, there is a lot that you should be doing.

Validate your files. If you render them with a tool, validate the rendered files. If YAML is your source of truth, take care of it. Don’t generate and apply on the fly. And maybe try to not abuse YAML too much… I’m liking cue these days, but that’s a topic for another time.