Deep network security on GKE just got a whole lot easier.

At Vamp our trial clusters present a harsh security challenge. We want trial users to bring their own containers but we don’t want them to have access to the Vamp infrastructure.

One of the most powerful and yet often ignored security features in Kubernetes are network policies. Whilst they lack the the advanced features of modern firewalls, network policies are a powerful tool for building deeply secure environments as they allow you to secure traffic within a cluster.

Policies are often ignored as a security feature because:

  1. There is a  misconception that cluster-edge security is sufficient;
  2. They require an understanding of how Kubernetes’ networking works; and
  3. Debugging them can be difficult.

In our environments, we don’t take any aspect of security for granted, at any point, so in addition to using best practices such as  role-based access control (RBAC) and  mutual TLS and tools like Hashicorp Vault to manage cross-cluster secrets and the PKI (Public Key Infrastructure). We also make extensive use of network policies to extend network access control to the traffic within and between nodes.

We use Google Cloud’s Kubernetes Engine (GKE) as our reference platform because if it doesn’t work on GKE, it’s probably not going to work on Amazon’s EKS or Azure’s AKS.

Project Calico

GKE implements network policies using Tigera’s Project Calico. In our experience Calico offers the best behaved network policy implementation. This is important because network policies can be hard to implement and debug which is partly why they are often overlooked. So, you need an implementation that faithfully follows the spec and does what you expect it to.

We admit we’re a little biased at Vamp, we’ve had good experiences with  Calico starting a few years ago when the majority of our customers were on DC/OS. Our experience of Calico on Kubernetes has been just as positive.

Containers are Vulnerable

Kubernetes, the operating systems running on the nodes, Docker, the operating systems running in the containers, the software frameworks and the third party libraries used to implement your services all have vulnerabilities. Lots of vulnerabilities.

One of the most important value adds of using a managed Kubernetes service like GKE is that Google actively patch Kubernetes and the OS used to run the nodes, as well as actively defending against known vulnerabilities. This is not foolproof and in any case the vulnerabilities in your Containers are beyond the scope of what Google can do.

For example, if you use one of the  official Node.js Docker images as your base image, there is a big difference in the number of known vulnerabilities depending on which image you choose.

Vulnerabilities in the latest (10.6.2) official Node.js LTS Docker images

The lts image uses the official Debian 9.x “Stretch” Docker image as it’s base and has 43 components that have known vulnerabilities, many of them critical. The lts-buster the official  Debian 10.x “Buster” Docker image and has 25 components that have known vulnerabilities, 14 of them critical. Whereas the lts-alpine image uses the much slimmer, security focused Alpine Linux as it’s base and has no components with known vulnerabilities.

The rule of thumb is that smaller images generally have fewer vulnerable components because they have fewer components in total but ultimately its not about the numbers, you need defence in depth.

Defence in Depth: The First Step

The first step to leveraging network policies to secure internal cluster traffic is to enable the policy enforcement feature when creating a cluster.

To enable network policy enforcement when creating a cluster using the gcloud CLI , simply add--enable-network-policy:

gcloud command to create a cluster

Note: the second command will trigger GKE to recreate all the node pools in your cluster to ensure they are correctly configured.

Tip: the easiest way to  verify that network policies are enabled, is either to use kubectl to describe one of your existing Pods,  or to use the GKE console to view the Pod’s YAML. Kubernetes doesn’t warn you that  network policies are not enabled, it accepts the configuration and silently ignores it.

If network policies are enabled, you will see a cni.projectcalico.org/podIP annotation.

Use kubectl to verify network polices are enabled

Still Access All Areas

The Pods in a Kubernetes cluster are not isolated by default, which means that any Pod in any namespace is free to access any other Pod, in any other namespace. Pods are also free to establish connections to practically anywhere on the Internet.

Relying on stopping threats at the cluster edge is like relying on checking festival goers tickets when they arrive and then relying on their “good character” not to storm the backstage areas. It’s going to  end in a mess.

Block Everything but DNS

The first thing you want to do is set a default deny (almost) all egress policy on your Namespaces. It is harsh but effective. The policy should allow DNS, without it the network will not function properly.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-all-egress-except-dns
spec:
  podSelector: {}
  policyTypes:
  - Egress
  egress:
  # allow DNS resolution
  - ports:
    - port: 53
      protocol: UDP
    - port: 53
      protocol: TCP

Your Pods will be able to respond to requests won’t be able to initiate connections. So, if a bad actor does hijack your containers, it will be that much harder for them to harvest your data or to use your cluster to do things like anonymously buy Facebook advertising.

kubectl commands to test the connection to google.com before the egress policy is applied
kubectl commands to create and describe the network egress policy

Tip: BusyBox is a huge help when testing policies.

kubectl commands to verify the connection to google.com is blocked by the egress policy

Make a Hole

Just like with a firewall, you can then allow egress to specific destinations on a case by case basis. In this case, allowing the Vamp Release Agent in the customer1 Namespace to connect to Elasticsearch in the vamp Namespace.

The egress policy allows any Pod with a io.vamp=release-agent label to connect to any Namespace with a role=core label using TCP port 9200 only.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: release-agent-egress-es
spec:
  podSelector:
    matchLabels:
      io.vamp: release-agent
  policyTypes:
  - Egress
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          role: core
    ports:
    - protocol: TCP
      port: 9200
kubectl commands to create and describe the network egress policy

Using BusyBox we can now successfully connect to Elasticsearch in the vamp Namespace.

kubectl commands to verify the connection to Elasticsearch is allowed by the egress policy

Check Both Ends

Next, for each of the Services running in your cluster, set individual policies to control which Namespaces can access the Pods behind each of those services, using a namespaceSelector. A common use case for this is when your application has dependencies on services like Elasticsearch or Redis that run in the same cluster as your Services.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: elasticsearch-ingress
  namespace: vamp
spec:
  podSelector:
    matchLabels:
      app: elasticsearch
  policyTypes:
  - Ingress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          role: tenant
    ports:
    - protocol: TCP
      port: 9200

Ingress network policies are written from the perspective of the Pods that are being protected. In this case, the policy restricts which Pods can connect to the Elasticsearch Pods in the vamp Namespace.

The Elasticsearch Pods are identified as being any Pod with a app=elasticsearch label and they are only allowed to accept requests from Pods in namespaces with a role=tenant label. And only on TCP port 9200.

We can test the effect of this policy by creating a new Namespace without the role=tenant label and running our  BusyBox spider:

kubectl commands to verify the connection to Elasticsearch is restricted by the ingress policy

Double Up

As a safeguard, it’s a good idea to safeguard against the corresponding egress being accidentally removed or misconfigured by enforcing similar restrictions at both the egress and ingress ends of a connection.

You can further lock down access by defining which Pods within a Namespace can have access, this is done by pairing a podSelector with the namespaceSelector. This is also useful when you have multiple Services in a Namespace, some of which need access and some of which don’t.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: elasticsearch-ingress
  namespace: vamp
spec:
  podSelector:
    matchLabels:
      app: elasticsearch
  policyTypes:
  - Ingress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          role: tenant
      podSelector:
        matchLabels:
          io.vamp: release-agent
    ports:
    - protocol: TCP
      port: 9200

The updated policy now restricts the Elasticsearch Pods to only accepting requests on TCP port 9200 from Pods that have a io.vamp=release-agent label and that are running in namespaces that have  a role=tenant label.

kubectl commands to update and describe the improved network ingress policy

Troubleshooting

One of the reasons people shy away from implementing network policies is that troubleshooting network configuration issues is a daunting challenge to most developers.

Fortunately, troubleshooting network policies became a whole lot easier when GKE introduced the option to enable intranode visibility for a cluster. It sounds dull but don’t be fooled, what exposing your intranode Pod-to-Pod traffic to the GCP networking fabric means is that you can see logs of the network flows between Pods (e.g. VPC flow logging).

Adding intranode visibility to a new cluster using the GKE console

You can enable Intranode Visibility for an existing cluster using the gcloud CLI:

gcloud command to update an existing cluster

Note: this command will trigger GKE to restart all the masters and  all the nodes in your cluster, so it makes sense to enable intranode visibility and network policies in the same command.

Tip: by default the logs are aggregated every 5 seconds. This is great when you are troubleshooting but for a small cluster that can easily result in 20Gb of logs per day at a cost of 0.50 USD/Gb. So, when you not actively troubleshooting, it makes sense to increase the aggregation period to 30 seconds or 1 minute. This will reduce your costs by 80–90%.

Filtering the Logs

To simplify checking the logs, you need to know either the source or destination IP address and preferably the destination port number. The port number is useful when you want to filter out DNS (port 53) requests, etc.

In this example, we are interested in requests from BusyBox (pod2–10.4.1.11) to Elasticsearch (10.4.2.12) on port 9200.

Note: the IP address shown in the examples above (10.0.25.32) is the IP address of the elasticsearch Service, not the Pod. Only the Pod overlay network is exposed, the requests to the Services are not logged.

kubectl command to get the Pod IP addresses

You can view the logs using the GKE Stackdriver logs viewer console page or using the gcloud CLI. The console page is the easier option when diagnosing issues:

  1. Start by selecting the GCE Subnetwork for your cluster using the left most dropdown menu and then selct just the vpc_flows log.
  2. You can then filter by jsonPayload.connection.src_ip or jsonPayload.connection.dest_ip plus the jsonPayload.connection.dest_port if you know it.

Tip: it is important to type in the filter box and use the auto completion. Pasting into the filter box often results in a text query, like text:jsonPayload.connection.src_ip:10.4.1.11 which won’t return any results.

VPC flow log for pod2 connections to Elasticsearch

Our Experience

We have deep experience of Project Calico and of Kubernetes network policies on self-managed clusters but we were wary of implementing them on enforce layer-3 segmentation on managed services like GKE. The main reason for this was the lack root access to the nodes. Our experience was that you needed to be able to tap into the various real and overlay networks to be able to diagnose issue. The GKE VPC flow logs changed that.

The most frequent  mistake we see occurs when a Service is exposed on a different port to the Container target port. Policies operate at the Pod-level only. So, if your Service is exposed on port 80 but the corresponding Container uses port 8080, you must use port 8080 in the network policy.

Another common mistake we see occurs  when a network policy uses a podSelector that is subtly different to the selector used for the corresponding Service. This can lead to ugly situations  when releasing a new version of a microservice. For example, when the new Deployment is labelled slightly differently the Pods for the new version may match the Service’s selector and be added to it’s load balancer only to fail because the requests routed to those Pods are blocked by the network policy.

You can find out more about Vamp and our policy-based, data-driven approach to Service Reliability Engineering at vamp.io.