The fourth and last day of the KubeCon CloudNativeCon Europe 2020 virtual event. An amazing conference, congratulations to the CNCF team. And thanks to our SoKube team for following the confs and for writing these blog posts!
Check out other days:
By Leonardo Di Donato Open Source Software Engineer, Sysdig
At the beginning of Falco, CI was done through Travis CI.
Pain points were especially about the non-interactive workflow between a classical CI and GitHub (the CI does not handle status from Github repo):
no clear ownership
PR merged event GitHub status is KO
Some policies but they were not easily discoverable, auditable
No enforcement for approvals
Falco context: didn’t want to spend time to:
build a custom ci/cd
create automatic policy enforcer
Falco team wants to focus only on development of their product.
As Kubernetes used Prow, Falco chosed to follow this path.
Manage and enforce policies
Auto-merge bot, with considerations for GitHub status
Prow is OSS, so you can add some plugins and extensions if needed
Built for Kubernetes, on Kubernetes
=> With these capabilities (and the one in particular), Prow is by nature very scalable.
It seems that Falco now uses Prow as their CI/CD solution, and it fits perfectly their needs.
We think it is very interesting to have a kubernetes native solution for CI/CD such as Prow, but as of now it is limited to GitHub repositories, and that is a pain point.
A huge proportion of organizations have other SCM from the market in place (Bitbucket, GitLab, SourceForge, …) and don’t want to migrate to GitHub. It seems that GitLab is considering helping the Prow project by provising an integration with their system, but for now it’s only at the ideation stage.
By Daniel Bryant Product Architect, Datawire
Boundaries between apps and users has evolved in last 30 years. We will refer at those boundaries of an application (networking etc) as “edge”, as the speaker used this vocabulary.
1990: Hardware load balancers
2000: software load balancers appear (nginx/haproxy,…)
2010: API and so… API gateways begin
2015: Microservices => independent, and so: different protocols, languages, locations, authentication systems, …
API gateway needs to to handle all of this: authentication, load-balancing, discovering of new services.
Since the advent of micro-services, the workflow changed and now app teams are fully responsible for a service delivery
2 biggest challenges:
Scale edge management (who does what), because we have more and more resources like, routes, etc in the API gateway (retries, authentication, caching, tracing, rate limiting are the main features of an API gateway solution)
Support all these requirements in different ways, since every service will choose a solution that best fits its own needs.
Deploy an additional kube API gateway:
dev teams are responsible
OR existing ops teams can manage this
Extend existing API gateway:
Augmenting an existing API gateway solution
Custom ingress controller or load balancer
Enable sync between the API endpoints and location of k8s services
Hard to maintain (custom scripts must avoid conflict between routes inside the cluster)
Deploy an in-cluster edge stack:
Deploy Kubernetes-native API gateway
Install in each of your kube clusters
Ops team own it, and provides default
Dev teams are responsible for configuring the network boundaries of their services as part of their normal workflow
Simple to maintain, but learn about new proxies technologies can be hard at the beginning for Ops team.
Nice session on the different evolutions of API Gateway during the last decade. If we just keep in mind the key points:
Edge and API gateways have evolved through several evolutions driven by architecture (Hardware vs software, networking: from L4 to L7, and changes of the workflow and responsibilities since the apparition of micro-services)
Adoption of micro-services, with changes in your workflow, will led to choose a strategy for implementing an API gateway solution, and you’ll need to choose your own way, with the choice that will best fit the best your requirements.
By Josh Bottum Vice President, Arrikto
The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable.
Jupyter notebooks: source code
Training operators: Machine Learning layer
Workflow building: tools that simplify the kubeflow pipelines
Pipelines: way to schedule, run and monitor a workflow that will run your ML model
Data management: provides the versioning, sharing and reproducibility of your models
Tools: TensorBoard, Prometheus, etc. Dashboard for visualization around the KubeFlow
Metadata: metadata of your models
Serving: serving tools allow you to put and provide your model efficiently in the system
Interesting points from surveys presented by the speaker:
Kubeflow essentially used by software engineers and data scientists
Only 16% users of KubeFlow used it in production currently (~25% of users uses it just for learning).
Users develop Machine Learning models faster with Kubeflow
Some tools, like CUJ, helps the organizations:
The demonstration, using miniKF (a small KubeFlow, available on GCP), showed how well it integrated and was very visual:
deployments (with colors that help us to see what is used or not)
before a pipeline runs, a snapshot is taken
you can see the pipeline status in real time while it is running
when you take a snapshot of a stage into the pipeline will retrieve the context too, giving you the opportunity to reproduce the exact issue you had (you can rerun only one step thanks to the serving components).
The demonstration was the best moment, it illustrated KubeFlow as a great platform for developing in the ML domain. It aggregates all the tools needed to develop a data model, and is very user friendly (getting started with such a platform should help people curious to learn and and experiment ML).
By Wenbo Zhu Senior Staff Engineer, Google
Presentation about a new protocol for supporting gRPC at “web” level: gRPC-Web.
This new protocol (gRPC-Web):
must be compatible with gRPC
introduces minimum changes to the original protocol (just implement specificity for web, like CORS).
Limit the streaming support
avoid complexity to support protocols that require fallback, such as websockets
don’t invent anything we may regret in the future
don’t make an underlying streaming technology more reliable than it is
Need to work anywhere, support old platforms like IE10 and new platforms, and both browsers and non-browsers clients
Developer joy: prioritize feature that improve the development experience (Code-gen and build, TS, Node, …)
Be compatible with REST, keeping JSON support. No need for reimplementation of protocol agnostic features such as security. Will just integrate it.
Security features (XSRF, XSS, CSP)
Gateway with more languages - very limited for the moment
Protobuf improvements and performance
Several problems when kubectl create secret :
who create the secret, when and why?
is it tested? can we rollback it? is it the truth?
Focus on what is the source of truth for kube secrets, and how we can protect it.
What about git and gitops? History, rollback and reviews, source of truth
Use git, and not with plaintext secret (if you don’t use git, the pattern works too):
Use asymmetric cryptography: JSON Web encryption (JWE)
Usage of an envelop is recommended because secrets in most KMS cannot exceed 64Kb. So the envelope gives us the flexibility to encrypt larger payloads.
Workflow with personas:
Key admin (management of KMS)
Secret admin (manage sensitive data)
Cluster admin (deploy, manage and configure the kube cluster)
Key admin create a key in the key management system, and push the public key in the git repo
Secret admin uses this key to create JWE and the secret manifest in git
Cluster admin will retrieve secret file in the repo and push it in kubernetes cluster,.
Store secret in etcd is the problem… since it can be retrieved after that…
Demonstration using Google cloud KMS and GCE (but it can works with other solutions, of course)
generate the asymmetric key in kms and export it to the git repository
grant the decrypt priviledges to the dedicated service account in kubernetes
encrypt the credentials with the same crypto algorythm than the key admin used
create JWE and secret manifest, push it to the git repo
will use the service account dedicated to retrieve secret in KMS
configure the webhook to access the KMS
Great session about how to really secure your secrets. Instead of storing directly the secret in Kubernetes, you can use a webhook which can be triggered only by the service account in order to retrieve the secret from the KMS. As the KMS secret can be read only by the service account, which is secured.
And since the secret is just a secret with access to the webhook, we don’t care if it can been seen in etcd, since Google KMS will deliver the key only when called by the appropriate service account.
No matter what solution you'll choose, the key is to use a security solution over Kubernetes and do not store directly your secret as “plaintext” in etcd.
What is Threat Modeling ?
Threat modeling is preventing from finding out about security issues when it's too late. As early as possible, once a shared understanding is established and when features are designed for every subsequent release. Everybody can bring their own unique perspective. In fact, architects know how things should work, DevOps know how things actually work and other team like product owners, business analysts or internal users can brings informative and necessary information to the modeling. To implement that, answer the four questions: what are you building ? What can go wrong once it’s built ? What should you do about what can go wrong ? Did you do a decent job for analysis ?
What does Threat Modeling look like for Kubernetes ?
Kubernetes cluster Threat models :
Provisioning and scaling
Runtime and cluster configuration
CI/CD and application deployment
What can go wrong after you deployed your pod or run your CI/CD pipeline ?
You can have the most secure system in the world at runtime, but if it's exploited because you forgot about the supply-chain security and deploying securely in the system then it's only wasted time.
The first is defined a end to end pipeline diagram like this :
Diagrams are really important for breaking down what is built into flow processes, trust boundaries and stores within the system.
After the diagram is established we can use different techniques to find what is wrong in the system. The most common is STRIDE.
STRIDE : to characterize and identify the kinds of threats that affect processes data flows stores within the system
Existing runtime models - CNCF attack trees
Here is a GitHub repository with the threat model for Kubernetes system : https://github.com/cncf/financial-user-group/tree/master/projects/k8s-threat-model
Attack trees :
Attack trees provide a formal, methodical way of describing the security of systems, based on varying attacks. Basically, you represent attacks against a system in a tree structure, with the goal as the root node and different ways of achieving that goal as leaf nodes.” - Bruce Schneier (1999)
What are we going to do if one of threats is true ?
There should be security controls. Here are a few items :
Use a dedicated devices and network for management
Harden EC2 instances
Restrict EC2 instances IMA roles
Containers based IDS/IPS
Encore control and etc mTLS
You can enforce your controls by complementing them around :
Networking (VPC, ACL, Security Group, Subnet…)
Runtime (Security context for pods and containers : Run as non-root user, as unprivileged, drop all linux capabilities… )
RBAC and policy ( Kuberntes RBAC, Admission controllers, Open Policy Agent…)
Supply Chain Security
Determining Control sets :
You can start simple but more complex control set requires automation and testing → Risk is the deterring factor
Defense in depth with attack trees :
Integrated Kubernetes with a global SOC
Reproduce the attack against test cluster repeatedly
Gather the signals generated
Work with System and Organization and Controls (SOC)
Re-run the test cases
Make sure Docker starts correctly
This topic deals with a very important aspect of Kubernetes : Security. It was very informative and educative, explaining what is a threat model and how to create it, by utilizing diagrams and on focusing in the right aspects like runtime, networking, supply chain and many more.
Autoscaling project reviews :
The Horizontal Pod Autoscaler
Core logic lives in the Kube-controller-manager and is responsible for comparing current state of metrics against desired state and adjusting as necessary
Three different metrics types which can be used :
Resource metrics :
Resource metrics are the simplest of the 3 metrics - CPU and Memory based autoscaling.
Provided by the API metrics.k8s.io - the same metrics you can see when running kubectl top
Now usually provided by the Metrics Server - this scrapes the resource metrics from kubelet APIs and serves them via API aggregation
Currently based on the usage of the entire pod - this can be an issue if only one container in your pod is the bottleneck
Custom metrics :
Served under the API custom.metrics.k8s.io
No “official” implementation - though the most widely adopted is the Prometheus Adapter
Say you have a service where you know how many requests a given pod can handle at any time but the memory or CPU usage isn’t a good indicator of this - i.e. a fixed number of uWSGI processes
Scaling on CPU or memory is either going to waste money or result in decreased performance
External metrics :
Served under the external.metrics.k8s.io API path
A number of implementations exist for this - Azure, GCP and AWS provide ones for their metrics systems so that you can scale your k8s services based on metrics from them as well as some of the previously mentioned custom metrics implementations
Intended for metrics entirely external to kubernetes objects (e.g. kafka queue length, Azure servicebus queue length, AWS ALB active requests)
The HPA’s Algorithm
What if I want to scale on multiple metrics ?
As of k8s 1.15 the HPA handles this well, you can scale on multiple metrics and the HPA will make the safest (i.e. highest) choice, even if one or more of the metrics is unavailable
What about scaling down to zero ?
It’s possible, but you have to set your HPA up in the right way - requires both enabling an alpha feature gate - HPAScaleToZero and setting the associated HPA up with at least one object or external metric
Vertical Pod Autoscaling
Application is changing over time, maybe init request setting is no longer suitable later :
Daily/Weekly traffic patterns
User base growing over time
App lifecycle phases with different resource needs
The Vertical Pod Autoscaler (VPA) aims to solve these problems - scaling the resource requests and limits for monitored pods up and down to match demand and reduce waste.
Three components to it :
Recommender : Responsible for calculations of recommendations based on historical data
Updater : responsible for eviction of pods which are to have their resources modified
Admission plugin : a Mutating Admission Webhook - parsing all pod creation requests and modifying those with a matching VPA to match recommendations
Currently provides 4 modes : Auto, Recreate, Initial, Off
Useful for singletons
Services used by internal teams
No use giving them peak resource usage and burning money during the quiet periods
Shouldn’t use it in conjunction with resource based HPAs as the two will conflict
Modifying the resource requests requires recreating the pod - meaning a pod restart
Can be tricky to use with JVM based workloads on the memory side
The Cluster Autoscaler (CA)
Scale ups are triggered by pending pods. CA then performs an evaluation of which node groups it monitors would be able to fit the pending pods if they were scaled up. Scale down is evaluated for nodes using resources below a certain threshold.
Cluster Autoscaler Expanders
The different methods supported by the Cluster Autoscaler for deciding which node group to scale up when needed
Random (the default) : picks a random candidate node group which can fit the pending pods
Priority (available from 1.14 onwards) : can use this in conjunction with custom logic
Price (Currently GKE/GCP only) - automatically picks the cheapest candidate node group for you
Least waste : picks the candidate node group with the least wasted CPU after scale up
There are a number of things to consider when enabling Cluster Autoscaling like which pods can tolerate interruptions, whether pods being scaled down need to do any clean up, pod priorities and more …
Cost optimisation with the Cluster Autoscaler
If you have batch jobs or jobs which don’t need to run immediately, you can use “-expendable-pods-priority-cutoff” to avoid the CA scaling up purely for ultra low priority jobs.
If you want fall back to on-demand instances when Spot/Preemptible instances are out of capacity, users can create on-demand node groups with lower expansion priority and spot instance node groups with higher priority
You can also use field “--max-node-provision-time” if you have multiple spot node groups and each fallback takes 15m and you want reduce the time.
The best practice using the CA is to map each node group to a single ASG because accurate simulation requires instances have same resources.
Gotchas with the Cluster Autoscaler
How to protect my critical workloads and ensure they don’t get interrupted by CA ?
Pods with the annotation “cluster-autoscaler.kubernetes.io/safe-to-evict=false” prevents the CA terminating the node with your critical job even if the node utilization is lower than the default threshold
How to over-scale Kubernetes with the cluster-autoscaler ?
Overprovision feature puts dummy pods with low priority to reserve space. K8s scheduler will remove them to make space for unschedulable pods with a higher priority. Critical pods then don’t have to wait for a new nodes to be provisioned. These pods don’t even have to be dummy pods if you have a suitable workload that is non-critical and can tolerate interruption.
What if all of my services start scaling and don’t stop scaling ?
ResourceQuotas are invaluable here, figure out the maximum resources a given namespace should use at peak load, and allowing for failovers and set the ResourceQuota for that namespace to guard against runaway scaling
In addition, setting the maximum size of the node groups to limit the scale of clusters on the Cluster Autoscaler’s side
Cluster Autoscaler doesn’t yet support all cloud providers, but most of the big ones are covered. Decouple cloud provider and support pluggable Cloud Provider over gRPC
This sessions was very interesting, demonstrating how autoscaling at different levels is possible. As for anything, cost saving in Kubernetes is about analyzing the trade-offs you can make, which pods can afford to be interrupted, how quickly you need services to scale up and down and what scaling behaviour you want in the cluster. Finally, the best cost saving strategies can vary depending on your workloads, environment and cloud provider.
By Fabio Oliveira (Research Scientist, IBM Research)
The goals of this presentation are :
Raise awareness of a fundamental yet largely ignored problem at the core of cloud native canary releases, performance tests, and A/B & A/B/n testing
Offer an open solution to that problem and engage the community
It is about agility :
But it is also about learning :
What if you could safely :
learn how your code behaves in production or test ?
What if you could continuously and safely :
learn what resonates with your users ?
find ways to increase your company’s revenue ?
maximize your company’s revenue as you learn ?
Analytics is crucial, continuous experimentation is an analytics problem and a comparative analytics problem ! For that, enter iter8
Overviews of iter8
Version assessment :
Traffic control strategies
Traffic control safety filters
Cutoff on failure
Experiment traffic percentage
iter8 experiment type
Assess version against criteria
Typically done in a test/dev environment
Can be done in production
2 version : bassine and candidate
make sure so SLOs are violated
relative criteria make sens
Apply traffic control strategy
If canary passes → roll forward
If canary fail → roll back
A/B and A/B/n testing
“n” versions :
1 or more candidates
Compare versions to declare a winner
maximize a reward metric
make sure no SLOs are violated
Apply traffic control strategy
Traffic will go towards winner
This conference was really informative, showing how traffic management analytics could be driving a pipeline CI/CD. For this, it is necessay to introduce the iter8 tool to increase the power of continuous experimentation based on Machine Learning . It is particularly interesting to see how today, different domains can work together to drastically increase business goals. In this case we see a mix of DevOps and Machine Learning.
Fore more information about iter8 :
Steven McDonald (Site Reliability & Infra Engineer, Usabilla)
Steven McDonald presents several iterations of his Fluentd logging stack :
First iteration :
Two fluentd aggregators, with fluent-bit on every host configured to forward local logs to fluentd.
Both fluentd and fluent-bit were configured for disk-backed buffering for reliability.
Fluentd then forwarded logs on to CloudWatch and Elasticsearch.
He explains that there have been cascading failures, initially because of the Elastic cluster. (duplication of data, exploding volume, performance problem ect ..)
It highlights the lessons learned on each of the iterations tested.
Second iteration :
A new iteration and not the least, he's bringing in Kafka as a “logging buffer” and to have a stateless architecture.
Despite many contributions on the ruby-kafka plugin, the team still encounters a lot of problems (bad management of large batches, performance problem ..)
Third iteration :
It uses a KStream to filter one or more Kafka topics and then adds two Kafka Connector (sink), one for S3 storage (https://www.confluent.io/hub/confluentinc/kafka-connect-s3) and the other for Elasticsearch (https://www.confluent.io/hub/confluentinc/kafka-connect-elasticsearch).
Given the "Stateless" approach, it is very likely that only data on S3 and Elastic will be persisted.
In this type of configuration, it would be interesting to add a Schema-Registry (https://docs.confluent.io/current/schema-registry/index.html) and a ksqlDB (https://ksqldb.io/).
Very interesting conference, especially on the third iteration which we have implemented at SoKube. This validates our choice of architecture. It remains to be seen if the Fluentbit connector (planned for v1.6) will avoid the use of a dedicated KStream for processing.
The session was almost about the new version of Helm (v3) :
Tiller bye bye
Easy upgrade to Helm v3 (helm2to3)
CNCF Graduated Project
Helm2 Depreciation (30 nov 2020)
Process submission charts for community
Schema Validation (Json file)
Tests (with Registre OCI)
CLI enhancement (SDK Go)
Post-render (--post--render | https://helm.sh/docs/topics/advanced/)
Helm tools (helm-diff, helm-file, helm-controller, helm-conftest)
Security (Helm inherits your RBAC (bye bye Tiller, bye bye Rbac Tiller) & Chart provenance
The end of the conference was interesting, but Helm v3 has been released since November 13, 2019. Focusing too much on the new features introduced since v2 was a bit disappointing as we were expecting some focus on more advanced concepts.