Kubecon Europe 2020 - Day 3
Mis à jour : avr. 8
Third day of the KubeCon CloudNativeCon Europe 2020 virtual event.
Again lot of interesting sessions !
Check out other days:

Keynotes

Vicky Cheung presented the release notes of Kubernetes 1.18.
38 enhancements
Release team: 34 persons
40 000 individual contributors to date !
Key features:
Storage Enhancement:
Raw block device support graduates to Stable
alpha version of CSI Proxy for Windows (to perform privileged storage operations in Windows)
Scheduling Enhancements:
Run multiple scheduling profiles
Taint based Eviction graduation to Stable
PodTopologySpred graduation to Beta
HPA (Horizontal Pod Autoscaler)
Feature in Alpha
Finer grain control on autoscaling rates
Avoid flapping of replicas
Adjust scaling behavior based on application profile
Kubectl alpha debug
Ephemeral containers were added in 1.16
When “kubectl exec” isn’t enough
1 kubectl alpha debug -it demo --image=debian --target=demo
Priority and Fairness for API Server Requests
Protect API servers from being overloaded while ensuring critical requests go through
Prevent loss of user access should anything run amok
Node Topology Manager graduates to Beta:
Useful for high performance computing and machine learning workloads
Other Notable features:
IPv6 Beta
Certificate API
APIServer DryRun and “kubectl diff”
In the next Kubernetes 1.19:
Generic ephemeral volumes
Kube-proxy IPV6DualStack support on Windows
Initial support for cgroup v2 (YES !!!)

Briana Franck explained how to transform your IT projects into the Cloud Native world.
New projects allow teams to re-evaluate the current path of transformation using Design thinking workshop. Take advantage of the start of new projects to catalyze innovation and enforce the culture of automation to gain efficiency. Human and communication is key. An exemple for operate and improve communication is ChatOps.
Automate the deployment of thousands of clusters with ease: exemple with Razee.io

This talk could have been renamed “Zombie land” !
Excellent keynote of Holly Cummins explaining the impacts of bad resources usage on the Planet.
The consumption of resources between data centers and aviation is not that far! (1 to 2% vs. 2.5%)
The Kubesprawls trend (pattern of “many clusters” rather than “one big shared” cluster) is not uncommon for multiple teams in the same organization. Cluster are less elastic than applications and have overhead.
There are several reasons to get your own cluster: isolation, security, perf, name collisions, …
Consolidation with multi-tenancy could be a good approach.
In all cases don’t forget your resources : Zombie workloads ! You know the things you forget in your cluster and that stay “alive” but not really and that consume resources for a while !
manual solutions exists: meeting, tags on manifest objects, …
The solution is to do the right thing
GitOps helps a lot with the infra-as-code: “disposable infrastructure” you can deploy, redeploy and delete in a single operation.
To avoid Zombie workload, See the talk “Sharing Clusters: Learnings From Building a Namespace On-Demand Platform“ especially the part “Monitor cost and identify idle namespaces”

In this presentation, Hannah Foxwell explains what is the path of least resistance to a successful Cloud Native project:
Start by identify Early Adopters
Build a Minimum Viable Product
Don’t think like building the platform as a silo and more as service
Scaling the success
Communication and relation is key
Small steps are easier
Adopt the KISS principle: Keep It Short and Simple
And remember we are human not super human !
Containerd Deep Dive
By Akihiro Suda (Software Engineer, NTT) and Wei Fu (Software Engineer, Alibaba)
This talk was about Containerd, a container runtime implemented the CRI runtime specification.
Containerd is not only used by Docker but also by many distributions: K3s, kubespray, micro8s, Charmed Kubernetes, Kind, minikube, Alibaba ACK, Amazon EKS, Azure AKS, Google GKE, … And also by libs/frameworks like BuildKit, LinuxKit, Fassd, Vmware Fusion Nautilus
A lots of nice features were introduced with version 1.4:
Lazy pulling of images: run containers before completion of downloading the images -> improve start speed: it is based on Stargz & eStargz. With a tar format you cannot navigate to a specific offset until all the file is downloaded
Support for cgroup v2 and improved support for rootless mode (resource limitation support)
Windows CRI support
and much more (see release notes: https://github.com/containerd/containerd/releases/tag/v1.4.0)
Containerd is highly Customizable throught its V2 Runtime:
Runtime v2 with a shim API that allows for instance integration with low level runtime like gVisor, KataContainer, Firecracker.
Binary: specific naming convention
Support pluggable logging via STDIO URIs
Containerd is definitely a good CRI runtime with lots of nice features in v1.4. We especially appreciate the lazy pulling of images and expect a lot from cgroup V2.
What You Didn’t Know About Ingress Controllers’ Performance
By Mikko Ylinen (Senior Software Engineer, Intel) and Ismo Puustinen (Cloud Software Engineer, Intel)
Basically Ingress exposes HTTP(s) routes to services within the cluster. The Ingress controller follows the Kubernetes objects and creates the route configuration. The Ingress Proxy reads the configuration and handles the actual traffic routing
There are several implementations like Nginx, HAProxy, Envoy…
Performance bottleneck are mostly bandwidth and latency.
Tuning areas concern:

A focus on TLS Handshake because most of the time of a request is spent on the TLS handshake.
How to improve it:
TLS 1.3 or a faster cypher…
Sync vs async TLS

Async TLS is currently being added in Ingress
Async TLS Offloading

When to offloading depends on CPU-bound of the controller, if lot of new HTTPs connections, …
Exemple with HAProxy Hardware Acceleration, HAProxy RSA Multibuffer
Call to action:
Check where are your bottlenecks
Ingress Controllers: check config of crypto offload, non native resources, node affinity labels
Ingress Proxy devs: switch to async TLS and allow custom TLS handshake
A good and very technical presentation on how to tune your Ingress if you need to get the best performances.
Managing Multi-Cluster/Multi-Tenant Kubernetes with GitOps
By Chris Carty (Customer Engineer, Independent)
The author presented what is GitOps (You will also find a presentation of GitOps on the Sokube Blog)
2 main GitOps projects:
Project Structure contains your deployment yaml files.
Some useful tools in the GitOps context :
OPA (Open Policy Agent)
Conftest (helps you write tests against structured configuration data)
Kubeval (tool for validating a Kubernetes YAML or JSON configuration file)
Kind (tool for running local Kubernetes clusters using Docker container "nodes" like k3d/k3s):
Git Single vs Multi Repo in case of multi-tenancy
Single Git repo:
Containing the CI, Cluster-admins (monitoring, networking, security), teams-1 (app resources), teams-2 (App resources), … :

each cluster (Dev, QA, Prod, …) Sync on a specific branch or tag
Multi-repo:
Several Git repos can target the same cluster but in different namespaces
Branch/Tag = Grouping/Environnement of cluster


see more info here: fluxcd/multi-tenancy
A nice presentation for this very important topic. The speaker did not have time to go into details and to present the pros and cons of each approach, but it gives important pointers and a way to achieve multi-tenancy using Flux in Kubernetes .
Sharing Clusters: Learnings From Building a Namespace On-Demand Platform
By Lukas Gentele (CEO, DevSpace Technologies Inc.)
2 Approaches
Single-Tenant k8s: 1 Team/App per Cluster => too expensive
Multi-tenant k8s: Sharing large Cluster => less expensive but more complex
Several learning:
Centralize user management and authentification
SSO for k8s via Dex for instance
Restrict user but use smart defaults (UX matters)
Pod Security Policy
Resource Quotas: Set default via LimitRange (Mutating Admission Controller)
Network Policies; Default to deny all (allow traffic from inside the namespace)
Automate as much as possible
template for RBAC, Quotas, Net Pol
OPA for dynamic admission control (ex: Hostname validation for ingress resource or for certificate resource or block certain storage and network conf…)
Store everything in kubernetes + git
Use annotation, labels, secrets config map to store info about owner, tenants, …
GitOps for History, audit, rollback and Approval process vial PR and Code Owner
CRDs for even more control & automation

Do not hide kubernetes but make it easier to use
Engineers need direct access to Kubernetes (to verify, debug, ….)
Kubectl is an API not a dev tool
Simplifying Local Development in Kubernetes with Telepresence
File-Sync-based Dev Experience: Skaffold, DevSpace, Tilt
Monitor cost and identify idle namespaces
automate the shutdown of idle namespaces with:
Cluster-turndown by Kubecost
Sometimes users need more than just namespaces
Namespaces based multi tenancy has limitations: CRD needed by users or specific versions of k8s
Virtual Cluster can solve this problem: vCluster
A very interesting talk on how to address multi-tenancy providing best practices and tools like Telepresence, Cluster-turndown, Sleep-mode, VCluster and Kiosk…
Multi-Tenant Clusters with Hierarchical Namespaces
By Adrian Ludwin (Senior Software Engineer, GKE, Google)
Concept of hierarchical namespaces is new in Kubernetes.
Multitenancy to care about cost and velocity.
One tenant per cluster was for small team
Kubesprawl : it does scale very well to get 1 cluster per tenant or team
A Kubernetes multi-tenancy group exists to adresse those issues.
Namespaces are the primary unit of tenancy in kubernetes but some security features require namespaces:
RBAC works best at the namespace level
Also applies to most other policies : resource Quota, NetworkPolicy, …
Policies across namespaces
need a tool and source of truth outside k8s: Flux, ArgoCD, ….
Alternatively, some in-cluster solution add accounts or tenants: Kiosk or Tenant CRD
Hierarchical Namespace Controller (HNC):
Hierarchical namespaces make it easier to share your cluster by making namespaces more powerful. For example, you can create additional namespaces under your team's namespace, even if you don't have cluster-level permission to create namespaces, and easily apply policies like RBAC and Network Policies across all namespaces in your team (e.g. a set of related microservices).
Entirely Kube native
Builds on regular kube namespaces
Delegate subnamespace creation without cluster privileges !
policy propagation
subnamespace hierarchy
It cannot be moved
Trusted label with tree suffix in the child namespace
Easy to extend
Other features:
Authorization checks before modifying the hierarchy
Cascading deletion of subnamespaces
Monitoring options (Metrics via OpenCensus)
Uninstallation support (to avoid data deletion)
Hierarchical namespaces is in alpha but can simply be added like an addon on Kubernetes 1.15+ (Well done !)
It is a really cool feature that will simplify the multi-tenancy approach in a K8S cluster (especially with a GitOps approach) and give teams control over their own namespace hierarchy.
Automating Load Balancing and Fault Tolerance via Predictive Analysis - Steven Rosenberg, Red Hat
By Steven Rosenberg (Software Engineer, Red Hat)
Load Balancing → Type of solution Fault Tolerance → Live Migration Scheduling → Predictive Analysis
Load Balancing

Priority based upon urgency
Even Distribution within categories :
Urgent priority - Mission Critical - Real Time Processing
High Priority - High Importance - near Real Time Processing
Neutral Priority - Medium importance - Normal Processing
Low Priority - Low importance - Not Time Critical Processing
No Priority - Unimportant - Unimportant Proecesses
Fault Tolerance Redundancy Example

Scheduling
Ability to launch processes based upon needed resources
Monitor the amount of resource each process utilizes
Type of Launching/Migration Scenarios :
Initial Launch
Migration for maintenance
Re-balancing - Migration to another host
Fault recovery - Migrating to mitigate system/process failure

Policy units - Attributes of scheduling Migrations
Filters
Weights/Scoring
Balancers :
Even distribution
Power saving
Prioritizing
Affinity
CPU/Non-Unimform Memory Access Pinning for optimal performance
Live migration process to migrate from source host toward destination host :
Network connectivity
Remote disk availability
Migration data on local disk(s)
Copying memory state in phases
All of the Curren memory contents
Current differences before VM pausing
Minimal difference during VM pausing
Copy CPU State
The goal is to limit pausing of the VM
Restarting the VM on the destination host
Clean up on the source host
Predictive Analysis Topics
Predicting future occurrences via analysis of past performance
Techniques for predictive analysis
Process for developing a prediction model
Predictive analytics methodology :

Get historical data → Create a training Set → Create an algorithm and a model → Get result → Restart process
Process fo developing a prediction model :

Applying Predictive Analytics to Schedulers
Criteria for Data
Processing time / Iteration - Adjusted for resource capacity and priority
Percentage of resources used - Adjusted for capacity and priority
Adjust for anomalies when calculating averages
Ideas - DSelective techniques applied for other scheduling applications :
Comining regression-like modeling and functional approximation using the sum of exponential functions to produce probability estimates
Machine Learning and advanced mathematical models
Predictive analysis architecture :

Tracking historical data :
The time each process starts and terminates
The resources used by each process
The time each process uses to migrate
The time/iteration that memory/disk transfer occurs per size
Consideration based upon analysis
If early can proceed
When early migration shall start
Error correction/anomaly detection for accurate results
This topic was interesting but is a complex subject. It demonstrates how to be proactive about load-balancing and fault-tolerance, based on collected historical data and mathematical compute to prevent future faults. There are interesting models to improve infrastructure reliability.
Simplify Your Cloud Native Application Packaging and Deployments - Chris Crone, Docker
By Chris Crone (Engineering Manager, Docker)
What is a cloud native application ?
“A program or piece of software designed to fulfill a particular purpose” - Oxford English Dictionary
A cloud native application is :
Compute :
Containers
Function (AWS Lambda, Azure Functions…)
Virtual Machines
Storage :
Databases
Object storage
Volumes
Networking
The CNCF Cloud Native Landscape map represents a view of many applications, tools , runtime and more recommended by a CNCF. On this interactive map you can directly visit the product website by clicking on the product logo.
Deploying application
Are you encountering the following errors ? Probably yes !

Often need more than one tool to deploy an application
Is the ReadMe up to date ?
Which version of the tools ?
What if I'm using Windows or Mac and not Linux ?
Difficult coordination problem between team members, CI, users …
Ideal deployment tooling ?
Defined as code : tools, versions options → what the solution ?
Same deployment environment everywhere → what the solution ?
Packaging application
Different parts, different places :

Ideal application packaging ?
Immutable application artefact → what's the solution ?
Store the whole application in a registry → what's the solution ?
Ability to store application artefact online → what's the solution ?
There are many questions without answers!
Cloud Native Application Bundles
→ Here the Cloud Native Application Bundles (CNAB) enter in the picture.
CNAB is a package format specification that describes a technology for building, installing and managing distributed applications, that are by design, cloud agnostic.
CNAB specification
Target is tolling developers
Packaging specification (bundle)
Bundle runtime (action)
Install, upgrade, uninstall
Optionally
Lifecycle tracking
Registry storage
Security
Dependency
Bundle structure :

CNAB runtime
Standard actions : install, upgrade, uninstall
Custom actions :
status, logs …
Stateful, stateless
Application lifecycle tracked by claims
Keep track of state of installation
keep record of parameters, outputs…
Only data structure defined in specification
Finally CNAB can respond to previous questions :
→ Ideal deployment tooling ?
Defined as code : tools, versions options → solution :
Porter tool : Package everything you need to do a deployment (command-line tools, configuration files, secrets and bash scripts to glue it together). Package that into a versions bundle distributed over standard Docker registries or tgz files.
porter.yaml
Stored in CNAB invocation image
Same deployment environment everywhere → solution :
Containers
→ Ideal application packaging ?
Immutable application artifact → solution :
Hashes for components
Leverage OCI image specification
Store the whole application in a registry → solution :
Any OCI compliant container registry
Ability to store application artifact online → solution :
OCI image layout
CNAB in registries :

With CNAB : Different parts, same place :

CNAB Security
Leverage same mechanisms as containers
TUF
In-toto
Notary
Signy → Is a tool that implements CNAB security specification. It implements signing and verifying for CNAB bundles in the cancel formats.
CNAB website : cnab.io Demo code : chris-crone/kubecon-eu-20 Porter : porter.sh
This topic is again interesting. In fact, it allows true multi platform applications and makes application deployment very easy . Cloud Native Bundle Application can be very useful and accelerate new features deployment. It will be very interesting to test this new standardized bundle in toolchain integration. Something to keep an eye on...
How to Work in Cloud Native Security: Demystifying the Security Role
By Justin Cormack Engineer, Docker
Resource : https://static.sched.com/hosted_files/kccnceu20/06/How to work in cloud native security.pdf
What to take away from this conference (high level of abstraction) :
Security must be transverse and present everywhere (Development, Operate = DevSecOps)
There's no secret, you have to: Play, understand, break, fix.
Security is unimportant most of the time and it is often hard to hire people. (Estimated that there will be 3.5 million unfilled security jobs in 2021)
You don't need :
Formal qualifications in security
To have hacked high profile sites
To be a great developer
Understand the threat model
Security is quality :
Handle errors and the unexpected.
Understand the issues in domain.
Write security tests.
A security engineer is a human like any other :
You cannot tell anyone about what you do most of the time.
Not enough people, often overloaded.
Live away from the happy path.

Interesting conference, the speaker depicts the routine of security jobs through his own experience, the difficulties he may have encountered, the complexity of recruiting this type of profile or the negative effects of a job market under stress (burnout, etc..)
Prometheus Deep Dive
By Goutham Veeramachaneni (Software Engineer, Grafana Labs) and Bartlomiej Plotka (Principal Software Engineer, Red Hat)
Favour the federation approach (architecture example) :
A server Prometheus (high retention) scraping the client Prometheus (low retention)
Federation (/federate) updates every (~15s)
The source of your Prometheus clients must be specific (e.g. metrics infrastructure, metrics apps, metrics middleware, etc..)
Avoid the Russian doll effect (Never go beyond 4 subsets)

Don't believe everything you're told :
Prometheus can work over long term retentions of time (2 Years, easy), if you meet a few prerequisites :
Older data use marginal resources when not queried.
Get large SSD and plan data size head of time.
Caveats (Be careful!) :
Hard to plan disk space for future: Uncontrollable cardinality.
Persistent disk backup are not always easy
No downsampling
Remote Write ( So what if I want to send to a different DB ?) :

Remote Endpoints and Storage :

Documentation link : https://prometheus.io/docs/operating/integrations/#remote-endpoints-and-storage
Remote write with Cortex :
Cortex is a CNCF sandbox project used in several production systems including Weave Cloud and Grafana Cloud. Cortex is primarily used as a remote write destination for Prometheus, exposing a Prometheus-compatible query API. – Official documentation (Cortex)
Cortex is :
Horizontally scalable
Highly available
Multi-tenant
Long term storage.
Documentation link : https://cortexmetrics.io/docs/
Metadata & Label naming :
Easy construct your queries with label naming
Documentation link (Best practice) : https://prometheus.io/docs/practices/naming/
Metadata API : https://prometheus.io/docs/prometheus/latest/querying/api/#querying-metric-metadata
Soon :
Metrics metadata persistence
Remote write metadata to other systems
Backfilling (How do I import my metrics from different system to Prometheus?) :
Remote read :

Manually (CLI import) :

This conference was interesting, as it addressed many of Prometheus's often unrecognized functions/aspects.
We would have appreciated a more detailed presentation on the part with Cortex and the demo (using CLI) was not of good enough quality (1000 Kbps) isn't readable.
Deep Dive: Harbor - Enterprise Cloud Native Artifact Registry - Steven Zou & Daniel Jiang, VMware
By Steven Zou (Staff Engineer, VMware) and Daniel Jiang (Software Engineer, VMware)
Harbor is cloud native registry with a solid community graduated this year by CNCF ! This presentation was mainly on new functionalities delivered by v2.0 and v2.1 so there is a little summary:
V2.0:
v2 api to manage other than container (helm, opa bundles, cnab)
aggregate view of artifacts by project/repository
consistent management workflow
tls communication across internal components
trivy as default scanner
v2.1:
Improvement of GC, now it’s non-blocking
Possibility to run GC in dry run mode
Proxy cache, proxied artifacts are considered as local
Management policies can be applied on proxied artifacts
P2P , distribute the deploying content to the P2P network
Extended artifact processor to handle extra annotations
And here is the roadmap:

Why We Are Choosing Cloud Native Buildpacks at GitLab - Abubakar Siddiq, GitLab
What is a buildpack ? Buildpacks are responsible for transforming code in something executable. Buildpacks are composed of a set of scripts and depend on language. Buildpacks are able to retrieve necessary dependencies for a type of language.
Heroku supports some of most popular languages like Java, Ruby, Node.js, Php…
An open source utility (Herokuish) emulates Heroku build and runtime tasks in containers. It depends on Heroku supported buildpacks.
Cloud Native Buildpacks are modular tools that transform source code in an OCI image. (https://buildpacks.io). CNB is opt-in for Auto-Build jobs in AutoDevOps. Here is an example of using it

We think it’s a very interesting way to create and share a builder across projects and easily maintain it. While it’s not fully mature yet (for example there is no equivalent to a herokuish buildpack test) we will watch its evolution very closely.
Optimized Resource Allocation in Kubernetes? Topology Manager is Here - Conor Nolan, Intel & Victor Pickard, Red Hat
By Victor Pickard (Principal Sofware Engineer, Red Hat) and Conor Nolan (Software Engineer, Intel)
For a better understanding of this topic it’s nice to remember what is NUMA (http://www.techplayon.com/what-is-numa-non-uniform-memory-access/) . In Kubernetes we have CPU managers which permits to allocate exclusive CPUs to pods and Device managers which allocate hardware resources.
This presentation introduces a new manager: Topology Manager. This manager is a beta feature in Kubernetes 1.18. Topology manager provides an interface to coordinate resource assignment because CPU and Device manager assign resources independently which can cause some suboptimal allocation. So now Topology manager offers the possibility to assign resources from the same NUMA node and optimize allocation.
This concept introduces 4 manager policies:
None: by default.
Best-effort: attempts to align resources on NUMA nodes
Restricted: attempts to align resources on NUMA nodes, pod fails if not possible to align resources
Single-numa-node: attempts to align resources on a single NUMA node, pod fails if not possible to align resources
Here is an example of allocation:

And some numbers for performance improvement:

This conference was informative. It allows to know how to schedule workload for better performance. It introduce of Topology Manger a new beta feature of Kubernetes 1.18. This new feature enables NUMA alignment of CPUs and peripheral devices, allowing your world to run in an environment optimized for low-latency. Today performance is one of the most important key for everything.
Advanced Logging with Fluent Bit
By Eduardo Silva (Principal Engineer, Arm Treasure Data) and Wesley Pettit (Amazon)
Github : fluent/fluent-bit-kubernetes-logging

Lightweight and more efficient than Fluentd.
Focused in low CPU/Memory usage (~650KB). (It's written in C language, NOT Ruby)
Zero dependencies, unless some special plugin requires them.
Perfect for :
Embedded Linux.
Containers.
Servers.
Already more than 70 plugins.
Easily scrapable by a Prometheus for monitoring.
Soon : OpenTelemetry, KafkaConnector in the roadmap v1.6.