Monkeying Around the KubeCon Paris 2024

Bonjour intelligence artificielle...

and

Mar 26, 2024

Relay Monkey illustration of creme brulee

If you are like me, conferences are always inspiring. However, there are many different approaches people take. Some go to Paris just to enjoy the city while the company pays for the trip. That's not my gig. Therefore, I’m happy to share my thoughts and hopefully continue the conversation. This isn’t the full conference digest, but few snippets I found interesting which are decorated with additional words for your own amusement.

Inference

…reminds me of observability. Conceptually, a new word trying to do its job and I need to learn what it means. Charity Majors at some point in the past presented observability to encapsulate monitoring, metrics, traces… — under one umbrella, which at the time was at best referred to as monitoring. Observability did its job, it improved the engineering vocabulary.

Oxford dictionary defines inference as “something that you can find out indirectly from what you already know”. It doesn’t mean much just stating this, but in certain context it makes more sense. At the conference, it’s always mentioned in the context of generative AI.

In the context of AI, inference refers to the process by which already trained machine learning model applies its learned knowledge to make predictions or decisions based on new, unseen data. Once a model has been trained on a dataset to recognize patterns or make associations, the model can then infer outputs when presented with new inputs.

In layman’s terms, in AI world, it’s a process of generating output based on your prompt.

NVIDIA

NVIDIA is big in AI word. This also means AMD is lagging. Congratulations to those that onboarded the stock early enough. You folks know who you are if you are reading this.

There’s so much stuff to talk about here, but let’s keep it brief. You can have breakout sessions for things you find most appealing.

NVIDIA’s keynote was just a few days before KubeCon. They are dedicated and proving to be winning in generative AI hardware race. More importantly, it’s not just hardware, they are investing significantly (along with partners) in simplifying the way customers can use GPU resources within Kubernetes clusters. I believe this is the key to success!

Integrating hardware with platforms like Kubernetes that are used for training and serving models lowers the barrier to entry. CNCF ecosystem of products helps facilitate and accelerate end-to-end lifecycle of AI projects. Feature where easy to use Kubernetes primitives are extended with ability to consume GPU resources — is a killer. Shout-out to C-suits that supported engineering efforts investing is surrounding ecosystems outside of GPU development alone. Core technology is often just a piece of the puzzle. Building ecosystems around it is what we should always strive to do. Good job NVIDIA.

New Blackwell GPU architecture is used as a core enabler for future scale. Measurable performance factors are order of magnitude faster than before. 5th generation NVLink networking components connect everything together. It’s fascinating how much innovation in different areas is required to bring one chip to market, to put it in the hands of YAML.

It’s obvious that (these days) proper scaling is done through distributed systems. To achieve that, it’s mandatory to have high performance networking components and core technology that is built to thrive in distributed environment.

General consensus is that we lack the observability on GPU level. No doubt we’ll catch up over time. Streamlining GPU resource consumption within Kubernetes is fairly new, so let’s give it some time to mature. With all the GPU sharing mechanisms, I suspect operators are willing to have better insight into what’s happening on lower level to be able to better utilize GPU resources.

Trivia related to AMD, NVIDIA’s competitor, is related to George Hotz aiming to move a needle in a positive direction with AMD’s CEO Lisa Su, unfortunately without success. My notion is that AMD needs a mindset update, it needs to accept the need to make their hardware more accessible to developers by investing in ecosystems that consume GPUs.

Hugging Face

Where were you guys? What am I missing here? Your company is partially based in France and deeply engaged in AI. Is there a backstory here? But fine, Ollama was there and maybe it makes more sense on second thought.

Ollama

Awesome project which brings us closer to streamlining AI model serving.

I’ve started following them on GitHub when they said we can run AI models locally, on our MacBooks. At the time I didn’t fancy to try it, nor do I do today (yet), but when I saw it deployed in Kubernetes while exposing interface for consuming deployed model, it became clear that it’s more than just a playground.

Keep an eye on Ollama as your runtime for AI models.

Kubernetes and CNCF

It’s all about Kubernetes. CNCF products are built for Kubernetes. It’s mature with no signs of stepping down from the throne as the workload orchestration king. This is at least true in this world of mortals where Borg (Google’s orchestration system) doesn’t exist. I’m so happy to move from conversations where we talk about how complex is to get Kubernetes running or operate.

This year we are hearing “Kubernetes as OS”. That’s great. That means we have a platform to build on top.

CNCF even published “Cloud Native Artificial Intelligence Whitepaper” by CNCF AI Working Group. I view this as a public declaration demonstrating their strong commitment to advancing the AI movement. Worth reading for inspiration on tools from CNCF landscape that might be used in your own AI journey.

GitHub Kubernetes agent-based cluster lifecycle management and hydrating templates

GitHub engineers were motivated to develop a custom agent-based Kubernetes cluster lifecycle management system with node agent and controller.

One of the takeaways is that all configuration for expected cluster state is committed to (some) repository! PRs are part of configuration change lifecycle. Nothing new, just want to double down on importance of this. It’s GitOps.

There’s a number of places where configuration bundle gets hydrated. I know, another colloquial speech. This is easier to understand compared to inference, though. Hydrating templates means that configuration gets updated based on context during the whole deployment pipeline.

Folks use hubot chatbot to automate a lot of their operations. CLI tool also exists for interacting with developed controller and node agent. RBAC is managed through central LDAP.

HwameiStor

Think about HwameiStor (created by DaoCloud) as a low level storage management system configurable from Kubernetes. It’s DirectPV (created by MinIO) on steroids. Look into it if you are dealing with cloud native storage solutions. They are claiming light footprint in terms of system resources.

Lunch #1

I met an engineer who works at ING Bank. We had a nice chat about methods to approach different challenges, but what stuck with me was his mention of “Anger Driven Development” (sometimes). I had a laugh when I heard this one, reminding me of TDD. I know I’d be frustrated in his position. There’s so much regulation that it’s almost impossible to bring new technology without serious audit. Having an audit is fine, but it takes too long.

Out of interest, I’ve checked if “Anger Driven Development” was mentioned at any time before and it seems there’s a reference at RexBlog (2016) and Insane Ramblings of an Egomaniac (2014). Some forks exist in form of “[Hate, Rage, Emotion] Driven Development”.

If you’re a bank, think about the ways you can accelerate bringing new vendors and pieces of software developed outside of the company.

If you are a service provider, look to meet the compliance that’s required by the banks. It’ll be easier for highly regulated industries to accept your solution.

If you are a security vendor, look how to improve the speed of validating software to comply with regulations. Software supply chain security is becoming the thing.

Paris has a good cycling infrastructure.

Elastic

Elastic went big on building Kubernetes controllers. Elastic is committed to migrate lifecycle management of their managed services, like Elasticsearch residing in Elastic Cloud, to be managed by controllers. Good stuff.

In general, full fledged Kubernetes operators, among other things, are proving to be a good choice for solving at least two things: streamlining product lifecycle at scale and providing simple interface for setting desired cluster state with CRDs.

For building controllers, Elastic heavily relies on controller-runtime. From their perspective, it bootstraps a lot of logic they would require, so there’s no need to reinvent the wheel.

They trust Argo Workflows to help them in the process.

Mercedes

I think that GQE should have carpet padding in doors where bottles go (come on, it’s E class) and it should remove the capacitive buttons on the steering wheel.

Related to infrastructure, Mercedes has a platform team. To me, it’s interesting they only provide Kubernetes and some managed applications through UI. Each product team is responsible to take it from there, as far as I was able to gather from the conversation around their booth. I worry that it’s cutting UX too soon in application lifecycle management. Meaning, I’d expect platform team to take it a step further in orchestrating application deployment and observability, at least. My opinion is based on walk by conversation, so maybe there’s more to it.

Takeaway here, as always, is that Kubernetes isn’t a silver bullet. It’s a tool in a toolbox that needs to be integrated into organization, orchestrated with surrounding automation tooling and instrumented with supporting services, as a minimum. Point is, handing over vanilla Kubernetes to product teams doesn’t magically make them better.

It’s great to see an automaker embracing Kubernetes so seriously. Gut gemacht!

eBPF

…will catch you, you can’t run from it. Cilium (by Isovalent) will probably be the first to get you.

Look into eBPF if you didn’t already. Nice documentary on the history of eBPF presents the story about the people who brought it to life. eBPF is finding its application in observability, networking and security, but no kernel event will be left untouched and wrapped into a product.

Tetragon is a Cilium addition and it allows you to apply policies on security related kernel events. This eBPF and kernel related stuff is still low level. For example, which kernel events to track? Well, even experienced cluster operators go to their dedicated kernel teams to ask questions.

I’m excited to see eBPF optimize network performance, provide greater observability and security.

UX (using Kubernetes)

A team presented findings based on the research they conducted related to user expectations from Kubernetes.

Actively promote infrastructure services to developers. Maybe developers don’t know how to solve their challenges, which are maybe trivial already. Ideally, have ambassadors.

Observability in each stage of application lifecycle is important to users: From building container images to state of application in production. As much of correlation between events is presented, the better. Vulnerability scanning is preferred to be in CI.

Policies that validate deployment definitions are welcome. Policies build developer’s confidence when they pass the validation and they can enforce cluster operation best practices.

In the battle between GUI vs. CLI, GUI is for less experienced teams.

Health indicators are best when automated or easy to implement. Provide product teams with ability to easily enable basic service availability metrics and alerts. Look into Ingress Monitor Controller.

Lunch #2

Met guys from CROZ and Adcubum. Overall impressions, many challenges that cloud native operation teams face are common, though solutions are also known — but this is where company organization, not technology, plays a big role. Organization needs to listen and sign off investment where needed.

WebAssembly

I got the idea that there’s a wave of WebAssembly (WASM) excitement and adaption in certain scenarios. It claims to be fast, portable and lightweight. Example mentioned was Node.js 400 MB binary compressed to 2 MB. WASM requires special runtime though, not container runtime. You might find SpinKube interesting if you wish to try it.

Datadog

Since gRPC has a tendency to establish a connection and keep it open, this means that distribution of connections on serving Pods is often unbalanced. There’s often significantly more connections on one Pod compared to the other. Takeaway here was to implement load balancing of server side (Pod) IPs on client side. The list of IPs is received by the client since Service resources receiving gRPC traffic were created as headless. Now, since Datadog invested significantly in managing DNS in their clusters (they own DNS component code), I was wondering why weren’t they implementing the logic to serve required IP from the list of IPs in DNS component to gRPC client. Consequently, none of the clients would require reworking its code and very advanced IP load balancing mechanisms could be implemented in DNS component. I might be missing something here, though.

Nonetheless, interesting to see how Kubernetes opens the door for interventions and offers a way for companies to bend it to their needs.

They consider service mesh to be harder to debug. Resource consumption in sidecar model was also challenging. The challenge was the range of resources sidecars consumed. In an example, CPU consumed was in range from 100m to 1500m. This poses a problem since CPU consumption isn’t predictable for all sidecars and generally over provisioning to cover peak load leads to wasted resources. I think that in-place Pod resource updates could come handy in the future. They solved it by not using sidecars and using Cilium.

While comparing Linkerd with Istio (sidecar model), Linkerd had lower latency. Worth mentioning that Istio’s Ambient mode isn’t using sidecar model, but ztunnel agent per node. Additional instrumentation with waypoint proxies is needed to manage L7 traffic policies.

Overall, we need to be mindful with sidecar model in any service mesh since it can cause financial, cognitive and performance issues.

It’s recommended to look into Cilium when thinking about (multi-cluster) mesh performance and resource utilization.

Let Datadog folks know if you want to work for them. They work on cool stuff.

Sustainability

We are slooowly getting to a point where people are starting to actively think about the environmental impact. Firstly, new hardware that is being developed is designed to consume less power. Secondly, as I hear, regulation that requires cloud providers to surface more power related data is coming.

Currently it’s hard to quantify cloud power consumption, therefore the lack of observability tooling. Kepler offers CPU centric power level metrics.

Cluster API

Multi-cluster, in my opinion, is still not straight forward. There are a lot of approaches and solutions, but we haven’t yet settled on the best way to manage multi-cluster or multi-provider workloads and networking.

Cluster API SIG presented challenges in naming CRDs and how it’s hard to find common ground to allow different implementations for multi-cluster management. In my opinion, talk boiled down to managing cluster metadata with CRDs. It can prove to be powerful in the future, but no significant leaps forward as of right now. Interesting presentation though.

Lunch #3

Fun bit. While heading to lunch, I overheard a guy walking towards the coffee talking to his colleague, saying: “I need spice that extends consciousness” — referring to his need for coffee in Dune language. I found it funny at the time.

WebSockets

Similar to gRPC, WebSockets have a tendency to keep one connection open. Engineering team managing EV charging stations mentioned how UX can be degraded if connections need to re-establish multiple times in a short period of time. This is exactly what was happening during rolling deployment. They fixed it by switching to blue-green deployment using Argo Rollouts, therefore only one network re-connection was required by the client.

Team also noticed how event driven architecture gave them a natural way of protecting downstream workloads from overwhelming them. Worst case, back pressure would cause queue to start filling up until processed.

Notebooks

People generally use MacBooks, mainly Pro model and ThinkPads, mainly X1 model. There were other notebook brands visible, of course.

Altinity, Percona, Google

Panel consisting of Altinity CEO (ClickHouse operator), Percona evangelist Edith Puclla and Clayton Coleman distinguished engineer from Google (worked on StatefulSet, among other things) discussed DBs in Kubernetes. Unanimous answer — yes to DBs in Kubernetes. I agree. Many still don’t.

Explore more

KubeVirt (VMs for impractical-to-containerize applications), Kaito (AI toolkit), Karmada (advanced scheduling), KServe (model inference platform).

To conclude, this is how you order two crèmes brûlées: “Deux crèmes brûlées, s'il vous plaît”. You’re welcome.

A guest post by

Alen Zubic

Break. Build. Deploy.