Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions content/posts/2025/how-to-improve-stabity-low-cost-aks/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
---
url: "/posts/how-to-improve-stabity-low-cost-aks/"
title: How I helped company to improve stability of low-cost AKS clusters
draft: true
date: 2025-12-07
tags: [Kubernetes, AKS, Stability, DevOps]

---

# Introduction

Check failure on line 10 in content/posts/2025/how-to-improve-stabity-low-cost-aks/index.md

View workflow job for this annotation

GitHub Actions / Markdown Lint

Multiple top-level headings in the same document

content/posts/2025/how-to-improve-stabity-low-cost-aks/index.md:10 MD025/single-title/single-h1 Multiple top-level headings in the same document [Context: "Introduction"] https://github.com/DavidAnson/markdownlint/blob/v0.37.4/doc/md025.md

Check failure on line 10 in content/posts/2025/how-to-improve-stabity-low-cost-aks/index.md

View workflow job for this annotation

GitHub Actions / Markdown Lint

Headings should be surrounded by blank lines

content/posts/2025/how-to-improve-stabity-low-cost-aks/index.md:10 MD022/blanks-around-headings Headings should be surrounded by blank lines [Expected: 1; Actual: 0; Below] [Context: "# Introduction"] https://github.com/DavidAnson/markdownlint/blob/v0.37.4/doc/md022.md
In my company we run our product on Azure Kubernetes Service (AKS) clusters. We use for non production environments spot machines for nodes. What that means is that those nodes can be evicted at any time when Azure needs the capacity back. But as trade-off for that we pay much less for those nodes. Sometimes even up to 80% less of original price.

After recent months of running our product on those low-cost AKS clusters, we started to experience stability issues. It was getting much worse and we had to solve this problem.

I was given task - improve the stability of the cluster but keep the cost low.

Sounds challenging, right?

Let's dive in.


Check failure on line 21 in content/posts/2025/how-to-improve-stabity-low-cost-aks/index.md

View workflow job for this annotation

GitHub Actions / Markdown Lint

Multiple consecutive blank lines

content/posts/2025/how-to-improve-stabity-low-cost-aks/index.md:21 MD012/no-multiple-blanks Multiple consecutive blank lines [Expected: 1; Actual: 2] https://github.com/DavidAnson/markdownlint/blob/v0.37.4/doc/md012.md
## Chapter 1: Identifying the stability problem

## Chapter 2: Finding solution

## Chapter 3: Tains and tolerations

## Chapter 4: Applying the changes and monitoring results

The most important part as in develping software is to test and monottor new changes.

Check warning on line 30 in content/posts/2025/how-to-improve-stabity-low-cost-aks/index.md

View workflow job for this annotation

GitHub Actions / Spell Check

"develping" should be "developing".

In k8s world is the same. What can look good in theory, can fail in practice. For that reason I before actually appling it to the clusters and did several series of tests to to make sure this will work.

Check warning on line 32 in content/posts/2025/how-to-improve-stabity-low-cost-aks/index.md

View workflow job for this annotation

GitHub Actions / Spell Check

"appling" should be "applying" or "appalling".

## Conclusion
Loading