Kubernetes Patterns for High Availability
Kubernetes Patterns for High Availability
When designing applications for Kubernetes, high availability is often a critical requirement. In this post, I'll share some battle-tested patterns that we've implemented at Meituan to ensure our services remain available even during node failures, zone outages, or during maintenance operations.
Pod Anti-Affinity
One of the most basic patterns for high availability is ensuring your pods don't all run on the same node. This can be achieved using pod anti-affinity:
```yaml affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- my-app topologyKey: "kubernetes.io/hostname" ```
- key: app
operator: In
values:
This configuration ensures that pods with the label app=my-app
will not be scheduled on the same node.
Topology Spread Constraints
For more advanced distribution of pods across failure domains, you can use topology spread constraints:
```yaml topologySpreadConstraints:
- maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: DoNotSchedule labelSelector: matchLabels: app: my-app ```
This ensures your pods are evenly distributed across availability zones.
Pod Disruption Budgets
To protect your application during voluntary disruptions (like node drains during maintenance), use Pod Disruption Budgets (PDBs):
```yaml apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: my-app-pdb spec: minAvailable: 2 # or use maxUnavailable selector: matchLabels: app: my-app ```
This ensures that at least 2 pods of your application remain available during voluntary disruptions.
Readiness Probes
Properly configured readiness probes ensure that traffic is only sent to pods that are ready to handle requests:
```yaml readinessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 5 periodSeconds: 10 ```
Circuit Breaking
At Meituan, we've implemented circuit breaking at multiple levels:
- Service Mesh Level: Using Istio to implement circuit breaking between services
- Application Level: Using libraries like resilience4j to implement circuit breaking in the application code
- Database Level: Implementing connection pooling with circuit breaking to protect databases from overload
Conclusion
High availability in Kubernetes requires a multi-layered approach. By combining these patterns, you can build resilient applications that can withstand various types of failures.
In my next post, I'll dive deeper into disaster recovery strategies for stateful applications on Kubernetes.