🐶
Kubernetes

Kubernetes StatefulSets vs Pods for Persistent Storage

By Jan on 02/05/2025

Learn why Kubernetes StatefulSets are essential for applications needing persistent state, even though stateless pods can access persistent volumes.

Kubernetes StatefulSets vs Pods for Persistent Storage

Table of Contents

Introduction

This guide provides a comprehensive walkthrough of deploying Apache Solr on Google Kubernetes Engine (GKE), a managed Kubernetes offering. We'll cover essential steps, from setting up your Kubernetes cluster to deploying and managing your Solr application.

Step-by-Step Guide

  1. Install Kubernetes: Choose an installation method from the Kubernetes documentation (https://kubernetes.io/docs/setup/) that suits your environment (local, cloud, datacenter).
  2. Deploy etcd: As etcd is crucial for storing Kubernetes cluster data, ensure a robust backup and recovery plan. Refer to the official documentation for in-depth etcd management (https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/).
  3. Define Solr Deployment: Create a Kubernetes Deployment YAML file to define your Solr deployment. This file specifies the number of Solr replicas, resource limits, and other configurations.
apiVersion: apps/v1
kind: Deployment
metadata:
  name: solr-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: solr
  template:
    metadata:
      labels:
        app: solr
    spec:
      containers:
      - name: solr
        image: solr:latest
        # Add any necessary configuration and volume mounts here
  1. Create a Service: Define a Kubernetes Service to expose your Solr deployment. This allows other applications to access Solr.
apiVersion: v1
kind: Service
metadata:
  name: solr-service
spec:
  selector:
    app: solr
  ports:
  - protocol: TCP
    port: 8983
    targetPort: 8983
  1. Consider Sidecar Containers: If your Solr setup requires additional functionalities like logging or monitoring, explore using sidecar containers. Ensure proper configuration and monitoring of sidecars to avoid impacting application deployment.
  2. Persistent Storage: For stateful applications like Solr, utilize Kubernetes Volumes to persist data beyond the lifecycle of pods. This ensures data durability and consistency.
  3. Autoscaling (Optional): If you anticipate variable workloads, leverage GKE's custom compute classes to fine-tune autoscaling behavior based on your application's needs.
  4. Deploy to Kubernetes: Use kubectl, the Kubernetes command-line tool, to deploy your Solr application.
kubectl apply -f solr-deployment.yaml
kubectl apply -f solr-service.yaml
  1. Monitor and Manage: Regularly monitor your Solr deployment for health, performance, and resource utilization. Use Kubernetes tools and dashboards for efficient management.

Code Example

This code defines a Kubernetes setup for deploying a Solr search engine. It includes a deployment specification for creating three Solr instances with persistent storage and a sidecar container for logging. A service is defined to expose Solr on port 8983, and a persistent volume claim requests 10GB of storage for data persistence. The deployment uses a busybox image for the sidecar to tail log files. Before deployment, a compatible persistent volume needs to be provisioned in the Kubernetes cluster.

This example demonstrates deploying Solr on Kubernetes with a sidecar container for logging and persistent storage for data.

1. Solr Deployment (solr-deployment.yaml):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: solr-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: solr
  template:
    metadata:
      labels:
        app: solr
    spec:
      containers:
      - name: solr
        image: solr:latest
        ports:
        - containerPort: 8983
        volumeMounts:
        - name: solr-data
          mountPath: /var/solr/data
      - name: log-sidecar
        image: busybox
        command: ["sh", "-c", "tail -f /var/log/solr/*.log"]
        volumeMounts:
        - name: solr-logs
          mountPath: /var/log/solr
      volumes:
      - name: solr-data
        persistentVolumeClaim:
          claimName: solr-pvc
      - name: solr-logs
        emptyDir: {}

2. Solr Service (solr-service.yaml):

apiVersion: v1
kind: Service
metadata:
  name: solr-service
spec:
  selector:
    app: solr
  ports:
  - protocol: TCP
    port: 8983
    targetPort: 8983

3. Persistent Volume Claim (solr-pvc.yaml):

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: solr-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

Explanation:

  • Deployment:
    • Deploys 3 replicas of Solr.
    • Mounts a PersistentVolumeClaim solr-pvc to /var/solr/data for data persistence.
    • Includes a sidecar container log-sidecar using the busybox image to tail Solr logs.
    • Mounts an emptyDir volume solr-logs to share logs between Solr and the sidecar.
  • Service:
    • Exposes the Solr deployment on port 8983.
  • PersistentVolumeClaim:
    • Requests 10GB of storage with ReadWriteOnce access mode.

Deployment Steps:

  1. Create the PersistentVolume: Before deploying, ensure you have a suitable PersistentVolume provisioned in your cluster that matches the requirements of the solr-pvc.
  2. Apply the YAML files:
    kubectl apply -f solr-pvc.yaml
    kubectl apply -f solr-deployment.yaml
    kubectl apply -f solr-service.yaml

Notes:

  • This is a basic example and can be further customized based on your specific needs.
  • Consider using a more robust logging solution like Fluentd or Logstash instead of a simple sidecar.
  • Configure appropriate resource limits and requests for the Solr container based on your workload.
  • Implement a backup and recovery strategy for your persistent volume to prevent data loss.
  • Monitor your deployment using Kubernetes tools and dashboards for optimal performance and stability.

Additional Notes

General:

  • Resource Planning: Before deploying, carefully plan your resource requirements (CPU, memory, storage) for both Solr and any sidecar containers. Consider factors like expected query volume, indexing frequency, and data size.
  • Security: Implement appropriate security measures, such as network policies, role-based access control (RBAC), and secure communication channels (TLS/SSL) to protect your Solr deployment.
  • Configuration Management: Use a configuration management tool like Helm or Kustomize to streamline deployments, manage configuration files, and simplify updates.
  • Testing: Thoroughly test your Solr deployment in a staging environment before deploying to production. This includes performance testing, load testing, and functional testing.

Solr Specific:

  • Solr Configuration: The provided YAML files offer a basic setup. You'll likely need to customize Solr configuration files (e.g., solrconfig.xml, schema.xml) based on your specific indexing and search requirements.
  • SolrCloud: For larger deployments or high availability, consider using SolrCloud, which allows you to distribute your Solr index across multiple nodes for scalability and fault tolerance.
  • Monitoring and Logging: Implement robust monitoring and logging solutions to track Solr's health, performance, and errors. This will help you identify and troubleshoot issues proactively.

Kubernetes Specific:

  • Liveness and Readiness Probes: Configure liveness and readiness probes for your Solr containers to ensure that Kubernetes can detect and handle unhealthy instances.
  • Resource Limits and Requests: Set appropriate resource limits and requests for your Solr containers to prevent resource contention and ensure predictable performance.
  • Pod Disruption Budgets (PDBs): Use PDBs to define how many replicas of your Solr deployment can be down simultaneously during maintenance operations like upgrades or node draining. This helps maintain availability during planned disruptions.

Sidecar Containers:

  • Resource Optimization: Choose lightweight sidecar images and configure them to minimize resource consumption, especially if you have multiple sidecars running alongside your Solr containers.
  • Sidecar Communication: If your sidecar needs to communicate with the main Solr container, establish a secure and efficient communication mechanism (e.g., shared volumes, localhost networking).

Persistent Storage:

  • Storage Class: Choose a storage class that meets your performance and availability requirements. Consider factors like IOPS, throughput, and replication.
  • Backup and Recovery: Implement a robust backup and recovery strategy for your persistent volumes to protect against data loss. Regularly back up your Solr data and test your recovery procedures.

Summary

This document outlines the steps to deploy a Solr application on a Kubernetes cluster.

Prerequisites:

  • A running Kubernetes cluster.
  • kubectl command-line tool installed and configured.

Steps:

  1. Install Kubernetes: Choose an appropriate installation method for your environment from the official Kubernetes documentation.
  2. Deploy etcd: Ensure a robust backup and recovery plan for etcd, the key-value store for Kubernetes cluster data.
  3. Define Solr Deployment: Create a solr-deployment.yaml file to define your Solr deployment, specifying replicas, resource limits, and other configurations.
  4. Create a Service: Define a solr-service.yaml file to expose your Solr deployment, allowing other applications to access it.
  5. Consider Sidecar Containers: Utilize sidecar containers for additional functionalities like logging or monitoring, ensuring proper configuration and monitoring.
  6. Persistent Storage: Implement Kubernetes Volumes to persist Solr data beyond the lifecycle of pods, ensuring data durability and consistency.
  7. Autoscaling (Optional): Leverage GKE's custom compute classes for fine-tuned autoscaling based on your application's workload.
  8. Deploy to Kubernetes: Use kubectl to deploy your Solr application:
    kubectl apply -f solr-deployment.yaml
    kubectl apply -f solr-service.yaml
  9. Monitor and Manage: Regularly monitor your Solr deployment for health, performance, and resource utilization using Kubernetes tools and dashboards.

Key Considerations:

  • etcd Management: Proper etcd management is crucial for cluster stability.
  • Sidecar Configuration: Carefully configure and monitor sidecar containers to avoid impacting application deployment.
  • Persistent Storage: Choose appropriate storage solutions based on your data persistence needs.
  • Monitoring and Management: Implement robust monitoring and management practices for optimal performance and stability.

Conclusion

Deploying Apache Solr on Kubernetes offers a robust and scalable solution for search infrastructure. By leveraging Kubernetes features like persistent storage, sidecar containers, and autoscaling, you can create a highly available and performant Solr deployment. Remember to carefully consider your specific requirements, configure Solr and Kubernetes components appropriately, and implement robust monitoring and management practices for optimal performance and stability.

References

  • Operating etcd clusters for Kubernetes | Kubernetes Operating etcd clusters for Kubernetes | Kubernetes | etcd is a consistent and highly-available key value store used as Kubernetes' backing store for all cluster data. If your Kubernetes cluster uses etcd as its backing store, make sure you have a back up plan for the data. You can find in-depth information about etcd in the official documentation. Before you begin Before you follow steps in this page to deploy, manage, back up or restore etcd, you need to understand the typical expectations for operating an etcd cluster.
  • About custom compute classes | Google Kubernetes Engine (GKE ... About custom compute classes | Google Kubernetes Engine (GKE ... | Understand how to control autoscaled node attributes in GKE clusters.
  • Getting started | Kubernetes Getting started | Kubernetes | This section lists the different ways to set up and run Kubernetes. When you install Kubernetes, choose an installation type based on: ease of maintenance, security, control, available resources, and expertise required to operate and manage a cluster. You can download Kubernetes to deploy a Kubernetes cluster on a local machine, into the cloud, or for your own datacenter. Several Kubernetes components such as kube-apiserver or kube-proxy can also be deployed as container images within the cluster.
  • Kubernetes Sidecar Containers: Use Cases and Best Practices Kubernetes Sidecar Containers: Use Cases and Best Practices | Explore essential steps for configuring sidecars and best practices for monitoring to ensure sidecar containers don't hinder application deployment.
  • ![Kubernetes Tutorial for Beginners FULL COURSE in 4 Hours ... [Kubernetes Tutorial for Beginners FULL COURSE in 4 Hours ... | Nov 6, 2020 ... Persisting Data in K8s with Volumes ▻ The need for persistent storage ... stateful applications ▻ Deployment of stateful and stateless ...

Were You Able to Follow the Instructions?

😍Love it!
😊Yes
😐Meh-gical
😞No
🤮Clickbait