codeWithYoha logo
Code with Yoha
HomeArticlesAboutContact
Distributed Systems

Mastering Distributed Locking & Consensus: Redis, ZooKeeper, and Etcd Explained

CodeWithYoha
CodeWithYoha
20 min read
Mastering Distributed Locking & Consensus: Redis, ZooKeeper, and Etcd Explained

Introduction

In the realm of modern software architecture, distributed systems have become the norm. Microservices, cloud-native applications, and horizontally scalable databases all rely on multiple independent processes or nodes working together to achieve a common goal. While this distributed nature offers immense benefits in terms of scalability, fault tolerance, and resilience, it introduces a unique set of challenges, particularly around coordinating shared resources and maintaining data consistency across disparate components.

Two critical concepts emerge when tackling these challenges: Distributed Locking and Consensus. Distributed locking ensures that only one process can access a shared resource at any given time, preventing race conditions and data corruption. Consensus, on the other hand, is about getting multiple nodes to agree on a single value or state, even in the face of failures. Mastering these concepts and the tools that implement them is paramount for building robust, reliable, and performant distributed applications.

This comprehensive guide will delve deep into the mechanics of distributed locking and consensus, exploring three prominent technologies: Redis, Apache ZooKeeper, and Etcd. We'll uncover their underlying principles, examine their strengths and weaknesses, provide practical code examples, and discuss best practices for leveraging them in real-world scenarios.

Prerequisites

To get the most out of this guide, a basic understanding of the following concepts will be beneficial:

  • Distributed Systems Fundamentals: Concepts like network latency, partial failures, and concurrency.
  • Basic Programming: Familiarity with Python, Java, or Go for understanding code examples.
  • Key-Value Stores: General knowledge of how key-value databases work.

Understanding Distributed Locking

Distributed locking is a mechanism used to control access to a shared resource in a distributed environment. Just as a mutex or semaphore protects shared memory in a single process, a distributed lock protects shared resources (like a database record, a file, or a section of code) across multiple processes running on different machines.

Why is Distributed Locking Challenging?

Implementing a robust distributed lock is significantly harder than its single-process counterpart due to the inherent complexities of distributed systems:

  1. Network Partitions: What happens if the network connection between a client holding a lock and the lock service is severed? The client might assume it still holds the lock, while the lock service might have timed it out and granted it to another client.
  2. Process Crashes: If a client holding a lock crashes before releasing it, the lock can be held indefinitely, leading to a deadlock.
  3. Clock Skew: Different machines can have slightly different clocks, making time-based lock expiration unreliable.
  4. Performance: The overhead of acquiring and releasing locks across a network can impact application performance.
  5. Fault Tolerance: The lock service itself must be highly available and fault-tolerant; otherwise, it becomes a single point of failure.

These challenges highlight the need for sophisticated algorithms and robust infrastructure to implement distributed locking effectively.

The Role of Consensus

While distributed locking focuses on mutual exclusion, consensus is a broader and more fundamental problem in distributed computing. It's about ensuring that all non-faulty processes in a distributed system agree on a single value or decision. This agreement must be reached even if some processes fail or messages are lost or delayed.

Consensus protocols are the bedrock for achieving strong consistency, leader election, and atomic commits in distributed databases and systems. Famous consensus algorithms include Paxos and Raft. Raft, in particular, was designed to be more understandable than Paxos while offering similar fault tolerance guarantees. These protocols typically involve a series of message exchanges, proposals, and votes among a set of nodes to reach a definitive agreement.

Redis for Distributed Locking

Redis, primarily an in-memory data structure store, is often pressed into service for distributed locking due to its speed and atomic operations. While it's excellent for simple, best-effort locks, it's crucial to understand its limitations regarding strong consistency guarantees.

How SET NX PX Works

The most common way to implement a distributed lock with Redis is using the SET command with the NX (Not eXist) and PX (expire in milliseconds) options. This command allows you to set a key only if it doesn't already exist and simultaneously set an expiration time (TTL).

  • SET mylock_key my_unique_value NX PX 10000:
    • NX: Ensures the key is only set if it does not already exist. This is the atomic "acquire lock" operation.
    • PX 10000: Sets an expiration of 10,000 milliseconds (10 seconds) for the lock. This is crucial for preventing deadlocks if a client crashes.

To release the lock, the client simply DELetes the key. However, to prevent one client from deleting another client's lock (e.g., if the original client's lock expired and another client acquired it), the my_unique_value (often a UUID) is used. The client should only delete the key if its value matches the one it set.

import redis
import uuid
import time

class RedisLock:
    def __init__(self, host='localhost', port=6379, db=0):
        self.client = redis.Redis(host=host, port=port, db=db)
        self.lock_key_prefix = "distributed_lock:"

    def acquire_lock(self, resource_name, expiry_ms=5000, max_retries=5, retry_delay_s=0.1):
        lock_key = self.lock_key_prefix + resource_name
        lock_value = str(uuid.uuid4()) # Unique value for this lock attempt

        for _ in range(max_retries):
            # SET NX PX: Set if Not eXist, with Px (milliseconds) expiry
            if self.client.set(lock_key, lock_value, nx=True, px=expiry_ms):
                print(f"Acquired lock '{resource_name}' with value '{lock_value}'")
                return lock_value # Return the unique value to identify the lock
            print(f"Failed to acquire lock '{resource_name}', retrying...")
            time.sleep(retry_delay_s)
        print(f"Max retries reached, could not acquire lock '{resource_name}'")
        return None

    def release_lock(self, resource_name, lock_value):
        lock_key = self.lock_key_prefix + resource_name
        # Use a Lua script for atomic check-and-delete
        # This prevents releasing a lock that has been acquired by another client
        lua_script = """
        if redis.call("get", KEYS[1]) == ARGV[1] then
            return redis.call("del", KEYS[1])
        else
            return 0
        end
        """
        
        # Execute the Lua script atomically
        if self.client.eval(lua_script, 1, lock_key, lock_value):
            print(f"Released lock '{resource_name}' with value '{lock_value}'")
            return True
        else:
            print(f"Failed to release lock '{resource_name}' (value mismatch or lock already expired)")
            return False

# Example Usage:
if __name__ == "__main__":
    lock_manager = RedisLock()
    resource = "my_critical_resource"

    # Attempt to acquire lock
    my_lock_value = lock_manager.acquire_lock(resource, expiry_ms=2000)

    if my_lock_value:
        try:
            print("Performing critical operation...")
            time.sleep(1) # Simulate work
        finally:
            # Ensure lock is released even if an error occurs
            lock_manager.release_lock(resource, my_lock_value)
    else:
        print("Could not perform critical operation, lock not acquired.")

    # Demonstrate another process trying to acquire the lock
    print("\n--- Another process trying to acquire ---")
    another_lock_value = lock_manager.acquire_lock(resource, expiry_ms=2000)
    if another_lock_value:
        lock_manager.release_lock(resource, another_lock_value)

The Redlock Algorithm

While SET NX PX works for a single Redis instance, it suffers from a single point of failure. If the Redis master crashes and a replica is promoted, any locks acquired on the old master might be lost or become invalid if the replica hasn't synchronized them. This led to the proposal of Redlock by Salvatore Sanfilippo (creator of Redis).

Redlock attempts to achieve a more robust distributed lock by requiring clients to acquire locks on a majority of independent Redis master instances (e.g., 3 out of 5). The algorithm involves:

  1. Getting the current time in milliseconds.
  2. Attempting to acquire the lock on all N Redis instances using SET NX PX with a small, uniform timeout.
  3. Calculating the time elapsed since step 1.
  4. If the lock was acquired on a majority of instances (N/2 + 1) AND the elapsed time is less than the lock's validity time (expiry_ms minus network latency), the lock is considered successfully acquired.
  5. If not, the client releases the lock on all instances it managed to acquire it from.

Redis Limitations and Use Cases

Redlock itself has faced significant criticism from distributed systems experts (e.g., Martin Kleppmann), mainly concerning its reliance on synchronized clocks and its behavior during network partitions, which can lead to multiple clients believing they hold the same lock.

Limitations:

  • No Strong Consistency: Redis is eventually consistent. While SET NX is atomic, a single-instance Redis lock is vulnerable to data loss during failovers. Redlock attempts to mitigate this but introduces its own complexities.
  • Clock Skew Sensitivity: Redlock's correctness heavily relies on clocks being synchronized, which is hard to guarantee in practice.
  • Complexity: Implementing Redlock correctly, including client-side retry logic and error handling, is non-trivial.

Use Cases:

  • Best-Effort Locks: When occasional lock contention or false positives are acceptable (e.g., idempotent tasks that can be run multiple times safely).
  • Rate Limiting: Simple throttling mechanisms.
  • Task Queues: Ensuring only one worker processes a specific message.

For critical operations requiring strong consistency and strict mutual exclusion, Redis locks, even Redlock, might not be the most suitable choice.

Apache ZooKeeper for Consensus

Apache ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. It's designed to be highly available and provides a consistent view of its data to all clients, making it an excellent choice for building distributed coordination services.

Architecture and ZAB Protocol

ZooKeeper operates as an ensemble of servers (typically an odd number like 3, 5, or 7) that use the ZAB (ZooKeeper Atomic Broadcast) protocol to maintain a consistent, replicated state. ZAB ensures that all updates to the ZooKeeper state are applied in the same order on all servers. It achieves consensus through:

  • Leader Election: One server is elected as the leader, responsible for processing all write requests. Followers replicate the leader's state.
  • Atomic Broadcast: All updates (transactions) are proposed by the leader and committed only after a majority of followers acknowledge them.

Clients connect to any server in the ensemble. Read requests can be served by any server, while write requests are forwarded to the leader. If the leader fails, a new leader is elected, ensuring high availability.

Znodes, Watches, Ephemeral and Sequential Nodes

ZooKeeper's data model is a hierarchical namespace similar to a file system, composed of Znodes. Each Znode can store data and have children.

  • Persistent Znodes: Exist until explicitly deleted.
  • Ephemeral Znodes: Automatically deleted when the client that created them disconnects from ZooKeeper. This is crucial for detecting client failures and implementing leader election.
  • Sequential Znodes: When created, ZooKeeper appends a monotonically increasing counter to the Znode's name. This is useful for creating ordered queues or ensuring fairness in locking.
  • Watches: Clients can set watches on Znodes to be notified of changes (data changes, child changes, Znode deletion). This allows clients to react to state changes without constant polling.

Distributed Locking with ZooKeeper

ZooKeeper is ideal for distributed locking due to its strong consistency guarantees, ephemeral nodes, and sequential nodes. A common pattern for distributed locks using ZooKeeper is:

  1. Create a Lock Directory: A persistent Znode (e.g., /locks/mylock) is created for the specific resource.
  2. Acquire Lock: A client creates an ephemeral, sequential Znode under the lock directory (e.g., /locks/mylock/lock-0000000001).
  3. Check Position: The client gets all children of the lock directory and checks if its created Znode has the smallest sequence number. If it does, the client holds the lock.
  4. Wait for Lock: If the client's Znode is not the smallest, it sets a watch on the Znode immediately preceding its own (e.g., /locks/mylock/lock-0000000000). When that preceding Znode is deleted (meaning the previous lock holder released it or failed), the client is notified and re-checks its position.
  5. Release Lock: The client simply deletes its ephemeral Znode.

This pattern inherently handles client crashes (ephemeral nodes disappear) and ensures fairness (sequential nodes guarantee FIFO access).

import org.apache.curator.framework.CuratorFramework;
import org.apache.curator.framework.CuratorFrameworkFactory;
import org.apache.curator.framework.recipes.locks.InterProcessMutex;
import org.apache.curator.retry.ExponentialBackoffRetry;

import java.util.concurrent.TimeUnit;

public class ZookeeperDistributedLock {

    private static final String ZK_CONNECTION_STRING = "127.0.0.1:2181";
    private static final String LOCK_PATH = "/mylock";

    public static void main(String[] args) {
        CuratorFramework client = null;
        try {
            // 1. Create a Curator client instance
            client = CuratorFrameworkFactory.newClient(
                    ZK_CONNECTION_STRING,
                    new ExponentialBackoffRetry(1000, 3)
            );
            client.start();
            client.blockUntilConnected(5, TimeUnit.SECONDS);
            System.out.println("Connected to ZooKeeper!");

            // 2. Create an InterProcessMutex (distributed lock) for the given path
            InterProcessMutex lock = new InterProcessMutex(client, LOCK_PATH);

            System.out.println("Attempting to acquire lock...");
            // 3. Acquire the lock with a timeout
            if (lock.acquire(10, TimeUnit.SECONDS)) {
                try {
                    System.out.println("Lock acquired by process " + Thread.currentThread().getId());
                    // Simulate critical section work
                    Thread.sleep(5000);
                    System.out.println("Critical operation complete.");
                } finally {
                    // 4. Release the lock
                    lock.release();
                    System.out.println("Lock released by process " + Thread.currentThread().getId());
                }
            } else {
                System.out.println("Failed to acquire lock by process " + Thread.currentThread().getId());
            }

        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            if (client != null) {
                client.close();
                System.out.println("ZooKeeper client closed.");
            }
        }
    }
}

ZooKeeper Use Cases

  • Leader Election: Electing a primary node in a cluster.
  • Distributed Locking: Providing strong mutual exclusion.
  • Configuration Management: Centralized storage and dynamic updates of application configurations.
  • Service Discovery: Registering and discovering services in a distributed environment.
  • Distributed Queues/Barriers: Implementing complex coordination primitives.

Etcd for Distributed Consensus

Etcd is a distributed, reliable key-value store for the most critical data of a distributed system. It's designed to be simple, secure, and fast, and it uses the Raft consensus algorithm to ensure strong consistency and high availability. Etcd is a core component of Kubernetes, used for storing cluster state and configuration.

Raft Algorithm Implementation

Etcd's primary strength lies in its robust implementation of the Raft consensus algorithm. Raft ensures that all committed operations are replicated across a majority of nodes in an Etcd cluster, guaranteeing strong consistency (linearizability).

Key aspects of Etcd's Raft implementation:

  • Leader Election: Nodes elect a leader, which is responsible for handling all client requests and replicating logs to followers.
  • Log Replication: The leader appends new entries (client requests) to its log and replicates them to followers. An entry is committed once a majority of nodes have stored it.
  • Safety: Raft guarantees that committed entries are durable and that only one leader can commit entries for a given term.

Key-Value Store Features and Watch Mechanism

Etcd provides a simple API for storing and retrieving key-value pairs. Beyond basic CRUD operations, it offers powerful features for distributed coordination:

  • Leases (TTL): Keys can be associated with leases that have a time-to-live (TTL). If the lease expires, all keys attached to it are automatically deleted. This is fundamental for implementing ephemeral locks and heartbeating.
  • Revisions: Every modification to the Etcd store increments a global revision number. Clients can retrieve historical values based on revisions, enabling transactional semantics.
  • Watch API: Clients can watch for changes on specific keys or ranges of keys. This allows for real-time notification of updates, crucial for service discovery and configuration management.
  • Transactions: Multiple operations can be grouped into an atomic transaction, ensuring all or none are applied.

Distributed Locking with Etcd

Etcd's leases and transactional capabilities make it well-suited for distributed locking. The general approach:

  1. Create a Lease: The client creates a lease with a specific TTL.
  2. Acquire Lock: The client attempts to create a unique, ephemeral key (e.g., /locks/mylock) associated with its lease, using a transactional compare-and-swap operation. This ensures atomicity: the key is only created if it doesn't already exist.
  3. Hold Lock: While holding the lock, the client periodically sends heartbeats to refresh its lease, preventing it from expiring prematurely.
  4. Release Lock: The client explicitly deletes the key or lets the lease expire.
  5. Wait for Lock: If the lock acquisition fails, the client can use the Watch API to monitor the lock key. When it's deleted, the client retries the acquisition.
package main

import (
	"context"
	"fmt"
	"log"
	"time"

	etcd "go.etcd.io/etcd/client/v3"
	etcd_concurrency "go.etcd.io/etcd/client/v3/concurrency"
)

func main() {
	// Connect to etcd cluster
	client, err := etcd.New(etcd.Config{
		Endpoints:   []string{"localhost:2379"},
		DialTimeout: 5 * time.Second,
	})
	if err != nil {
		log.Fatal(err)
	}
	defer client.Close()

	// Create a session with a 5-second TTL
	session, err := etcd_concurrency.NewSession(client, etcd_concurrency.WithTTL(5))
	if err != nil {
		log.Fatal(err)
	}
	defer session.Close()

	// Create a new mutex (distributed lock)
	mutex := etcd_concurrency.NewMutex(session, "/mylock_key")

	// Acquire the lock
	fmt.Println("Attempting to acquire lock...")
	ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) // Overall timeout for acquisition
	defer cancel()

	if err := mutex.Lock(ctx); err != nil {
		log.Printf("Failed to acquire lock: %v\n", err)
		return
	}

	fmt.Println("Lock acquired!")
	// Simulate critical section work
	time.Sleep(3 * time.Second)
	fmt.Println("Critical operation complete.")

	// Release the lock
	if err := mutex.Unlock(context.TODO()); err != nil {
		log.Printf("Failed to release lock: %v\n", err)
	}
	fmt.Println("Lock released.")

	// Demonstrate another process trying to acquire the lock
	fmt.Println("\n--- Another process trying to acquire ---")
	anotherMutex := etcd_concurrency.NewMutex(session, "/mylock_key")
	if err := anotherMutex.Lock(context.Background()); err != nil {
		log.Printf("Another process failed to acquire lock (expected if first lock held): %v\n", err)
		// If the first lock was released, this one might succeed. Here we expect it to fail or block briefly.
	} else {
		fmt.Println("Another process acquired lock.")
		// In a real scenario, this would be a separate process.
		anotherMutex.Unlock(context.TODO())
		fmt.Println("Another process released lock.")
	}
}

Etcd Use Cases

  • Kubernetes Backend: Stores all cluster data, state, and configuration.
  • Service Discovery: Registering and discovering microservices.
  • Distributed Configuration: Centralized storage and dynamic updates of application settings.
  • Leader Election: Electing a primary node in a distributed system.
  • Distributed Locks and Barriers: Ensuring mutual exclusion and coordinating workflows.

Comparing the Solutions

FeatureRedis (SET NX PX)ZooKeeperEtcd
Primary UseCache, message broker, simple locksDistributed coordination, configuration, leader electionDistributed key-value store, configuration, service discovery
ConsistencyEventual (single instance), Best-effort (Redlock)Strong (ZAB protocol, linearizable reads/writes)Strong (Raft protocol, linearizable reads/writes)
AvailabilityHigh (Redis Sentinel/Cluster), but lock consistency issues during failoverHigh (ensemble, ZAB protocol)High (cluster, Raft protocol)
PerformanceExtremely fast (in-memory)Moderate (disk-backed, network overhead)Fast (in-memory caching, disk-backed, network overhead)
ComplexitySimple to use for basic locks, complex for RedlockModerate to high (requires understanding Znodes, watches)Moderate (simpler API than ZK, Raft is abstracted)
Data ModelKey-value store, various data structuresHierarchical Znodes (like a file system)Flat key-value store with ranges, revisions, leases
Lock MechanismSET NX PX for single instance, Redlock for multipleEphemeral sequential Znodes, watchesLeases, transactions, watch API, concurrency package
Fault ToleranceLimited for locks on single instance, Redlock attemptsExcellent (ZAB protocol, majority quorum)Excellent (Raft protocol, majority quorum)
EcosystemVery broad, many clients/languagesMature, robust (Apache Curator for Java is standard)Growing, popular in cloud-native (Go client is native)

When to Use Which?

Choosing the right tool depends heavily on your specific requirements:

  • Choose Redis when:

    • You need high performance and low latency for simple, best-effort distributed locks.
    • Your application can tolerate occasional false positives (two clients briefly thinking they have the lock) or lost locks during extreme network partitions or Redis failovers.
    • You're already using Redis for caching or other purposes, and want to leverage existing infrastructure.
    • Examples: Rate limiting, idempotent task processing, simple resource serialization where strong consistency isn't absolutely critical.
  • Choose Apache ZooKeeper when:

    • You require strong consistency and strict mutual exclusion for critical operations.
    • You need robust leader election, distributed configuration management, or service discovery with strong guarantees.
    • Your application is built in Java or environments where the ZooKeeper ecosystem (like Apache Curator) is well-supported and mature.
    • Examples: Building distributed databases, message queues, highly available distributed services, Hadoop ecosystem coordination.
  • Choose Etcd when:

    • You need a strongly consistent, highly available key-value store for critical metadata.
    • You're building cloud-native applications, especially with Kubernetes, and need a reliable backend for cluster state.
    • You require robust service discovery, dynamic configuration, or strong distributed locking with good performance.
    • You prefer a simpler API and the benefits of the Raft algorithm over ZAB.
    • Examples: Kubernetes, service mesh configuration, custom orchestration systems, distributed queue backends.

Best Practices for Distributed Coordination

Regardless of the tool, certain best practices apply to distributed locking and consensus:

  1. Always Set Timeouts/Leases: Never acquire a lock without an expiration. This prevents deadlocks if a client crashes or network partitions occur. Renew leases/timeouts as needed.
  2. Use Unique Lock Values: When possible, associate a unique identifier (e.g., UUID) with the lock. This ensures that only the original lock holder can release it, preventing accidental release by an expired, re-acquired lock.
  3. Atomic Operations: Ensure lock acquisition and release operations are atomic. Use Lua scripts for Redis or built-in transactional capabilities for ZooKeeper/Etcd.
  4. Fencing: Implement fencing tokens or monotonically increasing version numbers for resources. When a client acquires a lock, it also acquires a new, higher fencing token. When interacting with the shared resource, it must present this token. The resource should reject operations with older tokens, preventing stale operations from an old, "fenced-off" lock holder.
  5. Graceful Release: Always release locks in a finally block or deferred function to ensure they are freed even if errors occur during the critical section.
  6. Avoid Over-Reliance: Distributed locks are complex and add overhead. Use them only when absolutely necessary. Explore alternative patterns like idempotent operations, queues, or eventual consistency if appropriate.
  7. Monitoring: Monitor lock contention, acquisition/release times, and system health. High contention can indicate a bottleneck.
  8. Client Libraries: Whenever possible, use well-tested, robust client libraries (e.g., redis-py, Apache Curator, go.etcd.io/etcd/client/v3/concurrency) that abstract away much of the complexity and handle retries, watches, and session management correctly.

Common Pitfalls

Distributed systems are notorious for their failure modes. Be aware of these common pitfalls:

  1. Clock Skew: Relying solely on client-side timestamps for lock validity is dangerous due to potential clock differences between machines. Server-side timeouts are more reliable.
  2. Network Partitions: The most insidious problem. A client might lose connection to the lock service but still believe it holds the lock. Strong consensus systems are designed to handle this, but best-effort systems like single-node Redis can suffer.
  3. Ignoring Renewal Failures: If a client's lease renewal fails (e.g., due to network issues), its lock might expire. The client needs to detect this and stop operating on the shared resource, even if it hasn't explicitly released the lock.
  4. Misinterpreting Lock Status: A client should never assume it holds a lock just because it tried to acquire it. Always verify the acquisition success based on the lock service's response.
  5. Deadlocks: Forgetting to release a lock, or incorrect logic leading to multiple resources being locked in conflicting orders, can halt your system. Timeouts are the primary defense.
  6. Thundering Herd: When a lock is released, all waiting clients might try to acquire it simultaneously, leading to a "thundering herd" problem. Backoff and jitter are crucial for retries.

Conclusion

Distributed locking and consensus are foundational concepts for building resilient and scalable distributed systems. While Redis offers a fast, best-effort solution for less critical scenarios, Apache ZooKeeper and Etcd provide robust, strongly consistent mechanisms backed by sophisticated consensus algorithms like ZAB and Raft.

Understanding the trade-offs between performance, consistency, and complexity is key to selecting the right tool for your specific needs. By adhering to best practices and anticipating common pitfalls, developers can leverage these powerful technologies to build highly reliable distributed applications that can withstand the unpredictable nature of networked environments.

As distributed systems continue to evolve, the demand for reliable coordination services will only grow. Mastering Redis, ZooKeeper, and Etcd empowers you to architect and implement the next generation of robust, fault-tolerant applications.

Younes Hamdane

Written by

Younes Hamdane

Full-Stack Software Engineer with 5+ years of experience in Java, Spring Boot, and cloud architecture across AWS, Azure, and GCP. Writing production-grade engineering patterns for developers who ship real software.

Related Articles