Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Implementing State Machines in Go for Cleaner Call Chains

Tech 1

Background

Many projects require sequential operations based on runtime state. Common scenarios include:

  • Parsing configuration formats or programming languages
  • Executing operations on systems, routers, or clusters
  • ETL pipelines for data extraction, transformation, and loading

Rob Pike's classic talk on lexical scanning in Go introduced a powerful state machine pattern that has influenced many implementations. The key insight is leveraging Go's ability to treat functions as first-class values—each state returns the next state to execute, breaking the traditional if/else chain of function calls.

This approach decouples call chains into testable units, making it significantly easier to verify individual components without mocking entire hierarchies.

The Problem with Traditional Call Chains

Conventional approaches to sequential operations typically look like this:

func Execute(args Config) {
    stepOne(args)
    stepTwo(args)
}

Or with distributed responsibilities:

func Execute(args Config) {
    stepOne(args)
}

func stepOne(args Config) {
    stepTwo(args)
}

func stepTwo(args Config) {
    // completion
}

Both patterns work, but they create challenges when testing. If Execute makes remote calls to external systems, you must mock or stub every downstream function to write isolated tests. With 50 call levels, this becomes unmanageable—testing a single function requires mocking everything underneath it.

Pike's pattern solves this by making each function return the next function to execute.

State Machine Pattern

Define a state as a function type:

type State[T any] func(ctx context.Context, input T) (T, State[T], error)

Each state receives input of type T and returns the modified input, the next state to execute, or an error. Returning nil for the next state terminates the machine. Returning an error also terminates the machine.

This design differs from Pike's original by including generics and returning the input parameter, enabling pure functional state machines where each state can transform data and pass it to the next.

The state driver is straightforward:

func Run[T any](ctx context.Context, input T, initial State[T]) (T, error) {
    current := initial
    for {
        select {
        case <-ctx.Done():
            return input, ctx.Err()
        default:
        }
        
        input, current, err := current(ctx, input)
        if err != nil {
            return input, err
        }
        if current == nil {
            return input, nil
        }
    }
}

This minimal implementation handles context cancellation, error propagation, and state transitions.

Practical Example: Cluster Service Cleanup

Consider a scenario where you need to remove a service from a cluster while handling associated storage:

package cleanup

// storageProvider defines operations for managing backup storage.
type storageProvider interface {
    DeleteOldBackups(ctx context.Context, serviceID string, retainCount int) error
    PurgeContainer(ctx context.Context, serviceID string) error
}

// clusterOps defines operations for cluster service management.
type clusterOps interface {
    Drain(ctx context.Context, serviceID string) error
    Unregister(ctx context.Context, serviceID string) error
    Enumerate(ctx context.Context) ([]string, error)
    HasAttachedStorage(ctx context.Context, serviceID string) (bool, error)
}

The private interfaces define the exact operations needed, allowing clients to implement only what the state machine requires.

Configuration for the operation:

// Request holds parameters for service cleanup.
type Request struct {
    ServiceID string
    Storage   storageProvider
    Cluster   clusterOps
}

func (r Request) validate() error {
    if r.ServiceID == "" {
        return errors.New("service ID is required")
    }
    if r.Storage == nil {
        return errors.New("storage provider is required")
    }
    if r.Cluster == nil {
        return errors.New("cluster operations are required")
    }
    return nil
}

Note that Request is a value type, not a pointer. Since we modify and pass Request through each state, avoiding pointer indirection saves garbage collector overhead—negligible for simple operations but meaningful in high-throughput ETL pipelines.

The public API wraps the state machine:

// Cleanup removes a service from the cluster and cleans up storage.
// The three most recent backups are preserved.
func Cleanup(ctx context.Context, req Request) error {
    if err := req.validate(); err != nil {
        return err
    }
    
    _, err := Run(ctx, req, drainPhase)
    if err != nil {
        return fmt.Errorf("failed to cleanup service %q: %w", req.ServiceID, err)
    }
    return nil
}

Users interact with Cleanup, not the underlying state machine. The function validates input, selects the initial state (drainPhase), and executes the machine.

The first state drains the service:

func drainPhase(ctx context.Context, req Request) (Request, State[Request], error) {
    services, err := req.Cluster.Enumerate(ctx)
    if err != nil {
        return req, nil, err
    }

    found := false
    for _, id := range services {
        if id == req.ServiceID {
            found = true
            break
        }
    }
    if !found {
        return req, nil, fmt.Errorf("service %q not found in cluster", req.ServiceID)
    }

    if err := req.Cluster.Drain(ctx, req.ServiceID); err != nil {
        return req, nil, fmt.Errorf("failed to drain service: %w", err)
    }

    return req, unregisterPhase, nil
}

This state verifies the service exists, drains it, and returns unregisterPhase as the next state.

The second state unregisters the service:

func unregisterPhase(ctx context.Context, req Request) (Request, State[Request], error) {
    if err := req.Cluster.Unregister(ctx, req.ServiceID); err != nil {
        return req, nil, fmt.Errorf("failed to unregister service: %w", err)
    }

    hasStorage, err := req.Cluster.HasAttachedStorage(ctx, req.ServiceID)
    if err != nil {
        return req, nil, fmt.Errorf("storage check failed: %w", err)
    }
    
    if hasStorage {
        return req, cleanupPhase, nil
    }

    return req, nil, nil
}

This state removes the service from the cluster, checks for attached storage, and either proceeds to cleanup or terminates the machine by returning nil, nil.

The conditional branching demonstrates how state machines handle different execution paths based on runtime conditions.

Testing Benefits

This pattern naturally encourages small, testable units. Modules split easily when they grow too large—just create new states to isolate functionality.

The primary advantage emerges when testing. Traditional approaches require either sequential calls or functional chains, both leading to end-to-end tests that shouldn't exist:

// Sequential approach - testing Execute requires mocking everything
func Execute(ctx context.Context, req Request) error {
    if err := drainPhase(ctx, req); err != nil {
        return err
    }
    if err := unregisterPhase(ctx, req); err != nil {
        return err
    }
    // ... more phases
    return nil
}

With a state machine, each state is independently testable. Test drainPhase in isolation by verifying it returns the correct next state and input modifications. The top-level Cleanup function doesn't need testing—it simply validates input and delegates to Run, which is also testable separately.

A table-driven test for drainPhase:

func TestDrainPhase(t *testing.T) {
    t.Parallel()

    cases := []struct {
        desc       string
        request    Request
        expectErr  bool
        expectNext State[Request]
    }{
        {
            desc: "Enumerate returns error",
            request: Request{
                Cluster: &mockCluster{
                    enumerateErr: errors.New("network failure"),
                },
            },
            expectErr: true,
        },
        {
            desc: "Service not in cluster",
            request: Request{
                ServiceID: "api-gateway",
                Cluster: &mockCluster{
                    services: []string{"auth-service", "db-primary"},
                },
            },
            expectErr: true,
        },
        {
            desc: "Drain operation fails",
            request: Request{
                ServiceID: "api-gateway",
                Cluster: &mockCluster{
                    services: []string{"api-gateway", "auth-service"},
                    drainErr: errors.New("timeout"),
                },
            },
            expectErr: true,
        },
        {
            desc: "Successful drain",
            request: Request{
                ServiceID: "api-gateway",
                Cluster: &mockCluster{
                    services: []string{"api-gateway", "auth-service"},
                    drainErr: nil,
                },
            },
            expectNext: unregisterPhase,
        },
    }

    for _, tc := range cases {
        t.Run(tc.desc, func(t *testing.T) {
            _, next, err := drainPhase(context.Background(), tc.request)
            
            if tc.expectErr && err == nil {
                t.Errorf("expected error, got nil")
                return
            }
            if !tc.expectErr && err != nil {
                t.Errorf("expected no error, got %v", err)
                return
            }
            if err != nil {
                return
            }
            
            if tc.expectNext != nil && reflect.ValueOf(next).Pointer() != reflect.ValueOf(tc.expectNext).Pointer() {
                t.Errorf("state mismatch: got %v, want %v", funcName(next), funcName(tc.expectNext))
            }
        })
    }
}

Each test verifies that a state produces the correct output and transitions to the expected next state. No mocking of entire call chains required.

Alternative Implementations

A variant pattern stores the next state in the input parameter itself, useful for dynamic state selection:

type State[T any] func(ctx context.Context, input T) (T, State[T], error)

type Transition[T any] struct {
    Data T
    Next State[T]
}

func Run[T any](ctx context.Context, trans Transition[T], initial State[T]) (T, error) {
    current := initial
    for {
        if ctx.Err() != nil {
            return trans.Data, ctx.Err()
        }
        
        trans.Data, current, err := current(ctx, trans.Data)
        if err != nil {
            return trans.Data, err
        }
        
        current = trans.Next
        trans.Next = nil

        if current == nil {
            return trans.Data, nil
        }
    }
}

This approach simplifies integration with distributed tracing systems or logging frameworks, as state transitions are explicit in the input structure.

For high-throughput scenarios requiring concurrency, the stagedpipe library provides advanced features including parallel processing and backpressure management.

Conclusion

State machines in Go provide a clean mechanism for managing sequential operations with testable components. By representing each step as a function that returns the next step, call chains become explicit, configurable, and straightforward to verify. This pattern scales from simple parsers to complex multi-stage workflows, making it a valuable addition to any Go developer's toolkit.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.