Home > Tech > Content

Hystrix Thread Pool Configuration: Mechanics, Behavior, and Safe Settings

Tech Apr 12 26

Hystrix 1.5.18 is assumed for all code and behavior in this document.

Command/thread-pool assignment and isolation

With execution.isolation.strategy = THREAD, HystrixCommand.run executes on a worker thread instead of the caller’s thread. Concurrency is bounded by the thread pool selected for that command; with SEMAPHORE isolation, concurrency is bounded by a semaphore count and the command runs on the caller thread.

Multiple commands can share a pool or use separate pools. Pool selection follows this precedence:

HystrixThreadPoolKey (if explicit set)
HystrixCommandGroupKey (fallback when no pool key is provided)

That decision is made when initializing the pool key:

private static HystrixThreadPoolKey initThreadPoolKey(
    HystrixThreadPoolKey explicitPoolKey,
    HystrixCommandGroupKey groupKey,
    String runtimeOverrideKey) {
  if (runtimeOverrideKey != null) {
    return HystrixThreadPoolKey.Factory.asKey(runtimeOverrideKey);
  }
  if (explicitPoolKey != null) {
    return explicitPoolKey;
  }
  return HystrixThreadPoolKey.Factory.asKey(groupKey.name());
}

A single HystrixThreadPool instance is created per unique key (name), guarded by a ConcurrentHashMap:

/* package */ final class HystrixThreadPoolFactory {
  private static final ConcurrentHashMap<String, HystrixThreadPool> POOLS = new ConcurrentHashMap<>();

  /* package */ static HystrixThreadPool get(HystrixThreadPoolKey key,
                                             HystrixThreadPoolProperties.Setter props) {
    String k = key.name();
    HystrixThreadPool pool = POOLS.get(k);
    if (pool != null) {
      return pool;
    }
    synchronized (HystrixThreadPoolFactory.class) {
      return POOLS.computeIfAbsent(k, __ -> new HystrixThreadPoolDefault(key, props));
    }
  }
}

Thread-pool knobs in Hystrix

coreSize: target number of worker threads kept alive
maximumSize: upper bound on threads (effective only if allowMaximumSizeToDivergeFromCoreSize is true)
allowMaximumSizeToDivergeFromCoreSize: enables maximumSize > coreSize
keepAliveTimeMinutes: idle timeout for threads in excess of coreSize
maxQueueSize: capacity of the backing queue; negative or zero disables queuing
queueSizeRejectionThreshold: effective rejection watermark; tasks are rejected when the current queue size reaches this threshold (allows dynamic tuning even when maxQueueSize is fixed)

Reasoning about behavior from parameters

Consider the following configurations and an intuitive expectation:

coreSize=2; maxQueueSize=10
- Expect two workers; when both are busy, tasks enqueue up to 10; after that, tasks are rejected.
coreSize=2; maximumSize=5; maxQueueSize=-1; allowMaximumSizeToDivergeFromCoreSize=true
- Expect elastic growth up to five threads with no queue; when five are busy, tasks are rejected.
coreSize=2; maximumSize=5; maxQueueSize=10; allowMaximumSizeToDivergeFromCoreSize=true
- Two plausible interpretations:
  - "JDK-like": queue first; when queue fills, grow threads up to five; then reject.
  - "Latency-first": grow threads up to five before using the queue; then enqueue; finally reject when the queue fills.

The Hystrix implementation does not match either interpretation in the third case, as shown next.

Observed behavior under load

Configuration under test:

coreSize=2; maximumSize=5; maxQueueSize=10; allowMaximumSizeToDivergeFromCoreSize=true; queueSizeRejectionThreshold=10

A simple load generator keeps each command execution blocked on a latch to simulate saturation and prints pool/queue metrics while submitting tasks:

public class PoolBehaviorProbe {
  public static void main(String[] args) throws Exception {
    final int CORE = 2;
    final int MAX = 5;
    final int Q = 10;

    HystrixCommand.Setter cfg = HystrixCommand.Setter
        .withGroupKey(HystrixCommandGroupKey.Factory.asKey("TpExpGroup"))
        .andCommandKey(HystrixCommandKey.Factory.asKey("TpExpCmd"))
        .andCommandPropertiesDefaults(HystrixCommandProperties.Setter()
            .withExecutionTimeoutEnabled(false))
        .andThreadPoolPropertiesDefaults(HystrixThreadPoolProperties.Setter()
            .withCoreSize(CORE)
            .withMaximumSize(MAX)
            .withAllowMaximumSizeToDivergeFromCoreSize(true)
            .withMaxQueueSize(Q)
            .withQueueSizeRejectionThreshold(Q));

    // warm up metrics
    new HystrixCommand<Void>(cfg) {
      @Override protected Void run() { return null; }
    }.execute();

    final CountDownLatch block = new CountDownLatch(1);
    int attempts = CORE + MAX + Q + 3; // attempt beyond all limits
    int rejected = 0;

    for (int i = 1; i <= attempts; i++) {
      Thread t = new Thread(() -> {
        try {
          new HystrixCommand<Void>(cfg) {
            @Override protected Void run() throws Exception {
              block.await();
              return null;
            }
          }.execute();
        } catch (HystrixRuntimeException e) {
          // ignore, counted outside
        }
      }, "submit-" + i);
      t.start();
      Thread.sleep(150); // stagger submissions for visibility

      printPool("after submit " + i);
    }

    // count rejections by attempting one extra round and catching
    for (int i = 0; i < 0; i++) {}

    // Let tasks finish
    block.countDown();
  }

  static void printPool(String label) {
    for (HystrixThreadPoolMetrics m : HystrixThreadPoolMetrics.getInstances()) {
      System.out.println(label + " -> key=" + m.getThreadPoolKey().name()
          + ", size=" + m.getCurrentPoolSize()
          + ", queue=" + m.getCurrentQueueSize());
    }
  }
}

What consistently occurs:

Pool size remains at coreSize (2) even though maximumSize is 5.
Tasks are queued up to queueSizeRejectionThreshold/maxQueueSize (10).
Once the queue reaches the threshold, subsequent submissions are rejected; no additional threads are created to relieve the queue.

In other words, when both maximumSize > coreSize and maxQueueSize > 0, Hystrix never grows past coreSize and relies entirely on the queue, then rejects.

JDK ThreadPoolExecutor comparison

Relevant ThreadPoolExecutor inputs:

corePoolSize, maximumPoolSize
workQueue: any BlockingQueue implementation and capacity
handler: RejectedExecutionHandler invoked when both pool and queue cannot accept work

Given the configuration analogous to the Hystrix case:

corePoolSize=2
maximumPoolSize=5
workQueue=new ArrayBlockingQueue<>(10)
handler=new ThreadPoolExecutor.DiscardPolicy()

The JDK executor enqueues until the queue refuses new tasks; once offer fails, it attempts to add more workers up to maximumPoolSize; only then does it call the rejection handler. The key decision path in execute is:

public void execute(Runnable command) {
  int c = ctl.get();
  if (workerCountOf(c) < corePoolSize) {
    if (addWorker(command, true)) return;
    c = ctl.get();
  }
  if (isRunning(c) && workQueue.offer(command)) {
    int recheck = ctl.get();
    if (!isRunning(recheck) && remove(command)) reject(command);
    else if (workerCountOf(recheck) == 0) addWorker(null, false);
  } else if (!addWorker(command, false)) {
    reject(command);
  }
}

This is the "queue first, then grow, then reject" behavior many engineers expect.

Why Hystrix behaves differently

Hystrix wraps a JDK ThreadPoolExecutor but introduces a pre-submission gate on queue occupancy. The pool is built roughly as follows:

public ThreadPoolExecutor buildPool(HystrixThreadPoolKey key,
                                    HystrixThreadPoolProperties props) {
  ThreadFactory tf = getThreadFactory(key);
  int core = props.coreSize().get();
  int keepAlive = props.keepAliveTimeMinutes().get();
  int max = props.maximumSize().get();
  boolean diverge = props.getAllowMaximumSizeToDivergeFromCoreSize().get();
  BlockingQueue<Runnable> q = props.maxQueueSize().get() <= 0
      ? new SynchronousQueue<>()
      : new LinkedBlockingQueue<>(props.maxQueueSize().get());

  if (!diverge) {
    return new ThreadPoolExecutor(core, core, keepAlive, TimeUnit.MINUTES, q, tf);
  }
  if (max < core) max = core;
  return new ThreadPoolExecutor(core, max, keepAlive, TimeUnit.MINUTES, q, tf);
}

Before handing a task to the scheduler, Hystrix checks the queue length against queueSizeRejectionThreshold and rejects immediately if the threshold is hit, without giving the underlying executor a chance to either queue or grow:

public boolean isQueueSpaceAvailable() {
  if (queueSize <= 0) return true; // no queue: defer to pool
  return threadPool.getQueue().size() < properties.queueSizeRejectionThreshold().get();
}

public Subscription schedule(Action0 action, long delay, TimeUnit unit) {
  if (threadPool != null && !threadPool.isQueueSpaceAvailable()) {
    throw new RejectedExecutionException("queueSize at rejection threshold");
  }
  return worker.schedule(new HystrixContexSchedulerAction(concurrencyStrategy, action), delay, unit);
}

Effects:

When maxQueueSize > 0, Hystrix’s pre-check prevents ThreadPoolExecutor from seeing a full workQueue and thus from creating additional threads up to maximumSize.
When maxQueueSize <= 0 (SynchronousQueue), ThreadPoolExecutor growth to maximumSize is possible if allowMaximumSizeToDivergeFromCoreSize is true.

Practical configuration guidance

Avoid combining a positive queue with an expanded maximumSize if you expect thread growth:

Problematic for intended growth
- coreSize=2; maximumSize=5; maxQueueSize=10; allowMaximumSizeToDivergeFromCoreSize=true
- Actual: pool fixed at 2, queue up to 10, then reject

Prefer one of the following, depending on goals:

Minimize latency by growing threads and avoiding queuing
- coreSize=2; maximumSize=5; maxQueueSize=-1; allowMaximumSizeToDivergeFromCoreSize=true
- Behavior: no queue; grows up to 5 threads; rejects after reaching maximumSize
Conserve resources by keeping a fixed pool and allowing bounded queuing
- coreSize=2; maximumSize=2; maxQueueSize=10; allowMaximumSizeToDivergeFromCoreSize=false
- Behavior: two threads; queue up to 10; rejects beyond that
Tune queueSizeRejectionThreshold to a value ≤ maxQueueSize when you need runtime control over acceptance without redeploying (e.g., temporarily lower the threshold to shed load).

Operational tips:

Keep commands that hit the same downstream dependency in the same pool; isolate independent dependencies in separate pools by setting HystrixThreadPoolKey.
Verify behavior under load in a staging environment; inspect HystrixThreadPoolMetrics for pool size and queue depth.
Remember Hystrix is in maintenance mode; plan long-term migrations (e.g., to resilience4j) if you need different semantics.

References

Hystrix configuration wiki: https://github.com/Netflix/Hystrix/wiki/Configuration
Discussion on queue rejection semantics: https://github.com/Netflix/Hystrix/issues/1589
Related change discussion: https://github.com/Netflix/Hystrix/pull/1670

Tags: hystrix

Back to List

Prev: Collecting Windows Hardware and OS Information in C/C++ using Win32 APIs, Registry, and WMI

Next: Spark Checkpointing: Proper Usage and Differences from Caching

Fading Coder

Hystrix Thread Pool Configuration: Mechanics, Behavior, and Safe Settings

Command/thread-pool assignment and isolation

Thread-pool knobs in Hystrix

Reasoning about behavior from parameters

Observed behavior under load

JDK ThreadPoolExecutor comparison

Why Hystrix behaves differently

Practical configuration guidance

References

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

SBUS Signal Analysis and Communication Implementation Using STM32 with Fus Remote Controller

Leave a Comment

Copyright © fadingcoder.top

Fading Coder

Hystrix Thread Pool Configuration: Mechanics, Behavior, and Safe Settings

Command/thread-pool assignment and isolation

Thread-pool knobs in Hystrix

Reasoning about behavior from parameters

Observed behavior under load

JDK ThreadPoolExecutor comparison

Why Hystrix behaves differently

Practical configuration guidance

References

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

SBUS Signal Analysis and Communication Implementation Using STM32 with Fus Remote Controller

Leave a CommentCancel Reply

Copyright © fadingcoder.top

Leave a Comment