Hystrix Thread Pool Configuration: Mechanics, Behavior, and Safe Settings
Hystrix 1.5.18 is assumed for all code and behavior in this document.
Command/thread-pool assignment and isolation
With execution.isolation.strategy = THREAD, HystrixCommand.run executes on a worker thread instead of the caller’s thread. Concurrency is bounded by the thread pool selected for that command; with SEMAPHORE isolation, concurrency is bounded by a semaphore count and the command runs on the caller thread.
Multiple commands can share a pool or use separate pools. Pool selection follows this precedence:
- HystrixThreadPoolKey (if explicit set)
- HystrixCommandGroupKey (fallback when no pool key is provided)
That decision is made when initializing the pool key:
private static HystrixThreadPoolKey initThreadPoolKey(
HystrixThreadPoolKey explicitPoolKey,
HystrixCommandGroupKey groupKey,
String runtimeOverrideKey) {
if (runtimeOverrideKey != null) {
return HystrixThreadPoolKey.Factory.asKey(runtimeOverrideKey);
}
if (explicitPoolKey != null) {
return explicitPoolKey;
}
return HystrixThreadPoolKey.Factory.asKey(groupKey.name());
}
A single HystrixThreadPool instance is created per unique key (name), guarded by a ConcurrentHashMap:
/* package */ final class HystrixThreadPoolFactory {
private static final ConcurrentHashMap<String, HystrixThreadPool> POOLS = new ConcurrentHashMap<>();
/* package */ static HystrixThreadPool get(HystrixThreadPoolKey key,
HystrixThreadPoolProperties.Setter props) {
String k = key.name();
HystrixThreadPool pool = POOLS.get(k);
if (pool != null) {
return pool;
}
synchronized (HystrixThreadPoolFactory.class) {
return POOLS.computeIfAbsent(k, __ -> new HystrixThreadPoolDefault(key, props));
}
}
}
Thread-pool knobs in Hystrix
- coreSize: target number of worker threads kept alive
- maximumSize: upper bound on threads (effective only if allowMaximumSizeToDivergeFromCoreSize is true)
- allowMaximumSizeToDivergeFromCoreSize: enables maximumSize > coreSize
- keepAliveTimeMinutes: idle timeout for threads in excess of coreSize
- maxQueueSize: capacity of the backing queue; negative or zero disables queuing
- queueSizeRejectionThreshold: effective rejection watermark; tasks are rejected when the current queue size reaches this threshold (allows dynamic tuning even when maxQueueSize is fixed)
Reasoning about behavior from parameters
Consider the following configurations and an intuitive expectation:
-
coreSize=2; maxQueueSize=10
- Expect two workers; when both are busy, tasks enqueue up to 10; after that, tasks are rejected.
-
coreSize=2; maximumSize=5; maxQueueSize=-1; allowMaximumSizeToDivergeFromCoreSize=true
- Expect elastic growth up to five threads with no queue; when five are busy, tasks are rejected.
-
coreSize=2; maximumSize=5; maxQueueSize=10; allowMaximumSizeToDivergeFromCoreSize=true
- Two plausible interpretations:
- "JDK-like": queue first; when queue fills, grow threads up to five; then reject.
- "Latency-first": grow threads up to five before using the queue; then enqueue; finally reject when the queue fills.
- Two plausible interpretations:
The Hystrix implementation does not match either interpretation in the third case, as shown next.
Observed behavior under load
Configuration under test:
- coreSize=2; maximumSize=5; maxQueueSize=10; allowMaximumSizeToDivergeFromCoreSize=true; queueSizeRejectionThreshold=10
A simple load generator keeps each command execution blocked on a latch to simulate saturation and prints pool/queue metrics while submitting tasks:
public class PoolBehaviorProbe {
public static void main(String[] args) throws Exception {
final int CORE = 2;
final int MAX = 5;
final int Q = 10;
HystrixCommand.Setter cfg = HystrixCommand.Setter
.withGroupKey(HystrixCommandGroupKey.Factory.asKey("TpExpGroup"))
.andCommandKey(HystrixCommandKey.Factory.asKey("TpExpCmd"))
.andCommandPropertiesDefaults(HystrixCommandProperties.Setter()
.withExecutionTimeoutEnabled(false))
.andThreadPoolPropertiesDefaults(HystrixThreadPoolProperties.Setter()
.withCoreSize(CORE)
.withMaximumSize(MAX)
.withAllowMaximumSizeToDivergeFromCoreSize(true)
.withMaxQueueSize(Q)
.withQueueSizeRejectionThreshold(Q));
// warm up metrics
new HystrixCommand<Void>(cfg) {
@Override protected Void run() { return null; }
}.execute();
final CountDownLatch block = new CountDownLatch(1);
int attempts = CORE + MAX + Q + 3; // attempt beyond all limits
int rejected = 0;
for (int i = 1; i <= attempts; i++) {
Thread t = new Thread(() -> {
try {
new HystrixCommand<Void>(cfg) {
@Override protected Void run() throws Exception {
block.await();
return null;
}
}.execute();
} catch (HystrixRuntimeException e) {
// ignore, counted outside
}
}, "submit-" + i);
t.start();
Thread.sleep(150); // stagger submissions for visibility
printPool("after submit " + i);
}
// count rejections by attempting one extra round and catching
for (int i = 0; i < 0; i++) {}
// Let tasks finish
block.countDown();
}
static void printPool(String label) {
for (HystrixThreadPoolMetrics m : HystrixThreadPoolMetrics.getInstances()) {
System.out.println(label + " -> key=" + m.getThreadPoolKey().name()
+ ", size=" + m.getCurrentPoolSize()
+ ", queue=" + m.getCurrentQueueSize());
}
}
}
What consistently occurs:
- Pool size remains at coreSize (2) even though maximumSize is 5.
- Tasks are queued up to queueSizeRejectionThreshold/maxQueueSize (10).
- Once the queue reaches the threshold, subsequent submissions are rejected; no additional threads are created to relieve the queue.
In other words, when both maximumSize > coreSize and maxQueueSize > 0, Hystrix never grows past coreSize and relies entirely on the queue, then rejects.
JDK ThreadPoolExecutor comparison
Relevant ThreadPoolExecutor inputs:
- corePoolSize, maximumPoolSize
- workQueue: any BlockingQueue implementation and capacity
- handler: RejectedExecutionHandler invoked when both pool and queue cannot accept work
Given the configuration analogous to the Hystrix case:
- corePoolSize=2
- maximumPoolSize=5
- workQueue=new ArrayBlockingQueue<>(10)
- handler=new ThreadPoolExecutor.DiscardPolicy()
The JDK executor enqueues until the queue refuses new tasks; once offer fails, it attempts to add more workers up to maximumPoolSize; only then does it call the rejection handler. The key decision path in execute is:
public void execute(Runnable command) {
int c = ctl.get();
if (workerCountOf(c) < corePoolSize) {
if (addWorker(command, true)) return;
c = ctl.get();
}
if (isRunning(c) && workQueue.offer(command)) {
int recheck = ctl.get();
if (!isRunning(recheck) && remove(command)) reject(command);
else if (workerCountOf(recheck) == 0) addWorker(null, false);
} else if (!addWorker(command, false)) {
reject(command);
}
}
This is the "queue first, then grow, then reject" behavior many engineers expect.
Why Hystrix behaves differently
Hystrix wraps a JDK ThreadPoolExecutor but introduces a pre-submission gate on queue occupancy. The pool is built roughly as follows:
public ThreadPoolExecutor buildPool(HystrixThreadPoolKey key,
HystrixThreadPoolProperties props) {
ThreadFactory tf = getThreadFactory(key);
int core = props.coreSize().get();
int keepAlive = props.keepAliveTimeMinutes().get();
int max = props.maximumSize().get();
boolean diverge = props.getAllowMaximumSizeToDivergeFromCoreSize().get();
BlockingQueue<Runnable> q = props.maxQueueSize().get() <= 0
? new SynchronousQueue<>()
: new LinkedBlockingQueue<>(props.maxQueueSize().get());
if (!diverge) {
return new ThreadPoolExecutor(core, core, keepAlive, TimeUnit.MINUTES, q, tf);
}
if (max < core) max = core;
return new ThreadPoolExecutor(core, max, keepAlive, TimeUnit.MINUTES, q, tf);
}
Before handing a task to the scheduler, Hystrix checks the queue length against queueSizeRejectionThreshold and rejects immediately if the threshold is hit, without giving the underlying executor a chance to either queue or grow:
public boolean isQueueSpaceAvailable() {
if (queueSize <= 0) return true; // no queue: defer to pool
return threadPool.getQueue().size() < properties.queueSizeRejectionThreshold().get();
}
public Subscription schedule(Action0 action, long delay, TimeUnit unit) {
if (threadPool != null && !threadPool.isQueueSpaceAvailable()) {
throw new RejectedExecutionException("queueSize at rejection threshold");
}
return worker.schedule(new HystrixContexSchedulerAction(concurrencyStrategy, action), delay, unit);
}
Effects:
- When maxQueueSize > 0, Hystrix’s pre-check prevents ThreadPoolExecutor from seeing a full workQueue and thus from creating additional threads up to maximumSize.
- When maxQueueSize <= 0 (SynchronousQueue), ThreadPoolExecutor growth to maximumSize is possible if allowMaximumSizeToDivergeFromCoreSize is true.
Practical configuration guidance
Avoid combining a positive queue with an expanded maximumSize if you expect thread growth:
- Problematic for intended growth
- coreSize=2; maximumSize=5; maxQueueSize=10; allowMaximumSizeToDivergeFromCoreSize=true
- Actual: pool fixed at 2, queue up to 10, then reject
Prefer one of the following, depending on goals:
-
Minimize latency by growing threads and avoiding queuing
- coreSize=2; maximumSize=5; maxQueueSize=-1; allowMaximumSizeToDivergeFromCoreSize=true
- Behavior: no queue; grows up to 5 threads; rejects after reaching maximumSize
-
Conserve resources by keeping a fixed pool and allowing bounded queuing
- coreSize=2; maximumSize=2; maxQueueSize=10; allowMaximumSizeToDivergeFromCoreSize=false
- Behavior: two threads; queue up to 10; rejects beyond that
-
Tune queueSizeRejectionThreshold to a value ≤ maxQueueSize when you need runtime control over acceptance without redeploying (e.g., temporarily lower the threshold to shed load).
Operational tips:
- Keep commands that hit the same downstream dependency in the same pool; isolate independent dependencies in separate pools by setting HystrixThreadPoolKey.
- Verify behavior under load in a staging environment; inspect HystrixThreadPoolMetrics for pool size and queue depth.
- Remember Hystrix is in maintenance mode; plan long-term migrations (e.g., to resilience4j) if you need different semantics.
References
- Hystrix configuration wiki: https://github.com/Netflix/Hystrix/wiki/Configuration
- Discussion on queue rejection semantics: https://github.com/Netflix/Hystrix/issues/1589
- Related change discussion: https://github.com/Netflix/Hystrix/pull/1670