Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Implementing a Task Scheduling System: A Comprehensive Technical Guide

Tech May 8 3

Task scheduling is a fundamental requirement in enterprise software development. While many tutorials focus on "how to use tools," this article explores "how to build tools" by examining the core logic behind task scheduling systems.

Quartz Framework

Quartz is an open-source task scheduling framework for Java and serves as the starting point for many Java engineers learning about task scheduling.

Core Architecture

Quartz consists of three essential components:

  • Job: Represents the task to be executed
  • Trigger: Defines the scheduling timing rules - when and how often a job should execute. A single job can be associated with multiple triggers, but each trigger maps to only one job
  • Scheduler: The factory class that creates scheduler instances and coordinates task execution based on trigger rules

The default JobStore implementation is RAMJobStore, where triggers and jobs are stored in memory. The core execution class is QuartzSchedulerThread.

Execution Flow

The scheduler thread retrieves triggers that need execution from JobStore and modifies their status. When firing a trigger, the system updates trigger information including the next fire time and current status, then persists these changes. Finally, the system creates concrete task execution objects and processes them through a worker thread pool.

Cluster Deployment

Quartz's cluster deployment requires creating Quartz-specific tables in the database for different database types (MySQL, Oracle). The JobStore in cluster mode is JobStoreSupport.

This distributed approach lacks a centralized management node and relies on database row-level locking for concurrent control in cluster environments. Scheduler instances in cluster mode first acquire row locks from the {0}LOCKS table. The {0} prefix is replaced with the configured table prefix (default: QRTZ_). The sched_name represents the application cluster instance, and lock_name identifies the row-level lock. Quartz uses two primary row-level locks: TRIGGER_ACCESS and STATE_ACCESS.

This architecture solves distributed scheduling challenges where the same task runs on only one node. However, when handling numerous short tasks, nodes frequently compete for database locks, causing performance degradation as the cluster grows.

Distributed Lock Pattern

While Quartz's cluster mode provides horizontal scalability, it requires database tables, introducing strong coupling. An alternative approach uses distributed locks.

Business Scenario

Consider an e-commerce system where unpaid orders should be cancelled after a timeout period. A typical implementation uses a scheduled task checking orders from the past 30 minutes every 2 minutes, releasing inventory for unpaid orders and marking them as invalid.

@Scheduled(cron = "0 */2 * * * ? ")
public void processPendingOrders() {
    log.info("Scheduled task started");
    orderService.cancelExpiredOrders();
    log.info("Scheduled task completed");
}

In single-server deployments, this works correctly. However, when scaling to a cluster for high availability, multiple servers executing the same task simultaneously can cause business logic errors.

Redis-Based Solution

The solution involves using Redis distributed locks during task execution:

@Scheduled(cron = "0 */2 * * * ? ")
public void processPendingOrders() {
    log.info("Scheduled task started");
    String lockKey = "cancelExpiredOrdersLock";
    RedisLock distributedLock = redisClient.getLock(lockKey);
    boolean acquired = distributedLock.tryLock(3, 300, TimeUnit.SECONDS);
    if (!acquired) {
        log.info("Failed to acquire distributed lock: {}", lockKey);
        return;
    }
    try {
        orderService.cancelExpiredOrders();
    } finally {
        distributedLock.unlock();
    }
    log.info("Scheduled task completed");
}

Redis offers excellent read/write performance, and distributed locks are more lightweight than database row-level locks. Alternatively, Zookeeper-based locks can provide similar functionality.

This combination works well for smaller projects but has two limitations: tasks can still experience idle runs in distributed scenarios, and manual task triggering requires additional code.

ElasticJob-Lite Framework

ElasticJob-Lite provides a lightweight, decentralized solution distributed as a JAR file for distributed task coordination.

Tasks are defined by implementing the SimpleJob interface:

public class MyElasticJob implements SimpleJob {
    @Override
    public void execute(ShardingContext context) {
        switch (context.getShardingItem()) {
            case 0:
                // process segment 0
                break;
            case 1:
                // process segment 1
                break;
            case 2:
                // process segment 2
                break;
        }
    }
}

For example, an application with five tasks (A, B, C, D, E) where task E requires four shards, deployed across two servers. After startup, the five tasks are coordinated through Zookeeper and distributed across both machines, with each running different tasks via Quartz Scheduler.

ElasticJob's underlying scheduling still relies on Quartz. Compared to Redis locks or distributed Quartz, its advantage lies in leveraging Zookeeper for load balancing across Quartz Scheduler containers within applications.

From a usage perspective, it's straightforward. However, architecturally, schedulers and executors reside in the same application JVM, and containers require load balancing after startup. Frequent application restarts lead to continuous leader election and shard rebalancing—relatively heavyweight operations.

Additionally, ElasticJob's console is basic, reading registry data to display job status and updating registry data to modify global task configuration.

Centralized Approaches

Centralized architectures separate scheduling and execution into distinct components: a scheduling center and execution agents. The scheduling center handles scheduling attributes and triggers commands, while execution agents receive commands and execute business logic. Both components can scale independently.

Message Queue Pattern

The first centralized architecture uses message queues for decoupling. The scheduling center relies on Quartz cluster mode and sends messages to RabbitMQ when triggering tasks. Business applications consume these messages as execution agents.

This model leverages MQ's decoupling特性, but has strong dependencies on the message queue. Scalability, functionality, and system load are closely tied to the message queue, requiring architects to have deep expertise in messaging systems.

XXL-JOB

XXL-JOB is a distributed task scheduling platform designed for rapid development, simple learning, and easy extension. It has been adopted by multiple companies in production.

Network Communication Model

The scheduling center and executors communicate using a server-worker model. The scheduling center is a SpringBoot application listening on port 8080. Executors start embedded servers (EmbedServer) listening on port 9994, allowing bidirectional communication.

Executors periodically send registration commands, enabling the scheduling center to maintain a list of available executors. The routing strategy determines which node executes the task:

  • Random Execution: Selects any available node. Suitable for offline order settlement
  • Broadcast Execution: Dispatches tasks to all nodes. Suitable for batch cache updates
  • Sharded Execution: Splits tasks according to custom logic for parallel execution across nodes. Suitable for massive log statistics

Scheduler Implementation

Early XXL-JOB versions relied on Quartz. Version 2.1.0 removed Quartz dependency, replacing Quartz tables with custom tables.

The core scheduler class is JobTriggerPoolHelper. After calling start(), two threads begin: scheduleThread and ringThread.

The scheduleThread periodically loads tasks from the database, using database row locks to ensure only one scheduling node triggers tasks:

Connection conn = XxlJobAdminConfig.getAdminConfig()
                .getDataSource().getConnection();
connAutoCommit = conn.getAutoCommit();
conn.setAutoCommit(false);
preparedStatement = conn.prepareStatement(
    "select * from xxl_job_lock where lock_name = 'schedule_lock' for update");
preparedStatement.execute();
// Trigger task execution (pseudocode)
for (XxlJobInfo jobInfo : scheduleList) {
    // scheduling logic
}
conn.commit();

The scheduleThread handles tasks based on their next fire time: overdue tasks are immediately queued for execution, while tasks due within five seconds are placed in a ringData structure. The ringThread periodically retrieves tasks from ringData and submits them to the thread pool.

Custom Implementation

In 2018, I led a project to build a custom task scheduling system with a specific requirement: supporting the team's proprietary RPC framework without code modifications, allowing RPC-annotated methods to be托管 in the scheduling system as native tasks.

During development, I studied XXL-JOB source code and drew inspiration from Alibaba Cloud's SchedulerX:

  • Schedulerx-console: The scheduling console for creating and managing tasks
  • Schedulerx-server: The core scheduling service responsible for triggering client tasks and monitoring execution status
  • Schedulerx-client: The client component where each application process acts as a Worker, communicating with the server for discovery and registration

Architecture Design

I adopted RocketMQ's remoting module for network communication for two reasons: familiarity with the remoting component from previous projects, and discovering that SchedulerX's communication framework closely resembled RocketMQ Remoting.

In RocketMQ's remoting, the server uses a Processor pattern. The scheduling center registers two processors: CallBackProcessor for callback results and HeartBeatProcessor for heartbeats. Executors register TriggerTaskProcessor for task triggering.

public void registerProcessor(
             int requestCode,
             NettyRequestProcessor processor,
             ExecutorService executor);

public interface NettyRequestProcessor {
    RemotingCommand processRequest(
            ChannelHandlerContext ctx,
            RemotingCommand request) throws Exception;
    boolean rejectRequest();
}

For the communication framework, implementation only requires processing logic without concerning network details.

Scheduler Selection

I ultimately chose Quartz cluster mode for the scheduler due to:

  1. Sufficient stability for moderate scheduling loads with compatibility for existing XXL-JOB tasks
  2. TimeWheel lacked practical experience; coordinating triggers across multiple scheduling servers would require Zookeeper, introducing new components
  3. Project timeline required rapid delivery

The custom scheduler was completed and上线 within six weeks, running stably with approximately 40-50 million调度 executions over four months.

The bottleneck with Quartz's row-level locking became apparent. To address this, I created a prototype:

  1. Removing external registry—scheduling servers manage sessions directly
  2. Introducing Zookeeper for coordination with a simple HA mechanism (primary-standby)
  3. Replacing Quartz with TimeWheel (based on Dubbo's implementation)

This prototype ran in development but required significant optimization and never reached production.

Recent Alibaba Cloud documentation describes SchedulerX 2.0's high-availability architecture using three-way replication with Zookeeper lock competition for leader election.

SchedulerX 2.0 uses Akka architecture for high-performance workflow engines and optimized inter-process communication. Among open-source options, PowerJob also implements Akka architecture with workflow and MapReduce execution modes.

Technical Selection Guide

Comparing open-source task scheduling products with commercial offerings like SchedulerX:

Feature Quartz ElasticJob XXL-JOB SchedulerX PowerJob
Architecture Framework Framework Centralized Centralized Centralized
High Availability Database locks Zookeeper Database locks Zookeeper Zookeeper
Task Sharding Manual Automatic Automatic Automatic Automatic
Workflow No No Basic Advanced Advanced
MapReduce No Yes No Yes Yes
Console No Yes Yes Yes Yes

Quartz and ElasticJob are essentially framework-level solutions. Centralized products offer clearer architecture with more flexible scheduling, supporting complex scenarios like MapReduce dynamic sharding and workflows.

XXL-JOB provides minimal setup with out-of-box functionality, meeting most teams' scheduling needs. Its simplicity and effectiveness explain its popularity.

Technical selection depends on team expertise and specific scenarios. Regardless of the chosen technology, two principles remain crucial:

  • Idempotency: Ensure correct results when tasks execute multiple times or when distributed locks fail
  • Troubleshooting: When tasks fail, check调度 logs, use Jstack for JVM thread analysis, and ensure network communication has proper timeouts

Conclusion

2015 was a significant year for task scheduling—ElasticJob and XXL-JOB, representing different architectural approaches, were both open-sourced. The choice between frameworks ultimately depends on understanding the underlying principles rather than just learning surface-level APIs.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.