Fading Coder

One Final Commit for the Last Sprint

Understanding Apache Flink's Semi and Anti Join Implementation

Introduction to Semi and Anti Joins in Apache Flink Apache Flink provides robust support for semi joins and anti joins in its query processing framework. These specialized join operations are essential for implementing SQL features like IN/NOT IN subqueries and EXISTS/NOT EXISTS predicates. Flink's...

Optimizing Apache Flink for Large-Scale Stream Processing at Kuaishou

Evolution of Stream Processing Systems Stream processing focuses on handling unbounded data streams in real time, delivering results with minimal latency. To understand Flink's role, we trace the evolution of data systems: Early Batch Era: Google's MapReduce (2003) and Apache Hadoop popularized larg...

Apache Flink vs Spark Streaming: Core Concepts, Deployment, and Word Count Examples

Apache Spark and Apache Flink are both widely used big data processing frameworks, but they differ significantly in stream processing architectures. Spark uses Spark Streaming for micro-batch stream processing, while Flink is designed as a true stream processing engine with stream-batch unification...

Flink Java Development Environment Setup

Flink is considered one of the top tools in the big data field. It has been incorporated into the Apache Foundation. This article introduces the development environment setup, not intended for production use. I. Flink Overview Note: The following content was generated by edge's Copilot and slightly...

Flink Task Execution Pipeline: From Transformations to JobGraph

Job Transformation Pipeline When a user submits a Flink job, the system collects operators through a chain of method calls: transform() → doTransform() → addOperator(). This process accumulates operators like map, flatMap, filter, and process into a List<Transformation<?>> collection. Up...

Understanding Flink SQL for Dynamic Stream Processing

Flink SQL Overview Table API and SQL represent the highest-level APIs within Flink. These two APIs are tightly integrated, with SQL operations being executed against Flink's Table abstraction. Consequently, they are often considered a unified layer. Flink provides a unified batch and stream processi...

Comprehensive Guide to Apache Flink Performance Optimization

Resource configuration is the foundational step in Flink performance tuning. Adequate resource allocation correlates directly with throughput capabilities. When submitting applications via YARN in per-job mode, resources are defined through command-line arguments or configuration files. Since Flink...