Resolving Jar Hell in Java and Scala Projects: Dependency Clashes and Effective Mitigation
Common Failure Indicators
Unexpected runtime behaviour in distributed processing frameworks like Apache Flink or Hadoop often stems from misaligned library versions packed inside the application jar. Typical symptoms fall into two categories.
Explicit Errors
The JVM throws linkage or reflection errors pointing to Flink or Hadoop classes:
java.lang.AbstractMethodError
java.lang.ClassNotFoundException
java.lang.IllegalAccessError
java.lang.IllegalAccessException
java.lang.InstantiationError
java.lang.InstantiationException
java.lang.InvocationTargetException
java.lang.NoClassDefFoundError
java.lang.NoSuchFieldError
java.lang.NoSuchFieldException
java.lang.NoSuchMethodError
java.lang.NoSuchMethodException
Silent Misbehaviours
Logging frameworks may stop writing output or ignore custom log4j configurations. This occurs when the fat jar bundles its own log4j.properties or a conflicting log4j bridge. When different log4j versions are strictly required, use the Maven Shade Plugin to relocate the conflicting packages instead of simply excluding them.
Root Causes
- The application jar carries transitive dependencies that already exist in the target runtime (e.g.,
flink-java,hadoop-common,spring-core, or logging libraries). - Required connector or format dependencies are either missing from the jar or clash with platform-provided jars.
Diagnostic Techniques
Inspect the Packaged Jar Directly
Run jar tf your-application.jar inside the build output directory to list every entry. Look for classes or configuraton files that belong to already provided libraries.
cd target
jar tf data-pipeline-2.1.0.jar | findstr "hadoop"
Trace the Full Dependency Tree
Execute mvn dependency:tree from the project root or an IDE terminal. This command prints the complete hierarchy including transitive dependencies. Use it to identify unwanted pulls that bring in conflicting versions.
mvn dependency:tree -Dincludes=org.apache.flink,org.apache.hadoop
Filtering with -Dincludes narrows the output to suspect groups.
Remediation Approaches
Mark Platform Libraries as Provided
Every dependency that the cluster already supplies must have <scope>provided</scope> so its excluded from the assembled jar:
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
Strip Transitive Dependencies with Exclusions
When a needed third-party library pulls in a conflicting JAR, carve it out with <exclusions>:
<dependency>
<groupId>com.data.vendor</groupId>
<artifactId>connector-bundle</artifactId>
<version>3.4.0</version>
<exclusions>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
</exclusion>
</exclusions>
</dependency>
Relocate Conflicting Packages via Shading
When identical classes must coexist, use maven-shade-plugin to move them into a new namespace, avoiding linkage errors. This is particularly useful for embedded log4j configurations or older client libraries.