Python Job Failure: java.lang.NoSuchFieldError: chunkSize – Lucidworks

Issue

When running Python Spark jobs after upgrading to Fusion 5.9.16, the job fails during execution with the following error in the serve-Arrow thread:

Exception in thread "serve-Arrow" java.lang.NoSuchFieldError: chunkSize
    at io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.<init>(PooledByteBufAllocatorL.java:153)
    at io.netty.buffer.PooledByteBufAllocatorL.<init>(PooledByteBufAllocatorL.java:49)
    at org.apache.arrow.memory.NettyAllocationManager.<clinit>(NettyAllocationManager.java:51)

Diagnosis

This error occurs when PySpark attempts to use Apache Arrow for data serialization (e.g., during .toPandas() calls or when using Pandas UDFs). The stack trace indicates that arrow-memory-netty is attempting to access the chunkSize field within the Netty PoolChunk class.

In Fusion 5.9.16, the bundled Netty version was upgraded to a version where the chunkSize field has been renamed or refactored. Because the Apache Arrow libraries included in the Spark image were compiled against an older version of Netty, a binary incompatibility is triggered at runtime.

Environment

Fusion Version: 5.9.16
Spark Version: 3.x
Cloud Platform: AKS / Self-Hosted
Kubernetes Version: 1.30

Cause

The root cause is a classpath version mismatch within the fusion-spark Docker image. Specifically:

The arrow-memory-netty JAR is compiled against Netty 4.1.72 or older.
The Netty JARs present on the Fusion 5.9.16 Spark classpath are version 4.1.76 or newer.

This mismatch prevents the Arrow memory allocator from initializing correctly, leading to the NoSuchFieldError.

Resolution

To resolve this issue, you must disable Apache Arrow optimization for PySpark. This forces Spark to use standard Java serialization for data transfer between the JVM and the Python process.

Step 1: Update Spark Job Configuration

Add the following properties to your Spark job configuration:

spark.sql.execution.arrow.pyspark.enabled=false
spark.sql.execution.arrow.enabled=false

Step 2: Verification

Restart the job and monitor the driver logs. The serve-Arrow thread should no longer initialize, and the job should complete successfully.

Technical Note on Performance

Disabling Arrow serialization will result in slower data transfer between Spark executors and Python workers. This impact is most noticeable when processing large DataFrames. However, this is the only supported workaround as Lucidworks does not currently provide a native patch for Arrow-Netty library alignment in the 5.9.16 Spark image.

Avoid "Unsafe" Allocator Workaround

Do not attempt to switch the Arrow allocator to Unsafe using -Darrow.allocation.manager.type=Unsafe. This will result in a ClassNotFoundException because the arrow-memory-unsafe JAR is not bundled in the standard Fusion Spark distribution.