Skip to content

[SPARK-56803][SQL] Add bulk read+narrow path for INT64 DECIMAL to 32-bit Decimal Parquet vector updater#55853

Draft
LuciferYang wants to merge 2 commits into
apache:masterfrom
LuciferYang:SPARK-56803-downcast-long
Draft

[SPARK-56803][SQL] Add bulk read+narrow path for INT64 DECIMAL to 32-bit Decimal Parquet vector updater#55853
LuciferYang wants to merge 2 commits into
apache:masterfrom
LuciferYang:SPARK-56803-downcast-long

Conversation

@LuciferYang
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Extend the bulk read+widen pattern introduced in SPARK-56791 to DowncastLongUpdater (parquet INT64 + DECIMAL(p<=9) read into a Spark 32-bit DecimalType).

A new readLongsAsInts default method on VectorizedValuesReader does the per-row fallback. VectorizedPlainValuesReader overrides it to fetch source bytes once via getBuffer(total * 8) and run a tight in-method conversion loop. DowncastLongUpdater.readValues becomes a one-line delegation. The narrowing is Java's primitive long-to-int cast ((int) buffer.getLong()), which discards the high 32 bits; this is non-lossy in practice because Parquet's DECIMAL(p<=9) encoding bounds the value range to [-999_999_999, 999_999_999].

Why are the changes needed?

DowncastLongUpdater.readValues allocates a fresh ByteBuffer slice inside getBuffer(8) for every element on the legacy path, and that allocation dominates the loop. Collapsing N allocations into one is the same win SPARK-56791 delivered for the INT32 -> Long sibling.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

(To be updated after the GHA benchmark and test runs complete.)

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code

@LuciferYang LuciferYang marked this pull request as draft May 13, 2026 12:23
@LuciferYang LuciferYang force-pushed the SPARK-56803-downcast-long branch from 800ed85 to 39b038d Compare May 14, 2026 02:56
…uet.ParquetVectorUpdaterBenchmark (JDK 17, Scala 2.13, split 1 of 1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant