[SPARK-56803][SQL] Add bulk read+narrow path for INT64 DECIMAL to 32-bit Decimal Parquet vector updater by LuciferYang · Pull Request #55853 · apache/spark

LuciferYang · 2026-05-13T12:21:16Z

What changes were proposed in this pull request?

Extend the bulk read+widen pattern introduced in SPARK-56791 to DowncastLongUpdater (parquet INT64 + DECIMAL(p<=9) read into a Spark 32-bit DecimalType).

A new readLongsAsInts default method on VectorizedValuesReader does the per-row fallback. VectorizedPlainValuesReader overrides it to fetch source bytes once via getBuffer(total * 8) and run a tight in-method conversion loop. DowncastLongUpdater.readValues becomes a one-line delegation. The narrowing is Java's primitive long-to-int cast ((int) buffer.getLong()), which discards the high 32 bits; this is non-lossy in practice because Parquet's DECIMAL(p<=9) encoding bounds the value range to [-999_999_999, 999_999_999].

Why are the changes needed?

DowncastLongUpdater.readValues allocates a fresh ByteBuffer slice inside getBuffer(8) for every element on the legacy path, and that allocation dominates the loop. Collapsing N allocations into one is the same win SPARK-56791 delivered for the INT32 -> Long sibling.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

(To be updated after the GHA benchmark and test runs complete.)

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code

…bit Decimal Parquet vector updater

…uet.ParquetVectorUpdaterBenchmark (JDK 17, Scala 2.13, split 1 of 1)

LuciferYang marked this pull request as draft May 13, 2026 12:23

[SPARK-56803][SQL] Add bulk read+narrow path for INT64 DECIMAL to 32-…

39b038d

…bit Decimal Parquet vector updater

LuciferYang force-pushed the SPARK-56803-downcast-long branch from 800ed85 to 39b038d Compare May 14, 2026 02:56

Benchmark results for org.apache.spark.sql.execution.datasources.parq…

bde2af0

…uet.ParquetVectorUpdaterBenchmark (JDK 17, Scala 2.13, split 1 of 1)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56803][SQL] Add bulk read+narrow path for INT64 DECIMAL to 32-bit Decimal Parquet vector updater#55853

[SPARK-56803][SQL] Add bulk read+narrow path for INT64 DECIMAL to 32-bit Decimal Parquet vector updater#55853
LuciferYang wants to merge 2 commits into
apache:masterfrom
LuciferYang:SPARK-56803-downcast-long

LuciferYang commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LuciferYang commented May 13, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant