[spark] Add numRowsRead custom metric and improve Spark UI operator name for FlussScan by Yohahaha · Pull Request #3196 · apache/fluss

Yohahaha · 2026-04-24T08:41:13Z

Purpose

Linked issue: close #3160

Brief change log

This PR adds custom metrics support to the Spark connector for Fluss table scans, enabling users to monitor
row read counts via Spark UI. It also fixes AbstractSparkTable.name() to return the correct table path.

Add numRowsRead custom metric: Introduces FlussMetrics with custom metric classes (FlussNumRowsReadMetric, FlussNumRowsReadTaskMetric) for driver-side aggregation and executor-side reporting
Fix Spark UI operator name: AbstractSparkTable.name() now returns tableInfo.getTablePath.toString (e.g. db.table) instead of the verbose TableInfo.toString.
Refactor next() → next0(): Moves scan logic to next0() in concrete readers so the
base class can centrally maintain the row counter.

Tests

SparkLogTableReadTest, Spark Read: log table scan metrics
SparkPrimaryKeyTableReadTest, Spark Read: primary key table scan metrics

API and Format

Documentation

Yohahaha · 2026-04-24T08:41:46Z

before

after

Copilot

Pull request overview

This PR enhances the Fluss Spark connector’s observability and Spark UI readability by adding a custom numRowsRead scan metric and improving scan/table naming shown in Spark’s operator UI.

Changes:

Add a numRowsRead custom metric for Fluss scans via Spark DataSource V2 custom metrics.
Improve Spark UI scan description (FlussScan.description) to include table path and scan type.
Refactor partition readers to move scan iteration logic into next0() so the base reader can centrally maintain the row counter and report task metrics.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
fluss-spark/fluss-spark-common/src/main/scala/org/apache/fluss/spark/read/FlussMetrics.scala	Introduces Fluss custom metric/task-metric implementations and a shared metric name constant.
fluss-spark/fluss-spark-common/src/main/scala/org/apache/fluss/spark/read/FlussPartitionReader.scala	Adds centralized row counting + executor-side `currentMetricsValues()` reporting and `next0()` hook.
fluss-spark/fluss-spark-common/src/main/scala/org/apache/fluss/spark/read/FlussScan.scala	Adds scan description improvements and advertises supported custom metrics.
fluss-spark/fluss-spark-common/src/main/scala/org/apache/fluss/spark/read/FlussAppendPartitionReader.scala	Refactors `next()` implementation to `next0()` for centralized metric accounting.
fluss-spark/fluss-spark-common/src/main/scala/org/apache/fluss/spark/read/FlussUpsertPartitionReader.scala	Refactors `next()` implementation to `next0()` for centralized metric accounting.
fluss-spark/fluss-spark-common/src/main/scala/org/apache/fluss/spark/read/lake/FlussLakeAppendPartitionReader.scala	Refactors `next()` implementation to `next0()` for centralized metric accounting.
fluss-spark/fluss-spark-common/src/main/scala/org/apache/fluss/spark/read/lake/FlussLakeUpsertPartitionReader.scala	Refactors `next()` implementation to `next0()` for centralized metric accounting.
fluss-spark/fluss-spark-common/src/main/scala/org/apache/fluss/spark/catalog/AbstractSparkTable.scala	Makes Spark table `name()` return the concise table path string for better UI readability.
fluss-spark/fluss-spark-ut/src/test/scala/org/apache/fluss/spark/SparkLogTableReadTest.scala	Adds UT coverage validating scan description and `numRowsRead` metric accumulation for append/log scans.
fluss-spark/fluss-spark-ut/src/test/scala/org/apache/fluss/spark/SparkPrimaryKeyTableReadTest.scala	Adds UT coverage validating scan description and `numRowsRead` metric accumulation for upsert/PK scans.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Yohahaha · 2026-04-27T07:36:35Z

@luoyuxia please help take a look, thank you!

luoyuxia

+1

Yohahaha marked this pull request as ready for review April 24, 2026 08:41

luoyuxia requested a review from Copilot April 24, 2026 09:14

Copilot started reviewing on behalf of luoyuxia April 24, 2026 09:14 View session

Copilot AI reviewed Apr 24, 2026

View reviewed changes

Yohahaha added 3 commits April 27, 2026 10:12

fix

4b9efd3

add next0

3418b88

fix rebase

42eb918

Yohahaha force-pushed the spark-ui branch from 89cb24f to 42eb918 Compare April 27, 2026 02:14

trigger CI

07a4140

luoyuxia approved these changes Apr 27, 2026

View reviewed changes

luoyuxia merged commit 5aec064 into apache:main Apr 27, 2026
7 checks passed

Yohahaha deleted the spark-ui branch April 27, 2026 13:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[spark] Add numRowsRead custom metric and improve Spark UI operator name for FlussScan#3196

[spark] Add numRowsRead custom metric and improve Spark UI operator name for FlussScan#3196
luoyuxia merged 4 commits intoapache:mainfrom
Yohahaha:spark-ui

Yohahaha commented Apr 24, 2026 •

edited

Loading

Uh oh!

Yohahaha commented Apr 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Yohahaha commented Apr 27, 2026

Uh oh!

luoyuxia left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Yohahaha commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Brief change log

Tests

API and Format

Documentation

Uh oh!

Yohahaha commented Apr 24, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Yohahaha commented Apr 27, 2026

Uh oh!

luoyuxia left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Yohahaha commented Apr 24, 2026 •

edited

Loading