Skip to content

[spark] Add numRowsRead custom metric and improve Spark UI operator name for FlussScan#3196

Merged
luoyuxia merged 4 commits intoapache:mainfrom
Yohahaha:spark-ui
Apr 27, 2026
Merged

[spark] Add numRowsRead custom metric and improve Spark UI operator name for FlussScan#3196
luoyuxia merged 4 commits intoapache:mainfrom
Yohahaha:spark-ui

Conversation

@Yohahaha
Copy link
Copy Markdown
Contributor

@Yohahaha Yohahaha commented Apr 24, 2026

Purpose

Linked issue: close #3160

Brief change log

This PR adds custom metrics support to the Spark connector for Fluss table scans, enabling users to monitor
row read counts via Spark UI. It also fixes AbstractSparkTable.name() to return the correct table path.

  • Add numRowsRead custom metric: Introduces FlussMetrics with custom metric classes (FlussNumRowsReadMetric, FlussNumRowsReadTaskMetric) for driver-side aggregation and executor-side reporting
  • Fix Spark UI operator name: AbstractSparkTable.name() now returns tableInfo.getTablePath.toString (e.g. db.table) instead of the verbose TableInfo.toString.
  • Refactor next()next0(): Moves scan logic to next0() in concrete readers so the
    base class can centrally maintain the row counter.

Tests

  • SparkLogTableReadTest, Spark Read: log table scan metrics
  • SparkPrimaryKeyTableReadTest, Spark Read: primary key table scan metrics

API and Format

Documentation

@Yohahaha Yohahaha marked this pull request as ready for review April 24, 2026 08:41
@Yohahaha
Copy link
Copy Markdown
Contributor Author

before
image

after
image

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the Fluss Spark connector’s observability and Spark UI readability by adding a custom numRowsRead scan metric and improving scan/table naming shown in Spark’s operator UI.

Changes:

  • Add a numRowsRead custom metric for Fluss scans via Spark DataSource V2 custom metrics.
  • Improve Spark UI scan description (FlussScan.description) to include table path and scan type.
  • Refactor partition readers to move scan iteration logic into next0() so the base reader can centrally maintain the row counter and report task metrics.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated no comments.

Show a summary per file
File Description
fluss-spark/fluss-spark-common/src/main/scala/org/apache/fluss/spark/read/FlussMetrics.scala Introduces Fluss custom metric/task-metric implementations and a shared metric name constant.
fluss-spark/fluss-spark-common/src/main/scala/org/apache/fluss/spark/read/FlussPartitionReader.scala Adds centralized row counting + executor-side currentMetricsValues() reporting and next0() hook.
fluss-spark/fluss-spark-common/src/main/scala/org/apache/fluss/spark/read/FlussScan.scala Adds scan description improvements and advertises supported custom metrics.
fluss-spark/fluss-spark-common/src/main/scala/org/apache/fluss/spark/read/FlussAppendPartitionReader.scala Refactors next() implementation to next0() for centralized metric accounting.
fluss-spark/fluss-spark-common/src/main/scala/org/apache/fluss/spark/read/FlussUpsertPartitionReader.scala Refactors next() implementation to next0() for centralized metric accounting.
fluss-spark/fluss-spark-common/src/main/scala/org/apache/fluss/spark/read/lake/FlussLakeAppendPartitionReader.scala Refactors next() implementation to next0() for centralized metric accounting.
fluss-spark/fluss-spark-common/src/main/scala/org/apache/fluss/spark/read/lake/FlussLakeUpsertPartitionReader.scala Refactors next() implementation to next0() for centralized metric accounting.
fluss-spark/fluss-spark-common/src/main/scala/org/apache/fluss/spark/catalog/AbstractSparkTable.scala Makes Spark table name() return the concise table path string for better UI readability.
fluss-spark/fluss-spark-ut/src/test/scala/org/apache/fluss/spark/SparkLogTableReadTest.scala Adds UT coverage validating scan description and numRowsRead metric accumulation for append/log scans.
fluss-spark/fluss-spark-ut/src/test/scala/org/apache/fluss/spark/SparkPrimaryKeyTableReadTest.scala Adds UT coverage validating scan description and numRowsRead metric accumulation for upsert/PK scans.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Yohahaha
Copy link
Copy Markdown
Contributor Author

@luoyuxia please help take a look, thank you!

Copy link
Copy Markdown
Contributor

@luoyuxia luoyuxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@luoyuxia luoyuxia merged commit 5aec064 into apache:main Apr 27, 2026
7 checks passed
@Yohahaha Yohahaha deleted the spark-ui branch April 27, 2026 13:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[spark] Improve spark ui more readable when reading fluss

3 participants