Skip to content

#409 Migrate BHR collection to Glean data#413

Open
skylarkning wants to merge 4 commits intomozilla:mainfrom
skylarkning:glean-bhr-migration
Open

#409 Migrate BHR collection to Glean data#413
skylarkning wants to merge 4 commits intomozilla:mainfrom
skylarkning:glean-bhr-migration

Conversation

@skylarkning
Copy link
Copy Markdown

Summary

This PR refers to Issue #409.

This migrates bhr_collection.py to read BHR hang report data from the Glean table:

moz-fx-data-shared-prod.firefox_desktop_stable.hang_report_v1

instead of the legacy telemetry table:

moz-fx-data-shared-prod.telemetry_stable.bhr_v4.

Changes

  • Set mozdata as the BigQuery billing project
  • Read client_info, ping_info, and metrics from the Glean hang report table
  • Map Glean fields into the existing processing shape
  • Parse Glean hang report/module object metrics before processing
  • Support Glean stack frames shaped as {frame, module}
  • Remove the legacy payload/time_since_last_ping normalization path

Local Validation

Ran locally with:

JAVA_HOME=$(/usr/libexec/java_home -v 17) python3 ./mozetl/bhr_collection/bhr_collection.py \
  --date 2025-04-25 \
  --bq-connector-jar=spark-bigquery-latest_2.12.jar \
  --sample-size 0.0002

Results and Comparison

The job completed successfully and wrote:

  • output/hangs_main_20250421.json
  • output/hangs_main_current.json

The output files are compared against legacy output:

  • Top-level keys match
  • Thread keys match
  • sampleTable keys match
  • Date keys match

The output is non-empty for the same thread/date combinations.

Note: Exact numeric similarity is no longer expected as the data source has changed to Glean, and the new output no longer normalizes by usage hours.

@skylarkning skylarkning force-pushed the glean-bhr-migration branch 2 times, most recently from 5b89902 to 76fb64c Compare May 5, 2026 15:11
Update bhr_collection.py to read from firefox_desktop_stable.hang_report_v1 instead of telemetry_stable.bhr_v4.

Map the Glean client_info and metrics fields into the existing processing shape, and parse the Glean hang report and module object metrics before processing.

Refs mozilla#409.
@skylarkning skylarkning force-pushed the glean-bhr-migration branch from 76fb64c to 914dde2 Compare May 5, 2026 15:18
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 5, 2026

Codecov Report

❌ Patch coverage is 0% with 28 lines in your changes missing coverage. Please review.
✅ Project coverage is 32.58%. Comparing base (6af5435) to head (174d2d1).

Files with missing lines Patch % Lines
mozetl/bhr_collection/bhr_collection.py 0.00% 28 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #413      +/-   ##
==========================================
- Coverage   32.63%   32.58%   -0.06%     
==========================================
  Files          36       36              
  Lines        3858     3864       +6     
==========================================
  Hits         1259     1259              
- Misses       2599     2605       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@squarewave
Copy link
Copy Markdown
Contributor

@BenWu would you be able to take a look at reviewing this? Sky is a student worker who just started and he's going to be making some improvements to BHR, so we had him taking a look at migrating the bhr_collection job to use glean data as a first task. Regarding the coverage, I'd like to get some tests in but looking at what we have so far it looks like it might be a little involved to get a basic test of bhr_collection up and running. Does that sound right? In any case do you have any thoughts on the best route forward or salient examples for getting some bhr tests up and running either in this PR or in a follow-up?

Copy link
Copy Markdown
Contributor

@BenWu BenWu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For testing, I think the simplest thing to do would be to mock the spark load() that gets data from bigquery with a dataframe with a few sample pings. Then just run that through the rest of the job and verify the output. That might be feasible but I'm not sure if spark complicates it.

Another note is that this job is currently using spark 2.4.8 via dataproc image 1.5 which is being discontinued in August this year so this will need to be updated soon. This is out of scope for this PR but I recommend doing a bit of research to get an idea of what would be required. Bringing this up because I'm not sure we (on the data platform side) will be able to get to it before then. That might be an easy AI job but the testing and validation is something I don't want to commit to right now

@BenWu
Copy link
Copy Markdown
Contributor

BenWu commented May 6, 2026

Github is having issues right now so I can't comment in the code

Can you make the billing project an input arg for the job? We'll still want to run in certain projects for the scheduled production jobs.

Add a --billing-project CLI option and pass it through the job config to the BigQuery connector.

Default to mozdata so local runs keep the current behavior, while scheduled jobs can override the billing project.
@skylarkning
Copy link
Copy Markdown
Author

skylarkning commented May 6, 2026

Github is having issues right now so I can't comment in the code

Can you make the billing project an input arg for the job? We'll still want to run in certain projects for the scheduled production jobs.

Hi @BenWu, thanks for the feedback and review! I added a --billing-project CLI option and pass it through the job config to the BigQuery connector. Default value is mozdata and scheduled production jobs can override it with a different project.

Next I will look into the tests. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants