#409 Migrate BHR collection to Glean data by skylarkning · Pull Request #413 · mozilla/python_mozetl

skylarkning · 2026-05-05T14:44:38Z

Summary

This PR refers to Issue #409.

This migrates bhr_collection.py to read BHR hang report data from the Glean table:

moz-fx-data-shared-prod.firefox_desktop_stable.hang_report_v1

instead of the legacy telemetry table:

moz-fx-data-shared-prod.telemetry_stable.bhr_v4.

Changes

Set mozdata as the BigQuery billing project
Read client_info, ping_info, and metrics from the Glean hang report table
Map Glean fields into the existing processing shape
Parse Glean hang report/module object metrics before processing
Support Glean stack frames shaped as {frame, module}
Remove the legacy payload/time_since_last_ping normalization path

Local Validation

Ran locally with:

JAVA_HOME=$(/usr/libexec/java_home -v 17) python3 ./mozetl/bhr_collection/bhr_collection.py \
  --date 2025-04-25 \
  --bq-connector-jar=spark-bigquery-latest_2.12.jar \
  --sample-size 0.0002

Results and Comparison

The job completed successfully and wrote:

output/hangs_main_20250421.json
output/hangs_main_current.json

The output files are compared against legacy output:

Top-level keys match
Thread keys match
sampleTable keys match
Date keys match

The output is non-empty for the same thread/date combinations.

Note: Exact numeric similarity is no longer expected as the data source has changed to Glean, and the new output no longer normalizes by usage hours.

…y the front end.

Update bhr_collection.py to read from firefox_desktop_stable.hang_report_v1 instead of telemetry_stable.bhr_v4. Map the Glean client_info and metrics fields into the existing processing shape, and parse the Glean hang report and module object metrics before processing. Refs mozilla#409.

codecov-commenter · 2026-05-05T15:28:28Z

Codecov Report

❌ Patch coverage is 0% with 28 lines in your changes missing coverage. Please review.
✅ Project coverage is 32.58%. Comparing base (6af5435) to head (174d2d1).

Files with missing lines	Patch %	Lines
mozetl/bhr_collection/bhr_collection.py	0.00%	28 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #413      +/-   ##
==========================================
- Coverage   32.63%   32.58%   -0.06%     
==========================================
  Files          36       36              
  Lines        3858     3864       +6     
==========================================
  Hits         1259     1259              
- Misses       2599     2605       +6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

squarewave · 2026-05-05T16:12:10Z

@BenWu would you be able to take a look at reviewing this? Sky is a student worker who just started and he's going to be making some improvements to BHR, so we had him taking a look at migrating the bhr_collection job to use glean data as a first task. Regarding the coverage, I'd like to get some tests in but looking at what we have so far it looks like it might be a little involved to get a basic test of bhr_collection up and running. Does that sound right? In any case do you have any thoughts on the best route forward or salient examples for getting some bhr tests up and running either in this PR or in a follow-up?

BenWu

For testing, I think the simplest thing to do would be to mock the spark load() that gets data from bigquery with a dataframe with a few sample pings. Then just run that through the rest of the job and verify the output. That might be feasible but I'm not sure if spark complicates it.

Another note is that this job is currently using spark 2.4.8 via dataproc image 1.5 which is being discontinued in August this year so this will need to be updated soon. This is out of scope for this PR but I recommend doing a bit of research to get an idea of what would be required. Bringing this up because I'm not sure we (on the data platform side) will be able to get to it before then. That might be an easy AI job but the testing and validation is something I don't want to commit to right now

BenWu · 2026-05-06T16:04:07Z

Github is having issues right now so I can't comment in the code

Can you make the billing project an input arg for the job? We'll still want to run in certain projects for the scheduled production jobs.

Add a --billing-project CLI option and pass it through the job config to the BigQuery connector. Default to mozdata so local runs keep the current behavior, while scheduled jobs can override the billing project.

skylarkning · 2026-05-06T19:29:23Z

Github is having issues right now so I can't comment in the code

Can you make the billing project an input arg for the job? We'll still want to run in certain projects for the scheduled production jobs.

Hi @BenWu, thanks for the feedback and review! I added a --billing-project CLI option and pass it through the job config to the BigQuery connector. Default value is mozdata and scheduled production jobs can override it with a different project.

Next I will look into the tests. Thank you!

Sky Ning and others added 2 commits May 4, 2026 16:52

Use mozdata as BigQuery billing project.

e89f23a

Removed the usage of payload/time_since_last_ping as it is not used b…

2323017

…y the front end.

skylarkning force-pushed the glean-bhr-migration branch 2 times, most recently from 5b89902 to 76fb64c Compare May 5, 2026 15:11

skylarkning force-pushed the glean-bhr-migration branch from 76fb64c to 914dde2 Compare May 5, 2026 15:18

BenWu reviewed May 6, 2026

View reviewed changes

Follow up on Issue 409 - Make BHR billing project configurable

174d2d1

Add a --billing-project CLI option and pass it through the job config to the BigQuery connector. Default to mozdata so local runs keep the current behavior, while scheduled jobs can override the billing project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#409 Migrate BHR collection to Glean data#413

#409 Migrate BHR collection to Glean data#413
skylarkning wants to merge 4 commits intomozilla:mainfrom
skylarkning:glean-bhr-migration

skylarkning commented May 5, 2026

Uh oh!

codecov-commenter commented May 5, 2026 •

edited

Loading

Uh oh!

squarewave commented May 5, 2026

Uh oh!

BenWu left a comment •

edited

Loading

Uh oh!

BenWu commented May 6, 2026

Uh oh!

skylarkning commented May 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

skylarkning commented May 5, 2026

Summary

Changes

Local Validation

Results and Comparison

Uh oh!

codecov-commenter commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

squarewave commented May 5, 2026

Uh oh!

BenWu left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BenWu commented May 6, 2026

Uh oh!

skylarkning commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov-commenter commented May 5, 2026 •

edited

Loading

BenWu left a comment •

edited

Loading

skylarkning commented May 6, 2026 •

edited

Loading