Skip to content

Handle gapless MP3 Xing/Info durations#3198

Open
Tolriq wants to merge 1 commit into
androidx:mainfrom
Tolriq:mp3_gapless
Open

Handle gapless MP3 Xing/Info durations#3198
Tolriq wants to merge 1 commit into
androidx:mainfrom
Tolriq:mp3_gapless

Conversation

@Tolriq
Copy link
Copy Markdown
Contributor

@Tolriq Tolriq commented May 5, 2026

LAME Xing/Info headers can include encoder delay and padding. Mp3Extractor already propagates those values into Format so decoded audio is rendered gaplessly, but the SeekMap duration was still computed from the untrimmed MPEG frame count. For CBR Info files this can expose a source duration that is longer than the samples that will actually be played, which can break gapless transitions.

Split the Xing/Info duration into two concepts: raw duration from the frame count, used for bitrate and byte-position calculations, and gapless duration with encoder delay and padding removed, exposed from SeekMap.

Keep CBR average-bitrate derivation on the raw duration to avoid changing byte-position seeking. When a CBR seeker is given an explicit gapless duration, map seeks at the advertised end of the stream to the raw data end so the SeekMap endpoint contract is preserved.

Add focused tests for raw vs gapless duration calculation, CBR seek endpoint handling, and Info-frame duration/bitrate behavior, and update affected extractor dumps.

Fixes #3183.

@Tolriq Tolriq mentioned this pull request May 5, 2026
1 task
@icbaker icbaker self-assigned this May 6, 2026
@icbaker icbaker self-requested a review May 6, 2026 15:56
@icbaker
Copy link
Copy Markdown
Collaborator

icbaker commented May 6, 2026

GitHub isn't letting me add review comments for some reason, so here's a quick bit of initial feedback


The part of this PR related to the LAME_TO_DECODED_PCM_TRIM_OFFSET_SAMPLES constant & its usages seem unrelated to the duration vs raw duration split. Please can you send this as a separate PR?


In the bear-vbr-xing-header-no-toc.mp3.cbr-seeking-always.0.dump dump file, I'd expect getPosition(DURATION) to return timeUs=DURATION (concretely: getPosition(2783979) = [[timeUs=2783979, position=38396]]). But it seems timeUs is still returning the 'raw' duration (same for many other dump files).

@Tolriq
Copy link
Copy Markdown
Contributor Author

Tolriq commented May 6, 2026

Yeah sorry I did not fully verify that part, I completely underestimated the complete mess that is mp3 seeking and gapless before going into this.

I've moved the lame part in another PR, but used AI to rebase it on main to not depend on this one. It's probably best to not really look at the second one deeply until this is one is done or not.

* Returns the raw duration before subtracting encoder delay/padding, or {@link C#TIME_UNSET} if
* not known.
*/
default long getRawDurationUs() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm wondering if we can avoid introducing this method (and therefore avoid having to distinguish between these concepts of 'duration' and 'raw duration').

It looks like it's used in two ways:

  1. To calculate a bitrate in Mp3Extractor as part of the 'enhanced CBR' logic
  2. In the implementation of ConstantBitrateSeeker.getSeekPoints to ensure the final byte position is reported correctly.

For (1), I see that Seeker already has a getAverageBitrate() method, which should fit for the use-case (based on its documentation of Returns the average bitrate (usually derived from the duration and length of the file)) - except it seems that XingSeeker and VbriSeeker implement this incorrectly by directly returning the bitrate of the MPEG frame that contains the seek metadata.

For (2) I think you can use getDataEndPosition() instead?

If this sounds right, would you be up for:

  1. Sending a PR to fix the bug in the getAverageBitrate() impls in XingSeeker and VbriSeeker
  2. Rebasing this PR on top once that's merged, and then removing getRawDurationUs and using getAverageBitrate() and getDataEndPosition() instead.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can use Codex to split things and move things yes if that's not a problem for you. Way less time consuming.

Also Codex: Caveat: getDataEndPosition() is documented as immediately after audio data, so using it directly may be one byte/frame past the desired seek point. The implementation should likely use the last valid seek position derived from data end, for example dataEndPosition - normalizedFrameSize, while returning timeUs = durationUs. we start to be a little outside of my confort zone so I don't really have value on the decisions, just did tests and debug to try to reach solutions. I now mostly rely on your expertise.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, that's a good point about getDataEndPosition() - but ConstantBitrateSeeker already has a frameSize field so this calculation should be easy.

I think the time of this seek point should also be adjusted to be durationUs - frameDurationUs - since if you seek there, it will play the last frame, which has non-zero duration.

@Tolriq
Copy link
Copy Markdown
Contributor Author

Tolriq commented May 29, 2026

Rebased after the merge of the other one.

this.dataStartPosition = dataStartPosition;
this.dataEndPosition = dataEndPosition;
this.averageBitrate = computeAverageBitrate(dataEndPosition - dataStartPosition, durationUs);
this.averageBitrate =
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this diff related to the gapless fix or should it be in a separate PR?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This come up during the rebase because of 48836cb that was changed in the middle.

This is Codex answer when I asked confirmation about that change (Due to previous tests no more passing after that commit) :

• You’re right to question it. The failures were not from the new tests; they were existing Mp3ExtractorTest dump comparisons for FLAG_ENABLE_INDEX_SEEKING.

  What happened:

  - Our gapless change makes XingSeeker.getDurationUs() return the gapless duration.
  - We separately made Xing/Info average bitrate use encoded duration, so normal Xing paths keep the old bitrate.
  - But with FLAG_ENABLE_INDEX_SEEKING, Mp3Extractor wraps the unseekable XingSeeker in an IndexSeeker.
  - IndexSeeker was recomputing averageBitrate from its constructor durationUs, which is now the gapless duration.
  - That changed dump output from 108719 to 109656.

  So my IndexSeeker change was: when wrapping an existing seeker, pass through the existing average bitrate instead of recomputing it from the gapless duration. The added IndexSeekerTest only documents that behavior.

  I would not fix this by updating dumps. Updating dumps would accept a real behavior change in Format.averageBitrate for index-seeking MP3s, which contradicts the goal of “gapless durations while preserving average bitrate calculations.”

  That said, if we want the upstream diff to avoid touching IndexSeeker, a narrower alternative is possible: keep IndexSeeker unchanged and have Mp3Extractor remember the pre-wrap average bitrate for Format.Builder. But that leaves IndexSeeker.getAverageBitrate() internally
  inconsistent, so I think the current small IndexSeeker constructor overload is the cleaner fix.

It's answer convinced me it was better than updating the average, but it's maybe wrong.

} else {
resultSeeker =
getConstantBitrateSeeker(
input, (flags & FLAG_ENABLE_CONSTANT_BITRATE_SEEKING_ALWAYS) != 0);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is unfortunate to add another branch to this logic (which is already very branch-y), especially since this is a duplicate of L564.

Please can you refactor this to avoid this duplication?

RoundingMode.HALF_UP));
// See also https://github.com/androidx/media/issues/1376.
int averageBitrate = computeAverageBitrate(audioLength, infoFrame.computeEncodedDurationUs());
if (averageBitrate == C.RATE_UNSET_INT) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can this happen? durationUs == C.TIME_UNSET already results in return null above and audioLength is assigned to a non-unset value on either L669 or L672 above.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an habit of defensive coding as the function can return it, did not check all the possible cases.
Can remove if wanted and safe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants