Handle gapless MP3 Xing/Info durations#3198
Conversation
|
GitHub isn't letting me add review comments for some reason, so here's a quick bit of initial feedback The part of this PR related to the In the |
|
Yeah sorry I did not fully verify that part, I completely underestimated the complete mess that is mp3 seeking and gapless before going into this. I've moved the lame part in another PR, but used AI to rebase it on main to not depend on this one. It's probably best to not really look at the second one deeply until this is one is done or not. |
| * Returns the raw duration before subtracting encoder delay/padding, or {@link C#TIME_UNSET} if | ||
| * not known. | ||
| */ | ||
| default long getRawDurationUs() { |
There was a problem hiding this comment.
i'm wondering if we can avoid introducing this method (and therefore avoid having to distinguish between these concepts of 'duration' and 'raw duration').
It looks like it's used in two ways:
- To calculate a bitrate in
Mp3Extractoras part of the 'enhanced CBR' logic - In the implementation of
ConstantBitrateSeeker.getSeekPointsto ensure the final byte position is reported correctly.
For (1), I see that Seeker already has a getAverageBitrate() method, which should fit for the use-case (based on its documentation of Returns the average bitrate (usually derived from the duration and length of the file)) - except it seems that XingSeeker and VbriSeeker implement this incorrectly by directly returning the bitrate of the MPEG frame that contains the seek metadata.
For (2) I think you can use getDataEndPosition() instead?
If this sounds right, would you be up for:
- Sending a PR to fix the bug in the
getAverageBitrate()impls inXingSeekerandVbriSeeker - Rebasing this PR on top once that's merged, and then removing
getRawDurationUsand usinggetAverageBitrate()andgetDataEndPosition()instead.
There was a problem hiding this comment.
I can use Codex to split things and move things yes if that's not a problem for you. Way less time consuming.
Also Codex: Caveat: getDataEndPosition() is documented as immediately after audio data, so using it directly may be one byte/frame past the desired seek point. The implementation should likely use the last valid seek position derived from data end, for example dataEndPosition - normalizedFrameSize, while returning timeUs = durationUs. we start to be a little outside of my confort zone so I don't really have value on the decisions, just did tests and debug to try to reach solutions. I now mostly rely on your expertise.
There was a problem hiding this comment.
Ah yeah, that's a good point about getDataEndPosition() - but ConstantBitrateSeeker already has a frameSize field so this calculation should be easy.
I think the time of this seek point should also be adjusted to be durationUs - frameDurationUs - since if you seek there, it will play the last frame, which has non-zero duration.
|
Rebased after the merge of the other one. |
| this.dataStartPosition = dataStartPosition; | ||
| this.dataEndPosition = dataEndPosition; | ||
| this.averageBitrate = computeAverageBitrate(dataEndPosition - dataStartPosition, durationUs); | ||
| this.averageBitrate = |
There was a problem hiding this comment.
is this diff related to the gapless fix or should it be in a separate PR?
There was a problem hiding this comment.
This come up during the rebase because of 48836cb that was changed in the middle.
This is Codex answer when I asked confirmation about that change (Due to previous tests no more passing after that commit) :
• You’re right to question it. The failures were not from the new tests; they were existing Mp3ExtractorTest dump comparisons for FLAG_ENABLE_INDEX_SEEKING.
What happened:
- Our gapless change makes XingSeeker.getDurationUs() return the gapless duration.
- We separately made Xing/Info average bitrate use encoded duration, so normal Xing paths keep the old bitrate.
- But with FLAG_ENABLE_INDEX_SEEKING, Mp3Extractor wraps the unseekable XingSeeker in an IndexSeeker.
- IndexSeeker was recomputing averageBitrate from its constructor durationUs, which is now the gapless duration.
- That changed dump output from 108719 to 109656.
So my IndexSeeker change was: when wrapping an existing seeker, pass through the existing average bitrate instead of recomputing it from the gapless duration. The added IndexSeekerTest only documents that behavior.
I would not fix this by updating dumps. Updating dumps would accept a real behavior change in Format.averageBitrate for index-seeking MP3s, which contradicts the goal of “gapless durations while preserving average bitrate calculations.”
That said, if we want the upstream diff to avoid touching IndexSeeker, a narrower alternative is possible: keep IndexSeeker unchanged and have Mp3Extractor remember the pre-wrap average bitrate for Format.Builder. But that leaves IndexSeeker.getAverageBitrate() internally
inconsistent, so I think the current small IndexSeeker constructor overload is the cleaner fix.
It's answer convinced me it was better than updating the average, but it's maybe wrong.
| } else { | ||
| resultSeeker = | ||
| getConstantBitrateSeeker( | ||
| input, (flags & FLAG_ENABLE_CONSTANT_BITRATE_SEEKING_ALWAYS) != 0); |
There was a problem hiding this comment.
it is unfortunate to add another branch to this logic (which is already very branch-y), especially since this is a duplicate of L564.
Please can you refactor this to avoid this duplication?
| RoundingMode.HALF_UP)); | ||
| // See also https://github.com/androidx/media/issues/1376. | ||
| int averageBitrate = computeAverageBitrate(audioLength, infoFrame.computeEncodedDurationUs()); | ||
| if (averageBitrate == C.RATE_UNSET_INT) { |
There was a problem hiding this comment.
how can this happen? durationUs == C.TIME_UNSET already results in return null above and audioLength is assigned to a non-unset value on either L669 or L672 above.
There was a problem hiding this comment.
Just an habit of defensive coding as the function can return it, did not check all the possible cases.
Can remove if wanted and safe.
LAME Xing/Info headers can include encoder delay and padding. Mp3Extractor already propagates those values into Format so decoded audio is rendered gaplessly, but the SeekMap duration was still computed from the untrimmed MPEG frame count. For CBR Info files this can expose a source duration that is longer than the samples that will actually be played, which can break gapless transitions.
Split the Xing/Info duration into two concepts: raw duration from the frame count, used for bitrate and byte-position calculations, and gapless duration with encoder delay and padding removed, exposed from SeekMap.
Keep CBR average-bitrate derivation on the raw duration to avoid changing byte-position seeking. When a CBR seeker is given an explicit gapless duration, map seeks at the advertised end of the stream to the raw data end so the SeekMap endpoint contract is preserved.
Add focused tests for raw vs gapless duration calculation, CBR seek endpoint handling, and Info-frame duration/bitrate behavior, and update affected extractor dumps.
Fixes #3183.