Skip to content

Add audio translation task type and provider#335

Merged
julien-nc merged 9 commits into
mainfrom
enh/noid/audio-translation
Jun 16, 2026
Merged

Add audio translation task type and provider#335
julien-nc merged 9 commits into
mainfrom
enh/noid/audio-translation

Conversation

@julien-nc

@julien-nc julien-nc commented Feb 4, 2026

Copy link
Copy Markdown
Member
  • New audio2audio:translate task type (should be created in server?)
  • Audio translation provider
  • Factorize translation logic in a service
  • Use correct user language for IL10N text translations happening in the task

@julien-nc julien-nc added enhancement New feature or request 3. to review labels Feb 4, 2026
@julien-nc julien-nc changed the title Add audio translation task type and provider… Add audio translation task type and provider Feb 4, 2026
@julien-nc julien-nc force-pushed the enh/noid/audio-translation branch 4 times, most recently from 0c8d7be to 2e45e17 Compare February 4, 2026 16:40
Comment thread lib/TaskProcessing/AudioToAudioTranslateTaskType.php
Comment thread lib/TaskProcessing/AudioToAudioTranslateProvider.php
$this->logger->warning('Text to speech generation failed: no speech returned');
throw new ProcessingException('Text to speech generation failed: no speech returned');
}
$translatedAudio = $includeWatermark ? $this->watermarkingService->markAudio($apiResponse['body']) : $apiResponse['body'];

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe better to watermark the transcript of the input audio so the translated text and the translated audio both have the watermark in the target language

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The audio should always be marked directly as there is specific metadata that is added to the audio.

Comment on lines +157 to +166
if ($includeWatermark) {
if ($userId !== null) {
$user = $this->userManager->getExistingUser($userId);
$lang = $this->l10nFactory->getUserLanguage($user);
$l = $this->l10nFactory->get(Application::APP_ID, $lang);
$ttsPrompt .= "\n\n" . $l->t('This was generated using Artificial Intelligence.');
} else {
$ttsPrompt .= "\n\n" . $this->l->t('This was generated using Artificial Intelligence.');
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can also work but it would add the text/audio in the user's language which may or may not be the target language.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The watermark is now appended to the input transcript in the language of the user. So it ends up being in the audio and text outputs in the target language.

Comment thread lib/AppInfo/Application.php
@julien-nc julien-nc marked this pull request as draft February 24, 2026 09:41
@julien-nc julien-nc force-pushed the enh/noid/audio-translation branch 3 times, most recently from 67bfb51 to 1849a43 Compare June 15, 2026 09:41
@julien-nc julien-nc requested a review from kyteinsky June 15, 2026 10:19
@julien-nc julien-nc marked this pull request as ready for review June 15, 2026 10:19
@julien-nc

Copy link
Copy Markdown
Member Author

I had to update the composer dependencies to fix CI issues.

Comment thread lib/TaskProcessing/AudioToAudioTranslateProvider.php Outdated
Comment thread lib/TaskProcessing/AudioToAudioTranslateProvider.php
Comment thread lib/TaskProcessing/AudioToAudioTranslateProvider.php Outdated

@lukasdotcom lukasdotcom left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't look that deeply yet, but found a few things. Also it might be worth it to use the new methods in TranslateService for the old task processing provider too.

Comment thread lib/TaskProcessing/AudioToAudioTranslateProvider.php Outdated
Comment thread lib/TaskProcessing/AudioToAudioTranslateProvider.php
$this->logger->warning('Text to speech generation failed: no speech returned');
throw new ProcessingException('Text to speech generation failed: no speech returned');
}
$translatedAudio = $includeWatermark ? $this->watermarkingService->markAudio($apiResponse['body']) : $apiResponse['body'];

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The audio should always be marked directly as there is specific metadata that is added to the audio.

@julien-nc julien-nc force-pushed the enh/noid/audio-translation branch from 9181c87 to d173c96 Compare June 15, 2026 14:28
@marcelklehr

Copy link
Copy Markdown
Member

A test for the AudioToAudioTranslate provider would be nice, but I'm fine if we go without :)

use OCP\TaskProcessing\ShapeDescriptor;

class AudioToAudioTranslateTaskType implements ITaskType {
public const ID = Application::APP_ID . ':audio2audio:translate';

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to have this not in server?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Except that server PRs are annoying :D )

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add it later and adjust here.

Comment thread lib/TaskProcessing/AudioToAudioTranslateProvider.php Outdated
@julien-nc julien-nc requested review from lukasdotcom June 15, 2026 14:59
…, factorize translation logic in the translate service

Signed-off-by: Julien Veyssier <julien-nc@posteo.net>
Signed-off-by: Julien Veyssier <julien-nc@posteo.net>
Signed-off-by: Julien Veyssier <julien-nc@posteo.net>
Signed-off-by: Julien Veyssier <julien-nc@posteo.net>
Signed-off-by: Julien Veyssier <julien-nc@posteo.net>
Signed-off-by: Julien Veyssier <julien-nc@posteo.net>
…rmarked in the target language

Signed-off-by: Julien Veyssier <julien-nc@posteo.net>
@julien-nc julien-nc force-pushed the enh/noid/audio-translation branch from 3d348ab to 17df89a Compare June 16, 2026 10:47
use TranslateService::getStaticLanguages() to get the input shape enum values
watermark the audio directly
keep the watermark suffix in the language of the user who scheduled
register audio translation provider if tts is enabled
fix reporting translation output
fix cache being created inside a loop
cleanup

Signed-off-by: Julien Veyssier <julien-nc@posteo.net>
Signed-off-by: Julien Veyssier <julien-nc@posteo.net>
@julien-nc julien-nc force-pushed the enh/noid/audio-translation branch from a631631 to d01c43f Compare June 16, 2026 11:01
@julien-nc julien-nc merged commit 28f8da9 into main Jun 16, 2026
26 checks passed
@julien-nc julien-nc deleted the enh/noid/audio-translation branch June 16, 2026 11:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3. to review enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants