Skip to content

feat: s2s client#34

Open
junkin wants to merge 4 commits intomainfrom
sdj_s2s_client
Open

feat: s2s client#34
junkin wants to merge 4 commits intomainfrom
sdj_s2s_client

Conversation

@junkin
Copy link
Copy Markdown
Contributor

@junkin junkin commented Jan 12, 2023

Adding speech to speech basic cli

@rmittal-github rmittal-github changed the base branch from main to release/2.9.0 January 16, 2023 13:59
@rmittal-github
Copy link
Copy Markdown
Contributor

@PeganovAnton could you please review this? need this to merge ASAP to enable QA to test S2S.

Comment thread riva/client/nmt.py Outdated
Comment on lines +38 to +71
Generates speech recognition responses for fragments of speech audio in :param:`audio_chunks`.
The purpose of the method is to perform speech recognition "online" - as soon as
audio is acquired on small chunks of audio.

All available audio chunks will be sent to a server on first ``next()`` call.

Args:
audio_chunks (:obj:`Iterable[bytes]`): an iterable object which contains raw audio fragments
of speech. For example, such raw audio can be obtained with

.. code-block:: python

import wave
with wave.open(file_name, 'rb') as wav_f:
raw_audio = wav_f.readframes(n_frames)

streaming_config (:obj:`riva.client.proto.riva_asr_pb2.StreamingRecognitionConfig`): a config for streaming.
You may find description of config fields in message ``StreamingRecognitionConfig`` in
`common repo
<https://docs.nvidia.com/deeplearning/riva/user-guide/docs/reference/protos/protos.html#riva-proto-riva-asr-proto>`_.
An example of creation of streaming config:

.. code-style:: python

from riva.client import RecognitionConfig, StreamingRecognitionConfig
config = RecognitionConfig(enable_automatic_punctuation=True)
streaming_config = StreamingRecognitionConfig(config, interim_results=True)

Yields:
:obj:`riva.client.proto.riva_asr_pb2.StreamingRecognizeResponse`: responses for audio chunks in
:param:`audio_chunks`. You may find description of response fields in declaration of
``StreamingRecognizeResponse``
message `here
<https://docs.nvidia.com/deeplearning/riva/user-guide/docs/reference/protos/protos.html#riva-proto-riva-asr-proto>`_.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring needs to updated.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved in #43

Comment thread scripts/nmt/s2s_mic.py
nchannels = 1
if args.list_input_devices:
riva.client.audio_io.list_input_devices()
return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return
return
if args.list_output_devices:
riva.client.audio_io.list_output_devices()
return

Comment thread scripts/nmt/s2s_mic.py
sound_stream = riva.client.audio_io.SoundCallBack(
args.output_device, nchannels=nchannels, sampwidth=sampwidth, framerate=44100
)
print(sound_stream)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this print?

Comment thread scripts/nmt/s2s_mic.py
if args.output_device is not None or args.play_audio:
print("playing audio")
sound_stream = riva.client.audio_io.SoundCallBack(
args.output_device, nchannels=nchannels, sampwidth=sampwidth, framerate=44100
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should make framerate a parameter of the script, like --sample-rate-hz in the script tts/talk.py?

Comment thread scripts/nmt/s2s_mic.py
Comment on lines +68 to +70
sampwidth = 2
nchannels = 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sampwidth and nchannels are set in 2 places: here and in play_responses() function. Could you make global variables?

Comment thread scripts/nmt/s2s_mic.py
"then the default output audio device will be used.",
)

parser = add_asr_config_argparse_parameters(parser, profanity_filter=True)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll probably need to set max_alternatives=False and word_time_offsets=False because these parameters are pointless for the script. Do you think we also need to add speaker_diarization=False flag?

Comment thread scripts/nmt/s2s_mic.py
parser.add_argument("--output-device", type=int, help="Output device to use.")
parser.add_argument("--target-language-code", default="en-US", help="Language code of the output language.")
parser.add_argument(
"--play-audio",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If --play-audio is not set, then the script doesn't give any output. We probably should add --output parameter as in tts/talk.py so that the script could produce some output on server.

Comment thread scripts/nmt/s2s_mic.py
Comment on lines +107 to +116
play_responses(responses=nmt_service.streaming_s2s_response_generator(
audio_chunks=audio_chunk_iterator,
streaming_config=s2s_config), sound_stream=sound_stream)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
play_responses(responses=nmt_service.streaming_s2s_response_generator(
audio_chunks=audio_chunk_iterator,
streaming_config=s2s_config), sound_stream=sound_stream)
play_responses(
responses=nmt_service.streaming_s2s_response_generator(
audio_chunks=audio_chunk_iterator,
streaming_config=s2s_config,
),
sound_stream=sound_stream
)

Comment thread scripts/nmt/s2s_mic.py
interim_results=True,
),
translation_config = riva.client.TranslationConfig(
target_language_code=args.target_language_code,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here should be source_language_code and, probably, model_name as in config.

Comment thread scripts/nmt/s2s_mic.py
first = True # first tts output chunk received
auth = riva.client.Auth(args.ssl_cert, args.use_ssl, args.server)
nmt_service = riva.client.NeuralMachineTranslationClient(auth)
s2s_config = riva.client.StreamingTranslateSpeechToSpeechConfig(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a tts_config as in proto? If so, then we could add a add_tts_config_argparse_parameters() function to argparse_utils.py function and refactor tts/talk.py using this function.

@rmittal-github rmittal-github changed the base branch from release/2.9.0 to main January 30, 2023 04:56
@rmittal-github rmittal-github changed the base branch from main to release/2.11.0 April 19, 2023 12:15
Comment thread scripts/nmt/s2s_mic.py Outdated
@rmittal-github rmittal-github changed the base branch from release/2.11.0 to main May 19, 2023 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants