feat: s2s client by junkin · Pull Request #34 · nvidia-riva/python-clients

junkin · 2023-01-12T03:29:35Z

Adding speech to speech basic cli

rmittal-github · 2023-01-18T04:16:11Z

@PeganovAnton could you please review this? need this to merge ASAP to enable QA to test S2S.

PeganovAnton · 2023-01-18T10:12:38Z

+        Generates speech recognition responses for fragments of speech audio in :param:`audio_chunks`.
+        The purpose of the method is to perform speech recognition "online" - as soon as
+        audio is acquired on small chunks of audio.
+
+        All available audio chunks will be sent to a server on first ``next()`` call.
+
+        Args:
+            audio_chunks (:obj:`Iterable[bytes]`): an iterable object which contains raw audio fragments
+                of speech. For example, such raw audio can be obtained with
+
+                .. code-block:: python
+
+                    import wave
+                    with wave.open(file_name, 'rb') as wav_f:
+                        raw_audio = wav_f.readframes(n_frames)
+
+            streaming_config (:obj:`riva.client.proto.riva_asr_pb2.StreamingRecognitionConfig`): a config for streaming.
+                You may find description of config fields in message ``StreamingRecognitionConfig`` in
+                `common repo
+                <https://docs.nvidia.com/deeplearning/riva/user-guide/docs/reference/protos/protos.html#riva-proto-riva-asr-proto>`_.
+                An example of creation of streaming config:
+
+                .. code-style:: python
+
+                    from riva.client import RecognitionConfig, StreamingRecognitionConfig
+                    config = RecognitionConfig(enable_automatic_punctuation=True)
+                    streaming_config = StreamingRecognitionConfig(config, interim_results=True)
+
+        Yields:
+            :obj:`riva.client.proto.riva_asr_pb2.StreamingRecognizeResponse`: responses for audio chunks in
+            :param:`audio_chunks`. You may find description of response fields in declaration of
+            ``StreamingRecognizeResponse``
+            message `here
+            <https://docs.nvidia.com/deeplearning/riva/user-guide/docs/reference/protos/protos.html#riva-proto-riva-asr-proto>`_.


The docstring needs to updated.

resolved in #43

PeganovAnton · 2023-01-18T13:42:31Z

+    nchannels = 1
+    if args.list_input_devices:
+        riva.client.audio_io.list_input_devices()
+        return


Suggested change

return

return

if args.list_output_devices:

riva.client.audio_io.list_output_devices()

return

PeganovAnton · 2023-01-18T16:04:08Z

+        sound_stream = riva.client.audio_io.SoundCallBack(
+            args.output_device, nchannels=nchannels, sampwidth=sampwidth, framerate=44100
+        )
+        print(sound_stream)


Why do we need this print?

PeganovAnton · 2023-01-18T16:10:34Z

+    if args.output_device is not None or args.play_audio:
+        print("playing audio")
+        sound_stream = riva.client.audio_io.SoundCallBack(
+            args.output_device, nchannels=nchannels, sampwidth=sampwidth, framerate=44100


Maybe we should make framerate a parameter of the script, like --sample-rate-hz in the script tts/talk.py?

PeganovAnton · 2023-01-18T16:13:51Z

+    sampwidth = 2
+    nchannels = 1


sampwidth and nchannels are set in 2 places: here and in play_responses() function. Could you make global variables?

PeganovAnton · 2023-01-18T16:19:07Z

+        "then the default output audio device will be used.",
+    )
+
+    parser = add_asr_config_argparse_parameters(parser, profanity_filter=True)


You'll probably need to set max_alternatives=False and word_time_offsets=False because these parameters are pointless for the script. Do you think we also need to add speaker_diarization=False flag?

PeganovAnton · 2023-01-18T16:24:56Z

+    parser.add_argument("--output-device", type=int, help="Output device to use.")
+    parser.add_argument("--target-language-code", default="en-US", help="Language code of the output language.")
+    parser.add_argument(
+        "--play-audio",


If --play-audio is not set, then the script doesn't give any output. We probably should add --output parameter as in tts/talk.py so that the script could produce some output on server.

PeganovAnton · 2023-01-18T16:26:55Z

+        play_responses(responses=nmt_service.streaming_s2s_response_generator(
+            audio_chunks=audio_chunk_iterator,
+            streaming_config=s2s_config), sound_stream=sound_stream)


Suggested change

play_responses(responses=nmt_service.streaming_s2s_response_generator(

audio_chunks=audio_chunk_iterator,

streaming_config=s2s_config), sound_stream=sound_stream)

play_responses(

responses=nmt_service.streaming_s2s_response_generator(

audio_chunks=audio_chunk_iterator,

streaming_config=s2s_config,

),

sound_stream=sound_stream

)

PeganovAnton · 2023-01-18T16:58:48Z

+            interim_results=True,
+        ),
+        translation_config = riva.client.TranslationConfig(
+            target_language_code=args.target_language_code,


Here should be source_language_code and, probably, model_name as in config.

PeganovAnton · 2023-01-18T17:01:44Z

+    first = True # first tts output chunk received
+    auth = riva.client.Auth(args.ssl_cert, args.use_ssl, args.server)
+    nmt_service = riva.client.NeuralMachineTranslationClient(auth)
+    s2s_config = riva.client.StreamingTranslateSpeechToSpeechConfig(


Do we need a tts_config as in proto? If so, then we could add a add_tts_config_argparse_parameters() function to argparse_utils.py function and refactor tts/talk.py using this function.

rmittal-github requested a review from PeganovAnton January 12, 2023 03:54

rmittal-github changed the base branch from main to release/2.9.0 January 16, 2023 13:59

PeganovAnton suggested changes Jan 18, 2023

View reviewed changes

rmittal-github changed the base branch from release/2.9.0 to main January 30, 2023 04:56

rmittal-github changed the base branch from main to release/2.11.0 April 19, 2023 12:15

rmittal-github force-pushed the sdj_s2s_client branch from ba394ef to d2213b6 Compare April 19, 2023 12:15

rmittal-github force-pushed the sdj_s2s_client branch from d2213b6 to db64efc Compare April 27, 2023 10:55

rmittal-github reviewed Apr 27, 2023

View reviewed changes

Comment thread scripts/nmt/s2s_mic.py Outdated

rmittal-github changed the base branch from release/2.11.0 to main May 19, 2023 11:05

junkin and others added 4 commits May 19, 2023 16:36

feat: s2s streaming demo app

e1f2e85

fix: s2s app with latest proto

3404100

fix: add tts config as per latest proto

d952cc9

fix: remove tts language code and voice name hardcoding

b665b2f

rmittal-github force-pushed the sdj_s2s_client branch from db64efc to b665b2f Compare May 19, 2023 11:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: s2s client#34

feat: s2s client#34
junkin wants to merge 4 commits intomainfrom
sdj_s2s_client

junkin commented Jan 12, 2023

Uh oh!

rmittal-github commented Jan 18, 2023

Uh oh!

PeganovAnton Jan 18, 2023

Uh oh!

rmittal-github Apr 27, 2023

Uh oh!

PeganovAnton Jan 18, 2023

Uh oh!

PeganovAnton Jan 18, 2023

Uh oh!

PeganovAnton Jan 18, 2023

Uh oh!

PeganovAnton Jan 18, 2023

Uh oh!

PeganovAnton Jan 18, 2023

Uh oh!

PeganovAnton Jan 18, 2023

Uh oh!

PeganovAnton Jan 18, 2023

Uh oh!

PeganovAnton Jan 18, 2023

Uh oh!

PeganovAnton Jan 18, 2023

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-        return
+        return
+   if args.list_output_devices:
+        riva.client.audio_io.list_output_devices()
+        return

Conversation

junkin commented Jan 12, 2023

Uh oh!

rmittal-github commented Jan 18, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants