[DigiKey "Smart Manufacturing, Non-stop Happiness" Creative Competition] 7. TTS function implementation
[Copy link]
The implementation of the TTS function is based on the edge-tts library, which is implemented based on the crawler method.
First, install the library:
```
pip3 install edge-tts
```
To list all supported language roles, use the following command:
···
edge-tts --list-voices
···
The test code is as follows:
```
#!/usr/bin/env python3
import edge_tts
import pydub
import io
async def tts(text, actor = "zh-CN-XiaoyiNeural", fmt = "mp3"):
_voices = await edge_tts.VoicesManager.create()
_voices = _voices.find(ShortName=actor)
_communicate = edge_tts.Communicate(text, _voices[0]["Name"])
_out = bytes()
async for _chunk in _communicate.stream():
if _chunk["type"] == "audio":
# print(chunk["data"])
_out += _chunk["data"]
elif _chunk["type"] == "WordBoundary":
# print(f"WordBoundary: {chunk}")
pass
if fmt == "mp3":
return _out
if fmt == "wav":
_raw = pydub.AudioSegment.from_file(io.BytesIO(_out))
_raw = _raw.set_frame_rate(16000)
_wav = io.BytesIO()
_raw.export(_wav, format="wav")
# for i in range(len(_wav.getvalue())-1,-1,-1):
# if _wav.getvalue()[i] != 0x00:
# break
return _wav.getvalue()#[:i+1]
if __name__ == "__main__":
import asyncio
import pydub.playback
while True:
text_in = input(">说点什么:")
raw_wav = asyncio.run(tts(text_in, actor = "zh-CN-XiaoyiNeural", fmt = "wav"))
wav = pydub.AudioSegment.from_file(io.BytesIO(raw_wav))
pydub.playback. _play_with_pyaudio (wav)
```
Here we use the pyaudio method to specify the playback device, because we want to use I2S HAT for playback instead of the default device. First, use aplay -l to query the device number, then modify the source code of the library, add output_device_index=1 in line 26 of site-packages/pydub/playback.py, and the complete function is as follows:
···
def _play_with_pyaudio(seg):
import pyaudio
p = pyaudio.PyAudio()
stream = p.open(format=p.get_format_from_width(seg.sample_width),
channels=seg.channels,
rate=seg.frame_rate,
output_device_index=1,
output=True)
# Just in case there were any exceptions/interrupts, we release the resource
# So as not to raise OSError: Device Unavailable should play() be used again
try:
# break audio into half-second chunks (to allows keyboard interrupts)
for chunk in make_chunks(seg, 500):
stream.write(chunk._data)
finally:
stream.stop_stream()
stream.close()
p.terminate()
···
|