Integrating Tencent Cloud Text-to-Speech Using Python
Tencent Cloud offers a Text-to-Speech (TTS) service that converts written text into spoken audio. This capability enables applications too provide voice output for scenarios such as news readers in mobile apps, alerts from smart devices, creating custom voice content with minimal source material, and personalized navigation instrucitons in automotive systems.
Before using the Python SDK, you must obtain security credentials (SecretID and SecretKey) from the Tencent Cloud Console. The SecretID authenticates the API caller, while the SecretKey is used for signing requests and must be kept confidential.
You can install the SDK using pip. Execute the following command in your terminal:
pip install tencentcloud-sdk-python
For environments with both Python 2 and 3, use pip3 for Python 3.
The following script demonstrates basic usage of the TTS API to generate an audio file.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from base64 import b64decode
from uuid import uuid4
from tencentcloud.common import credential
from tencentcloud.common.exception.tencent_cloud_sdk_exception import TencentCloudSDKException
from tencentcloud.aai.v20180522.models import TextToVoiceRequest
from tencentcloud.aai.v20180522.aai_client import AaiClient
try:
# Initialize credential object with your SecretID and SecretKey
auth = credential.Credential("YOUR_SECRET_ID", "YOUR_SECRET_KEY")
# Create a client instance for the TTS service, specifying the region
tts_client = AaiClient(auth, 'ap-shanghai')
# Instantiate the request object
tts_request = TextToVoiceRequest()
# Configure the request parameters
tts_request.Text = 'The quick brown fox jumps over the lazy dog.'
tts_request.SessionId = str(uuid4())
tts_request.ModelType = 1
tts_request.Volume = 5.0
tts_request.Speed = 0.6
tts_request.ProjectId = 10086
tts_request.VoiceType = 0
tts_request.PrimaryLanguage = 1
tts_request.SampleRate = 16000
# Send the request and receive the response
api_response = tts_client.TextToVoice(tts_request)
# The response contains the audio data in base64 encoding and metadata
# Example structure:
# {
# "Audio": "UklGRl...",
# "RequestId": "unique-request-id",
# "SessionId": "session-identifier"
# }
# Decode the base64 audio data to binary
audio_data = b64decode(api_response.Audio)
# Write the binary data to a WAV file
output_filename = 'synthesized_speech.wav'
with open(output_filename, 'wb') as audio_file:
audio_file.write(audio_data)
except TencentCloudSDKException as error:
print(f"API call failed: {error}")
This example outlines the core process: seting up authentication, configuring the synthesis request with parameters like text, voice type, speed, and volume, executing the call, and saving the returned audio to a file.