Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Integrating Tencent Cloud Text-to-Speech Using Python

Tech May 12 2

Tencent Cloud offers a Text-to-Speech (TTS) service that converts written text into spoken audio. This capability enables applications too provide voice output for scenarios such as news readers in mobile apps, alerts from smart devices, creating custom voice content with minimal source material, and personalized navigation instrucitons in automotive systems.

Before using the Python SDK, you must obtain security credentials (SecretID and SecretKey) from the Tencent Cloud Console. The SecretID authenticates the API caller, while the SecretKey is used for signing requests and must be kept confidential.

You can install the SDK using pip. Execute the following command in your terminal:

pip install tencentcloud-sdk-python

For environments with both Python 2 and 3, use pip3 for Python 3.

The following script demonstrates basic usage of the TTS API to generate an audio file.

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from base64 import b64decode
from uuid import uuid4
from tencentcloud.common import credential
from tencentcloud.common.exception.tencent_cloud_sdk_exception import TencentCloudSDKException
from tencentcloud.aai.v20180522.models import TextToVoiceRequest
from tencentcloud.aai.v20180522.aai_client import AaiClient

try:
    # Initialize credential object with your SecretID and SecretKey
    auth = credential.Credential("YOUR_SECRET_ID", "YOUR_SECRET_KEY")
    
    # Create a client instance for the TTS service, specifying the region
    tts_client = AaiClient(auth, 'ap-shanghai')
    
    # Instantiate the request object
    tts_request = TextToVoiceRequest()
    
    # Configure the request parameters
    tts_request.Text = 'The quick brown fox jumps over the lazy dog.'
    tts_request.SessionId = str(uuid4())
    tts_request.ModelType = 1
    tts_request.Volume = 5.0
    tts_request.Speed = 0.6
    tts_request.ProjectId = 10086
    tts_request.VoiceType = 0
    tts_request.PrimaryLanguage = 1
    tts_request.SampleRate = 16000
    
    # Send the request and receive the response
    api_response = tts_client.TextToVoice(tts_request)
    
    # The response contains the audio data in base64 encoding and metadata
    # Example structure:
    # {
    #   "Audio": "UklGRl...",
    #   "RequestId": "unique-request-id",
    #   "SessionId": "session-identifier"
    # }
    
    # Decode the base64 audio data to binary
    audio_data = b64decode(api_response.Audio)
    
    # Write the binary data to a WAV file
    output_filename = 'synthesized_speech.wav'
    with open(output_filename, 'wb') as audio_file:
        audio_file.write(audio_data)
        
except TencentCloudSDKException as error:
    print(f"API call failed: {error}")

This example outlines the core process: seting up authentication, configuring the synthesis request with parameters like text, voice type, speed, and volume, executing the call, and saving the returned audio to a file.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.