OpenAI Text-to-Speech Guardrail Translation Handler
Handler for processing OpenAI's text-to-speech endpoint (/v1/audio/speech) with guardrails.
Overview
This handler processes text-to-speech requests by:
- Extracting the input text from the request
- Applying guardrails to the input text
- Updating the request with the guardrailed text
- Returning the output unchanged (audio is binary, not text)
Data Format
Input Format
{
"model": "tts-1",
"input": "The quick brown fox jumped over the lazy dog.",
"voice": "alloy",
"response_format": "mp3",
"speed": 1.0
}
Output Format
The output is binary audio data (MP3, WAV, etc.), not text, so it cannot be guardrailed.
Usage
The handler is automatically discovered and applied when guardrails are used with the text-to-speech endpoint.
Example: Using Guardrails with Text-to-Speech
curl -X POST 'http://localhost:4000/v1/audio/speech' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer your-api-key' \
-d '{
"model": "tts-1",
"input": "The quick brown fox jumped over the lazy dog.",
"voice": "alloy",
"guardrails": ["content_moderation"]
}' \
--output speech.mp3
The guardrail will be applied to the input text before the text-to-speech conversion.
Example: PII Masking in TTS Input
curl -X POST 'http://localhost:4000/v1/audio/speech' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer your-api-key' \
-d '{
"model": "tts-1",
"input": "Please call John Doe at john@example.com",
"voice": "nova",
"guardrails": ["mask_pii"]
}' \
--output speech.mp3
The audio will say: "Please call [NAME_REDACTED] at [EMAIL_REDACTED]"
Example: Content Filtering Before TTS
curl -X POST 'http://localhost:4000/v1/audio/speech' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer your-api-key' \
-d '{
"model": "tts-1-hd",
"input": "This is the text that will be spoken",
"voice": "shimmer",
"guardrails": ["content_filter"]
}' \
--output speech.mp3
Implementation Details
Input Processing
- Field:
input(string) - Processing: Applies guardrail to input text
- Result: Updated input text in request
Output Processing
- Processing: Not applicable (audio is binary data)
- Result: Response returned unchanged
Use Cases
- PII Protection: Remove personally identifiable information before converting to speech
- Content Filtering: Remove inappropriate content before TTS conversion
- Compliance: Ensure text meets requirements before voice synthesis
- Text Sanitization: Clean up text before audio generation
Extension
Override these methods to customize behavior:
process_input_messages(): Customize how input text is processedprocess_output_response(): Currently a no-op, but can be overridden if needed
Supported Call Types
CallTypes.speech- Synchronous text-to-speechCallTypes.aspeech- Asynchronous text-to-speech
Notes
- Only the input text is processed by guardrails
- Output processing is a no-op since audio cannot be text-guardrailed
- Both sync and async call types use the same handler
- Works with all TTS models (tts-1, tts-1-hd, etc.)
- Works with all voice options
Common Patterns
Remove PII Before TTS
import litellm
from pathlib import Path
speech_file_path = Path(__file__).parent / "speech.mp3"
response = litellm.speech(
model="tts-1",
voice="alloy",
input="Hi, this is John Doe calling from john@company.com",
guardrails=["mask_pii"],
)
response.stream_to_file(speech_file_path)
# Audio will have PII masked
Content Moderation Before TTS
import litellm
from pathlib import Path
speech_file_path = Path(__file__).parent / "speech.mp3"
response = litellm.speech(
model="tts-1-hd",
voice="nova",
input="Your text here",
guardrails=["content_moderation"],
)
response.stream_to_file(speech_file_path)
Async TTS with Guardrails
import litellm
import asyncio
from pathlib import Path
async def generate_speech():
speech_file_path = Path(__file__).parent / "speech.mp3"
response = await litellm.aspeech(
model="tts-1",
voice="echo",
input="Text to convert to speech",
guardrails=["pii_mask"],
)
response.stream_to_file(speech_file_path)
asyncio.run(generate_speech())