Setup5 min read
Getting Started with Voxtral
Complete setup guide for Voxtral speech understanding model. Learn how to install, configure, and run your first speech processing tasks.
Prerequisites
Before installing Voxtral, ensure your system meets these requirements:
- Python 3.8 or higher
- GPU with CUDA support (recommended)
- 10GB+ available storage
- Internet connection for initial model download
Installation
The easiest way to get started with Voxtral is using the UV package manager:
# Install UV package manager curl -LsSf https://astral.sh/uv/install.sh | sh # Install vLLM with UV uv pip install vllm
First Run
Start the Voxtral model server locally:
# Serve Voxtral 3B model (for local/edge deployment) vllm serve mistralai/voxtral-3b # Or serve Voxtral 24B model (for production) vllm serve mistralai/voxtral-24b
Basic Usage
Once the server is running, you can interact with Voxtral through the API:
from mistral_common import ChatCompletionRequest
import requests
# Initialize client
client_url = "http://localhost:8000"
# Process audio file
audio_file = "path/to/your/audio.wav"
with open(audio_file, "rb") as f:
response = requests.post(f"{client_url}/transcribe", files={"file": f})
print(response.json())
Understanding Output
Voxtral provides comprehensive speech understanding capabilities:
- Transcription: Accurate speech-to-text conversion
- Question Answering: Ask questions about audio content
- Summarization: Automatic content summaries
- Language Detection: Automatic multilingual processing
Performance Tips
- Use GPU acceleration for faster processing
- Choose the 3B model for local deployment and lower resource usage
- Use the 24B model for production environments requiring maximum accuracy
- Process audio files up to 40 minutes for optimal results
Next Steps
Now that you have Voxtral running, explore these advanced features:
- Function calling capabilities
- Multilingual speech processing
- Custom deployment configurations
- Integration with existing applications
Continue reading our Features & Capabilities guide to learn more about Voxtral's advanced functionality.