Setup5 min read

Getting Started with Voxtral

Complete setup guide for Voxtral speech understanding model. Learn how to install, configure, and run your first speech processing tasks.

Prerequisites

Before installing Voxtral, ensure your system meets these requirements:

  • Python 3.8 or higher
  • GPU with CUDA support (recommended)
  • 10GB+ available storage
  • Internet connection for initial model download

Installation

The easiest way to get started with Voxtral is using the UV package manager:

# Install UV package manager curl -LsSf https://astral.sh/uv/install.sh | sh # Install vLLM with UV uv pip install vllm

First Run

Start the Voxtral model server locally:

# Serve Voxtral 3B model (for local/edge deployment) vllm serve mistralai/voxtral-3b # Or serve Voxtral 24B model (for production) vllm serve mistralai/voxtral-24b

Basic Usage

Once the server is running, you can interact with Voxtral through the API:

from mistral_common import ChatCompletionRequest
import requests

# Initialize client
client_url = "http://localhost:8000"

# Process audio file
audio_file = "path/to/your/audio.wav"
with open(audio_file, "rb") as f:
    response = requests.post(f"{client_url}/transcribe", files={"file": f})

print(response.json())

Understanding Output

Voxtral provides comprehensive speech understanding capabilities:

  • Transcription: Accurate speech-to-text conversion
  • Question Answering: Ask questions about audio content
  • Summarization: Automatic content summaries
  • Language Detection: Automatic multilingual processing

Performance Tips

  • Use GPU acceleration for faster processing
  • Choose the 3B model for local deployment and lower resource usage
  • Use the 24B model for production environments requiring maximum accuracy
  • Process audio files up to 40 minutes for optimal results

Next Steps

Now that you have Voxtral running, explore these advanced features:

  • Function calling capabilities
  • Multilingual speech processing
  • Custom deployment configurations
  • Integration with existing applications

Continue reading our Features & Capabilities guide to learn more about Voxtral's advanced functionality.