KWIKmotion AI Live Captions v1.1.0
Real-Time Transcription & Translation API Reference
White Peaks Solutions SAS
Overview
KWIKmotion AI Live Captions is an enterprise WebSocket API providing real-time speech-to-text transcription and multi-language translation. Designed for broadcast media, live streaming, and professional applications.
Key Features
- Real-Time Transcription: Sub-second latency speech-to-text
- Multi-Language Translation: Simultaneous translation to multiple languages
- High Accuracy: Advanced AI processing for superior results
- 50+ Languages: Comprehensive global language support
- Flexible Audio: Dynamic chunk sizes supported
- Production-Ready: Enterprise reliability and error handling
WebSocket Connection
Endpoint
wss://your-server-address:PORT/
Protocol
- Protocol: Secure WebSocket (WSS) with SSL/TLS
- Message Format: JSON for control, Binary for audio
- Keepalive: 20-second ping interval, 30-second timeout
Connection Flow
1. Connect to secure WebSocket endpoint (wss://) with Authorization header 2. Send StartRecognition message 4. Wait for RecognitionStarted confirmation 5. Send audio data (one chunk at a time) 6. Receive transcripts and translations 7. Send EndOfStream when done 8. Receive EndOfTranscript 9. Close connection
Authentication
All connections to the KWIKmotion AI Live Captions API require a valid authentication token. You must subscribe to the service to obtain your authentication token.
Overview
The KWIKmotion AI Live Captions API uses bearer token authentication to secure access to the service. All API connections must include a valid authentication token in the HTTP headers during the WebSocket handshake.
Authentication Method
Include the Authorization header with your bearer token when establishing the WebSocket connection:
| Header | Format | Example |
|---|---|---|
Authorization |
Bearer <token> |
Bearer eyJhbGciOiJSU0EtT0FFU... |
- The token must be prefixed with
Bearer(note the space after "Bearer") - The header name is
Authorization(case-sensitive in some libraries) - The header must be included in the initial WebSocket handshake request
Token Management
Your authentication token:
- ✅ Is validated during the initial WebSocket handshake (before session starts)
- ✅ Is validated once per connection (not for each message)
- ✅ Can be used for multiple simultaneous connections (up to your subscription limits)
- ✅ Remains valid for the duration specified in your subscription
If your token expires during an active session, the current session will continue until you disconnect. You will need a valid token to establish a new connection.
Authentication Error Responses
If authentication fails, you will receive an error message immediately after connection:
Missing Authorization Header
{
"message": "Error",
"type": "authentication_error",
"reason": "Authentication required: Missing Authorization header",
"code": 4001,
"timestamp": 1730406000.123
}
Invalid Token Format
{
"message": "Error",
"type": "authentication_error",
"reason": "Authentication required: Invalid Authorization header format",
"code": 4001,
"timestamp": 1730406000.123
}
Authentication Failed (Invalid/Expired Token)
{
"message": "Error",
"type": "authentication_error",
"reason": "token_expired",
"code": 401,
"timestamp": 1730406000.123,
"details": {
"error": true,
"message": "token_expired"
}
}
Insufficient Permissions
{
"message": "Error",
"type": "authentication_error",
"reason": "Insufficient permissions for ailivecaptioning service",
"code": 403,
"timestamp": 1730406000.123,
"details": {
"error": true,
"message": "subscription_required"
}
}
Obtaining an Authentication Token
To use the KWIKmotion AI Live Captions service, you must first subscribe and obtain an authentication token:
- New Subscriptions: Contact sales@whitepeaks.fr to purchase access to the service
- Existing Customers: Contact your White Peaks Solutions account manager
- Technical Support: For technical assistance, email support@whitepeaks.fr
After subscribing, you will receive your unique authentication token via email.
- Store tokens securely: Use environment variables or secrets management systems
- Never commit tokens: Do not include tokens in your source code or version control
- Rotate tokens periodically: Contact your account manager for token rotation
- Use separate tokens: Request different tokens for dev/staging/production environments
- Monitor authentication: Log authentication failures in your application for security auditing
- Secure transmission: Always use secure connections (wss://) in production
Audio Format
Required Specifications
| Parameter | Value | Description |
|---|---|---|
| Sample Rate | 16,000 Hz | 16 kHz (recommended) |
| Channels | 1 (mono) | Mono audio required |
| Bit Depth | 16-bit | Standard PCM |
| Encoding | PCM S16LE | Signed 16-bit little-endian |
Audio Chunk Sizes
The system supports flexible, dynamic chunk sizes. Send one audio chunk at a time:
| Duration | Bytes (16kHz mono) | Use Case |
|---|---|---|
| 2 seconds | 64,000 | Low latency |
| 4 seconds | 128,000 | Balanced |
| 6 seconds | 192,000 | Recommended |
| 10 seconds | 320,000 | Longer context |
Bytes = sample_rate × duration × 2At 16kHz:
Bytes = 16,000 × duration × 2Example: 6 seconds = 16,000 × 6 × 2 = 192,000 bytes
Supported Encodings
pcm_s16le- 16-bit signed little-endian (recommended)pcm_f32le- 32-bit float little-endianmulaw- 8-bit μ-law (telephony)
Message Protocol
Client → Server Messages
1. StartRecognition Required JSON
Initialize a new transcription session.
{
"message": "StartRecognition",
"audio_format": {
"type": "raw",
"encoding": "pcm_s16le",
"sample_rate": 16000
},
"transcription_config": {
"language": "en"
},
"translation_config": {
"target_languages": ["fr", "es", "de"]
}
}
Authorization header during the WebSocket handshake, not in the message body.
See the Authentication section for details.
Parameters
| Field | Type | Required | Description |
|---|---|---|---|
message |
string | ✅ | Must be "StartRecognition" |
audio_format.type |
string | ✅ | "raw" or "file" |
audio_format.encoding |
string | ✅ | "pcm_s16le", "pcm_f32le", or "mulaw" |
audio_format.sample_rate |
integer | ✅ | Sample rate in Hz (16000 recommended) |
transcription_config.language |
string | ✅ | Source language ISO 639-1 code |
translation_config.target_languages |
string[] | ❌ | Target language codes (optional) |
2. Binary Audio Data Required BINARY
Send one audio chunk at a time in binary format.
Send binary audio chunks sequentially:
Each chunk: Your chosen duration (2-10 seconds recommended)
Format: Raw PCM audio bytes
Processing: Each chunk processed independently with automatic word boundary handling
Example: 6-second chunks
# At 16kHz mono 16-bit: Chunk duration: 6 seconds = 192,000 bytes # Send each chunk sequentially: Chunk 0: Send 192,000 bytes Chunk 1: Send 192,000 bytes Chunk 2: Send 192,000 bytes ...
Example: Variable durations
# You can vary chunk sizes: Chunk 0: 4 seconds = 128,000 bytes Chunk 1: 6 seconds = 192,000 bytes Chunk 2: 2 seconds = 64,000 bytes ...
3. EndOfStream Required JSON
Signal end of audio stream.
{
"message": "EndOfStream",
"last_seq_no": 10
}
| Field | Type | Required | Description |
|---|---|---|---|
message |
string | ✅ | Must be "EndOfStream" |
last_seq_no |
integer | ✅ | Number of segments sent |
Server → Client Messages
1. RecognitionStarted JSON
{
"message": "RecognitionStarted",
"session_id": "session_1761130389621"
}
2. AudioAdded JSON
{
"message": "AudioAdded",
"seq_no": 0
}
Sent after each audio chunk is received and queued for processing.
3. AddTranscript JSON
{
"message": "AddTranscript",
"metadata": {
"transcript": "This is the transcribed text",
"start_time": 0.0,
"end_time": 4.0,
"language": "en",
"chunk_index": 0,
"word_count": 10
}
}
4. AddTranslation JSON
{
"message": "AddTranslation",
"metadata": {
"translation": "C'est le texte traduit",
"target_language": "fr",
"start_time": 0.0,
"end_time": 4.0
}
}
You'll receive one per target language per transcript.
5. EndOfTranscript JSON
{
"message": "EndOfTranscript",
"reason": "no_more_audio"
}
6. Error JSON
{
"message": "Error",
"type": "invalid_model",
"reason": "Unsupported language: xyz",
"code": 4004,
"timestamp": 1729728000.123
}
Error Codes
| Code | Type | Description |
|---|---|---|
| 4001 | invalid_message | Malformed message or invalid input |
| 4004 | invalid_model | Unsupported language code |
| 1008 | policy_violation | Server at capacity (session limit) |
| 1011 | internal_error | Server processing error |
Supported Languages
The API supports 50+ languages for both transcription and translation using ISO 639-1 codes.
- Arabic: Industry-leading WER < 6% - The most advanced Arabic ASR available
- English: Exceptional accuracy with broadcast-quality recognition
- French: Superior performance for European French and Canadian French
- Dutch: Excellent accuracy for Netherlands and Belgian Dutch
- Custom Languages: We can train custom ASR models for your specific language needs - Contact us
| Language | ISO Code | Native Name |
|---|---|---|
| Afrikaans | af | Afrikaans |
| Arabic | ar | العربية |
| Armenian | hy | Հայերեն |
| Azerbaijani | az | Azərbaycan |
| Belarusian | be | Беларуская |
| Bosnian | bs | Bosanski |
| Bulgarian | bg | Български |
| Catalan | ca | Català |
| Chinese (Simplified) | zh | 中文 |
| Croatian | hr | Hrvatski |
| Czech | cs | Čeština |
| Danish | da | Dansk |
| Dutch | nl | Nederlands |
| English | en | English |
| Estonian | et | Eesti |
| Finnish | fi | Suomi |
| French | fr | Français |
| Galician | gl | Galego |
| German | de | Deutsch |
| Greek | el | Ελληνικά |
| Hebrew | he | עברית |
| Hindi | hi | हिन्दी |
| Hungarian | hu | Magyar |
| Icelandic | is | Íslenska |
| Indonesian | id | Bahasa Indonesia |
| Italian | it | Italiano |
| Japanese | ja | 日本語 |
| Kannada | kn | ಕನ್ನಡ |
| Kazakh | kk | Қазақ |
| Korean | ko | 한국어 |
| Kurdish (Kurmanji) | kmr | Kurdî |
| Kurdish (Sorani) | ckb | کوردی |
| Latvian | lv | Latviešu |
| Lithuanian | lt | Lietuvių |
| Macedonian | mk | Македонски |
| Malay | ms | Bahasa Melayu |
| Maori | mi | Māori |
| Marathi | mr | मराठी |
| Nepali | ne | नेपाली |
| Norwegian | no | Norsk |
| Persian (Farsi) | fa | فارسی |
| Polish | pl | Polski |
| Portuguese | pt | Português |
| Romanian | ro | Română |
| Russian | ru | Русский |
| Serbian | sr | Српски |
| Slovak | sk | Slovenčina |
| Slovenian | sl | Slovenščina |
| Spanish | es | Español |
| Swahili | sw | Kiswahili |
| Swedish | sv | Svenska |
| Tagalog | tl | Tagalog |
| Tamil | ta | தமிழ் |
| Thai | th | ไทย |
| Turkish | tr | Türkçe |
| Ukrainian | uk | Українська |
| Urdu | ur | اردو |
| Vietnamese | vi | Tiếng Việt |
| Welsh | cy | Cymraeg |
All languages support both transcription and translation to/from any other supported language.
Implementation Examples
Python Example (Step-by-Step)
#!/usr/bin/env python3
import asyncio
import websockets
import json
async def transcribe_and_translate():
uri = "wss://your-server-address:PORT"
token = "YOUR_TOKEN_HERE" # Your authentication token
# Authentication is required - add Authorization header
headers = {
'Authorization': f'Bearer {token}'
}
async with websockets.connect(uri, extra_headers=headers) as ws:
# Step 1: Start recognition
await ws.send(json.dumps({
"message": "StartRecognition",
"audio_format": {
"type": "raw",
"encoding": "pcm_s16le",
"sample_rate": 16000
},
"transcription_config": {
"language": "en" # English
},
"translation_config": {
"target_languages": ["fr", "es", "de"]
}
}))
# Step 2: Wait for confirmation
response = json.loads(await ws.recv())
print(f"Session ID: {response['session_id']}")
# Step 3: Read audio file (16kHz mono PCM)
with open("audio.raw", "rb") as f:
audio_data = f.read()
# Step 4: Configure chunk size (example: 6 seconds)
chunk_duration = 6 # Choose any duration (2-10 seconds)
chunk_size = 16000 * chunk_duration * 2 # 192,000 bytes for 6s
# Step 5: Send audio chunks sequentially
chunk_num = 0
for i in range(0, len(audio_data), chunk_size):
# Send audio chunk
chunk = audio_data[i:i + chunk_size]
await ws.send(chunk)
print(f"Sent chunk {chunk_num}: {len(chunk)} bytes")
chunk_num += 1
# Step 6: Listen for results
while True:
msg = json.loads(await ws.recv())
if msg["message"] == "AddTranscript":
print(f"Transcript: {msg['metadata']['transcript']}")
elif msg["message"] == "AddTranslation":
lang = msg['metadata']['target_language']
text = msg['metadata']['translation']
print(f"Translation ({lang}): {text}")
elif msg["message"] == "EndOfTranscript":
break
# Step 7: End session
await ws.send(json.dumps({
"message": "EndOfStream",
"last_seq_no": chunk_num
}))
asyncio.run(transcribe_and_translate())
JavaScript Example (Browser)
// Option 1: Token as query parameter
const token = 'YOUR_TOKEN_HERE';
const ws = new WebSocket(`wss://your-server-address:PORT?token=${token}`);
ws.onopen = () => {
// Start recognition
ws.send(JSON.stringify({
message: 'StartRecognition',
audio_format: {
type: 'raw',
encoding: 'pcm_s16le',
sample_rate: 16000
},
transcription_config: {
language: 'en'
},
translation_config: {
target_languages: ['fr', 'es']
}
}));
};
ws.onmessage = (event) => {
const msg = JSON.parse(event.data);
if (msg.message === 'RecognitionStarted') {
console.log('Session:', msg.session_id);
// Send audio chunks (6s each)
sendAudioChunks(audioBuffer);
}
else if (msg.message === 'AddTranscript') {
console.log('Transcript:', msg.metadata.transcript);
}
else if (msg.message === 'AddTranslation') {
console.log(`Translation (${msg.metadata.target_language}):`, msg.metadata.translation);
}
};
function sendAudioChunks(audioBuffer) {
const chunkSize = 16000 * 6 * 2; // 6 seconds = 192,000 bytes
let chunkNum = 0;
for (let i = 0; i < audioBuffer.byteLength; i += chunkSize) {
// Send audio chunk
const chunk = audioBuffer.slice(i, i + chunkSize);
ws.send(chunk);
console.log(`Sent chunk ${chunkNum}: ${chunk.byteLength} bytes`);
chunkNum++;
}
// End session
ws.send(JSON.stringify({
message: 'EndOfStream',
last_seq_no: chunkNum
}));
}
Error Handling
Error Types
| Type | Code | Description | Solution |
|---|---|---|---|
| invalid_message | 4001 | Malformed JSON or invalid format | Fix message structure |
| invalid_model | 4004 | Unsupported language code | Use valid ISO 639-1 code |
| internal_error | 1011 | Server processing error | Retry or contact support |
Common Issues
No transcripts received
- Verify audio format matches configuration (16kHz, mono, PCM S16LE)
- Ensure you're sending binary audio data (not base64 or JSON)
- Check audio contains speech (not silence)
- Verify chunk size is reasonable (2-10 seconds recommended)
Translation not working
- Ensure
target_languagesarray is not empty - Use valid ISO 639-1 codes (lowercase, 2-letter)
- Verify transcript is not empty
Performance
Latency
| Operation | Typical Latency |
|---|---|
| Transcription | < 1.5 seconds |
| Translation (per language) | < 0.5 seconds |
| Total (with 3 translations) | < 2.0 seconds |
Limits
- Connection Keepalive: Ping every 20 seconds, 30-second timeout if no response
- Network Bandwidth: ~340 kbps upload recommended (for 6-second chunks)
FAQ
Q: What chunk duration should I use?
A: 6 seconds is recommended for optimal balance. Shorter (2-4s) for lower latency, longer (8-10s) for better context.
Q: Can I vary chunk sizes during a session?
A: Yes, the system adapts to different chunk sizes dynamically.
Q: How does word boundary handling work?
A: The system automatically handles word boundaries between chunks using 1-second audio buffering and smart deduplication.
Q: How many languages can I translate to simultaneously?
A: No hard limit, but 3-5 languages recommended for optimal performance.
Q: What's the transcription accuracy?
A: For broadcast-quality audio, expect >95% word accuracy. For Arabic specifically, we achieve a Word Error Rate (WER) of less than 6%, making it the most advanced Arabic transcription system available. We also excel in English, French, and Dutch.
Q: Can you support additional languages not listed?
A: Yes! We can train custom ASR (Automatic Speech Recognition) models for specific languages tailored to your needs. Contact our technical team at support@whitepeaks.fr to discuss custom language model training.
Q: Can I use this for live streaming?
A: Yes! Designed for real-time applications with sub-2-second total latency.
Privacy & Compliance
GDPR Compliance
KWIKmotion AI Live Captions is fully compliant with the General Data Protection Regulation (GDPR). We prioritize your data privacy and security:
- No Audio Storage: We do not store, record, or keep copies of audio data sent to our service. Audio is processed in real-time and immediately discarded after transcription.
- No Text Storage: Generated transcripts and translations are not stored on our servers. All text processing occurs in memory and is delivered directly to you via WebSocket.
- No Logging of Content: We do not log or retain the actual content of your transcripts or translations for any purpose.
- Session Data Only: We only retain minimal session metadata (connection timestamps, session IDs) necessary for service operation, which is automatically purged after session termination.
- Data Processing Location: Audio and text processing occurs in real-time on our servers and is immediately discarded after transmission to your client.
- AI-Powered Processing: Transcription and translation are performed using artificial intelligence (AI) models. We do not hold any responsibility or guarantee the authenticity, accuracy, or completeness of the generated content. AI-generated content may contain errors or inaccuracies.
- Delicate Content Notice: If you are processing sensitive, legal, medical, or other delicate content, we strongly recommend that you inform your audience that transcription and translation services are provided via AI technology and may not be 100% accurate. It is your responsibility to review and verify any AI-generated content before use.
ISO 27001 Information Security Management
Our service adheres to ISO 27001 standards for information security management:
- Session Separation for Audio: Each audio session is completely isolated from other sessions. Audio data from one session cannot access or interfere with audio data from another session, ensuring complete data isolation and privacy.
- Access Controls: Authentication and authorization mechanisms protect your sessions.
- Real-Time Processing: Audio and text are processed in-memory only, with no persistent storage.
- Regular Security Audits: Our infrastructure undergoes regular security assessments and compliance reviews.
All audio and text data flows directly through our system without retention. You maintain full control over your data at all times. For questions about our privacy practices, contact: support@whitepeaks.fr
Contact & Support
White Peaks Solutions SAS
- Technical Support: support@whitepeaks.fr
- Sales & Licensing: sales@whitepeaks.fr
- White Peaks Website: https://www.whitepeaks.fr
- KWIKmotion Website: https://www.kwikmotion.com
WebSocket API Reference
View the complete API specification with all message types, parameters, and examples.