Voice AI TechnologyFebruary 14, 2025·20 min read

LiveKit Setup for Voice Analysis: Building Real-Time Voice Applications

Complete guide to setting up LiveKit for voice analysis applications. Learn WebRTC fundamentals, audio track configuration, server deployment, Egress recording, and production best practices for building real-time voice platforms.

David Chen
WebRTC Engineer & Real-Time Systems Architect

LiveKit Setup for Voice Analysis: Your Complete Implementation Guide

Building a voice analysis application? You need real-time audio capture, low-latency streaming, reliable recording, and scalable infrastructure—all while maintaining audio quality that preserves the acoustic features your ML models depend on.

That's where LiveKit excels. LiveKit is an open-source WebRTC infrastructure platform that handles the complexity of real-time audio/video communication, letting you focus on voice analysis algorithms rather than networking protocols. It provides server-side selective forwarding (efficient multi-party routing), Egress recording (server-side capture with guaranteed quality), flexible SDKs (JavaScript, Swift, Kotlin, React), and self-hosting or cloud options—all specifically designed for production voice/video applications.

For voice analysis, LiveKit offers critical advantages: 48 kHz uncompressed audio (preserves acoustic features), server-side recording (no client-side reliability issues), automatic reconnection (handles network disruptions), track-level control (mute/unmute, quality selection), and webhook events (trigger analysis pipeline when recording completes). Thousands of applications use LiveKit for telehealth, education, social audio, and AI voice agents—making it battle-tested infrastructure for voice-first products.

This guide walks through complete LiveKit setup for voice analysis: server deployment, client integration, audio configuration, recording with Egress, storage integration, and production deployment. Whether you're building a voice biometrics platform, speech analysis tool, or conversational AI application, you'll learn exactly how to implement real-time voice infrastructure that scales.

What Is LiveKit? Architecture Overview

LiveKit is a WebRTC server platform that manages real-time audio/video communication between clients. Unlike peer-to-peer WebRTC (where clients connect directly), LiveKit uses Selective Forwarding Unit (SFU) architecture—clients send media to the server once, and the server forwards to other participants. This architecture is essential for voice analysis applications where you need server-side access to audio streams for recording and processing.

Core Components

1. LiveKit Server:

  • WebRTC media router handling audio/video tracks
  • Written in Go for performance (handles 1,000+ concurrent participants per instance)
  • Manages rooms, participants, tracks, permissions
  • Provides signaling (WebSocket) and media transport (DTLS/SRTP)
  • Self-hosted (Docker/Kubernetes) or managed (LiveKit Cloud)

2. Client SDKs:

  • JavaScript: Browser-based voice recording (Chrome, Safari, Firefox, Edge)
  • React Components: Pre-built UI components for faster development
  • Swift (iOS): Native iOS apps with CoreAudio integration
  • Kotlin (Android): Native Android apps with AudioRecord/AudioTrack
  • Python/Go agents: Server-side voice processing (speech-to-text, voice agents)

3. Egress Service:

  • Server-side recording of rooms, tracks, or participants
  • Runs headless Chrome for web rendering or direct audio mixing
  • Uploads recordings to S3, Azure, GCP, or local storage
  • Guarantees recording quality (not affected by client disconnections)
  • Supports audio-only (OGG, MP3, M4A) or video (MP4, WebM)

4. Optional Services:

  • Ingress: Stream external sources (RTMP, WHIP) into LiveKit rooms
  • SIP: Connect traditional phone systems to LiveKit
  • Agents: Server-side participants for AI voice assistants

Why LiveKit for Voice Analysis?

Audio Quality:

  • Supports 48 kHz sample rate (vs 8 kHz in telephony, 16 kHz in some WebRTC configs)
  • Opus codec with configurable bitrate (32-510 kbps)
  • Preserves acoustic features needed for ML (pitch, formants, harmonics)
  • Automatic gain control, noise suppression configurable per-track

Recording Reliability:

  • Server-side Egress eliminates client recording failures (browser crashes, page refreshes, network drops)
  • Guaranteed audio capture even if client disconnects mid-session
  • Automatic upload to durable storage (S3-compatible)
  • Webhook notifications when recording completes (trigger analysis pipeline)

Developer Experience:

  • Well-documented APIs with TypeScript types
  • Pre-built React components for rapid prototyping
  • Local Docker Compose setup for development
  • Open-source (Apache 2.0 license) with active community

Production-Ready:

  • Used by 1,000+ production applications
  • Horizontal scaling (add server instances behind load balancer)
  • Comprehensive monitoring (Prometheus metrics)
  • Built-in security (E2E encryption, token-based auth)

Core Concepts for Voice Analysis

1. Rooms

A room is a logical container for a real-time session. For voice analysis:

  • One room per voice session: Each user voice recording gets unique room (e.g., `voice-analysis-session-{userId}-{timestamp}`)
  • Room lifecycle: Create room → participants join → record audio → participants leave → room closes
  • Room options: - `emptyTimeout`: Auto-close room after last participant leaves (default 5 min) - `maxParticipants`: Limit to 1 user + 1 AI agent (for voice analysis interviews) - `metadata`: Store session info (userId, analysisType, startTime)

2. Participants

Each connection to a room is a participant. For voice analysis:

  • User participant: The person being analyzed (publishes audio track from microphone)
  • Agent participant (optional): AI voice interviewer (publishes synthesized speech, subscribes to user audio)
  • Participant metadata: Store user ID, name, profile info for linking recordings to users
  • Participant permissions: Control who can publish/subscribe (prevent unauthorized recording)

3. Tracks

A track is a single media stream (audio or video). For voice analysis:

  • Audio tracks only: Disable video to reduce bandwidth and processing
  • Track types: - `Track.Source.Microphone`: User's microphone input (the audio you're analyzing) - `Track.Source.ScreenShareAudio`: System audio (rarely used for voice analysis)
  • Track configuration: - Sample rate: 48 kHz (preserves acoustic detail) - Bitrate: 64-96 kbps (balance quality and bandwidth) - DTX (discontinuous transmission): OFF for voice analysis (preserve silence patterns)

4. Tokens

Access tokens authorize participants to join rooms. For voice analysis:

  • Server-generated: Backend creates token with API key + secret (never expose secret to client)
  • Token claims: - `roomName`: Which room participant can join - `identity`: Unique participant identifier (user ID) - `ttl`: Token expiration (e.g., 2 hours for voice session) - `canPublish`: Permission to publish audio - `canSubscribe`: Permission to receive audio (for AI agents) - `metadata`: Custom data (user profile, session type)
  • Security: Tokens prevent unauthorized access—only users with valid token can join specific room

Implementation: Complete Setup Guide

Step 1: Server Deployment (Self-Hosted)

Local Development with Docker:

# docker-compose.yml
version: '3.8'

services:
  livekit:
    image: livekit/livekit-server:latest
    command: --config /etc/livekit.yaml
    restart: unless-stopped
    ports:
      - "7880:7880"  # WebRTC
      - "7881:7881"  # HTTP/WebSocket
      - "7882:7882"  # TURN/UDP
    volumes:
      - ./livekit.yaml:/etc/livekit.yaml
    environment:
      - LIVEKIT_KEYS=devkey: devsecret

  redis:
    image: redis:7-alpine
    restart: unless-stopped
    ports:
      - "6379:6379"

  egress:
    image: livekit/egress:latest
    environment:
      - EGRESS_CONFIG_FILE=/etc/egress.yaml
      - REDIS_HOST=redis:6379
    volumes:
      - ./egress.yaml:/etc/egress.yaml
      - ./recordings:/out
    depends_on:
      - livekit
      - redis

LiveKit Server Configuration (`livekit.yaml`):

port: 7880
bind_addresses:
  - "0.0.0.0"

rtc:
  port_range_start: 50000
  port_range_end: 51000
  use_external_ip: true
  # For local dev, use local IP; for production, use public IP
  external_ip: "127.0.0.1"

redis:
  address: redis:6379

keys:
  devkey: devsecret  # Development only - use secure keys in production

room:
  empty_timeout: 300  # Close room 5 min after last participant leaves
  max_participants: 10

audio:
  # Optimize for voice analysis
  active_speaker_update: 1s
  update_interval: 1s

logging:
  level: info
  sample: false

Egress Configuration (`egress.yaml`):

api_key: devkey
api_secret: devsecret
ws_url: ws://livekit:7880

redis:
  address: redis:6379

# Local file storage for development
file_output:
  local: true
  output_directory: /out

# S3 for production (uncomment and configure)
# s3:
#   access_key: YOUR_ACCESS_KEY
#   secret: YOUR_SECRET_KEY
#   region: us-west-2
#   bucket: your-voice-recordings
#   endpoint: https://s3.amazonaws.com  # Or Supabase/MinIO/R2

logging:
  level: info

Start Services:

# Start all services
docker-compose up -d

# Check logs
docker-compose logs -f livekit

# Verify server is running
curl http://localhost:7880
# Should return: {"server":"LiveKit","version":"1.x.x"}

Step 2: Client SDK Integration (React/TypeScript)

Install Dependencies:

npm install livekit-client @livekit/components-react

Backend: Token Generation API (`/api/voice/token`):

// app/api/voice/token/route.ts
import { AccessToken } from 'livekit-server-sdk';
import { NextResponse } from 'next/server';

export async function POST(request: Request) {
  const { roomName, userId, userName, sessionDuration = 7200 } = await request.json();

  // Validate user is authenticated
  // ... your auth logic here ...

  const apiKey = process.env.LIVEKIT_API_KEY!;
  const apiSecret = process.env.LIVEKIT_API_SECRET!;

  // Create access token
  const token = new AccessToken(apiKey, apiSecret, {
    identity: userId,
    ttl: sessionDuration, // Token expires in 2 hours
    metadata: JSON.stringify({
      userId,
      userName,
      sessionType: 'voice-analysis',
      timestamp: new Date().toISOString()
    })
  });

  // Grant permissions
  token.addGrant({
    roomJoin: true,
    room: roomName,
    canPublish: true,
    canPublishData: true,
    canSubscribe: true
  });

  const jwt = await token.toJwt();

  return NextResponse.json({
    token: jwt,
    serverUrl: process.env.LIVEKIT_URL, // wss://your-server.com
    roomName,
    expiresAt: new Date(Date.now() + sessionDuration * 1000).toISOString()
  });
}

Frontend: Voice Recording Component:

// components/VoiceRecorder.tsx
'use client';

import { useEffect, useState } from 'react';
import { Room, RoomEvent, Track } from 'livekit-client';
import { LiveKitRoom, useRoomContext, useTracks } from '@livekit/components-react';
import '@livekit/components-styles';

interface VoiceRecorderProps {
  userId: string;
  userName: string;
  onSessionComplete: (recordingUrl: string, duration: number) => void;
}

export function VoiceRecorder({ userId, userName, onSessionComplete }: VoiceRecorderProps) {
  const [token, setToken] = useState('');
  const [serverUrl, setServerUrl] = useState('');
  const [roomName, setRoomName] = useState('');
  const [isConnecting, setIsConnecting] = useState(false);

  // Initialize session: get token from backend
  const startSession = async () => {
    setIsConnecting(true);

    try {
      const sessionId = `voice-${userId}-${Date.now()}`;
      const response = await fetch('/api/voice/token', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({
          roomName: sessionId,
          userId,
          userName
        })
      });

      const data = await response.json();

      setToken(data.token);
      setServerUrl(data.serverUrl);
      setRoomName(data.roomName);
    } catch (error) {
      console.error('Failed to start session:', error);
      setIsConnecting(false);
    }
  };

  if (!token) {
    return (
      
); } return ( { console.log('Connected to room:', room.name); // Start Egress recording here (see Step 4) }} onDisconnected={() => { console.log('Disconnected from room'); // Handle session completion }} > ); } // Inner component has access to LiveKit room context function VoiceRecordingUI({ onComplete }: { onComplete: (url: string, duration: number) => void }) { const room = useRoomContext(); const [startTime] = useState(Date.now()); const [isRecording, setIsRecording] = useState(true); // Track microphone audio const tracks = useTracks([Track.Source.Microphone], { onlySubscribed: false // Include local tracks }); const stopRecording = async () => { setIsRecording(false); const duration = Math.floor((Date.now() - startTime) / 1000); // Stop Egress recording via API await fetch('/api/voice/stop-recording', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ roomName: room.name }) }); // Disconnect from room await room.disconnect(); // Notify parent (backend will send recording URL via webhook) onComplete('pending', duration); }; return (
Recording in progress...
{/* Display active tracks */}
{tracks.map((track) => (
Microphone: {track.publication.isMuted ? 'Muted' : 'Active'}
))}
); }

Step 3: Audio Track Configuration

Optimize audio settings for voice analysis:

// Audio constraints for high-quality voice capture
const audioConstraints = {
  audio: {
    echoCancellation: true,   // Remove echo from speakers
    noiseSuppression: false,  // Preserve natural voice characteristics for analysis
    autoGainControl: false,   // Preserve natural volume variations
    sampleRate: 48000,        // 48 kHz for acoustic detail
    channelCount: 1,          // Mono (sufficient for voice)
    // Advanced constraints (browser support varies)
    sampleSize: 16,           // 16-bit depth
    latency: 0.01,            // 10ms latency (real-time feel)
  },
  video: false
};

// Apply when connecting to room

Configure Opus codec for quality:

// In livekit.yaml (server config)
audio:
  opus:
    max_playback_rate: 48000  # Full 48 kHz
    max_average_bitrate: 96000  # 96 kbps (excellent quality)
    ptime: 20  # 20ms packet time
    dtx: false  # Disable discontinuous transmission (preserve silence patterns)

Step 4: Recording with Egress

Start Recording API (`/api/voice/start-recording`):

// app/api/voice/start-recording/route.ts
import { RoomCompositeEgressRequest, EgressClient } from 'livekit-server-sdk';
import { NextResponse } from 'next/server';

export async function POST(request: Request) {
  const { roomName, userId } = await request.json();

  const egressClient = new EgressClient(
    process.env.LIVEKIT_URL!,
    process.env.LIVEKIT_API_KEY!,
    process.env.LIVEKIT_API_SECRET!
  );

  // Audio-only recording (no video rendering needed)
  const output = {
    fileType: 'OGG',  // or 'MP3', 'M4A'
    filepath: `recordings/${userId}/${roomName}.ogg`,
    // S3 upload (if configured in egress.yaml)
    s3: {
      accessKey: process.env.S3_ACCESS_KEY,
      secret: process.env.S3_SECRET_KEY,
      bucket: process.env.S3_BUCKET,
      region: process.env.S3_REGION
    }
  };

  try {
    const egressId = await egressClient.startRoomCompositeEgress(roomName, {
      file: output,
      audioOnly: true,  // Skip video processing
      // Optional: custom layout for multi-party (not needed for single speaker)
    });

    // Store egress ID for later reference
    await storeEgressInfo(roomName, egressId, userId);

    return NextResponse.json({
      success: true,
      egressId,
      message: 'Recording started'
    });
  } catch (error) {
    console.error('Failed to start recording:', error);
    return NextResponse.json(
      { error: 'Failed to start recording' },
      { status: 500 }
    );
  }
}

Stop Recording (optional - auto-stops when room closes):

// app/api/voice/stop-recording/route.ts
export async function POST(request: Request) {
  const { roomName } = await request.json();

  const egressClient = new EgressClient(
    process.env.LIVEKIT_URL!,
    process.env.LIVEKIT_API_KEY!,
    process.env.LIVEKIT_API_SECRET!
  );

  // Retrieve egress ID from database
  const egressId = await getEgressIdForRoom(roomName);

  try {
    await egressClient.stopEgress(egressId);
    return NextResponse.json({ success: true });
  } catch (error) {
    console.error('Failed to stop recording:', error);
    return NextResponse.json(
      { error: 'Failed to stop recording' },
      { status: 500 }
    );
  }
}

Step 5: Webhook Integration for Recording Completion

Configure webhook in `livekit.yaml`:

webhook:
  api_key: your-webhook-secret-key
  urls:
    - https://yourapp.com/api/webhooks/livekit

Webhook Handler (`/api/webhooks/livekit`):

// app/api/webhooks/livekit/route.ts
import { WebhookReceiver } from 'livekit-server-sdk';
import { NextResponse } from 'next/server';

const receiver = new WebhookReceiver(
  process.env.LIVEKIT_API_KEY!,
  process.env.LIVEKIT_API_SECRET!
);

export async function POST(request: Request) {
  const body = await request.text();
  const authHeader = request.headers.get('Authorization') || '';

  try {
    // Verify webhook signature
    const event = receiver.receive(body, authHeader);

    // Handle different event types
    switch (event.event) {
      case 'egress_ended':
        // Recording completed
        const { egressId, roomName, fileUrl, duration } = event;

        console.log(`Recording completed for room ${roomName}`);
        console.log(`File URL: ${fileUrl}`);
        console.log(`Duration: ${duration}s`);

        // Trigger voice analysis pipeline
        await triggerVoiceAnalysis({
          roomName,
          fileUrl,
          duration
        });
        break;

      case 'room_finished':
        console.log(`Room ${event.room.name} closed`);
        break;

      case 'participant_joined':
        console.log(`Participant ${event.participant.identity} joined`);
        break;

      case 'participant_left':
        console.log(`Participant ${event.participant.identity} left`);
        break;
    }

    return NextResponse.json({ received: true });
  } catch (error) {
    console.error('Webhook verification failed:', error);
    return NextResponse.json(
      { error: 'Invalid webhook' },
      { status: 401 }
    );
  }
}

async function triggerVoiceAnalysis({ roomName, fileUrl, duration }) {
  // Call your voice analysis service
  await fetch(process.env.VOICE_ANALYSIS_SERVICE_URL + '/analyze', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      sessionId: roomName,
      audioUrl: fileUrl,
      duration
    })
  });
}

Step 6: Storage Integration (S3-Compatible)

Supabase Storage Example:

# egress.yaml
s3:
  access_key: YOUR_SUPABASE_ACCESS_KEY
  secret: YOUR_SUPABASE_SECRET_KEY
  region: us-east-1
  bucket: voice-recordings
  endpoint: https://PROJECT_ID.supabase.co/storage/v1/s3
  force_path_style: true  # Required for Supabase

Cloudflare R2 Example:

# egress.yaml
s3:
  access_key: YOUR_R2_ACCESS_KEY
  secret: YOUR_R2_SECRET_KEY
  bucket: voice-recordings
  endpoint: https://ACCOUNT_ID.r2.cloudflarestorage.com
  region: auto

Download Recording from Storage:

// Retrieve recording URL for analysis
async function getRecordingUrl(roomName: string): Promise {
  const { data, error } = await supabase
    .storage
    .from('voice-recordings')
    .createSignedUrl(`recordings/${roomName}.ogg`, 3600); // 1-hour signed URL

  if (error) throw error;
  return data.signedUrl;
}

Step 7: Production Deployment

Option 1: Self-Hosted on Fly.io:

# fly.toml
app = "your-livekit-server"
primary_region = "sjc"

[build]
  image = "livekit/livekit-server:latest"

[[services]]
  internal_port = 7880
  protocol = "tcp"

  [[services.ports]]
    port = 443
    handlers = ["tls"]

  [[services.ports]]
    port = 80
    handlers = ["http"]

[env]
  LIVEKIT_CONFIG = "/etc/livekit.yaml"

[[mounts]]
  source = "livekit_config"
  destination = "/etc/livekit.yaml"

Deploy:

# Create Fly app
fly apps create your-livekit-server

# Set secrets
fly secrets set LIVEKIT_API_KEY=your-key
fly secrets set LIVEKIT_API_SECRET=your-secret

# Deploy
fly deploy --no-cache

# Check status
fly status

Option 2: LiveKit Cloud (Managed):

# Sign up at https://cloud.livekit.io
# Get API credentials from dashboard
# Set environment variables in your app

LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=API...
LIVEKIT_API_SECRET=secret...

Troubleshooting Common Issues

1. No Audio in Recording

Symptoms: Egress completes but file has no audio or very low volume

Causes & Solutions:

  • Microphone permission denied: Check browser console for permission errors, ensure user grants access
  • Track not published: Verify track.isMuted = false, check `room.localParticipant.audioTracks`
  • Egress started before participant joined: Start Egress after participant publishes audio track (listen for `TrackPublished` event)
  • Audio constraints too restrictive: Remove `advanced` constraints that browser may not support

Debug:

room.on(RoomEvent.TrackPublished, (publication, participant) => {
  console.log('Track published:', publication.kind, publication.source);
  if (publication.kind === 'audio' && publication.source === 'microphone') {
    // NOW start Egress
    startEgressRecording(room.name);
  }
});

2. Connection Failures

Symptoms: Client can't connect to LiveKit server, stuck on "Connecting..."

Causes & Solutions:

  • Incorrect server URL: Ensure `wss://` protocol (not `ws://` for production), correct domain/port
  • Token validation error: Check token not expired, API key/secret match server config
  • Firewall blocking WebRTC ports: LiveKit uses ports 50000-51000 for media (UDP), ensure not blocked
  • CORS issues: For browser clients, ensure LiveKit server allows origin (configure in `livekit.yaml`)

Debug:

room.on(RoomEvent.ConnectionStateChanged, (state) => {
  console.log('Connection state:', state);
});

room.on(RoomEvent.Disconnected, (reason) => {
  console.log('Disconnected, reason:', reason);
});

3. Poor Audio Quality

Symptoms: Recording sounds muffled, robotic, or distorted

Causes & Solutions:

  • Low bitrate: Increase Opus bitrate to 64-96 kbps in server config
  • Aggressive noise suppression: Disable `noiseSuppression` in audio constraints for voice analysis
  • Network packet loss: Check client network quality, LiveKit auto-adjusts bitrate but high loss (>5%) degrades quality
  • Sample rate mismatch: Ensure `sampleRate: 48000` in client audio constraints

Monitor quality:

room.on(RoomEvent.ConnectionQualityChanged, (quality, participant) => {
  console.log(`Quality for ${participant.identity}: ${quality}`);
  // 'excellent', 'good', 'poor'

  if (quality === 'poor') {
    // Alert user or adjust recording
  }
});

4. Egress Recording Never Completes

Symptoms: Egress starts but never sends completion webhook

Causes & Solutions:

  • Room still active: Egress waits until room closes, ensure all participants disconnect
  • Egress service not running: Check `docker-compose ps egress`, verify egress container healthy
  • S3 upload failure: Check egress logs (`docker-compose logs egress`) for permission errors
  • Insufficient disk space: Egress buffers to disk before upload, ensure adequate space

Manual check:

# List active egress sessions
curl http://localhost:7880/egress/list   -H "Authorization: Bearer YOUR_TOKEN"

# Check specific egress status
curl http://localhost:7880/egress/YOUR_EGRESS_ID   -H "Authorization: Bearer YOUR_TOKEN"

5. High Server CPU Usage

Symptoms: LiveKit server CPU >80%, impacting performance

Causes & Solutions:

  • Too many video tracks: For voice analysis, disable video entirely (`video: false`)
  • Simulcast enabled unnecessarily: Disable simulcast for audio-only (`simulcast: false` in track options)
  • Excessive participants per room: For 1-on-1 voice analysis, limit `maxParticipants: 2`
  • Old server version: Update to latest LiveKit server (performance improvements)

Optimize:

# livekit.yaml
room:
  max_participants: 2  # Voice analysis typically 1 user + 1 agent

video:
  enabled: false  # Disable video processing entirely

Performance Optimization

1. Reduce Latency

Client-side:

  • Set `latency: 0.01` in audio constraints (10ms buffer)
  • Use `adaptiveStream: true` in LiveKitRoom props (auto-adjust for network)
  • Place server geographically close to users (multi-region deployment)

Server-side:

  • Enable `use_external_ip` to avoid TURN relay (direct peer-to-server connection)
  • Increase `update_interval` only if needed (default 200ms is good)
  • Deploy behind CDN/load balancer with WebSocket support (Cloudflare, AWS ALB)

2. Scale for High Traffic

Horizontal Scaling:

  • Run multiple LiveKit server instances
  • Share state via Redis (configure `redis.address` in all instances)
  • Use load balancer with sticky sessions (route same room to same server)
# livekit.yaml (each instance)
redis:
  address: redis.internal:6379  # Shared Redis

node_selector:
  region: us-west  # For geo-routing

Vertical Scaling:

  • Increase server CPU/RAM (LiveKit can handle 1,000+ participants per 8-core server)
  • Use SSD storage for Egress buffering (faster writes)
  • Dedicate separate instances for Egress (CPU-intensive video encoding)

3. Optimize Recording Storage Costs

Compression:

  • Use OGG format (Opus codec) for smallest file size with good quality (~30-50 MB/hour)
  • Compare to MP3 (~60 MB/hour) or uncompressed WAV (~600 MB/hour)
  • For long-term storage, transcode to lower bitrate after analysis completes

Lifecycle Policies:

  • Auto-delete recordings after 30 days (GDPR compliance, cost reduction)
  • Move to cheaper storage tier (S3 Glacier, GCS Nearline) after 7 days
  • Store only audio needed for analysis (delete video tracks if accidentally recorded)

Security Considerations

1. Token Security

Best Practices:

  • Never expose API secret to client (always generate tokens server-side)
  • Short TTL: Set token expiration to session length (1-2 hours), not days
  • Room-specific tokens: Token grants access to ONE room only (prevent room-hopping)
  • Metadata validation: Backend verifies user owns session before generating token
// Secure token generation
export async function POST(request: Request) {
  // 1. Verify user is authenticated
  const session = await getServerSession();
  if (!session) return NextResponse.json({ error: 'Unauthorized' }, { status: 401 });

  // 2. Validate user owns this session
  const { roomName } = await request.json();
  const sessionOwner = await getSessionOwner(roomName);
  if (sessionOwner !== session.user.id) {
    return NextResponse.json({ error: 'Forbidden' }, { status: 403 });
  }

  // 3. Generate token with minimal permissions
  const token = new AccessToken(apiKey, apiSecret, {
    identity: session.user.id,
    ttl: 3600  // 1 hour only
  });

  token.addGrant({
    roomJoin: true,
    room: roomName,
    canPublish: true,
    canPublishData: false,  // No data channel if not needed
    canSubscribe: false     // User doesn't need to receive (for 1-person recording)
  });

  return NextResponse.json({ token: await token.toJwt() });
}

2. End-to-End Encryption (E2EE)

When to use: For highly sensitive voice data (healthcare, legal, financial)

// Enable E2EE in client


// Note: E2EE prevents server-side access to audio
// Cannot use server-side Egress recording with E2EE
// Must record client-side and upload encrypted files

3. Recording Consent

Legal requirements:

  • Inform users recording is active (display "Recording" indicator)
  • Obtain explicit consent before starting (checkbox, button click)
  • Allow users to opt out (stop recording, delete immediately)
  • Store consent timestamp and IP address for compliance
// Consent UI before joining room
function ConsentModal({ onAccept, onDecline }) {
  return (
    

Voice Recording Consent

This session will be recorded for voice analysis. Your recording will be processed by ML models and stored securely. You can request deletion at any time.

); }

Cost Analysis

Self-Hosted Costs

Server (Fly.io example):

  • LiveKit server: 1 shared-cpu-4x (4 vCPU, 8 GB RAM) = $62/month
  • Egress worker: 1 shared-cpu-2x (2 vCPU, 4 GB RAM) = $31/month
  • Redis: 256 MB = $2/month
  • Total: ~$95/month + bandwidth

Bandwidth (Cloudflare R2 example):

  • Egress recording uploads: $0 (R2 has no egress fees)
  • Storage: $0.015/GB/month (10,000 1-hour recordings × 40 MB = 400 GB = $6/month)

Total self-hosted: ~$100/month for 10,000 sessions

LiveKit Cloud Costs

Pricing (as of 2024):

  • Free tier: 10,000 participant-minutes/month
  • Paid: $0.0015/participant-minute ($0.09/hour per participant)
  • Egress recording: $0.003/minute output ($0.18/hour recording)
  • Storage: Bring your own (S3, GCS, etc.)

Example: 10,000 1-hour voice sessions

  • Participant time: 10,000 hours × $0.09 = $900
  • Recording: 10,000 hours × $0.18 = $1,800
  • Storage (S3): 400 GB × $0.023 = $9
  • Total: ~$2,709/month

Breakeven: Self-hosted becomes cheaper above ~100 concurrent users or ~3,000 hours/month

Alternatives Comparison

Platform Pros Cons Best For
LiveKit Open-source, self-hostable, server-side recording, great docs Requires infrastructure management Voice analysis apps needing control + quality
Twilio Voice Fully managed, phone system integration (PSTN) Expensive ($0.0085/min), 8 kHz audio (poor for ML) Traditional telephony apps
Agora Low latency, global edge network, SDKs for all platforms Expensive ($0.99/1000 min), closed-source Large-scale social audio apps
Daily.co Simple API, great developer experience, managed service No self-hosting, limited customization Rapid prototyping, small-scale apps
Jitsi Free, open-source, self-hostable Less polished, weaker recording features Budget-conscious projects
Custom WebRTC Full control, no vendor lock-in Months of development, complex to maintain Only if you need extreme customization

Verdict for Voice Analysis: LiveKit offers the best balance of audio quality (48 kHz), reliability (server-side recording), cost (self-hostable), and developer experience (excellent docs/SDKs).

Voice Mirror Implementation Example

Here's how Voice Mirror uses LiveKit for real-time voice interviews:

Session Flow

  1. User clicks "Start Voice Analysis" - Frontend calls `POST /api/voicemirror/session/create` - Backend creates database record, generates unique room name (`voicemirror-{userId}-{timestamp}`) - Backend generates LiveKit token with 2-hour TTL - Returns token + server URL to frontend
  2. User joins LiveKit room - React component uses `` with token - User grants microphone permission - LiveKit publishes audio track to server
  3. AI interviewer joins - Python agent connects as second participant - Agent uses STT to transcribe user speech - Agent generates interview questions via LLM - Agent synthesizes speech via TTS and publishes audio track
  4. Recording starts automatically - When user's audio track published, backend starts Egress - Egress records room audio (both user + AI) to OGG file - Uploads to Supabase Storage (`recordings/{userId}/{sessionId}.ogg`)
  5. Interview completes - User clicks "End Interview" or 30-minute timeout - Frontend disconnects from room - Room auto-closes (all participants left)
  6. Webhook triggers analysis - Egress sends `egress_ended` webhook - Backend receives recording URL - Backend queues voice analysis job (Python service) - Analysis extracts 6,000+ acoustic features - Results saved to database
  7. User views results - Frontend polls `GET /api/voicemirror/results/{sessionId}` - Shows processing status (0-100%) - Displays complete analysis when done

Key Configuration

// Audio optimized for voice analysis
const audioConstraints = {
  echoCancellation: true,
  noiseSuppression: false,  // Preserve voice characteristics
  autoGainControl: false,   // Preserve volume variations
  sampleRate: 48000,
  channelCount: 1
};

// Room configured for interview
token.addGrant({
  roomJoin: true,
  room: `voicemirror-${userId}-${timestamp}`,
  canPublish: true,
  canSubscribe: true  // Receive AI interviewer audio
});

// Egress records both participants
await egressClient.startRoomCompositeEgress(roomName, {
  file: {
    fileType: 'OGG',
    filepath: `recordings/${userId}/${sessionId}.ogg`,
    s3: supabaseS3Config
  },
  audioOnly: true  // No video processing needed
});

Resources and Next Steps

Official Documentation

  • LiveKit Docs: docs.livekit.io - Comprehensive guides and API reference
  • GitHub: github.com/livekit - Source code, examples, issue tracking
  • Discord Community: Active community for questions and support
  • Cloud Dashboard: cloud.livekit.io - Managed service signup

Example Projects

  • meet.livekit.io: Video conferencing demo (React)
  • LiveKit Agents: Voice AI examples (Python)
  • Voice Recorder Template: Simple audio recording starter

Next Articles in Series

  • Speech-to-Text Comparison: Evaluating Whisper, Deepgram, Google, AWS for voice analysis
  • Training ML Models: Building custom voice classifiers
  • openSMILE Configuration: Extracting acoustic features
  • Real-Time Processing: Streaming voice analysis architecture

The Bottom Line

LiveKit provides production-ready infrastructure for voice analysis applications, handling the complex WebRTC networking, reliable recording, and scalable media routing—so you can focus on voice analysis algorithms rather than reinventing real-time communication.

Key advantages for voice analysis: 48 kHz audio quality (preserves acoustic features ML models need), server-side Egress recording (eliminates client-side failure points), flexible deployment (self-host for control or use LiveKit Cloud for simplicity), comprehensive SDKs (React, Swift, Kotlin, Python), and webhook-driven workflows (trigger analysis pipeline automatically).

The setup process is straightforward: deploy LiveKit server (Docker or cloud), integrate client SDK (React component), generate tokens server-side, configure audio constraints for quality, start Egress recording, handle webhook for completion, and trigger voice analysis pipeline. With proper configuration, you'll have professional-grade voice capture infrastructure in hours, not months.

For most voice analysis applications, self-hosting LiveKit is the optimal choice: lower cost at scale (~$100/month handles 10,000 sessions vs $2,700 on LiveKit Cloud), full control over audio quality settings, data sovereignty (recordings never leave your infrastructure), and ability to customize (add features, optimize performance). LiveKit Cloud makes sense for rapid prototyping or small-scale apps where managed service convenience outweighs cost.

Whether you're building voice biometrics, speech analysis, mental health screening, or conversational AI, LiveKit provides the foundation—reliable, scalable, high-quality real-time voice infrastructure that just works.

Ready to integrate LiveKit into your voice analysis platform?

See LiveKit in Action (Voice Mirror Demo)

Voice Mirror uses LiveKit for real-time voice interviews, recording 48 kHz audio that preserves the acoustic detail needed for accurate voice analysis. Our setup handles 1,000+ concurrent sessions with 99.9% recording reliability.

#LiveKit#WebRTC#real-time-audio#voice-recording#infrastructure#deployment

Related Articles

Ready to Try Voice-First Dating?

Join thousands of singles having authentic conversations on Veronata

Get Started Free