LiveKit Setup for Voice Analysis: Your Complete Implementation Guide

Building a voice analysis application? You need real-time audio capture, low-latency streaming, reliable recording, and scalable infrastructure—all while maintaining audio quality that preserves the acoustic features your ML models depend on.

That's where LiveKit excels. LiveKit is an open-source WebRTC infrastructure platform that handles the complexity of real-time audio/video communication, letting you focus on voice analysis algorithms rather than networking protocols. It provides server-side selective forwarding (efficient multi-party routing), Egress recording (server-side capture with guaranteed quality), flexible SDKs (JavaScript, Swift, Kotlin, React), and self-hosting or cloud options—all specifically designed for production voice/video applications.

For voice analysis, LiveKit offers critical advantages: 48 kHz uncompressed audio (preserves acoustic features), server-side recording (no client-side reliability issues), automatic reconnection (handles network disruptions), track-level control (mute/unmute, quality selection), and webhook events (trigger analysis pipeline when recording completes). Thousands of applications use LiveKit for telehealth, education, social audio, and AI voice agents—making it battle-tested infrastructure for voice-first products.

This guide walks through complete LiveKit setup for voice analysis: server deployment, client integration, audio configuration, recording with Egress, storage integration, and production deployment. Whether you're building a voice biometrics platform, speech analysis tool, or conversational AI application, you'll learn exactly how to implement real-time voice infrastructure that scales.

What Is LiveKit? Architecture Overview

LiveKit is a WebRTC server platform that manages real-time audio/video communication between clients. Unlike peer-to-peer WebRTC (where clients connect directly), LiveKit uses Selective Forwarding Unit (SFU) architecture—clients send media to the server once, and the server forwards to other participants. This architecture is essential for voice analysis applications where you need server-side access to audio streams for recording and processing.

Core Components

1. LiveKit Server:

WebRTC media router handling audio/video tracks
Written in Go for performance (handles 1,000+ concurrent participants per instance)
Manages rooms, participants, tracks, permissions
Provides signaling (WebSocket) and media transport (DTLS/SRTP)
Self-hosted (Docker/Kubernetes) or managed (LiveKit Cloud)

2. Client SDKs:

JavaScript: Browser-based voice recording (Chrome, Safari, Firefox, Edge)
React Components: Pre-built UI components for faster development
Swift (iOS): Native iOS apps with CoreAudio integration
Kotlin (Android): Native Android apps with AudioRecord/AudioTrack
Python/Go agents: Server-side voice processing (speech-to-text, voice agents)

3. Egress Service:

Server-side recording of rooms, tracks, or participants
Runs headless Chrome for web rendering or direct audio mixing
Uploads recordings to S3, Azure, GCP, or local storage
Guarantees recording quality (not affected by client disconnections)
Supports audio-only (OGG, MP3, M4A) or video (MP4, WebM)

4. Optional Services:

Ingress: Stream external sources (RTMP, WHIP) into LiveKit rooms
SIP: Connect traditional phone systems to LiveKit
Agents: Server-side participants for AI voice assistants

Why LiveKit for Voice Analysis?

Audio Quality:

Supports 48 kHz sample rate (vs 8 kHz in telephony, 16 kHz in some WebRTC configs)
Opus codec with configurable bitrate (32-510 kbps)
Preserves acoustic features needed for ML (pitch, formants, harmonics)
Automatic gain control, noise suppression configurable per-track

Recording Reliability:

Server-side Egress eliminates client recording failures (browser crashes, page refreshes, network drops)
Guaranteed audio capture even if client disconnects mid-session
Automatic upload to durable storage (S3-compatible)
Webhook notifications when recording completes (trigger analysis pipeline)

Developer Experience:

Well-documented APIs with TypeScript types
Pre-built React components for rapid prototyping
Local Docker Compose setup for development
Open-source (Apache 2.0 license) with active community

Production-Ready:

Used by 1,000+ production applications
Horizontal scaling (add server instances behind load balancer)
Comprehensive monitoring (Prometheus metrics)
Built-in security (E2E encryption, token-based auth)

Core Concepts for Voice Analysis

1. Rooms

A room is a logical container for a real-time session. For voice analysis:

One room per voice session: Each user voice recording gets unique room (e.g., `voice-analysis-session-{userId}-{timestamp}`)
Room lifecycle: Create room → participants join → record audio → participants leave → room closes
Room options: - `emptyTimeout`: Auto-close room after last participant leaves (default 5 min) - `maxParticipants`: Limit to 1 user + 1 AI agent (for voice analysis interviews) - `metadata`: Store session info (userId, analysisType, startTime)

2. Participants

Each connection to a room is a participant. For voice analysis:

User participant: The person being analyzed (publishes audio track from microphone)
Agent participant (optional): AI voice interviewer (publishes synthesized speech, subscribes to user audio)
Participant metadata: Store user ID, name, profile info for linking recordings to users
Participant permissions: Control who can publish/subscribe (prevent unauthorized recording)

3. Tracks

A track is a single media stream (audio or video). For voice analysis:

Audio tracks only: Disable video to reduce bandwidth and processing
Track types: - `Track.Source.Microphone`: User's microphone input (the audio you're analyzing) - `Track.Source.ScreenShareAudio`: System audio (rarely used for voice analysis)
Track configuration: - Sample rate: 48 kHz (preserves acoustic detail) - Bitrate: 64-96 kbps (balance quality and bandwidth) - DTX (discontinuous transmission): OFF for voice analysis (preserve silence patterns)

4. Tokens

Access tokens authorize participants to join rooms. For voice analysis:

Server-generated: Backend creates token with API key + secret (never expose secret to client)
Token claims: - `roomName`: Which room participant can join - `identity`: Unique participant identifier (user ID) - `ttl`: Token expiration (e.g., 2 hours for voice session) - `canPublish`: Permission to publish audio - `canSubscribe`: Permission to receive audio (for AI agents) - `metadata`: Custom data (user profile, session type)
Security: Tokens prevent unauthorized access—only users with valid token can join specific room

Implementation: Complete Setup Guide

Step 1: Server Deployment (Self-Hosted)

Local Development with Docker:

# docker-compose.yml
version: '3.8'

services:
  livekit:
    image: livekit/livekit-server:latest
    command: --config /etc/livekit.yaml
    restart: unless-stopped
    ports:
      - "7880:7880"  # WebRTC
      - "7881:7881"  # HTTP/WebSocket
      - "7882:7882"  # TURN/UDP
    volumes:
      - ./livekit.yaml:/etc/livekit.yaml
    environment:
      - LIVEKIT_KEYS=devkey: devsecret

  redis:
    image: redis:7-alpine
    restart: unless-stopped
    ports:
      - "6379:6379"

  egress:
    image: livekit/egress:latest
    environment:
      - EGRESS_CONFIG_FILE=/etc/egress.yaml
      - REDIS_HOST=redis:6379
    volumes:
      - ./egress.yaml:/etc/egress.yaml
      - ./recordings:/out
    depends_on:
      - livekit
      - redis

LiveKit Server Configuration (`livekit.yaml`):

port: 7880
bind_addresses:
  - "0.0.0.0"

rtc:
  port_range_start: 50000
  port_range_end: 51000
  use_external_ip: true
  # For local dev, use local IP; for production, use public IP
  external_ip: "127.0.0.1"

redis:
  address: redis:6379

keys:
  devkey: devsecret  # Development only - use secure keys in production

room:
  empty_timeout: 300  # Close room 5 min after last participant leaves
  max_participants: 10

audio:
  # Optimize for voice analysis
  active_speaker_update: 1s
  update_interval: 1s

logging:
  level: info
  sample: false

Egress Configuration (`egress.yaml`):

api_key: devkey
api_secret: devsecret
ws_url: ws://livekit:7880

redis:
  address: redis:6379

# Local file storage for development
file_output:
  local: true
  output_directory: /out

# S3 for production (uncomment and configure)
# s3:
#   access_key: YOUR_ACCESS_KEY
#   secret: YOUR_SECRET_KEY
#   region: us-west-2
#   bucket: your-voice-recordings
#   endpoint: https://s3.amazonaws.com  # Or Supabase/MinIO/R2

logging:
  level: info

Start Services:

# Start all services
docker-compose up -d

# Check logs
docker-compose logs -f livekit

# Verify server is running
curl http://localhost:7880
# Should return: {"server":"LiveKit","version":"1.x.x"}

Step 2: Client SDK Integration (React/TypeScript)

Install Dependencies:

npm install livekit-client @livekit/components-react

Backend: Token Generation API (`/api/voice/token`):

// app/api/voice/token/route.ts
import { AccessToken } from 'livekit-server-sdk';
import { NextResponse } from 'next/server';

export async function POST(request: Request) {
  const { roomName, userId, userName, sessionDuration = 7200 } = await request.json();

  // Validate user is authenticated
  // ... your auth logic here ...

  const apiKey = process.env.LIVEKIT_API_KEY!;
  const apiSecret = process.env.LIVEKIT_API_SECRET!;

  // Create access token
  const token = new AccessToken(apiKey, apiSecret, {
    identity: userId,
    ttl: sessionDuration, // Token expires in 2 hours
    metadata: JSON.stringify({
      userId,
      userName,
      sessionType: 'voice-analysis',
      timestamp: new Date().toISOString()
    })
  });

  // Grant permissions
  token.addGrant({
    roomJoin: true,
    room: roomName,
    canPublish: true,
    canPublishData: true,
    canSubscribe: true
  });

  const jwt = await token.toJwt();

  return NextResponse.json({
    token: jwt,
    serverUrl: process.env.LIVEKIT_URL, // wss://your-server.com
    roomName,
    expiresAt: new Date(Date.now() + sessionDuration * 1000).toISOString()
  });
}

Frontend: Voice Recording Component:

// components/VoiceRecorder.tsx
'use client';

import { useEffect, useState } from 'react';
import { Room, RoomEvent, Track } from 'livekit-client';
import { LiveKitRoom, useRoomContext, useTracks } from '@livekit/components-react';
import '@livekit/components-styles';

interface VoiceRecorderProps {
  userId: string;
  userName: string;
  onSessionComplete: (recordingUrl: string, duration: number) => void;
}

export function VoiceRecorder({ userId, userName, onSessionComplete }: VoiceRecorderProps) {
  const [token, setToken] = useState('');
  const [serverUrl, setServerUrl] = useState('');
  const [roomName, setRoomName] = useState('');
  const [isConnecting, setIsConnecting] = useState(false);

  // Initialize session: get token from backend
  const startSession = async () => {
    setIsConnecting(true);

    try {
      const sessionId = `voice-${userId}-${Date.now()}`;
      const response = await fetch('/api/voice/token', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({
          roomName: sessionId,
          userId,
          userName
        })
      });

      const data = await response.json();

      setToken(data.token);
      setServerUrl(data.serverUrl);
      setRoomName(data.roomName);
    } catch (error) {
      console.error('Failed to start session:', error);
      setIsConnecting(false);
    }
  };

  if (!token) {
    return (
      
        
      
    );
  }

  return (
     {
        console.log('Connected to room:', room.name);
        // Start Egress recording here (see Step 4)
      }}
      onDisconnected={() => {
        console.log('Disconnected from room');
        // Handle session completion
      }}
    >
      
    
  );
}

// Inner component has access to LiveKit room context
function VoiceRecordingUI({ onComplete }: { onComplete: (url: string, duration: number) => void }) {
  const room = useRoomContext();
  const [startTime] = useState(Date.now());
  const [isRecording, setIsRecording] = useState(true);

  // Track microphone audio
  const tracks = useTracks([Track.Source.Microphone], {
    onlySubscribed: false  // Include local tracks
  });

  const stopRecording = async () => {
    setIsRecording(false);
    const duration = Math.floor((Date.now() - startTime) / 1000);

    // Stop Egress recording via API
    await fetch('/api/voice/stop-recording', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ roomName: room.name })
    });

    // Disconnect from room
    await room.disconnect();

    // Notify parent (backend will send recording URL via webhook)
    onComplete('pending', duration);
  };

  return (
    
      
        
        Recording in progress...
      

      {/* Display active tracks */}
      
        {tracks.map((track) => (
          
            Microphone: {track.publication.isMuted ? 'Muted' : 'Active'}
          
        ))}
      

      
    
  );
}

Step 3: Audio Track Configuration

Optimize audio settings for voice analysis:

// Audio constraints for high-quality voice capture
const audioConstraints = {
  audio: {
    echoCancellation: true,   // Remove echo from speakers
    noiseSuppression: false,  // Preserve natural voice characteristics for analysis
    autoGainControl: false,   // Preserve natural volume variations
    sampleRate: 48000,        // 48 kHz for acoustic detail
    channelCount: 1,          // Mono (sufficient for voice)
    // Advanced constraints (browser support varies)
    sampleSize: 16,           // 16-bit depth
    latency: 0.01,            // 10ms latency (real-time feel)
  },
  video: false
};

// Apply when connecting to room

Configure Opus codec for quality:

// In livekit.yaml (server config)
audio:
  opus:
    max_playback_rate: 48000  # Full 48 kHz
    max_average_bitrate: 96000  # 96 kbps (excellent quality)
    ptime: 20  # 20ms packet time
    dtx: false  # Disable discontinuous transmission (preserve silence patterns)

Step 4: Recording with Egress

Start Recording API (`/api/voice/start-recording`):

// app/api/voice/start-recording/route.ts
import { RoomCompositeEgressRequest, EgressClient } from 'livekit-server-sdk';
import { NextResponse } from 'next/server';

export async function POST(request: Request) {
  const { roomName, userId } = await request.json();

  const egressClient = new EgressClient(
    process.env.LIVEKIT_URL!,
    process.env.LIVEKIT_API_KEY!,
    process.env.LIVEKIT_API_SECRET!
  );

  // Audio-only recording (no video rendering needed)
  const output = {
    fileType: 'OGG',  // or 'MP3', 'M4A'
    filepath: `recordings/${userId}/${roomName}.ogg`,
    // S3 upload (if configured in egress.yaml)
    s3: {
      accessKey: process.env.S3_ACCESS_KEY,
      secret: process.env.S3_SECRET_KEY,
      bucket: process.env.S3_BUCKET,
      region: process.env.S3_REGION
    }
  };

  try {
    const egressId = await egressClient.startRoomCompositeEgress(roomName, {
      file: output,
      audioOnly: true,  // Skip video processing
      // Optional: custom layout for multi-party (not needed for single speaker)
    });

    // Store egress ID for later reference
    await storeEgressInfo(roomName, egressId, userId);

    return NextResponse.json({
      success: true,
      egressId,
      message: 'Recording started'
    });
  } catch (error) {
    console.error('Failed to start recording:', error);
    return NextResponse.json(
      { error: 'Failed to start recording' },
      { status: 500 }
    );
  }
}

Stop Recording (optional - auto-stops when room closes):

// app/api/voice/stop-recording/route.ts
export async function POST(request: Request) {
  const { roomName } = await request.json();

  const egressClient = new EgressClient(
    process.env.LIVEKIT_URL!,
    process.env.LIVEKIT_API_KEY!,
    process.env.LIVEKIT_API_SECRET!
  );

  // Retrieve egress ID from database
  const egressId = await getEgressIdForRoom(roomName);

  try {
    await egressClient.stopEgress(egressId);
    return NextResponse.json({ success: true });
  } catch (error) {
    console.error('Failed to stop recording:', error);
    return NextResponse.json(
      { error: 'Failed to stop recording' },
      { status: 500 }
    );
  }
}

Step 5: Webhook Integration for Recording Completion

Configure webhook in `livekit.yaml`:

webhook:
  api_key: your-webhook-secret-key
  urls:
    - https://yourapp.com/api/webhooks/livekit

Webhook Handler (`/api/webhooks/livekit`):

// app/api/webhooks/livekit/route.ts
import { WebhookReceiver } from 'livekit-server-sdk';
import { NextResponse } from 'next/server';

const receiver = new WebhookReceiver(
  process.env.LIVEKIT_API_KEY!,
  process.env.LIVEKIT_API_SECRET!
);

export async function POST(request: Request) {
  const body = await request.text();
  const authHeader = request.headers.get('Authorization') || '';

  try {
    // Verify webhook signature
    const event = receiver.receive(body, authHeader);

    // Handle different event types
    switch (event.event) {
      case 'egress_ended':
        // Recording completed
        const { egressId, roomName, fileUrl, duration } = event;

        console.log(`Recording completed for room ${roomName}`);
        console.log(`File URL: ${fileUrl}`);
        console.log(`Duration: ${duration}s`);

        // Trigger voice analysis pipeline
        await triggerVoiceAnalysis({
          roomName,
          fileUrl,
          duration
        });
        break;

      case 'room_finished':
        console.log(`Room ${event.room.name} closed`);
        break;

      case 'participant_joined':
        console.log(`Participant ${event.participant.identity} joined`);
        break;

      case 'participant_left':
        console.log(`Participant ${event.participant.identity} left`);
        break;
    }

    return NextResponse.json({ received: true });
  } catch (error) {
    console.error('Webhook verification failed:', error);
    return NextResponse.json(
      { error: 'Invalid webhook' },
      { status: 401 }
    );
  }
}

async function triggerVoiceAnalysis({ roomName, fileUrl, duration }) {
  // Call your voice analysis service
  await fetch(process.env.VOICE_ANALYSIS_SERVICE_URL + '/analyze', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      sessionId: roomName,
      audioUrl: fileUrl,
      duration
    })
  });
}

Step 6: Storage Integration (S3-Compatible)

Supabase Storage Example:

# egress.yaml
s3:
  access_key: YOUR_SUPABASE_ACCESS_KEY
  secret: YOUR_SUPABASE_SECRET_KEY
  region: us-east-1
  bucket: voice-recordings
  endpoint: https://PROJECT_ID.supabase.co/storage/v1/s3
  force_path_style: true  # Required for Supabase

Cloudflare R2 Example:

# egress.yaml
s3:
  access_key: YOUR_R2_ACCESS_KEY
  secret: YOUR_R2_SECRET_KEY
  bucket: voice-recordings
  endpoint: https://ACCOUNT_ID.r2.cloudflarestorage.com
  region: auto

Download Recording from Storage:

// Retrieve recording URL for analysis
async function getRecordingUrl(roomName: string): Promise {
  const { data, error } = await supabase
    .storage
    .from('voice-recordings')
    .createSignedUrl(`recordings/${roomName}.ogg`, 3600); // 1-hour signed URL

  if (error) throw error;
  return data.signedUrl;
}

Step 7: Production Deployment

Option 1: Self-Hosted on Fly.io:

# fly.toml
app = "your-livekit-server"
primary_region = "sjc"

[build]
  image = "livekit/livekit-server:latest"

[[services]]
  internal_port = 7880
  protocol = "tcp"

  [[services.ports]]
    port = 443
    handlers = ["tls"]

  [[services.ports]]
    port = 80
    handlers = ["http"]

[env]
  LIVEKIT_CONFIG = "/etc/livekit.yaml"

[[mounts]]
  source = "livekit_config"
  destination = "/etc/livekit.yaml"

Deploy:

# Create Fly app
fly apps create your-livekit-server

# Set secrets
fly secrets set LIVEKIT_API_KEY=your-key
fly secrets set LIVEKIT_API_SECRET=your-secret

# Deploy
fly deploy --no-cache

# Check status
fly status

Option 2: LiveKit Cloud (Managed):

# Sign up at https://cloud.livekit.io
# Get API credentials from dashboard
# Set environment variables in your app

LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=API...
LIVEKIT_API_SECRET=secret...

Troubleshooting Common Issues

1. No Audio in Recording

Symptoms: Egress completes but file has no audio or very low volume

Causes & Solutions:

Microphone permission denied: Check browser console for permission errors, ensure user grants access
Track not published: Verify track.isMuted = false, check `room.localParticipant.audioTracks`
Egress started before participant joined: Start Egress after participant publishes audio track (listen for `TrackPublished` event)
Audio constraints too restrictive: Remove `advanced` constraints that browser may not support

Debug:

room.on(RoomEvent.TrackPublished, (publication, participant) => {
  console.log('Track published:', publication.kind, publication.source);
  if (publication.kind === 'audio' && publication.source === 'microphone') {
    // NOW start Egress
    startEgressRecording(room.name);
  }
});

2. Connection Failures

Symptoms: Client can't connect to LiveKit server, stuck on "Connecting..."

Causes & Solutions:

Incorrect server URL: Ensure `wss://` protocol (not `ws://` for production), correct domain/port
Token validation error: Check token not expired, API key/secret match server config
Firewall blocking WebRTC ports: LiveKit uses ports 50000-51000 for media (UDP), ensure not blocked
CORS issues: For browser clients, ensure LiveKit server allows origin (configure in `livekit.yaml`)

Debug:

room.on(RoomEvent.ConnectionStateChanged, (state) => {
  console.log('Connection state:', state);
});

room.on(RoomEvent.Disconnected, (reason) => {
  console.log('Disconnected, reason:', reason);
});

3. Poor Audio Quality

Symptoms: Recording sounds muffled, robotic, or distorted

Causes & Solutions:

Low bitrate: Increase Opus bitrate to 64-96 kbps in server config
Aggressive noise suppression: Disable `noiseSuppression` in audio constraints for voice analysis
Network packet loss: Check client network quality, LiveKit auto-adjusts bitrate but high loss (>5%) degrades quality
Sample rate mismatch: Ensure `sampleRate: 48000` in client audio constraints

Monitor quality:

room.on(RoomEvent.ConnectionQualityChanged, (quality, participant) => {
  console.log(`Quality for ${participant.identity}: ${quality}`);
  // 'excellent', 'good', 'poor'

  if (quality === 'poor') {
    // Alert user or adjust recording
  }
});

4. Egress Recording Never Completes

Symptoms: Egress starts but never sends completion webhook

Causes & Solutions:

Room still active: Egress waits until room closes, ensure all participants disconnect
Egress service not running: Check `docker-compose ps egress`, verify egress container healthy
S3 upload failure: Check egress logs (`docker-compose logs egress`) for permission errors
Insufficient disk space: Egress buffers to disk before upload, ensure adequate space

Manual check:

# List active egress sessions
curl http://localhost:7880/egress/list   -H "Authorization: Bearer YOUR_TOKEN"

# Check specific egress status
curl http://localhost:7880/egress/YOUR_EGRESS_ID   -H "Authorization: Bearer YOUR_TOKEN"

5. High Server CPU Usage

Symptoms: LiveKit server CPU >80%, impacting performance

Causes & Solutions:

Too many video tracks: For voice analysis, disable video entirely (`video: false`)
Simulcast enabled unnecessarily: Disable simulcast for audio-only (`simulcast: false` in track options)
Excessive participants per room: For 1-on-1 voice analysis, limit `maxParticipants: 2`
Old server version: Update to latest LiveKit server (performance improvements)

Optimize:

# livekit.yaml
room:
  max_participants: 2  # Voice analysis typically 1 user + 1 agent

video:
  enabled: false  # Disable video processing entirely

Performance Optimization

1. Reduce Latency

Client-side:

Set `latency: 0.01` in audio constraints (10ms buffer)
Use `adaptiveStream: true` in LiveKitRoom props (auto-adjust for network)
Place server geographically close to users (multi-region deployment)

Server-side:

Enable `use_external_ip` to avoid TURN relay (direct peer-to-server connection)
Increase `update_interval` only if needed (default 200ms is good)
Deploy behind CDN/load balancer with WebSocket support (Cloudflare, AWS ALB)

2. Scale for High Traffic

Horizontal Scaling:

Run multiple LiveKit server instances
Share state via Redis (configure `redis.address` in all instances)
Use load balancer with sticky sessions (route same room to same server)

# livekit.yaml (each instance)
redis:
  address: redis.internal:6379  # Shared Redis

node_selector:
  region: us-west  # For geo-routing

Vertical Scaling:

Increase server CPU/RAM (LiveKit can handle 1,000+ participants per 8-core server)
Use SSD storage for Egress buffering (faster writes)
Dedicate separate instances for Egress (CPU-intensive video encoding)

3. Optimize Recording Storage Costs

Compression:

Use OGG format (Opus codec) for smallest file size with good quality (~30-50 MB/hour)
Compare to MP3 (~60 MB/hour) or uncompressed WAV (~600 MB/hour)
For long-term storage, transcode to lower bitrate after analysis completes

Lifecycle Policies:

Auto-delete recordings after 30 days (GDPR compliance, cost reduction)
Move to cheaper storage tier (S3 Glacier, GCS Nearline) after 7 days
Store only audio needed for analysis (delete video tracks if accidentally recorded)

Security Considerations

1. Token Security

Best Practices:

Never expose API secret to client (always generate tokens server-side)
Short TTL: Set token expiration to session length (1-2 hours), not days
Room-specific tokens: Token grants access to ONE room only (prevent room-hopping)
Metadata validation: Backend verifies user owns session before generating token

// Secure token generation
export async function POST(request: Request) {
  // 1. Verify user is authenticated
  const session = await getServerSession();
  if (!session) return NextResponse.json({ error: 'Unauthorized' }, { status: 401 });

  // 2. Validate user owns this session
  const { roomName } = await request.json();
  const sessionOwner = await getSessionOwner(roomName);
  if (sessionOwner !== session.user.id) {
    return NextResponse.json({ error: 'Forbidden' }, { status: 403 });
  }

  // 3. Generate token with minimal permissions
  const token = new AccessToken(apiKey, apiSecret, {
    identity: session.user.id,
    ttl: 3600  // 1 hour only
  });

  token.addGrant({
    roomJoin: true,
    room: roomName,
    canPublish: true,
    canPublishData: false,  // No data channel if not needed
    canSubscribe: false     // User doesn't need to receive (for 1-person recording)
  });

  return NextResponse.json({ token: await token.toJwt() });
}

2. End-to-End Encryption (E2EE)

When to use: For highly sensitive voice data (healthcare, legal, financial)

// Enable E2EE in client


// Note: E2EE prevents server-side access to audio
// Cannot use server-side Egress recording with E2EE
// Must record client-side and upload encrypted files

3. Recording Consent

Legal requirements:

Inform users recording is active (display "Recording" indicator)
Obtain explicit consent before starting (checkbox, button click)
Allow users to opt out (stop recording, delete immediately)
Store consent timestamp and IP address for compliance

// Consent UI before joining room
function ConsentModal({ onAccept, onDecline }) {
  return (
    
      Voice Recording Consent
      
        This session will be recorded for voice analysis.
        Your recording will be processed by ML models and stored securely.
        You can request deletion at any time.
      
      
        
        I consent to voice recording and analysis
      
      
      
    
  );
}

Cost Analysis

Self-Hosted Costs

Server (Fly.io example):

LiveKit server: 1 shared-cpu-4x (4 vCPU, 8 GB RAM) = $62/month
Egress worker: 1 shared-cpu-2x (2 vCPU, 4 GB RAM) = $31/month
Redis: 256 MB = $2/month
Total: ~$95/month + bandwidth

Bandwidth (Cloudflare R2 example):

Egress recording uploads: $0 (R2 has no egress fees)
Storage: $0.015/GB/month (10,000 1-hour recordings × 40 MB = 400 GB = $6/month)

Total self-hosted: ~$100/month for 10,000 sessions

LiveKit Cloud Costs

Pricing (as of 2024):

Free tier: 10,000 participant-minutes/month
Paid: $0.0015/participant-minute ($0.09/hour per participant)
Egress recording: $0.003/minute output ($0.18/hour recording)
Storage: Bring your own (S3, GCS, etc.)

Example: 10,000 1-hour voice sessions

Participant time: 10,000 hours × $0.09 = $900
Recording: 10,000 hours × $0.18 = $1,800
Storage (S3): 400 GB × $0.023 = $9
Total: ~$2,709/month

Breakeven: Self-hosted becomes cheaper above ~100 concurrent users or ~3,000 hours/month

Alternatives Comparison

Platform	Pros	Cons	Best For
LiveKit	Open-source, self-hostable, server-side recording, great docs	Requires infrastructure management	Voice analysis apps needing control + quality
Twilio Voice	Fully managed, phone system integration (PSTN)	Expensive ($0.0085/min), 8 kHz audio (poor for ML)	Traditional telephony apps
Agora	Low latency, global edge network, SDKs for all platforms	Expensive ($0.99/1000 min), closed-source	Large-scale social audio apps
Daily.co	Simple API, great developer experience, managed service	No self-hosting, limited customization	Rapid prototyping, small-scale apps
Jitsi	Free, open-source, self-hostable	Less polished, weaker recording features	Budget-conscious projects
Custom WebRTC	Full control, no vendor lock-in	Months of development, complex to maintain	Only if you need extreme customization

Verdict for Voice Analysis: LiveKit offers the best balance of audio quality (48 kHz), reliability (server-side recording), cost (self-hostable), and developer experience (excellent docs/SDKs).

Voice Mirror Implementation Example

Here's how Voice Mirror uses LiveKit for real-time voice interviews:

Session Flow

User clicks "Start Voice Analysis" - Frontend calls `POST /api/voicemirror/session/create` - Backend creates database record, generates unique room name (`voicemirror-{userId}-{timestamp}`) - Backend generates LiveKit token with 2-hour TTL - Returns token + server URL to frontend
User joins LiveKit room - React component uses `` with token - User grants microphone permission - LiveKit publishes audio track to server
AI interviewer joins - Python agent connects as second participant - Agent uses STT to transcribe user speech - Agent generates interview questions via LLM - Agent synthesizes speech via TTS and publishes audio track
Recording starts automatically - When user's audio track published, backend starts Egress - Egress records room audio (both user + AI) to OGG file - Uploads to Supabase Storage (`recordings/{userId}/{sessionId}.ogg`)
Interview completes - User clicks "End Interview" or 30-minute timeout - Frontend disconnects from room - Room auto-closes (all participants left)
Webhook triggers analysis - Egress sends `egress_ended` webhook - Backend receives recording URL - Backend queues voice analysis job (Python service) - Analysis extracts 6,000+ acoustic features - Results saved to database
User views results - Frontend polls `GET /api/voicemirror/results/{sessionId}` - Shows processing status (0-100%) - Displays complete analysis when done

Key Configuration

// Audio optimized for voice analysis
const audioConstraints = {
  echoCancellation: true,
  noiseSuppression: false,  // Preserve voice characteristics
  autoGainControl: false,   // Preserve volume variations
  sampleRate: 48000,
  channelCount: 1
};

// Room configured for interview
token.addGrant({
  roomJoin: true,
  room: `voicemirror-${userId}-${timestamp}`,
  canPublish: true,
  canSubscribe: true  // Receive AI interviewer audio
});

// Egress records both participants
await egressClient.startRoomCompositeEgress(roomName, {
  file: {
    fileType: 'OGG',
    filepath: `recordings/${userId}/${sessionId}.ogg`,
    s3: supabaseS3Config
  },
  audioOnly: true  // No video processing needed
});

Resources and Next Steps

Official Documentation

LiveKit Docs: docs.livekit.io - Comprehensive guides and API reference
GitHub: github.com/livekit - Source code, examples, issue tracking
Discord Community: Active community for questions and support
Cloud Dashboard: cloud.livekit.io - Managed service signup

Example Projects

meet.livekit.io: Video conferencing demo (React)
LiveKit Agents: Voice AI examples (Python)
Voice Recorder Template: Simple audio recording starter

Next Articles in Series

Speech-to-Text Comparison: Evaluating Whisper, Deepgram, Google, AWS for voice analysis
Training ML Models: Building custom voice classifiers
openSMILE Configuration: Extracting acoustic features
Real-Time Processing: Streaming voice analysis architecture

The Bottom Line

LiveKit provides production-ready infrastructure for voice analysis applications, handling the complex WebRTC networking, reliable recording, and scalable media routing—so you can focus on voice analysis algorithms rather than reinventing real-time communication.

Key advantages for voice analysis: 48 kHz audio quality (preserves acoustic features ML models need), server-side Egress recording (eliminates client-side failure points), flexible deployment (self-host for control or use LiveKit Cloud for simplicity), comprehensive SDKs (React, Swift, Kotlin, Python), and webhook-driven workflows (trigger analysis pipeline automatically).

The setup process is straightforward: deploy LiveKit server (Docker or cloud), integrate client SDK (React component), generate tokens server-side, configure audio constraints for quality, start Egress recording, handle webhook for completion, and trigger voice analysis pipeline. With proper configuration, you'll have professional-grade voice capture infrastructure in hours, not months.

For most voice analysis applications, self-hosting LiveKit is the optimal choice: lower cost at scale (~$100/month handles 10,000 sessions vs $2,700 on LiveKit Cloud), full control over audio quality settings, data sovereignty (recordings never leave your infrastructure), and ability to customize (add features, optimize performance). LiveKit Cloud makes sense for rapid prototyping or small-scale apps where managed service convenience outweighs cost.

Whether you're building voice biometrics, speech analysis, mental health screening, or conversational AI, LiveKit provides the foundation—reliable, scalable, high-quality real-time voice infrastructure that just works.

Ready to integrate LiveKit into your voice analysis platform?

See LiveKit in Action (Voice Mirror Demo)

Voice Mirror uses LiveKit for real-time voice interviews, recording 48 kHz audio that preserves the acoustic detail needed for accurate voice analysis. Our setup handles 1,000+ concurrent sessions with 99.9% recording reliability.

LiveKit Setup for Voice Analysis: Your Complete Implementation Guide

What Is LiveKit? Architecture Overview

Core Components

Why LiveKit for Voice Analysis?

Core Concepts for Voice Analysis

1. Rooms

2. Participants

3. Tracks

4. Tokens

Implementation: Complete Setup Guide

Step 1: Server Deployment (Self-Hosted)

Step 2: Client SDK Integration (React/TypeScript)

Step 3: Audio Track Configuration

Step 4: Recording with Egress

Step 5: Webhook Integration for Recording Completion

Step 6: Storage Integration (S3-Compatible)

Step 7: Production Deployment

Troubleshooting Common Issues

1. No Audio in Recording

2. Connection Failures

3. Poor Audio Quality

4. Egress Recording Never Completes

5. High Server CPU Usage

Performance Optimization

1. Reduce Latency

2. Scale for High Traffic

3. Optimize Recording Storage Costs

Security Considerations

1. Token Security

2. End-to-End Encryption (E2EE)

3. Recording Consent

Voice Recording Consent

Cost Analysis

Self-Hosted Costs

LiveKit Cloud Costs

Alternatives Comparison

Voice Mirror Implementation Example

Session Flow

Key Configuration

Resources and Next Steps

Official Documentation

Example Projects

Next Articles in Series

The Bottom Line

Related Articles

Speech-to-Text for Voice Analysis: Comparing Whisper, Deepgram, Google, and AWS

Training ML Models for Voice Analysis: From Data Collection to Production

openSMILE Configuration Guide: Mastering Feature Extraction for Voice Analysis

Ready to Try Voice-First Dating?