OpsVoice System Architecture

Voice-First SRE Copilot powered by Gemini Live + Google ADK

User Layer
Browser
SRE operator interface with real-time voice, text, and camera inputs
Voice + Text + Camera
SEND RECEIVE
Frontend (In-Browser)
WebSocket Connection
Full-duplex bidirectional streaming between browser and server
Bidirectional
Audio Processing
Real-time PCM capture at 16kHz mono, base64 encoded for streaming
PCM 16kHz
Camera Frame Capture
Periodic frame capture from device camera, JPEG compressed
JPEG Frames
WebSocket / WSS
Backend (Google Cloud Run)
FastAPI Server
Async Python server handling WebSocket upgrades and HTTP endpoints
Python / Async
Google ADK Runner
Agent Development Kit orchestrating tool calls and Gemini sessions
ADK Framework
Session Management
Per-user session state, conversation history, and context tracking
Stateful
Proactive Alert Monitor
Background task polling service health, broadcasts alerts via WebSocket
Async Broadcast
Gemini Live API
AI Layer
Gemini 2.5 Flash
Multimodal AI with native audio understanding and generation capabilities
Native Audio
Live Bidirectional Streaming
Real-time audio-in, audio-out with interruptible conversation flow
Real-Time
Tool Calls
Infrastructure
Agent Tools
check_service_health
Poll status of monitored services
create_incident
Open new incident records in Firestore
get_open_incidents
List active incident records
update_incident_status
Resolve or escalate incidents
get_runbook
Retrieve remediation playbooks
google_search
Web search for troubleshooting context
Google Cloud Services
Cloud Run
Serverless container hosting
Firestore
Incidents database (NoSQL)
Secret Manager
Secure API key storage
Artifact Registry
Container image repository
Cloud Build
CI/CD pipeline
Cloud Logging
Centralized log aggregation

Data Flow Paths

1

Voice/Text Request

Browser WebSocket FastAPI ADK Runner Gemini Live API
2

Tool Execution

Gemini Live API Tool Calls Service Health / Firestore
3

AI Response

Gemini Live API Audio/Text WebSocket Browser
4

Proactive Alerts

Alert Monitor WS Broadcast All Browsers
User Layer
Frontend
Backend
AI Layer
Agent Tools
Google Cloud
Data Flow
Animated Flow