3:00 AM
$ kubectl get pods
ERROR: connection refused - port 5432
CRITICAL: api-server CrashLoopBackOff
FATAL: OOM Kill - container exceeded 512Mi
Your senior DevOps engineer is not answering...
What if you had one that never sleeps?
OpsVoice
"Voice-First SRE Copilot powered by Gemini Live + Google ADK"
🎤
VOICE + VISION
Real-Time Streaming
Talk naturally while sharing your camera - OpsVoice sees errors, hears you, and responds instantly with native audio
🚨
INCIDENT MGMT
Create, Track, Resolve
Full incident lifecycle - create incidents, check open issues, update status, and retrieve runbooks hands-free
PROACTIVE
Alert Monitor
Background service health polling with proactive WebSocket alerts - OpsVoice warns you before things break
Powered by Gemini 2.5 Flash Live API Google ADK Cloud Run
🔴 LIVE DEMO
Watch OpsVoice in Action
Switch to your app screen recording here
OpsVoice
System Architecture
Voice-First SRE Copilot powered by Gemini Live + Google ADK
🖥
Browser
SRE operator interface with real-time voice, text, and camera inputs
VOICE + TEXT + CAMERA
SEND
RECEIVE
➡️
WebSocket Connection
Full-duplex bidirectional streaming between browser and server
BIDIRECTIONAL
🎙️
Audio Processing
Real-time PCM capture at 16kHz mono, base64 encoded for streaming
PCM 16KHZ
📷
Camera Frame Capture
Periodic frame capture from device camera, JPEG compressed
JPEG FRAMES
⬇️ WEBSOCKET / WSS ⬆️
</>
FastAPI Server
Async Python server handling WebSocket upgrades and HTTP endpoints
PYTHON / ASYNC
⚙️
Google ADK Runner
Agent Development Kit orchestrating tool calls and Gemini sessions
ADK FRAMEWORK
👤
Session Management
Per-user session state, conversation history, and context tracking
STATEFUL
⚠️
Proactive Alert Monitor
Background task polling service health, broadcasts alerts via WebSocket
ASYNC BROADCAST
⬇️ GEMINI LIVE API ⬆️
Gemini 2.5 Flash
Multimodal AI with native audio understanding and generation capabilities
NATIVE AUDIO
Live Bidirectional Streaming
Real-time audio-in, audio-out with interruptible conversation flow
REAL-TIME
⬇️ TOOL CALLS
⬇️ INFRASTRUCTURE
⚔️
check_service_health
Poll status of monitored services
📝
create_incident
Open new incident records in Firestore
📋
get_open_incidents
List active incident records
update_incident_status
Resolve or escalate incidents
📖
get_runbook
Retrieve remediation playbooks
🔍
google_search
Web search for troubleshooting context
☁️
Cloud Run
Serverless container hosting
💾
Firestore
Incidents database (NoSQL)
🔐
Secret Manager
Secure API key storage
📦
Artifact Registry
Container image repository
</>
Cloud Build
CI/CD pipeline
📊
Cloud Logging
Centralized log aggregation
Data Flow Paths
1
Voice/Text Request
BrowserWebSocketFastAPIADK RunnerGemini Live API
2
Tool Execution
Gemini Live APITool CallsService Health / Firestore
3
AI Response
Gemini Live APIAudio/TextWebSocketBrowser
4
Proactive Alerts
Alert MonitorWS BroadcastAll Browsers
User
Frontend
Backend
AI
Tools
Cloud
Technical Highlights
Gemini Native Audio
gemini-2.5-flash-native-audio-preview with Live API
🤖
Google ADK
Runner.run_live() + LiveRequestQueue
🎤
Voice + Camera + Text
PCM 16kHz audio + JPEG frames, all via WebSocket
🚨
6 Agent Tools
Incident CRUD, health checks, runbooks, grounded search
Proactive Alerts
Background health polling with async WebSocket broadcast
☁️
Cloud Native
Cloud Run + Firestore + Secret Manager + Terraform
"AI interaction does not need a text box.
By combining real-time voice with computer vision,
we created an experience that feels like having a
senior engineer right beside you."
OpsVoice
Thank you for watching
1 / 6