OpsVoice Demo Presentation

3:00 AM

$ kubectl get pods

ERROR: connection refused - port 5432

CRITICAL: api-server CrashLoopBackOff

FATAL: OOM Kill - container exceeded 512Mi

Your senior DevOps engineer is not answering...

What if you had one that never sleeps?

🔧

OpsVoice

"Voice-First SRE Copilot powered by Gemini Live + Google ADK"

🎤

VOICE + VISION

Real-Time Streaming

Talk naturally while sharing your camera - OpsVoice sees errors, hears you, and responds instantly with native audio

🚨

INCIDENT MGMT

Create, Track, Resolve

Full incident lifecycle - create incidents, check open issues, update status, and retrieve runbooks hands-free

⚡

PROACTIVE

Alert Monitor

Background service health polling with proactive WebSocket alerts - OpsVoice warns you before things break

Powered by Gemini 2.5 Flash Live API Google ADK Cloud Run

🔴 LIVE DEMO

Watch OpsVoice in Action

Switch to your app screen recording here

OpsVoice

System Architecture

Voice-First SRE Copilot powered by Gemini Live + Google ADK

User Layer

🖥

Browser

SRE operator interface with real-time voice, text, and camera inputs

VOICE + TEXT + CAMERA

SEND

RECEIVE

Frontend (In-Browser)

➡️

WebSocket Connection

Full-duplex bidirectional streaming between browser and server

BIDIRECTIONAL

🎙️

Audio Processing

Real-time PCM capture at 16kHz mono, base64 encoded for streaming

PCM 16KHZ

📷

Camera Frame Capture

Periodic frame capture from device camera, JPEG compressed

JPEG FRAMES

⬇️ WEBSOCKET / WSS ⬆️

Backend (Google Cloud Run)

</>

FastAPI Server

Async Python server handling WebSocket upgrades and HTTP endpoints

PYTHON / ASYNC

⚙️

Google ADK Runner

Agent Development Kit orchestrating tool calls and Gemini sessions

ADK FRAMEWORK

👤

Session Management

Per-user session state, conversation history, and context tracking

STATEFUL

⚠️

Proactive Alert Monitor

Background task polling service health, broadcasts alerts via WebSocket

ASYNC BROADCAST

⬇️ GEMINI LIVE API ⬆️

AI Layer

✨

Gemini 2.5 Flash

Multimodal AI with native audio understanding and generation capabilities

NATIVE AUDIO

⚡

Live Bidirectional Streaming

Real-time audio-in, audio-out with interruptible conversation flow

REAL-TIME

⬇️ TOOL CALLS

⬇️ INFRASTRUCTURE

Agent Tools

⚔️

check_service_health

Poll status of monitored services

📝

create_incident

Open new incident records in Firestore

📋

get_open_incidents

List active incident records

✅

update_incident_status

Resolve or escalate incidents

📖

get_runbook

Retrieve remediation playbooks

🔍

google_search

Web search for troubleshooting context

Google Cloud Services

☁️

Cloud Run

Serverless container hosting

💾

Firestore

Incidents database (NoSQL)

🔐

Secret Manager

Secure API key storage

📦

Artifact Registry

Container image repository

</>

Cloud Build

CI/CD pipeline

📊

Cloud Logging

Centralized log aggregation

Data Flow Paths

1

Voice/Text Request

Browser → WebSocket → FastAPI → ADK Runner → Gemini Live API

2

Tool Execution

Gemini Live API → Tool Calls → Service Health / Firestore

3

AI Response

Gemini Live API → Audio/Text → WebSocket → Browser

4

Proactive Alerts

Alert Monitor → WS Broadcast → All Browsers

User

Frontend

Backend

AI

Tools

Cloud

Technical Highlights

✨

Gemini Native Audio

gemini-2.5-flash-native-audio-preview with Live API

🤖

Google ADK

Runner.run_live() + LiveRequestQueue

🎤

Voice + Camera + Text

PCM 16kHz audio + JPEG frames, all via WebSocket

🚨

6 Agent Tools

Incident CRUD, health checks, runbooks, grounded search

⚡

Proactive Alerts

Background health polling with async WebSocket broadcast

☁️

Cloud Native

Cloud Run + Firestore + Secret Manager + Terraform

"AI interaction does not need a text box.
By combining real-time voice with computer vision,
we created an experience that feels like having a
senior engineer right beside you."

OpsVoice

💻Open Source on GitHub

☁️Live on Google Cloud

📝Blog Post on Dev.to

Thank you for watching