Chanakya AI Voice Agent (30‑Day Build)

Natural, voice‑first conversational AI inspired by Acharya Chanakya: Speak → Transcribe (AssemblyAI) → Reason (Gemini, Chanakya persona) → Respond with realistic speech (Murf)

✨ Core Features

One‑tap voice chat (microphone → AI answer with auto‑played voice)
Multi‑stage pipeline: STT → LLM → TTS
Persistent in‑memory session history (per browser session id)
Real‑time web search via Tavily (Gemini Function Calling)
WebSocket live transcripts + streamed TTS playback
Public demo safety: features are gated until users provide their own API keys (no shared secrets)
Sidebar Tools:
- Text to Speech generator (choose text → Murf voice output)
- Echo Bot (record → transcribe → re‑speak your words in another voice)
Keyboard shortcut: press "m" to toggle mic on/off

🧠 Architecture Flow

User presses Start Speaking → Browser records audio (MediaRecorder)
Audio uploaded to /agent/chat/{session_id}
AssemblyAI transcribes bytes → text
Chat history compiled into a Gemini prompt
Gemini generates assistant reply
Murf API converts reply text to speech (default voice: en-US-charles)
Frontend auto‑plays the returned audio & renders chat bubbles

User Voice → FastAPI → AssemblyAI → Gemini → Murf → Browser Playback

Also supports real‑time streaming via WebSocket (/ws) with partial transcripts and chunked TTS audio.

🗂️ Project Structure

app/
├── main.py                # FastAPI entrypoint (routes import service layer)
├── services/              # Separated domain/service logic
│   ├── stt_service.py     # AssemblyAI transcription helpers
│   ├── tts_service.py     # Murf.ai TTS client wrapper
│   ├── llm_service.py     # Gemini client + prompt builder + function calling
│   ├── weather_service.py
│   ├── murf_ws_service.py # Murf WebSocket streaming (chunked TTS)
│   ├── web_search_service.py # Tavily search wrapper
│   └── streaming_transcriber.py # AssemblyAI streaming transcription
├── schemas/               # Pydantic request/response models
│   └── tts.py             # TextToSpeechRequest, ChatResponse, etc.
├── templates/
│   └── index.html         # UI shell (chat + sidebar tools)
├── static/
│   ├── css/style.css      # Styles (layout + responsive + theme)
│   ├── JS/script.js       # Frontend logic (record, upload, autoplay)
│   ├── images/            # Logo, screenshot, demo GIF
│   │   ├── logo.png
│   │   ├── ui-screenshot.png
│   │   └── demo.gif
│   └── sounds/            # Mic UI feedback
│       ├── mic_start.mp3
│       └── mic_stop.mp3
├── uploads/               # (Optional) temp upload storage placeholder
requirements.txt           # Dependencies
.env                       # Optional server fallback keys (NOT committed)
.gitignore                 # Ignore rules
README.md                  # This file

🔑 Environment Variables (.env)

Create a .env file in the project root (optional; for local fallback):

ASSEMBLYAI_API_KEY=your_assemblyai_key
GEMINI_API_KEY=your_gemini_key
MURF_API_KEY=your_murf_key
TAVILY_API_KEY=your_tavily_key
OPENWEATHER_API_KEY=your_openweather_key

Notes:

For public deployments, users must enter their own keys via the in‑app Settings modal. Server keys are optional fallback for private/dev.
Do not commit .env. Share .env.example with placeholders instead.

Where to get API keys

AssemblyAI: https://www.assemblyai.com/app/account
Gemini (Google AI Studio): https://aistudio.google.com/app/apikey
Murf AI: https://murf.ai/api (Account settings → API key)
Tavily: https://app.tavily.com/ (Dashboard → API Keys)
OpenWeather: https://home.openweathermap.org/api_keys

Tip: copy .env.example to .env and fill your values. Never commit .env.

🚀 Quick Start

# 1. Create & activate a virtual environment
python -m venv .venv
.venv\Scripts\activate  # Windows

# 2. Install dependencies
pip install -r requirements.txt

# 3. Add your .env file (see above)

# 4. Run the server (simple dev mode)
cd app && python main.py

# 5. Open in browser
http://127.0.0.1:8000/

# (Alt) Use uvicorn directly for auto-reload (optional)
# cd app && uvicorn main:app --reload

📡 Key Endpoints

Method	Endpoint	Purpose
POST	`/agent/chat/{session_id}`	Voice chat: audio → transcription → LLM → TTS
POST	`/tts/echo`	Echo tool (repeat what you said with Murf)
POST	`/generate_audio`	Direct text → speech (Murf)
POST	`/transcribe/file`	Raw transcription (AssemblyAI)
WS	`/ws`	Streaming: partial transcripts + chunked TTS
GET	`/debug/web_search`	Tavily test: `?query=your+question`
GET	`/debug/llm_chat`	LLM (no audio): `?q=hello`
POST	`/debug/llm_chat_text`	LLM (no audio): `{ "text": "hello" }`

🧪 Tech Highlights

FastAPI backend with service + schema layering (clean separation)
AssemblyAI transcription (resilient + fallback path)
Google Gemini (gemini-1.5-flash) via reusable client & retry logic
Gemini Function Calling with a web_search tool backed by Tavily
Murf AI TTS wrapped in a lightweight client (consistent error handling)
Murf WebSocket streaming with safe chunking to speak full answers
MediaRecorder + multipart upload for low-latency voice capture
Autoplay + replay logic with audio unlock and retry
Structured Pydantic responses for clearer API contracts
Per‑session key overrides wired from UI → backend (no keys echoed back)

🔄 Session Handling

Browser session id is appended to the URL (query param). History is stored in an in‑memory dict (CHAT_HISTORY) — suitable for prototyping; swap with Redis or DB for production scaling.

🛡️ Notes / Limits

Public mode gates features until users provide keys (Settings auto‑opens on first use)
Not production-hardened (no auth, rate limiting, or persistence yet)
API keys must remain secret (.env not committed)
In-memory history resets on server restart (swap with Redis/DB later)
Gemini key must be loaded before first request (lazy reconfigure added)

🤝 Contributing

Prototype phase — feel free to open issues with ideas (latency, UI/UX, voice packs, multilingual support). PRs welcome after discussion.

📄 License

This project is licensed under the MIT License. See LICENSE.txt for details.

🙌 Acknowledgements

AssemblyAI for speech-to-text
Google Gemini for language understanding
Murf AI for high-quality synthetic voices
FastAPI for the rapid backend framework

Built as part of a 30‑Day AI Voice Agent Challenge by Murf.ai

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Chanakya AI Voice Agent (30‑Day Build)

✨ Core Features

🧠 Architecture Flow

🗂️ Project Structure

🔑 Environment Variables (.env)

Where to get API keys

🚀 Quick Start

📡 Key Endpoints

🧪 Tech Highlights

🔄 Session Handling

🛡️ Notes / Limits

🤝 Contributing

📄 License

🙌 Acknowledgements

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
app		app
.env.example		.env.example
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
requirements.txt		requirements.txt

License

Swayam42/AI-Voice-Agent

Folders and files

Latest commit

History

Repository files navigation

Chanakya AI Voice Agent (30‑Day Build)

✨ Core Features

🧠 Architecture Flow

🗂️ Project Structure

🔑 Environment Variables (.env)

Where to get API keys

🚀 Quick Start

📡 Key Endpoints

🧪 Tech Highlights

🔄 Session Handling

🛡️ Notes / Limits

🤝 Contributing

📄 License

🙌 Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages