Skip to main content
Voxcode is built around voice. You can describe what you want to build, ask for edits, and have a back-and-forth conversation with the AI — all without touching a keyboard. This page explains how the voice pipeline works and how to get the most out of it.

Starting a recording

Tap the microphone button in the bottom navigation bar to begin recording. The button expands into the Voice Orb — a 3D animated sphere that reacts to your speech in real time.
1

Tap the mic button

Tap the microphone icon on the right side of the bottom bar. On mobile, the orb rises from the bottom of the screen and takes over the lower half of the display.
2

Speak your request

Talk naturally. Voxcode captures your audio and shows a live transcription while you speak. There is no need to use special syntax or trigger words.
3

Pause and let it process

On desktop, Voxcode automatically detects a brief pause in your speech and sends your message. On mobile, it waits for a short silence after you finish speaking, then transcribes and sends your request.
4

Receive the response

The AI streams its reply into the chat. If voice mode is active, ElevenLabs synthesizes the text response into speech and plays it back automatically.

What happens when you speak

The voice pipeline has three stages:
1

Transcription

On desktop, Voxcode transcribes your audio in real time using the browser’s built-in speech recognition, giving you immediate feedback as words appear in the input field. On mobile, audio is recorded and sent to OpenAI’s Whisper for transcription after you stop speaking.
2

AI processing

Your transcribed text is sent to the selected AI model via the OpenRouter API. Voxcode automatically decides whether to use a lite system prompt (for simple requests) or the full system prompt (for complex coding tasks) to keep responses fast and relevant.
3

Code generation and speech

The AI streams its response back. Any code blocks in the response are automatically extracted, saved to the Virtual File System, and displayed as interactive CodeSnippet cards. The text portion of the response is synthesized into speech by ElevenLabs and played back through your device.

Interrupting the AI mid-response

You do not have to wait for the AI to finish speaking. Voxcode supports voice interruption — simply start speaking while the AI is talking.
  • Desktop: As soon as your voice is detected with the mic open, the AI audio pauses immediately and your new input is captured.
  • Mobile: Voxcode monitors the microphone volume continuously. When your voice is detected while the AI is speaking, the audio stops and recording restarts for your next message.
This makes conversations feel natural rather than turn-based.

Example voice commands

The following commands work exactly as spoken:
"Create a button component with hover effects"
"Build a navbar with dark mode toggle"
"Make a landing page for a gym"
"Add a contact form to the site"
"Change the background to blue"
"Make the text bigger"
"Create a React todo app with local storage"
You can also ask follow-up questions in the same session:
"Now add filters to the todo list"
"Change the font to Inter"
"Make it mobile responsive"

Switching to text mode

If you prefer to type, you can switch to text mode at any time without leaving the current session.
Tap the Type… button that appears on the left side of the bottom bar while the orb is active. The orb closes and the text input field reappears. Your conversation history is preserved.

Mobile swipe gestures

On mobile, two swipe gestures give you quick access to your project context:
GestureAction
Swipe rightOpen the History sidebar — browse and resume previous conversations
Swipe leftOpen the File Explorer — view and manage files saved in the Virtual File System
These gestures work from anywhere in the main chat view.
Be as specific as possible in your voice commands. Instead of saying “make a form,” say “create a contact form with name, email, and message fields, and a submit button that shows a success message.” The more context you provide, the more accurate and complete the generated code will be.