vocal.nvim

Overview

vocal.nvim is a lightweight Neovim plugin that enables speech-to-text transcription directly within the editor. It allows users to record audio through their microphone and automatically transcribe it into text using either OpenAI's Whisper API or local Whisper models. The transcribed text is then inserted at the cursor position or replaces selected text in the buffer.

The plugin is designed to enhance text composition workflows by enabling hands-free dictation capabilities within Neovim, making it ideal for drafting documents, taking notes, or composing text without typing.

Capabilities

Core Recording & Transcription

Audio Recording: Start/stop audio recording using the :Vocal command or configured keymaps (default: <leader>v)
Dual Transcription Modes:
- Local transcription using OpenAI Whisper models (primary method, default)
- API-based transcription using OpenAI Whisper API (secondary, requires API key)
Automatic Model Management: For local models, the plugin automatically downloads, caches, and reuses Whisper models of various sizes (tiny, base, small, medium, large)

Buffer Manipulation

Cursor Insertion: Transcribed text inserts at the current cursor position
Selection Replacement: In visual mode, transcribed text replaces the selected text
Asynchronous Operations: Non-blocking transcription keeps Neovim responsive

User Interface & Feedback

Recording Status Indicators: Visual feedback showing when recording is active
Pending Transcription Status: UI updates indicating transcription in progress
Configurable Status Display: Status window behavior with customizable update intervals and display duration
Duration Calculation: Records and displays how long audio recordings were captured

Configuration & Flexibility

API Key Management:
- String value for direct API key
- Command-based retrieval (e.g., from password managers)
- Environment variable support (OPENAI_API_KEY)
Recording Storage: Configurable directory for saving audio recordings
Automatic Cleanup: Optional automatic deletion of recordings after transcription
Local Model Configuration: Control model size, custom download paths
API Configuration: Customize model selection, language, response format, temperature, and timeout
Custom Keybindings: Configurable or disable default keymaps
Debug Mode: Comprehensive debug logging to support troubleshooting

Language & Audio Support

Language Detection: Automatic language detection or manual language specification (ISO 639-1 format)
Audio Format: Records in WAV format at 44.1kHz mono using sox
Response Formats: JSON output format for API responses

Configuration

require("vocal").setup({
  local_model = {
    model = "base",
    path = "~/whisper",
  },
})

Local model transcription is the default and works without an API key. Swap to the API by setting api_key and omitting local_model.

Use Cases

Primary Users

Content Writers & Bloggers: Compose articles and posts via dictation
Neovim Power Users: Those who prefer staying within their editor for all tasks
Documentation Authors: Generate documentation through spoken notes
Accessibility: Users who prefer or need voice-based text input
Note-Taking: Quick capture of ideas and thoughts while coding

Ideal Scenarios

Drafting long-form content without breaking flow
Recording voice notes directly into project files
Composing documentation and comments in code
Quick text capture during meetings or brainstorming sessions
Reducing repetitive strain from excessive typing

What I learned

The gap between "record audio" and "get clean text in the buffer" is bigger than it looks. Sox handles recording fine, but getting asynchronous Whisper inference — especially the local Python subprocess — to report back to Neovim without blocking required careful use of vim.loop and job control. The visual-mode replace path also needed separate handling since nvim_buf_set_text behaves differently with active selections.

Stack

Neovim: Version 0.11.0 or higher
Language: Lua with comprehensive error handling and logging
Audio: sox (Sound eXchange) - WAV format at 44.1kHz mono
Transcription: OpenAI Whisper API or local openai-whisper Python package
Dependencies: Plenary.nvim for async operations and job management
Code Quality: StyLua formatter, Luarc.json for type checking
Platform Support: Fully tested on Linux (Arch), not fully tested on macOS, known issues on Windows

ui-lab.app

pomotea.com

vocal.nvim

xeno.nvim

agent.nvim

factory-gen.com

vocal.nvim

Overview

Capabilities

Core Recording & Transcription

Buffer Manipulation

User Interface & Feedback

Configuration & Flexibility

Language & Audio Support

Configuration

Use Cases

Primary Users

Ideal Scenarios

What I learned

Stack