5 Best Open-Source Speech-to-Text Tools in 2026

We all know AI coding agents like Claude Code, Codex, Cursor, or GitHub Copilot, work better when given more context and clear instructions. When we're working with coding agents, we're giving instructions to them multiple times a day, every day. Over time, I believe, we end up giving much better instructions and detailed context if we use speech-to-text tools compared to manually typing all those instructions all the time.

This is why, I encourage everyone to use speech-to-text tools to give detailed context to coding agents. As a developer, I love my keyboard and I can understand if you’re skeptical. I was too. But using speech-to-text is one of the high-leverage things you can do as a developer.

There are plenty of proprietary options (WisprFlow, SuperWhisper), but if you prefer open-source tools where you can inspect the code, avoid subscriptions, and keep your audio private, you have real choices now. Here are the 5 best open-source voice typing tools that can paste transcribed text directly into your active window including terminals running your coding agents.

I regularly write about agentic coding tools and updates. If you want deep dives like these in your inbox, subscribe below. Thanks!

1. Handy

GitHub | 18k stars | MIT License

Platforms: macOS, Windows, Linux
Engine: Parakeet V3/V2, Whisper (small/medium/large/turbo), Moonshine V2, GigaAM v3, SenseVoice, Breeze ASR, custom models
Mode: Hybrid (local transcription, optional cloud/local LLM post-processing)

Handy is the most popular open-source speech-to-text tool right now. Built with Tauri (Rust backend, React/TypeScript frontend), it's fast, works completely offline, and runs on all three desktop platforms.

The project was born from the creator (CJ Pais) breaking their finger and needing something more extensible than MacWhisper or SuperWhisper. The philosophy is therefore "the most forkable speech-to-text app" rather than the most polished one.

For usage, you press a configurable keyboard shortcut (default: Option+Space on macOS, Ctrl+Space on Windows/Linux), speak, and the transcribed text gets typed into whatever window is active. It supports both push-to-talk (hold to record, release to transcribe) and toggle mode (press once to start, again to stop). An overlay shows recording and processing states, with configurable position (top, bottom, or hidden).

Has the widest selection of transcription models. Parakeet V3 (~478 MB) is the recommended default which is fast, accurate, auto-detects 25 European languages, runs on CPU. Whisper models (487 MB to 1.6 GB) cover 99+ languages with GPU acceleration via Metal (macOS) or Vulkan (Windows/Linux). Specialized models are also available: Moonshine V2 (31-192 MB) for fast English on low-powered hardware, GigaAM v3 for Russian, SenseVoice for Chinese/Japanese/Korean/Cantonese, and Breeze ASR for Taiwanese Mandarin.

The auto-submit feature in Handy can press Enter after pasting the transcription to the active window. Pretty useful for terminal and chat workflows with coding agents.

Also has post-processing feature to run transcription through an LLM before pasting for grammar fixes, translation, or reformatting. Other features include a custom words dictionary for misheard terms, model memory unloading when idle, and transcription history with audio playback.

Handy has hit the Hacker News front page twice, 247 points in January 2026 and 237 points in September 2025. Users describe it as stunningly fast and accurate with Parakeet V3.

2. Epicenter Whispering

GitHub | 4.3k stars | AGPL-3.0 License

Platforms: macOS, Windows, Linux, Web, Chrome extension
Engine: Whisper C++, Speaches (local); Groq, OpenAI, ElevenLabs (cloud)
Mode: Hybrid (local or cloud)

Whispering has the highest Hacker News score of any tool on this list, 591 points and 152 comments on its Show HN. It's also YC-backed (the creator got into Y Combinator right out of college).

The workflow is as you'd expect: press a shortcut, speak, get text at your cursor. IMO its voice-activated mode really sets Whispering apart. You just press once to start a session, and it automatically detects when you start and stop speaking. This is particularly nice for hands-free coding agent workflows and for dictating notes and docs.

It also supports chaining custom AI transformations on your transcription like chaining grammar fixes, translation, reformatting. Built with Tauri and Svelte 5, it runs as a desktop app but also offers a web app and a Chrome extension.

Whispering is part of the broader Epicenter project, which aims to build a suite of open-source, local-first apps sharing a single folder of plain text and SQLite. The transcription engine is flexible: you can use Whisper C++ or Speaches for fully local processing, or bring your own API key for Groq (fastest and cheapest cloud option at ~$0.04/hour), OpenAI, or ElevenLabs.

A Linux user on The Autodidacts wrote this about Whispering:

❝

At long last there exists open source dictation software for Linux that actually works. For decades, I’ve contemplated capitulating and buying a proprietary dictation app, such as Dragon Naturally Speaking. Now I don't have to.

3. VoiceInk

GitHub | 4.3k stars | GPLv3 License

Platforms: macOS (14.4+)
Engine: WhisperKit, whisper.cpp, Parakeet
Mode: Hybrid (local default, optional cloud)

VoiceInk is a native Swift macOS app with the goal of being most efficient and privacy-focused voice-to-text solution for macOS. It runs from the menu bar, uses a global keyboard shortcut, and types directly into whatever app is focused. This is the tool I use currently for voice typing.

Whisper and Parakeet model are supported for local transcription and cloud models can also be used with your own API key. For AI enhancement of the transcribed text, providers such as Groq, OpenAI, DeepSeek, OpenRouter, and even Ollama are supported.

The standout features IMO are power mode where it can apply different formatting per-app and context awareness where it takes a screenshot and uses OCR to detect surrounding text, then adapts formatting accordingly. It also has a personal dictionary for custom terminology and multiple "Smart Modes" optimized for different writing contexts.

VoiceInk is open-source under GPLv3, but also sells a commercial version. A lifetime license costs $39 (compared to SuperWhisper's $250), which gets you automatic updates and priority support. You can also build from source for free following their instructions, or install via brew install --cask voiceink.

4. OpenWhispr

GitHub | 1.9k stars | MIT License

Platforms: macOS (12+), Windows (10+), Linux
Engine: Whisper (via sherpa-onnx), NVIDIA Parakeet, OpenWhispr Cloud, OpenAI Realtime API
Mode: Hybrid

OpenWhispr started as a dictation app but has grown into something closer to a voice-first productivity suite with an AI agent mode, meeting transcription, a notes system, and Google Calendar integration.

The core voice typing flow is similar to others: press the configurable hotkey (default: backtick), speak, and text gets pasted at your cursor. Parakeet and Whisper models are supported for local transcription. OpenWhispr also has its own cloud service and also supports BYOK with OpenAI's Realtime API via WebSocket streaming.

Also supports custom dictionary feature for names and technical terms to improve transcription accuracy and auto-learns from your corrections and updates the dictionary.

5. FluidVoice

GitHub | 1.5k stars | GPLv3 License

Platforms: macOS (14.0+ Sonoma, Apple Silicon or Intel)
Engine: Parakeet TDT v3 (default, 25 languages), Parakeet TDT v2 (English), Apple Speech, Whisper
Mode: Hybrid (local default, optional AI enhancement via OpenAI/Groq/custom)

FluidVoice is a native Swift macOS app with a focus on real-time transcription. Its signature feature is the live preview overlay where you see your words appearing in real-time as you speak, before they get inserted to the active window.

Parakeet TDT v3 is the default transcription model. Also supports Parakeet v2, Apple Speech, and Whisper models. AI enhancement is optional. You can connect OpenAI, Groq, or a custom provider for post-processing. Beyond basic dictation, FluidVoice also has a Command Mode for executing actions via voice and a Write Mode for text composition and editing.

To give coding agents better instructions, all you need is a tool that can transcribe your speech and paste it wherever your cursor is. All tools mentioned above do that. To pick one, the differences come down to platform, engine preference, and how much you want beyond basic dictation.

For most people, Handy is the best default if you want the biggest open-source project with broad platform support. If you're on macOS and want a polished native experience, VoiceInk or FluidVoice are strong choices. If you want maximum flexibility with cloud providers and AI transformations, Whispering or OpenWhispr give you the most options.

If you're still skeptical about dictating prompts instead of typing, I recommend just pick one of these tools, give it a week, and see how it changes your workflow with coding agents, and then re-evaluate.

I regularly write about agentic coding tools and updates. If you want deep dives like these in your inbox, subscribe below. Thanks!

5 Best Open-Source Speech-to-Text Tools in 2026

1. Handy

2. Epicenter Whispering

3. VoiceInk

4. OpenWhispr

5. FluidVoice

Reply

Keep Reading

ACW Monthly Brief: May 2026

Claude Code CLI Cheat Sheet

Talking to My Terminal with Local Speech-to-text and Pi Coding Agent

Agentic Coding Weekly