Kokoro TTS and Abogen on Windows

In my last post, I described trying out Kokoro text-to-speech (TTS) model via Kokoro-FastAPI web UI in a macOS (native) container. Here, I install Kokoro-TTS and Abogen on Windows, to take advantage of my Nvidia GPU.

Kokoro-TTS on Windows

I mentioned Kokoro-TTS in passing in my previous post. Kokoro-TTS is...

A CLI text-to-speech tool using the Kokoro model, supporting multiple languages, voices (with blending), and various input formats including EPUB books and PDF documents.

On Windows, I did the basic install:

run uv tool install kokoro-tts or pip install kokoro-tts in a working directory,
and download the ONNX model file and voices files to the same directory.

But, alas, I could not get Kokoro-TTS to use my NVidia GPU because Kokoro-TTS uses ONNX model instead of Pytorch, and I cannot be bothered to figure out the installation...

Abogen on Windows

Enter Abogen, which has a nice cross-platform PyQt desktop GUI and is much faster, utilising my GPU rather than CPU only.

Abogen is a powerful text-to-speech conversion tool that makes it easy to turn ePub, PDF, or text files into high-quality audio with matching subtitles in seconds.

Again following the instructions, I installed Abogen v1.1.16:

first, download an install espeak-ng.msi,:

then, for NVidia (and using uv instead of pip):

mkdir abogen
cd abogen
uv venv
.venv/Scripts/activate.cmd
uv pip install abogen

abogen is easy to use via the desktop user interface, and options include:

blending voice!
converting the ePUB into a single audio file - I prefer .m4b (MPEG-4 audiobook format) with chapter markers, and metadata like title and author,
optionally generating separate audio files for each chapter - I prefer .mp3 which is more widely supported ony my devices (no metadata),

Abogen is about 12x fater on my GPU than with just my CPU!

❮ Older

Newer ❯