llm-recipes

Audiobook Creation Setup Guide

This guide will walk you through setting up your environment and necessary tools to create an audiobook using Python.

Prerequisites

Step-by-Step Setup

1. Install required system library

  1. ffmpeg for audio management
    sudo apt get update
    sudo apt install ffmpeg 
    

2. Create a Virtual Environment and Install Libraries

  1. Create a Virtual Environment:
    python -m venv venv
    
  2. Activate the Virtual Environment:
    • On macOS/Linux:
      source venv/bin/activate
      
    • On Windows:
      .\venv\Scripts\activate
      
  3. Install Required Libraries:
    pip install -r requirements.txt
    
  4. For Pytorch Model Dev Users:
    pip install -r pytorch-requirements.txt
    

3. Setup Ollama for Parsing pdf

  1. Start Docker Compose:
    docker compose -f docker/llm-compose.yml up -d 
    
  2. Pull Necessary LLM Models:
    ollama pull deepseek-r1:7b
    ollama pull qwen2.5
    

4. Setup Parlet-tts for Speech Generation

  1. Pull Necessary TTS and Audio Models: To get started, download the following models using the Hugging Face CLI:

    huggingface-cli download parler-tts/parler-tts-mini-v1.1
    huggingface-cli download parler-tts/parler-tts-mini-multilingual-v1.1
    huggingface-cli download facebook/audiogen-medium
    huggingface-cli download facebook/audio-magnet-medium
    
  2. Start TTS Server for Speech Creation
    • for RTX 40 series - Fast inference with torch.compile
      docker compose -f docker/tts-server-fast.yml up -d
      
    • for GTX series
      docker compose -f docker/tts-server.yml up -d
      
  3. Start Audiocraft Server for Sound/Music Creation
    docker compose -f docker/audiocraft-server.yml up -d 
    

5. Additional Tips

Troubleshooting