llm-recipes

Detailed Workflow:

Workflow - Optimised for Production

  1. User upload PDF/Doc to Main Page
  2. Entry is made in DB for Async Main Actions Run
  3. Entry is made into celery for creating async calls to
    • Script parser - LLM : Large Context
    • Script is split into Sessions and session is split into background music, narrator and speakers
    • Calls are executed Session by session for Human feedback.
    • Async calls for audio gen for backgound music with script-id, session-id, background-music-id
    • Async call on TTS service is made for each session for narraor, different speakers with session-id, speaker-turn-id
    • Based on GPU availability, Async call’s can be run in linear or parallel ( Later optimisation problem)
  4. Script Parser
    • Using structured outputs, splits the scripts into different session based on template. Scene, Background Score, Conversation
    • Background score is split into distinct sounds and approximate duration for audio generation
    • Conversation is analysed and Voice-description for each session is updated based on Inputs. Ex
    • Emma (whispering)
      • Leo… What have you done?
    • Leo (excited)
      • I think… we just discovered something big.
        • Narrator voice is chosen and given a distinct personality based on user choice.


LLM Generated

  1. User Uploads PDF/Doc:
    • User uploads the document via a user-friendly interface.
    • The system validates the document format and integrity.
    • Confirmation message is displayed to the user.
  2. Database Entry:
    • An entry is made in the database for the new document.
    • Log the entry and handle any potential errors.
  3. Async Tasks with Celery:
    • Tasks are created in Celery with clear dependencies.
    • Script parsing and session splitting tasks are initiated.
    • Human feedback tasks are scheduled.
    • Audio generation tasks are queued, considering GPU availability.
    • Retry mechanism is implemented for failed tasks.
  4. Script Parser:
    • The script is parsed using structured outputs.
    • Sessions are split based on the template, with flexibility for different formats.
    • Background scores are accurately split into distinct sounds.
    • Conversations are analyzed for context and emotions.
    • Voice descriptions are generated using advanced NLP techniques.
    • Narrator voice is chosen based on user preference, with options for preview.
  5. Quality Assurance:
    • The generated audiobook is reviewed for quality and consistency.
    • User notifications are sent at each stage of the process.
  6. User Feedback:
    • Collect feedback from users to improve the process.
    • Implement changes based on user feedback.

Workflow Improvements for Creating Audiobook

1. User Upload PDF/Doc to Main Page

2. Entry is made in DB for Async Main Actions Run

3. Entry is made into Celery for creating async calls

Subtasks:

4. Script Parser

Additional Improvements: