WalkThroughTheDoorAndDoTheDinosaur 048b346370 I love python packages
2024-04-30 20:50:07 -07:00
2024-04-30 09:51:06 -07:00
2024-04-30 20:50:07 -07:00
2023-07-30 15:43:54 -04:00
2023-08-15 02:16:54 +00:00
2023-07-30 13:05:26 -04:00
2023-08-02 19:07:53 -04:00
2024-04-30 20:50:07 -07:00
2023-08-15 01:38:01 +00:00
wip
2023-08-04 10:46:46 -04:00
2023-08-14 20:47:55 -04:00
2024-04-30 20:50:07 -07:00
wip
2023-08-04 10:46:46 -04:00
wip
2023-08-04 10:46:46 -04:00
2024-04-30 20:50:07 -07:00
2024-04-30 20:50:07 -07:00
2023-12-28 15:32:24 -05:00
2023-08-05 09:26:23 -04:00
2023-08-05 19:29:33 -04:00

TL/DW: Too Long, Didnt Watch

YouTube contains an incredible amount of knowledge, much of which is locked inside multi-hour videos. Let's extract and summarize with AI!

Pieces

  • diarize.py - download, transcribe and diarize audio
    1. First uses yt-dlp to download audio(optionally video) from supplied URL
    2. Next, it uses ffmpeg to convert the resulting .m4a file to .wav
    3. Then it uses faster_whisper to transcribe the .wav file to .txt
    4. After that, it uses pyannote to perform 'diarorization'
    5. Finally, it'll send the resulting txt to an LLM endpoint of your choice for summarization of the text.
    • Goal is to support OpenAI/Claude/Cohere/Groq/local OpenAI endpoint (oobabooga/llama.cpp/exllama2) so you can either do a batch query to X endpoint, or just feed them one at a time. Your choice.
  • chunker.py - break text into parts and prepare each part for LLM summarization
  • roller-*.py - rolling summarization
    • can-ai-code - interview executors to run LLM inference
  • compare.py - prepare LLM outputs for webapp
  • compare-app.py - summary viewer webapp

Setup

  • Linux
    1. X
    2. Create a virtual env: python -m venv
    3. Launch/activate your virtual env: . .\scripts\activate.sh
    4. pip install -r requirements.txt
  • Windows
    1. X
    2. Create a virtual env: python -m venv
    3. Launch/activate your virtual env: . .\scripts\activate.ps1
    4. pip install -r requirements.txt

Credits

Description
No description provided
Readme 32 MiB
Languages
Python 99.5%
Dockerfile 0.5%