mirror of
https://github.com/jlengrand/tldw.git
synced 2026-03-10 08:51:17 +00:00
b4a32d6014fa24591c9a9c2688d975a6fd357b7e
Added sanity checks for file existence, file naming(remove illegal windows filenames and normalize to ascii), cuda existence,
TL/DW: Too Long, Didnt Watch
YouTube contains an incredible amount of knowledge, much of which is locked inside multi-hour videos. Let's extract and summarize with AI!
Pieces
diarize.py- download, transcribe and diarize audio- First uses yt-dlp to download audio(optionally video) from supplied URL
- Next, it uses ffmpeg to convert the resulting
.m4afile to.wav - Then it uses faster_whisper to transcribe the
.wavfile to.txt - After that, it uses pyannote to perform 'diarorization'
- Finally, it'll send the resulting txt to an LLM endpoint of your choice for summarization of the text.
- Goal is to support OpenAI/Claude/Cohere/Groq/local OpenAI endpoint (oobabooga/llama.cpp/exllama2) so you can either do a batch query to X endpoint, or just feed them one at a time. Your choice.
chunker.py- break text into parts and prepare each part for LLM summarizationroller-*.py- rolling summarization- can-ai-code - interview executors to run LLM inference
compare.py- prepare LLM outputs for webappcompare-app.py- summary viewer webapp
Setup
- Linux
- Download necessary packages (Python3, ffmpeg[sudo apt install ffmpeg / dnf install ffmpeg], ?)
- Create a virtual env:
python -m venv ./ - Launch/activate your virtual env:
. .\scripts\activate.sh - See
Linux && Windows
- Windows
- Download necessary packages (Python3, ffmpeg, ?)
- Create a virtual env:
python -m venv .\ - Launch/activate your virtual env:
. .\scripts\activate.ps1 - See
Linux && Windows
- Linux && Windows
pip install -r requirements.txt- may take a bit of time...- Run
python ./diarize.py <video_url>- The video URL does not have to be a youtube URL. It can be any site that ytdl supports. - You'll then be asked if you'd like to run the transcription through GPU(1) or CPU(2).
- Next, the video will be downloaded to the local directory by ytdl.
- Then the video will be transcribed by faster_whisper. (You can see this in the console output) * The resulting transcription output will be stored as both a json file with timestamps, as well as a txt file with no timestamps.
- Finally, you can have the transcription summarized through feeding it into an LLM of your choice.
- For running it locally, here's the commands to do so: * FIXME
- For feeding the transcriptions to the API of your choice, simply use the corresponding script for your API provider. * FIXME: add scripts for OpenAI api (generic) and others
Credits
Description
Languages
Python
99.5%
Dockerfile
0.5%