jlengrand/tldw

mirror of https://github.com/jlengrand/tldw.git synced 2026-03-10 08:51:17 +00:00

Go to file

WalkThroughTheDoorAndDoTheDinosaur 048b346370 I love python packages

2024-04-30 20:50:07 -07:00

.github/workflows

Create pylint.yml

2024-04-30 09:51:06 -07:00

I love python packages

2024-04-30 20:50:07 -07:00

more ufo compares

2023-07-30 15:43:54 -04:00

a few more summaries

2023-08-15 02:16:54 +00:00

consitent naming

2023-07-30 13:05:26 -04:00

switch to json instead of text so we can chunk by time

2023-08-02 18:45:21 -04:00

run the 4 testcases

2023-08-02 19:07:53 -04:00

.gitignore

I love python packages

2024-04-30 20:50:07 -07:00

chunker.py

working summary from 13b model

2023-08-15 01:38:01 +00:00

compare-app.py

wip

2023-08-04 10:46:46 -04:00

compare.py

update wip

2023-08-14 20:47:55 -04:00

diarize.py

I love python packages

2024-04-30 20:50:07 -07:00

merger.py

wip

2023-08-04 10:46:46 -04:00

pyannote.py

wip

2023-08-04 10:46:46 -04:00

README.md

I love python packages

2024-04-30 20:50:07 -07:00

requirements.txt

I love python packages

2024-04-30 20:50:07 -07:00

roller-chatgpt-v2.py

roller with chatgpt json

2023-12-28 15:32:24 -05:00

roller-chatgpt.py

split roller from chunker

2023-08-05 09:26:23 -04:00

roller-exllama.py

use assertive tone in speaker extraction prompt

2023-08-15 02:07:43 +00:00

roller-vllm.py

updates

2023-08-05 19:29:33 -04:00

README.md

TL/DW: Too Long, Didnt Watch

YouTube contains an incredible amount of knowledge, much of which is locked inside multi-hour videos. Let's extract and summarize with AI!

Pieces

diarize.py - download, transcribe and diarize audio
1. First uses yt-dlp to download audio(optionally video) from supplied URL
2. Next, it uses ffmpeg to convert the resulting .m4a file to .wav
3. Then it uses faster_whisper to transcribe the .wav file to .txt
4. After that, it uses pyannote to perform 'diarorization'
5. Finally, it'll send the resulting txt to an LLM endpoint of your choice for summarization of the text.
- Goal is to support OpenAI/Claude/Cohere/Groq/local OpenAI endpoint (oobabooga/llama.cpp/exllama2) so you can either do a batch query to X endpoint, or just feed them one at a time. Your choice.
chunker.py - break text into parts and prepare each part for LLM summarization
roller-*.py - rolling summarization
- can-ai-code - interview executors to run LLM inference
compare.py - prepare LLM outputs for webapp
compare-app.py - summary viewer webapp

Setup

Linux
1. X
2. Create a virtual env: python -m venv
3. Launch/activate your virtual env: . .\scripts\activate.sh
4. pip install -r requirements.txt
Windows
1. X
2. Create a virtual env: python -m venv
3. Launch/activate your virtual env: . .\scripts\activate.ps1
4. pip install -r requirements.txt

Credits