yt-dlp - download audio tracks of youtube videos
ffmpeg - decompress audio
whisper.cpp - transcribe audio to text
chunk.py - break text into parts and prepare each part for LLM summarization
can-ai-code - leverage interview_cuda or `interview-llamacpp`` executor to run LLM inference
compare.py - prepare LLM outputs for webapp
compare-app.py - summary viewer webapp

This project is under active development and is not ready for production use.

DEMO @ HF Space

Video Transcript Datasets

Filename	Title	Whisper Model	URL
ufo.txt	Subcommittee on National Security, the Border, and Foreign Affairs Hearing	small.en	https://www.youtube.com/watch?v=KQ7Dw-739VY
aoe-grand-finale.txt	GRAND FINAL $10,000 AoE2 Event (The Resurgence)	medium.en	https://www.youtube.com/watch?v=jnoxjLJind4

Download the audio track:

pip install yt-dlp
yt-dlp -f "bestaudio[ext=m4a]" --extract-audio  'https://www.youtube.com/watch?v=<video>'

Convert the audio track to wav:

ffmpeg -i *.m4a -hide_banner -vn -loglevel error -ar 16000 -ac 1 -c:a pcm_s16le -y resampled.wav

Transcribe the wav to txt:

main -m ../models/ggml-medium.en.bin -f resampled.wav -t 32 -otxt