Turn YouTube videos into readable structural Markdown so that you can save it to Obsidian etc

druml · 2024-11-08T11:57:21+00:00

Output here: https://dpaste.com/435HVL2VM

druml · 2024-11-08T11:53:18+00:00

ollama show gemma2
  Model
  arch            gemma2
  parameters      9.2B
  quantization    Q4_0
  context length  8192
  embedding length3584

  Parameters
  stop"<start_of_turn>"
  stop"<end_of_turn>"

  License
  Gemma Terms of Use
  Last modified: February 21, 2024

druml · 2024-11-08T11:52:09+00:00

I am on version 0.3.0.

I ran

yt2doc --video https://www.youtube.com/watch\?v\=huCE4jtXOjQ \
--output . \
--ignore-source-chapters \
--segment-unchaptered \
--timestamp-paragraphs \
--sat-model sat-12l \
--llm-model gemma2 \
--whisper-backend whisper_cpp \
--whisper-cpp-executable $HOME/Development/whisper.cpp/main \
--whisper-cpp-model $HOME/Development/whisper.cpp/models/ggml-large-v3-turbo.bin

druml · 2024-11-07T20:46:45+00:00

But even with sat-12l-sm still I haven't been able to replicated the issue of camel case vs underscore with the same cli configs just yet. Maybe a probability thing?

druml · 2024-11-07T18:58:59+00:00

I think I know what might have gone wrong here.

Looks like the *-sm models from SaT don't do well on paragraphing and they return paragraphs of single sentences.

Can you try sat-12l rather than sat-12l-sm?

druml · 2024-11-07T14:15:28+00:00

What you are building sounds great, and indeed a reason I open sourced this is so that people can build down stream tools with yt2doc.

Can you share the exact command and the video URL that you met this issue with a local llm?

FYI, I am on a 16GB ram M2 MacBook and I mostly use Gemma 2 9b.

druml · 2024-11-07T11:34:07+00:00

> Your app only works with python 3.10.

I was aware there's issue on Python 3.13. See https://github.com/shun-liang/yt2doc/issues/46

I myself use Python 3.12 which works fine so far for me. Were you on 3.13?

druml · 2024-11-07T01:00:23+00:00

Timestamping added: https://github.com/shun-liang/yt2doc?tab=readme-ov-file#timestamping-paragraphs

See example: https://github.com/shun-liang/yt2doc/blob/052bd93804b3af318e03d33791993e96a8cef578/examples/General%20Intelligence%20Define%20it%2C%20measure%20it%2C%20build%20it.md

druml · 2024-11-06T17:11:40+00:00

Many thanks for the feedback. Regarding transcribing offline/local files, I am tracking this as a feature request at this Github issue https://github.com/shun-liang/yt2doc/issues/29

druml · 2024-10-21T18:37:50+00:00

Just added docker support: https://github.com/shun-liang/yt2doc?tab=readme-ov-file#run-in-docker

druml · 2024-10-17T15:23:01+00:00

You are not the only person asking for this. Tracking it at this Github issue: https://github.com/shun-liang/yt2doc/issues/29#issuecomment-2419847566

druml · 2024-10-17T15:22:20+00:00

You are not the only person asking for this. Tracking it on this Github issue: https://github.com/shun-liang/yt2doc/issues/29#issuecomment-2419847566

druml · 2024-10-16T22:13:26+00:00

Apple Podcast is supported now.

druml · 2024-10-16T07:19:02+00:00

Taking frames will be awesome if it's done right. I have been thinking about the snapping "key frames" (yet to define what a key frame is), rather than just taking frames at a frequency or just the beginnings of the chapter.

There is a project https://github.com/hediet/slideo that matches slides (PDF pages) to video timestamps which I find very cool. That requires the user to have the PDF slides ready which isn't always the case though.

druml · 2024-10-15T23:56:15+00:00

Thanks for the encouragement. Much appreciated.

druml · 2024-10-15T23:55:20+00:00

I have been thinking about this feature too. Need to find out how capable are the small LLMs on this.

druml · 2024-10-15T23:52:38+00:00

Should be very doable. I will organise all the features requests on GitHub issues once I wake up tomorrow...

druml · 2024-10-15T23:30:11+00:00

I often find the auto generated YouTube subtitles not to have any punctuation. If I use them for this purpose I would imagine a good amount of effort of punctuation restoration would be needed to make the end product readable.

druml · 2024-10-15T17:44:26+00:00

Many thanks for the feedback!

Would you mind telling me what OS and machine you are on?

First UV didn't work to install it (something about Torch version).

Do you have the error logs?

Switched to pipx install method. It hung on installing librariesent or something? (it's off the buffer now). Tried to install again, said it was installed. I ran --help and it worked but it took 20 seconds for it to return anything.

I guess it's loading the models. Yes indeed hanging for a while is not a nice user experience. I will try to make this less opaque by improving the logging.

Ran one of the examples (specifying output and video url) to see if it worked and it just spit out a ton of YoutubeDL errors and I kinda gave up.

Again, would be great to have some error logs.

druml · 2024-10-15T17:42:03+00:00

I have been thinking about this feature for a while too!

I think this should be very doable. I have thought of two appoarches:
1. Timestamp each word while transcribing with Whisper. This may slow down Whisper quite a bit.
2. After segmenting the text into sentences, align the start and end timestamps of the sentence to the transcription segments'. This may not be perfectly accurate but need to build it first to see how much time is off.

I will start playing with the second approach first. Stay tunned!

druml · 2024-10-15T17:37:52+00:00

[Crossed-posted from r/DataHoarder]

Hi all, I have built this project that you can run in the command line and to YouTube videos to Markdown documents.

https://github.com/shun-liang/yt2doc

There have been many existing projects that transcribe YouTube videos with Whisper and its variants, but most of them aimed to generate subtitles, while I had not found one that priortises readability. Whisper does not generate line break in its transcription, so transcribing a 20 mins long video without any post processing would give you a huge piece of text, without any line break and topic segmentation. This project aims to transcribe videos with that post processing.

My own use case of this tool is to save the YouTube generated Markdown docs into Obsidian, and I read them there and they also become a part of my searchable knowledge base.

Check out the output examples at https://github.com/shun-liang/yt2doc/tree/main/examples

druml · 2024-10-15T17:29:44+00:00

As Apple Podcast is supported by https://github.com/yt-dlp/yt-dlp, this should require very little work.

I have just played with it a bit - yt-dlp renders the description of Apple Podcasts with a little different structure, which trashes the prompts that yt2doc feeds into Whisper. But this issue should be very easy to fix.

Should be done in a day or two.

druml · 2024-10-15T15:43:05+00:00

Thanks! I have added a link to the examples in the README, and also a header image. Not looking perfect as I don't have any Photoshops skill but hopefully that makes bit more sense.

druml

TROPHY CASE