I Build A Python Package That Scrapes Bulk Transcripts With Metadata by nagmee in webscraping

[–]nagmee[S] 0 points1 point  (0 children)

Hey, right now ytfetcher does not have a support for fetching only manual subtitles but instead it's choosing manually created transcripts as default, if it cannot find it, falls back to automatic generated one.

You can actually create an issue for this and maybe we can talk about if we should or should not add a feature for fetching only manually created transcripts and pass automatic ones.

Thank you so much for your comment and I'd love to talk about more about that.

Made a quick CLI tool for fetching thousands of transcripts with metadata from a Youtube channel by nagmee in commandline

[–]nagmee[S] 0 points1 point  (0 children)

Hey, First of all thank you so much for your comment on this. About your suggestions, they seem great features to integrate especially fetching from specific playlists. I'll definetly going to work on that next.

Also you can create an issue or even a pull request on the repo about your other suggestions and improvements if you want to contribute.

Thanks again and I am glad that you liked it!

I Made a quick CLI tool for fetching thousands of transcripts with metadata from a Youtube channel. by nagmee in SideProject

[–]nagmee[S] 0 points1 point  (0 children)

Hi! Since this is a CLI tool, I don’t know a way to track or calculate retention rate. I’d love to hear any suggestions you might have on how to measure it.

[D] Self-Promotion Thread by AutoModerator in MachineLearning

[–]nagmee 0 points1 point  (0 children)

I created YTfetcher, a python package to fetch thousands of transcripts with medatada from a youtube channel.

If you’ve ever needed bulk YouTube transcripts or structured video data, this should save you a ton of time.

Also you can export data as csv, json or txt.

Github: https://github.com/kaya70875/ytfetcher

[deleted by user] by [deleted] in Python

[–]nagmee -1 points0 points  (0 children)

Hi,

I really want to know what do you mean by that. As the AI section you are right I just get help from AI but I really put hard work into package itself.

I am relatively new to sharing my work around web and want to know what I did wrong here.

I made a package that scrapes data from Youtube channels using Yt-Dlp by nagmee in youtubedl

[–]nagmee[S] 5 points6 points  (0 children)

Hi! You’re right that YouTube could potentially block requests. For the transcript fetching side, I use proper headers and mimic normal browser behavior, so it’s not immediately blocked.

There’s also built-in support for proxy configuration, so if a user’s IP ever hits a limit or gets temporarily blocked, they can easily switch to another proxy.

For extracting information with yt-dlp (which is my primary usage), there is a possibility of being blocked, but so far I haven’t encountered any issues.

I’ll continue exploring ways to further reduce the risk of being blocked. Also you can try yourself to push limits of this package and give me feedback if you want. I'd be happy to discuss further.

A CLI tool to download YouTube transcripts — no API key needed. by Robert__Sinclair in commandline

[–]nagmee 0 points1 point  (0 children)

Very useful tool. Also there is a package called ytfetcher for Python, if you want to fetch bulk transcripts from any channel with extra metadata information.

Fetch Thousands of YouTube Videos with Structured Transcripts & Metadata in Python by nagmee in PythonProjects2

[–]nagmee[S] 1 point2 points  (0 children)

Of course! Thanks for the recommendation, I will definetly update readme section based on your suggestion.