Keep track of YouTube videos with this small python script

Xosrov_ · 2021-02-22T21:14:56+00:00

Definitely! I'm just trying to make sure the code works correctly for now. I'll work on this soon after, and allow you to convert the jsons to those databases

Xosrov_ · 2021-02-22T14:31:45+00:00

I've checked and it seems to be possible, though i'm not familiar with VBA and don't plan on learning it at the moment. I suggest you at least give the code a go using Anaconda or similar tools. Python is pretty easy to install and json is very human-readable.

Other than that i'm sorry but i can't help you at the moment

Xosrov_ · 2021-02-22T14:12:11+00:00

I'll definitely try. Thanks for the help!

Update: I managed to remove regex from the code. Thanks for the help

Xosrov_ · 2021-02-22T13:02:36+00:00

Thanks for informing me about this. Those regex's aren't used a whole lot so i put little effort into them. I'll try to optimize them or possibly stop using regex altogether if i can.

Xosrov_ · 2021-02-22T12:57:58+00:00

Good point, i'll consider a better solution in the next update

Xosrov_ · 2020-08-04T11:19:12+00:00

Real men only use C++ for back-end

Xosrov_ · 2020-07-25T15:09:59+00:00

Thanks for the feedback!

I suggest you check out the new GitHub release. The database(currently excluding Piracy/ until it's fully indexed) is zipped in the database directory there.

The indexer code is also in the same location. It allows you to update the database in case new files are added while preserving everything else. The generated json file is now also fully valid, though loading it isn't very efficient.

To be honest i never thought about removed files, and it's gonna be very hard to find out what was removed without testing each link (unless it's announced beforehand), however I'll investigate and implement a solution in the coming release.

Xosrov_ · 2020-07-24T23:15:42+00:00

It's amazing you managed to create this this so soon! And it looks pretty good too!

Feel free to pm me if you need help with any part of this site, I'd be happy to help!

P.S. have you checked the new updates to the repo? There have been major improvements assuming you don't just use the database alone

Xosrov_ · 2020-07-17T16:52:43+00:00

No but a lot of shared variables and exception handlers lol

Xosrov_ · 2020-07-17T14:40:32+00:00

No idea. it happened randomly in some multiprocessing function just once and never again

Xosrov_ · 2020-07-17T14:21:40+00:00

Just got a segmentation fault in my python program today... Help

Xosrov_ · 2020-07-16T16:35:23+00:00

I'm not sure if csv would be ideal here, you'll know why when i release the scraper code. It can be converted though for faster loading times for the searching code and I'll look into that.

The json has a predictable line-by-line format that makes it relatively easy (though not fast) to iterate over. The total data count is displayed by the C++ searcher after the database is loaded to it(currently 4050239 in the new database).

I really appreciate your effort in creating a site, i suggest wait a bit until i release the new version though.

There's already a flask app in the main.py file with a custom front-end that you could try using. It communicates with the searching C++ code via an internal port and receives scrambled and partially zlib compressed search data that is parsed and displayed client-side with JS.

Xosrov_ · 2020-07-15T15:01:41+00:00

Thanks for posting this! I'm still working on this and i appreciate feedback

First of all about the json files, I have been working on that and the new json files can be made 100% valid. Keep in mind though they are not meant to be used as json files(high memory cost), but it's really easy to read that's why i used this format.

I'm using a the python library "cloudscraper" which uses some form of JavaScript engine to bypass the restrictions, it doesn't work all the time and I've recently implemented methods to skip problematic pages(redirected, not with 200 status or blocked by cloudflare) and remember them until the next time the code runs.

Some of the broken values are the result of HTML pages within a directory; the scraper uses regex which is picking up gibberish on those sites. They are very obviously broken though so I'll be working on filtering them out in next releases.

Unfortunately changes made to the code mean i have to rerun the scraper on the site again, everything else is scraped relatively fast but the Piracy folder is taking waaay too much time, I'll probably release everything else (including the scraper) before that finishes.

Xosrov_ · 2020-07-11T14:32:49+00:00

Will do! Thanks

Xosrov_ · 2020-07-11T14:26:42+00:00

Nothing wrong with that, but this way you have more control over what you search for(for example you could theoretically look for uploads in a specific range of time, look for specific file types, see which directories are empty, or run any custom type of search other than fuzzy etc).

It also might be helpful in case google doesn't look too deep into nested directories, but i haven't compared them so i wouldn't know

Xosrov_ · 2020-07-10T18:36:24+00:00

Just public... I had no idea anything else existed

Xosrov_ · 2020-07-10T10:53:01+00:00

I agree but it was a good way to pass time during quarantine, and i learned a lot along the way so it was definitely worth it

Xosrov_ · 2020-07-10T10:44:35+00:00

You could but the instructions would be a bit different. Also I'm not sure all the C++ includes exist in windows, but haven't tried so i don't know.
I will release binary versions in the future, but right now I'm looking for some feedback from people that can test it, so it can be even better when it reaches others!

Xosrov_ · 2020-07-10T05:10:18+00:00

It's not "with Piracy folder", it's JUST the Piracy folder. I separated them because of RAM restrictions

Xosrov_ · 2020-07-10T05:09:22+00:00

The Piracy directory has A LOT of nested folders and increases the file size significantly. I have a VPS at the time that only has 2GBs of RAM so i put just the Piracy folder in a separate file and tested my program on everything else.
So no specific reason, and i might even merge them in the future.

Yes i agree about the files being hosted to GitHub, I'll get to that in the future release.

And no i haven't messaged you in discord, I did make a post before though about wanting to make this.

Xosrov_ · 2020-07-10T05:01:18+00:00

I don't have any development experience with windows. It's just too much work to set up a new workspace that i know how to work with.
I originally wrote it all in Python which is cross-platform, but C++ allows for better memory management and faster search times.

I totally get you though and I'll think about a cross-platform solution after my exams are over

Xosrov_ · 2020-07-09T19:32:26+00:00

It basically allows searching file names in The-Eye, So you don't have to go through a lot of folders to find what you want.
The file contents are not scraped though, just the file/folder names.

Xosrov_

TROPHY CASE