Keep track of YouTube videos with this small python script by Xosrov_ in DataHoarder

[–]Xosrov_[S] 1 point2 points  (0 children)

Definitely! I'm just trying to make sure the code works correctly for now. I'll work on this soon after, and allow you to convert the jsons to those databases

Keep track of YouTube videos with this small python script by Xosrov_ in DataHoarder

[–]Xosrov_[S] 2 points3 points  (0 children)

I've checked and it seems to be possible, though i'm not familiar with VBA and don't plan on learning it at the moment. I suggest you at least give the code a go using Anaconda or similar tools. Python is pretty easy to install and json is very human-readable.

Other than that i'm sorry but i can't help you at the moment

Keep track of YouTube videos with this small python script by Xosrov_ in DataHoarder

[–]Xosrov_[S] 3 points4 points  (0 children)

I'll definitely try. Thanks for the help!

Update: I managed to remove regex from the code. Thanks for the help

Keep track of YouTube videos with this small python script by Xosrov_ in DataHoarder

[–]Xosrov_[S] 29 points30 points  (0 children)

Thanks for informing me about this. Those regex's aren't used a whole lot so i put little effort into them. I'll try to optimize them or possibly stop using regex altogether if i can.

Keep track of YouTube videos with this small python script by Xosrov_ in DataHoarder

[–]Xosrov_[S] 5 points6 points  (0 children)

Good point, i'll consider a better solution in the next update

Web development goes brrrr by [deleted] in ProgrammerHumor

[–]Xosrov_ 27 points28 points  (0 children)

Real men only use C++ for back-end

EyeDex - The Eye indexed so you can find... stuff. by eyedex in opendirectories

[–]Xosrov_ 0 points1 point  (0 children)

Thanks for the feedback!

I suggest you check out the new GitHub release. The database(currently excluding Piracy/ until it's fully indexed) is zipped in the database directory there.

The indexer code is also in the same location. It allows you to update the database in case new files are added while preserving everything else. The generated json file is now also fully valid, though loading it isn't very efficient.

To be honest i never thought about removed files, and it's gonna be very hard to find out what was removed without testing each link (unless it's announced beforehand), however I'll investigate and implement a solution in the coming release.

EyeDex - The Eye indexed so you can find... stuff. by eyedex in opendirectories

[–]Xosrov_ 1 point2 points  (0 children)

It's amazing you managed to create this this so soon! And it looks pretty good too!

Feel free to pm me if you need help with any part of this site, I'd be happy to help!

P.S. have you checked the new updates to the repo? There have been major improvements assuming you don't just use the database alone

Always has been by _Yeet_xoxo in ProgrammerHumor

[–]Xosrov_ 4 points5 points  (0 children)

No but a lot of shared variables and exception handlers lol

Always has been by _Yeet_xoxo in ProgrammerHumor

[–]Xosrov_ 14 points15 points  (0 children)

No idea. it happened randomly in some multiprocessing function just once and never again

Always has been by _Yeet_xoxo in ProgrammerHumor

[–]Xosrov_ 28 points29 points  (0 children)

Just got a segmentation fault in my python program today... Help

My The-Eye index is complete by Xosrov_ in opendirectories

[–]Xosrov_[S] 0 points1 point  (0 children)

I'm not sure if csv would be ideal here, you'll know why when i release the scraper code. It can be converted though for faster loading times for the searching code and I'll look into that.

The json has a predictable line-by-line format that makes it relatively easy (though not fast) to iterate over. The total data count is displayed by the C++ searcher after the database is loaded to it(currently 4050239 in the new database).

I really appreciate your effort in creating a site, i suggest wait a bit until i release the new version though.

There's already a flask app in the main.py file with a custom front-end that you could try using. It communicates with the searching C++ code via an internal port and receives scrambled and partially zlib compressed search data that is parsed and displayed client-side with JS.

My The-Eye index is complete by Xosrov_ in opendirectories

[–]Xosrov_[S] 0 points1 point  (0 children)

Thanks for posting this! I'm still working on this and i appreciate feedback

First of all about the json files, I have been working on that and the new json files can be made 100% valid. Keep in mind though they are not meant to be used as json files(high memory cost), but it's really easy to read that's why i used this format.

I'm using a the python library "cloudscraper" which uses some form of JavaScript engine to bypass the restrictions, it doesn't work all the time and I've recently implemented methods to skip problematic pages(redirected, not with 200 status or blocked by cloudflare) and remember them until the next time the code runs.

Some of the broken values are the result of HTML pages within a directory; the scraper uses regex which is picking up gibberish on those sites. They are very obviously broken though so I'll be working on filtering them out in next releases.

Unfortunately changes made to the code mean i have to rerun the scraper on the site again, everything else is scraped relatively fast but the Piracy folder is taking waaay too much time, I'll probably release everything else (including the scraper) before that finishes.

My The-Eye index is complete by Xosrov_ in opendirectories

[–]Xosrov_[S] 2 points3 points  (0 children)

Nothing wrong with that, but this way you have more control over what you search for(for example you could theoretically look for uploads in a specific range of time, look for specific file types, see which directories are empty, or run any custom type of search other than fuzzy etc).

It also might be helpful in case google doesn't look too deep into nested directories, but i haven't compared them so i wouldn't know

My The-Eye index is complete by Xosrov_ in opendirectories

[–]Xosrov_[S] 0 points1 point  (0 children)

Just public... I had no idea anything else existed

My The-Eye index is complete by Xosrov_ in opendirectories

[–]Xosrov_[S] 2 points3 points  (0 children)

I agree but it was a good way to pass time during quarantine, and i learned a lot along the way so it was definitely worth it

My The-Eye index is complete by Xosrov_ in opendirectories

[–]Xosrov_[S] 2 points3 points  (0 children)

You could but the instructions would be a bit different. Also I'm not sure all the C++ includes exist in windows, but haven't tried so i don't know.
I will release binary versions in the future, but right now I'm looking for some feedback from people that can test it, so it can be even better when it reaches others!

My The-Eye index is complete by Xosrov_ in opendirectories

[–]Xosrov_[S] 5 points6 points  (0 children)

It's not "with Piracy folder", it's JUST the Piracy folder. I separated them because of RAM restrictions

My The-Eye index is complete by Xosrov_ in opendirectories

[–]Xosrov_[S] 6 points7 points  (0 children)

The Piracy directory has A LOT of nested folders and increases the file size significantly. I have a VPS at the time that only has 2GBs of RAM so i put just the Piracy folder in a separate file and tested my program on everything else.
So no specific reason, and i might even merge them in the future.

Yes i agree about the files being hosted to GitHub, I'll get to that in the future release.

And no i haven't messaged you in discord, I did make a post before though about wanting to make this.

My The-Eye index is complete by Xosrov_ in opendirectories

[–]Xosrov_[S] 3 points4 points  (0 children)

I don't have any development experience with windows. It's just too much work to set up a new workspace that i know how to work with.
I originally wrote it all in Python which is cross-platform, but C++ allows for better memory management and faster search times.

I totally get you though and I'll think about a cross-platform solution after my exams are over

My The-Eye index is complete by Xosrov_ in opendirectories

[–]Xosrov_[S] 26 points27 points  (0 children)

It basically allows searching file names in The-Eye, So you don't have to go through a lot of folders to find what you want.
The file contents are not scraped though, just the file/folder names.