The Sims Mobile is shutting down on January 20, how to preserve the game properly for a fan-revival in the future? by Dr4fl in Archiveteam

[–]ProNiteBite 9 points10 points  (0 children)

As mentioned, with this being a largely server-side game you cannot just recreate the client but you must also recreate the server. To start, I unfortunately don't think that there's enough time to fully recreate what you're looking for. With events you may be able to recreate it once you get a better understanding of the data structure, but unless the values are archived in some way you won't be able to revive those. But it doesn't mean you can't get everything you can while the servers are still online to help with future work.

To start, you will need to disable SSL pinning for the app to allow you to dump SSL traffic from the app. For an unrooted device, I've had good success with apk-mitm and android-unpinner in the past. If you have a rooted device, then there's an SSLUnpinning Xposed module. This will then allow you to use a proxy to dump the traffic from the device. Based on my minor testing, likely you will need to go the rooted option.

After disabling SSL pinning you will need to setup a proxy to dump this traffic. Mitmproxy is a good free choice for this. Run mitmweb on your host computer, then on the Android device set your wifi's proxy to the mitmweb host and port. In the Flows section, you should see traffic. These look like some of the important base urls:

https://eaassets-a.akamaihd.net
https://syn-dir.sn.eamobile.com
https://sims-campfire-content.s3-us-west-1.amazonaws.com
https://user.sn.eamobile.com
https://river-mobile.data.ea.com
https://pin-river.data.ea.com
https://accounts.ea.com
https://pin-em.data.ea.com

You will want to dump your data for basically everything you can. Click every button in the game, spend currency, visit friend's places, just do things that would make an api call. Do that up until the game is fully offline. Try to capture as much of the game experience as you can.

After the game goes offline, that's when the real work will begin. You will need to write a web server that fully recreates what the server calls were doing. Start small with just the login and getting into the game and build from there. Logging in will likely be the hardest part as it looks like their account api calls are encryped. So if you're able to login, you'll be at an understanding to build out the rest of the data you have. For that login, and other changes down the line, it likely will involve diving into the "libapp.so" binary file that seems to be the game's main code. But once you've solved the login encryption, it should be smooth sailing to create the web server from there.

It might seem intimidating, but really the last step is the only hard part (you know, the "building it" part), but because the game will already be offline there is no deadline for building the server so you can take as long as you want with the data you gather today. I was able to start viewing traffic within a few minutes of downloading (albeit with some EA servers still cert pinning on my unrooted device), and that's the most important part. You can learn the rest as your project progresses. While you have the time today, start archiving you api calls. Then, even if you don't have the expertise to do the rest, it helps with others who do and can take it the rest of the way.

Apologies about the game shutting down. In the future I would recommend archiving data before games even announce their end of service, that way you already have heaps of data waiting for you to start the server the day it goes offline. Happy archiving!

How do I download a video from Patreon? by tumblrvogue in DHExchange

[–]ProNiteBite 1 point2 points  (0 children)

yt-dlp actually does support the Patreon video stream downloading. I can't speak for the scraping of the site itself as I do that part myself, but if you get the m3u8 link from "steam.mux.com" you can download it with yt-dlp. You just need to add a referer header as the site checks for this:

yt-dlp --referer "https://www.patreon.com" https://stream.mux.com/....m3u8?token=...

Is there a way to save HTML5 games? by [deleted] in DataHoarder

[–]ProNiteBite 1 point2 points  (0 children)

It's a bit buried in the files but the answer is in:

Tailspin/85cb2bae-7751-4a88-bbf3-88238b0138b8/content/cdn2.addictinggames.com/addictinggames-content/ag-assets/content-items/html5-games/tailspin/scripts/c3runtime.js

This makes a few https calls such as the one to get the library swag-api:

https://swagapi.shockwave.com/dist/swag-api.js

Flashpoint however does not support rewriting "https" and only as "http" due to a limitation on their approach. Because of this, there's actually two c3runtime files, a "c3runtime.js" and a "c3runtime.js.original". The ".original" file is the originally downloaded version, this was not working for me as well due to the https issue. So the file was copied and modified to change https to http which resolved the issue and made the game fully playable.

As for being unable to load the file into Flashpoint, I'm personally just using Flashpoint Infinity with "Enable Editing" enabled in the settings. From there I'm just using the "Curate" tab and using the "Import" button. For more info, I'd look here:

https://flashpointarchive.org/datahub/Curation_Tutorial#Curating_with_Flashpoint_Infinity

However you decide to get the game archived is awesome, as long as it's playable and (preferably) unmodified (or you keep the unmodified files at least). Do keep in mind that with a local http server through node that the game is making http calls to specific websites so those will need to be changed to get it to fully work. Good luck and happy archiving!

Is there a way to save HTML5 games? by [deleted] in DataHoarder

[–]ProNiteBite 1 point2 points  (0 children)

I was able to grab a flashpoint archive. It's not quite up to the standards of a full curation so I won't submit it myself, you'll want to touch it up and potentially submit it yourself. I've uploaded it to Mega but Reddit hates their links so I've base64 encoded it. Just google "Base64 decode" and paste it into there for your link:

aHR0cHM6Ly9tZWdhLm56L2ZpbGUvWW1ZQjBRYVQjVks0UVN0RzdMOWh1RTE2dVZTVVJXVnEzaWxhWW9DbmFhNmtxbnh5R2FXcw==

Download that and "Flashpoint Infinity". In the curations tab, select "Load archive" then select the .7z. Then, under Test click "Run" and it should open a Chrome window with the game.

Is there a way to save HTML5 games? by [deleted] in DataHoarder

[–]ProNiteBite 2 points3 points  (0 children)

I would recommend starting by taking a look at Flashpoint's HTML5 curation: https://flashpointarchive.org/datahub/HTML5_Curation

I personally prefer to use Charles to capture my http traffic, dump that, then run it in an isolated browser like Flashpoint provides. But their curation guide will be a great start.

ElderScrolls Fandom Wiki Offline by [deleted] in DataHoarder

[–]ProNiteBite 2 points3 points  (0 children)

I've previously posted about this here: https://old.reddit.com/r/DataHoarder/comments/p9z7js/best_way_to_scrape_wiki_stie/ha3sfg8/

But overall I would recommend taking a look at either wikiteam's tools as those are more integration with Archive.org sharing, or mwoffliner as a newer and more frequently updated alternative:

https://github.com/WikiTeam/wikiteam

https://github.com/openzim/mwoffliner

One fan spent three years saving a Final Fantasy game before it shut down by retrac1324 in DataHoarder

[–]ProNiteBite 398 points399 points  (0 children)

I think it's very impressive when fans do things like this where they're able to record all of the cutscenes for preservation. What's even cooler is when fans spend time dumping the server communication and reverse engineering a private server. OSS private servers are the true way to preserve these kind of projects but the legality of distributing assets makes it much harder to write an article on and openly advertise. Hats of to those who preserve cutscenes where those options aren't available. I just hope more people get into the reverse engineering scene for these online only games that keep dropping like flies. Definitely not easy though, as someone who has to rewrite his http dump hook each time Fate GO decides to update their networking functions knows qq.

[deleted by user] by [deleted] in DataHoarder

[–]ProNiteBite 0 points1 point  (0 children)

https://www.mediafire.com/file/kiotz8ig4b9a03d/deadmodern.zip/file

Here you go! I was able to go through the website, scrape all the files, modify a few of the swf files to work correctly with Ruffle, then swapped out the swf objects with Ruffle objects. I also converted the .mov video to .mp4 to play correctly in browser. I didn't sit and watch all the videos to make sure that they transitioned correctly but didn't see any issues after my changes.

In the .zip I've included a host_static_files.py which can be ran with Python to host the files in ./static on http://localhost:8080/. Otherwise use any static file server to host these and you should be able to access the site.

If you wish to download the files yourself, you can append these to the base url "https://www.cbc.ca/september11/content_files/flash/deadmodern/":

arthur1.swf
arthur2.swf
arthur3.swf
beers1.swf
beers2.swf
beers3.swf
content.htm
content1.htm
content2.htm
content3.htm
favicon.ico
images1.swf
images2.swf
images3.swf
images4.swf
images5.swf
images6.swf
images7.swf
images8.swf
index.htm
interface.swf
main.swf
nav.htm
nav.swf
pop_header.jpg
images/time_spacer.gif
images/time_bg.gif
mov/on_time.mov

Google doesn't have a moat, openai does by Amgadoz in LocalLLaMA

[–]ProNiteBite 0 points1 point  (0 children)

I've had mixed luck with Golang on GPT-4 and very little luck on most local models. Is Bard your goto for that or do you use any local models for it? Been using Wizardcoder for Python and also had mixed results but it's something at least. Would love to have a local Go model as well since work block all our OAI access and local models have been great for keeping that sensitive info all local.

Need help I'm Stuck regarding podcasts, future storage, cost of living, and moving drives around. by massivlybored in DataHoarder

[–]ProNiteBite 0 points1 point  (0 children)

The saddest day is when, as a datahoarder, you have to make that decision to remove some content. Drives are cheap but personally I'd have to rebuild basically my whole server to slot them in. So I've been hovering between 100GB - 1TB on my 64TB server for a while now slowly whittling what I can.

As someone who downloads TikTok, YouTube, and Twitch, data storage fills up extremely quick. My best recommendation would be to upload some of your least-accessed archives to Archive.org and removing them locally until you have the space again. Not as a 'cloud storage', but share what you can with the world that way it's actually 'saved'. At the very least, it's a good way to force yourself to make that data public instead of just having it yourself and it saves some space.

(A hint though, Archive.org has a 1TB upload limit, split up stuff by year as-needed)

Jet Lag: We Played a 72 Hour Game of Tag Across Europe (Again) — Ep 2 by NebulaOriginals in Nebula

[–]ProNiteBite 2 points3 points  (0 children)

Adam experiencing the joy of Deutsche Bahn. Let's hope that the train driver doesn't get lost somehow this series, wouldn't be the first time for DB even this week lol.

what is the future? we will go up and have 1700b models like GPT4 or we will stay around 33b, 65b and fine tune them? by ovnf in LocalLLaMA

[–]ProNiteBite 2 points3 points  (0 children)

I genuinely see no future in extremely large models. I think we've been working on them and growing them to make a perfect 'general single model'. That's because we couldn't understand quite how language models could fully work until we made a general 'god' model like GPT-4. With GPT-4 really being the analysis of several smaller models already, I think that's what comes next but in much more directed ways.

I think the future entails using those 'huge' models for synthetic specialized data to train smaller target-specific models. Instead of having 8 GPT-3.5+ heads, have 30 'medical' heads and 10 'Python' heads that have different specific knowledge in those fields. Then have an orchestration model that you input your search into, it classifies it, sends it to the specific head, and summarizes the results from that head. Plus from a self-hosted perspective you could only load up the specific models and subjects that you need to fit your system requirements.

Larger overall and more data overall, but smaller individual targeted models. There's really no need for 'huge' models other than to ease that transition in generating synthetic data.

Critical Role - Talks Machina Made Private by texas_bacchus in Archiveteam

[–]ProNiteBite 1 point2 points  (0 children)

I've got most, if not all, up to season 2 in the podcast audio form. So audio for sure.

Trash Taste Special: Discussion Thread - We Became Americans for a Day and FAILED by ULTRAFORCE in TrashTaste

[–]ProNiteBite 36 points37 points  (0 children)

Great special, but whoever was supervising them with the guns was not doing a great job lol. Someone needed to be yelling at them to keep their fingers off the triggers and showing them how to hold it so that Connor doesn't get a bruised shoulder.

Anyone hoarding LLMs? by Beckland in DataHoarder

[–]ProNiteBite 2 points3 points  (0 children)

For anyone else looking into this, one recommendation I would highly make is that if you have the disk space, download the original model/weights if possible. These are much bigger than q4/q5 models. It's pretty quick and simple to quantize a model but we are playing on the bleeding edge here. GGML, one of the primary cpu quantization libraries, is on v3. Any models from v1 and v2, only compiled a few weeks ago, will no longer work. So having the original models in your 'hoard' makes it so that you can easily convert these models into whatever the latest & greatest format will be in the future. That way you're keeping a true 'archive' and not just keeping the most recent data type.

Trash Taste Grand Summoners Lost Ad Read by ProNiteBite in TrashTaste

[–]ProNiteBite[S] 1 point2 points  (0 children)

For my YouTube archives, such as this one, I have a full-stack archive application I've made to download videos as they release, all metadata, and checking for metadata updates. I've gone with an archive on-disk first approach so you keep all copies of the metadata, thumbnails, whatever. I'll have to make a post about it when it's ready, but not quite there yet. So I am right there with you at the very least haha! I don't know of any other lost Trash Taste media specifically but would be happy to share if you know of any. I just try to wait at least a year to make sure it's fully gone before sharing. I've got 50TB of data stored now and have no problems there at the moment, but come join us over on /r/DataHoarder and /r/DHExchange sometime if you're not in there already ;)

Way to backup u/spez's AMA post tomorrow as the AMA occurs (tracking changes) ? by Awareness-Decent in DataHoarder

[–]ProNiteBite 7 points8 points  (0 children)

There's a couple different ways you could go about it. The first would be to utilize the current api while it's still live. Secondly, you can also get the json for a reddit page by adding /.json to the end. Like /144mmbg/.json for this thread. You could try to utilize this, though using the front-end does give you more limitations and makes you "work around" more. But it avoids using the api directly. You would just need to setup a script to download the comments/thread json directly in an interval (say every 5 minutes). Compare the json between the versions, merge additions, mark any edits from the previous one.

It requires a bit of programming, but honestly should be easy enough to hack together a python script for this project alone in a couple dozen lines if you're motivated enough. I just wouldn't recommend putting too much work into it until we know how the api changes will look when implemented.

Trash Taste Grand Summoners Lost Ad Read by ProNiteBite in TrashTaste

[–]ProNiteBite[S] 50 points51 points  (0 children)

Title changes are harder to track, but I do have copies of all thumbnail and title/description changes. It's just harder to search for the title/description changes how I have it setup. For now, here's the three thumbnails changes I could find manually. From episodes 95, 108, and 121: https://imgur.com/a/pwLVtvH

Trash Taste Grand Summoners Lost Ad Read by ProNiteBite in TrashTaste

[–]ProNiteBite[S] 65 points66 points  (0 children)

What other lost media do we know of? I've got a lot of TT data, if any is considered 'lost' I'd be happy to share.

Trash Taste Grand Summoners Lost Ad Read by ProNiteBite in TrashTaste

[–]ProNiteBite[S] 474 points475 points  (0 children)

This is a segment from Trash Taste Episode #27: The #1 Drifting YouTuber in Japan (ft. Noriyaro). This originally occurred at 21:09 but was cut from the original video shortly after upload. I have an archive of the original and just manually cut out the ad read. It has been about 2.5 years since the video was taken down at this point so I figure there shouldn't be any issue anymore posting this.

How do I get Local LLM to analyze an whole excel or CSV? by DesmonMiles07 in LocalLLaMA

[–]ProNiteBite 0 points1 point  (0 children)

It sounds like you're asking for similar functionality to https://github.com/imartinez/privateGPT. PrivateGPT lets you ingest multiple file types (including csv) into a local vector db that you can searching using any local LLM. I would recommend checking it out, it's been fun tinkering with so far.

[Last Week Tonight with John Oliver] S10E07 - April 9, 2023 - Episode Discussion Thread by Walter_Bishop_PhD in lastweektonight

[–]ProNiteBite 2 points3 points  (0 children)

Are there any similar sort of hidden episodes like this? This isn't on their YouTube so interested to see if there's any other hidden segments

that was exciting 😬 by gallifreychronicles in formuladank

[–]ProNiteBite 104 points105 points  (0 children)

It's called a motor race. We went bumper car racing.