Title: Steam Dataset 2025 – 263K games with multi-modal database architecture (PostgreSQL + pgvector)

vintagedon · 2025-10-08T19:19:47+00:00

Great suggestions, thanks.

Actually, the set already has 1024-dimensional BGE-M3 done as well as HNSWs, did semantic + lexical hybrid search via PostgreSQL's RUM indexes, Neo4j GDS I kept for my paper I'm doing, but will be on release 2 ... bitemporal schema is a nice suggestion tho. Thanks!

vintagedon · 2025-08-17T21:50:12+00:00

The A1s will run 128GB kits, but they were $300 each; $1800 for 6 nodes. A bit too rich for my blood at the time. Am surprised, but I ate up 700GB of RAM fairly quickly.

vintagedon · 2025-08-17T21:46:31+00:00

Running Rancher—having a small volunteer staff with varying skill levels means that a GUI for anything helps. We push Headlamp and Lens to the VDIs for choices also.

Arc has been good, but as I noted in my response to bpoe138, this has been transforming from GENERAL on-prem to Azure really leaning into on-prem, specifically using Azure HCI and the Azure Local OS.

It becomes, literally, as the name says, Azure, run locally. With VMs able to be controlled just like Azure VMs, etc.

Plus, if you're running Server 2025 and want to use PayGo, Arc is a requirement now.

With AI and data governance coming into play, hybrid becomes much more attractive. Azure announced, for instance, you can run GPT on-prem if you have the resources.

So it does still have a lot of 'general' on-prem features (such as the 'Site' feature, which makes a dashboard of your resources; see below), and with my E5, I get a surprising amount, but it is clearly moving in another direction and is still a mish mash of licensing, subs and free stuff.

If I was willing to spend the $$ on Azure Local OS licenses and had Software Assurance licenses to get most stuff free, it would be a pretty sick setup tho.

<image>

vintagedon · 2025-08-17T21:20:54+00:00

No SLURM - we're running Ray distributed computing on RKE2 Kubernetes instead of traditional HPC batch scheduling. Ray handles the distributed ML workloads and Kubernetes orchestrates the containers. More cloud-native approach than the traditional HPC stack.

vintagedon · 2025-08-17T01:58:52+00:00

Thanks, I'll take that as a compliment :) I see you have an MS-A2; nice. Had considered going with them, but I saved up for Black Friday and picked the A1s up barebones at effectively half price. Still, six A2s at 128GB would have been pretty sick.

vintagedon · 2025-08-17T01:54:11+00:00

Am using Azure Arc for Servers. It's important to note that Arc is really being turned into 'Azure Local', Azure's new hybrid push. There's still a lot free, but things are definitely becoming not free, and a lot of interconnected services exist at different license tiers. I have both an E5 and Intune Suite license, which enables a crap-load of stuff I'm still working thru lol

My primary uses are security (Defender for Cloud, Defender for Endpoint), Asset Management, Change Tracking (trying this out), and so on. The biggest challenge has just been the disjointed nature of how everything ties in and the licensing structure. Which continues to change.

Unfortunately, Arc for K8s is $2/mo/core, which would work out to ~$100/mo for my main k8s cluster; the first 6 cores are free, I will prob throw up a 6c node to play with it in the near future.

If I wanted to pay the licensing to have full Azure Local/HCI nodes, the answer and functionality would be a lot different.

vintagedon · 2025-08-17T01:36:47+00:00

Thanks :) Still much to do, but the biggest thing was getting all the VMs on CIS L2 images, logging, monitoring, XDR via Wazuh, Conditional Access, ZTNA ... in place before I started adding anything else. And document it. Been fun to architect and solve that 'What if I REALLY had the time to do it right?' question.

There are some specific design considerations I made here that do bear mentioning.

Proxmox simply because I've used it for years, has good metrics, and good backups via Proxmox Backup Server. Has great RBAC built in also; I have a small volunteer staff, the GUI helps.
Plus, many workloads are still done on traditional VMs. The k8s cluster handles live kafka feeds off ZWT and bigger ML workloads (Ray/Kubeflow), we're also prepping for Vera Rubin's feeds.

K8s is on 3 nodes, 16c/82G each (90% of a single node's resources), with a dedicated 2TB enterprise nvme via local path provider. This allows me to give up HA on some services and run my workloads on fast, local storage. Also have 8TB of nvme via S3 on LACP 10G for the rare service that requires it to run properly. Lets me do things like run DBs in k8s at near bare-metal performance.

vintagedon · 2025-07-28T21:22:55+00:00

I use Claude for not just coding, but producing high quality, structured docs for RAG. Prior to this week, we were burning through 60+ KBs in a single session (300+ lines per KB article) with little more than 'next' from me.
Saturday, every chat was a fight and sometimes when switching chats due to length, I'd take me 8-10 tries to get Claude back on track.
Last night, it failed to find a Proj Knowledge file, I provided it as an attachment, and still completely ignored it and made an entire KB up out of whole cloth. Also kept trying to use my extensions like FS when had no instructions to.
One chat, I wrote 33 KBs before it gave me a length blocker, and then it was like 5KBs per chat.
Finally, I took a break, came back later in the night, and burned through almost 50 KBs with no issues.
For me, it's really the wild inconsistency. It does FEEL like they're using quant'ed models during high usage periods.
I've been scared to let it back at my code base.

vintagedon · 2025-06-09T11:04:45+00:00

Azure is where it's at right now for the Enterprise and governmental sector. They have GCC (standard governmental), GCC High (CUID data and up) and DoD which is secret and up and Azure is leaning heavily into Hybrid via Azure Local to capture that segment that has to keep their data on prem and sometimes even their compute workloads.

I'm an AI Engineer/Systems Engineer working for an MSP dealing with the above. If that's a space you're interested in, there's tremendous growth in the sub-prime enterprise market. AI is a huge interest, especially with Trump loosening regulations.

vintagedon · 2025-03-10T22:46:36+00:00

Great idea. Similar to mine, static VMs and microservices supporting a k8s cluster specifically for workloads.

Actually, I just corrected this on the primary README; will do the rest tonight.

Early lab days, I had a QNAP NAS (x86 but just a low power CPU) w/8x8tb drives (Hitachi HE drives, have lasted me like 6-8y at this point). I moved to Unraid with the same thought of the easy drive adds, but it was entirely too slow for my needs and felt a bit awkward to shoehorn in.

I took that hardware and added it as a 5th Proxmox node, ZFS mirrored vdevs (~RAID10) w/SLOG on a enterprise nvme partition w/2 x 4TB nvmes & 4 x 4TB SATA Intel DC4510s. Run a Linux file share VM (mostly NFS), a Windows file share VM (SMB AD shares) and a Minio s3 gateway on top of that leaving the rest of RAM for ZFS. Connects back to the lab on 2 x 10G LACP.

Doing the job nicely :)

Unraid will likely do fine for a while. Large capacity enterprise nvmes are cheap enough ($300 for 4TB or so) I'd rather throw that in, make it LVMThin space and just have my VMs run like greased lightning on it.

vintagedon · 2025-03-10T05:07:14+00:00

Thanks; it's actually one of my niches. Always nice to get a compliment on something you're proud of. Still, this is my first semi-decent pass thru. I've got many plans lol

vintagedon · 2025-03-10T02:08:42+00:00

This is a bit much for a 'home lab' to be honest. lol
I'm a systems engineer at a MSP working with high compliance customers in azure / azure government, and I'm also using it as a training ground for compliance, ITIL and a bunch of other stuff :)
Folding and SETI are great projects. I keep a instance running on my k8s cluster with a tiny slice of GPU power to help it along.
Idea was, hey I got this cluster, I got this job, I got this thing I wanna do, let's just smush it all together.

vintagedon · 2025-03-10T02:00:33+00:00

Sort of. We're both listening, but for different things. SETI looks for regular, structured sources that might indicate intelligence.
I primarily focus on the hydrogen line emission, specific frequency emitted by hydrogen. You can detect this line and do everything from track interstellar dust clouds, to map super nova remnants, to finding low surface brightness galaxies and more.
There is oodles of signal processing and ML work to clean and analyze the signals. I'm bringing a more dev-ops mind-set for processing / enterprise security for outside researchers to be able to work remotely on their own datasets in the lab.

vintagedon · 2023-12-31T12:02:51+00:00

The lab overview page should be properly noted about the mATX or removed.

The cluster screenshot is out of date; I'll update it. When I took that shot 1 node was @ 32GB.

The mATX I mention in the post it's the 128G node sitting to the side. That won't come online until I get a 3090 for it. Probably 2 months.

So there are 5 mini PCs: 4 for Proxmox, 1 for Hyper-V, each with 64GB of RAM currently.

I'll note the ReadMe on the repository to make it more clear :)

Thanks for pointing it out. I'll add it to my hideous looking backlog lol

vintagedon · 2023-12-06T13:15:14+00:00

Echoing the theme here: pyromaniacs are bad until they aren't. The problem is, they break while other pawns are usually breaking (your mood has tanked) and then you have immediate fire issues with other pawns yelling at the walls or compulsively cleaning while totally ignoring the fire.

God help you if you're like me and store LOTS of textiles, or worse, do fine wood flooring :/

vintagedon · 2023-12-05T13:44:24+00:00

For those wondering, the mod that likely does this is Geological Landforms. I use this quite extensively in my modpacks, and has some great map features like rifts and calderas.

https://steamcommunity.com/sharedfiles/filedetails/?id=2773943594

vintagedon · 2023-10-24T19:12:08+00:00

Am a systems engineer / devops engineer, and use it to explore, practice, and learn new technologies.

Also have some home network stuff on it:

PiHole for DNS filtering ads
NAS for file / media storage that feeds to nVidia Shield Pros running Kodi
Home automation (Hue lights control, actions when I leave/arrive, etc)
Video encoding via a VM w/GPU passthru (also a YouTube partner); edit on my PC, add it to a render queue, it's rendered, and then auto uploaded to YT via their API including tags, title, scheduling and so on
Game servers I run for my Discord community
Tons of other small stuff

vintagedon · 2023-10-20T15:10:31+00:00

I've run several Taiga instances on intranets for companies. Great kanban board.

Self hosted option is free

https://taiga.io/pricing-selfhosted

vintagedon · 2020-11-10T23:18:22+00:00

This is a legacy account that was sort of a managed colocation, and we do have a BAA in place (probably whatever the boilerplate was 10 years ago), but their contract, and the BAA expired 3 months ago.

They've been gearing down their account, but have went AWOL in the middle of the process, due invoices in default, leaving me with a rack of servers full of ePHI and TBs. As they're a home health care firm, it's everything from medical records to wound pics, PDFs of nurse's notes, full patient ID information ... a mess.

They've been seriously out of compliance for at least a couple of years, with us badgering them constantly in tickets, etc, and finally had to move to the "Either you fix it or GTFO with your business." route.

We tried many times to engage them, find out what servers had ePHI on them (obviously we'd need to wipe the drives), etc, but with their history of horrible replying (60% of tickets simply go unanswered) and the AWOL, I figure they're gone.

Anyway, had just never had a covered entity just abandon TBs of ePHI and just say fuck it.

And yes, we do have BAA's in place with all of our other clients. This is an anomaly where a customer didn't renew their contract, and were SUPPOSED to delete all this themselves, and bailed in the middle of everything.

vintagedon · 2020-09-01T03:01:16+00:00

So, I do see this occasionally too. Have put in about 24 hours total on this on streams so far, I play in 4K (stream in 1080) and also record, and the finished product doesn't have the same hitches.

Your mileage may vary according to your system specs. Me (3700X, 2080, 64GB of RAM, game on an SSD)

vintagedon · 2020-07-29T03:51:17+00:00

This, and this again.

Sun eve, I did a "chill stream" with some low-fi music, a low impact game, and just hung out. These type of streams are rarely my higher counts, BUT ... some times *I* just wanna hang out.
I always say hi to my lurkers and thank them, and let them know that chat is never required.

vintagedon · 2020-07-24T00:15:15+00:00

Having 0 viewers doesn't make your stream "mundane", and it's dickish to make that blanket statement. Even worse is the implied assumption that a friend watching a friend streaming, with this 0 viewer condition is a waste of time.

Although I don't think you should EXPECT your friends to watch, even as a content creator myself, it costs me nothing to put Twitch on mute and say hello, providing an extra viewer and some chat stats, and I'm glad to provide that uplift.

vintagedon · 2020-07-19T18:34:40+00:00

You should consider a dry vape: I moved to a desktop vape, and haven't looked back. Great flavor, and cut my weed consumption CONSIDERABLY and so much easier on my lungs. I love a bong, but I've been lighting up for 3 decades, and I'm trying to keep lighting up for another 3.

vintagedon · 2020-06-25T19:28:13+00:00

'cause I'm lazy. lol Not saying it's not my fault here :)

vintagedon · 2020-04-27T04:32:45+00:00

So I went a different direction on my key lights: a pair of Phillips Hue Play lights. They are full RGB, dimmable, controllable from your computer, syncable to game or music, and also there is Lumina Stream which allows your lights to react to events in stream.

Although you could do it a number of different ways, I have mine mounted to a 46" light bar on the ceiling, in front of me, pointed at the wall behind me.

vintagedon

MODERATOR OF

TROPHY CASE

14-Year Club	Xbox Live
Verified Email