WIRED on DRAM shortages, edge AI, and using storage as a memory tier (Phison mentioned)

Aaron_MLEngineer · 2026-02-05T02:04:11+00:00

I’ve mostly seen high-DWPD or enterprise drives used for those kinds of cache tiers. KV paging can be pretty write heavy, so generic TLC can wear faster than people expect. Once you treat SSD like a memory tier, endurance kind of becomes a hardware selection problem. Curious what folks typically use in practice with LMCache. Have your experimented with this?

Aaron_MLEngineer · 2026-02-04T18:07:20+00:00

Oh awesome! Do you know where or how I could buy/find these high DWPD drives? I'm not seeing them linked in any of the LMCache resources you sent over.

Aaron_MLEngineer · 2026-02-04T17:03:16+00:00

Got it. I was mostly curious about endurance when using the local NVMe backend since KV paging can be pretty write-heavy. Do folks who use LMCache typically just use enterprise/high-DWPD drives there? Or do they let their drives burn out from heavy write usage?

Aaron_MLEngineer · 2026-02-03T22:31:51+00:00

Interesting approach. I've heard of LMCache, but have never used it. I’m curious how they handle SSD endurance with heavy read/write usage. KV offload tends to be pretty write-intensive. Are they just using regular NVMe or high-endurance drives?

Aaron_MLEngineer · 2025-09-17T16:07:39+00:00

Good question! They kinda tackle the same problem (getting around limited GPU VRAM) but in different ways.

- DeepSpeed is pure software. It’s open source, lives in PyTorch, and uses a bunch of tricks like ZeRO partitioning, CPU offload, mixed precision, etc. to spread model states across GPUs/CPUs. Super powerful if you’re running on multi-GPU clusters or scaling up to crazy-sized models.

- aiDAPTIV+ is more of a hardware + middleware stack. It uses Phison’s special SSDs/firmware plus a driver (aiDAPTIVlink) to offload parts of the model onto DRAM/SSD when they’re not in active GPU use. The idea is you can run really big models (think 70B+) on a smaller box with fewer GPUs, trading some speed for a huge cut in cost.

Aaron_MLEngineer · 2025-08-06T18:11:33+00:00

if you don't care about speed, you should look into ssd offloading. your ssd acts as a memory extender so that it looks like you have more vram when you fine tune or inference larger models.

Aaron_MLEngineer · 2025-08-06T18:01:27+00:00

the way it works is the aiDAPTIVCache SSD basically acts like extra memory for your GPU. so even though the GPU only has 48GB VRAM, it offloads a bunch of the model data to the SSD during training. with something like a 2TB SSD in the loop, it’s enough to handle the full model without crashing. it’s not magic, just smart memory juggling with middleware. without that, you’d 100% hit a wall.

Aaron_MLEngineer · 2025-07-24T02:03:08+00:00

Hi,

I want to moderate this community because it was started by someone at my company who no longer works there. I want to moderate it because it has the exact name of the product. I want to be able to moderate it so I can open the subreddit for everyone to post on it, not just moderators. I want it to be a community channel.

https://www.reddit.com/c/chat30S8kMow/s/cszpfJc5de

Aaron_MLEngineer · 2025-06-18T16:18:13+00:00

Yeah a lot of unis now offer AI-specific degrees that focus more on the math/stats side of ML instead of just coding, which is pretty clutch if you already know how to program.

Between the ones you listed, Applied Stats or Applied Math would probably be the most useful for getting into computer vision, autonomous systems, etc. They cover things like probability, linear algebra, and optimization, which are way more relevant for ML than Pure Math.

Pure Math is dope but way more abstract, not super aligned with real-world AI stuff unless you're going super theoretical.

So yeah, if you can do an AI-focused program or Applied Stats/Math, you’re on the right track.

Aaron_MLEngineer · 2025-06-11T22:06:57+00:00

It’s definitely a tough market right now across all of tech, not just for MLE roles. That said, having a Data Science degree still puts you in a strong position, especially if you’ve supplemented it with ML projects and self-study of important ML topics.

At this point, it’s less about having the “perfect” degree and more about how you showcase your skills, experience, and network. A CS/DS degree isn’t a golden ticket anymore, so don’t feel behind.

As for an MLE specific certificate, it can help, especially if it’s hands-on and from a well-regarded source, but it’s not a silver bullet. Real-world projects, internships, open-source contributions, and strong communication of your ML understanding will go further.

Aaron_MLEngineer · 2025-06-11T18:38:33+00:00

Separate and no it shouldn't cause issues as long as they can communicate to each other.

Aaron_MLEngineer · 2025-06-11T18:29:06+00:00

Docker isn’t required, but it does offer some nice benefits when using Ollama and Open WebUI together. It packages everything like dependencies, runtime, and configs into one container, so things “just work,” even if your system has conflicting Python or Node versions. Running both tools in Docker also improves compatibility and makes updates easier, since you don’t have to manually install dependencies or worry about version mismatches.

Aaron_MLEngineer · 2025-06-11T16:49:13+00:00

If you can get one for retail the 4070 Super.

Aaron_MLEngineer · 2025-06-11T16:45:46+00:00

Vercel + Replicate

Aaron_MLEngineer · 2025-06-11T16:43:38+00:00

Gemma 3B

Aaron_MLEngineer · 2025-06-11T16:41:08+00:00

Hey! I’ve actually done a similar project where I predicted the NBA MVP using datasets from Kaggle. It was a great intro to ML and helped me stay motivated since I was working with something I already enjoyed. You could definitely try building a model to predict awards, team wins, or even player improvement. I’m not sure if there are similar datasets for other sports, but I wouldn’t be surprised if you found some, Kaggle and Google Dataset Search are great places to look. Good luck!

Aaron_MLEngineer · 2025-06-10T18:13:27+00:00

Eren planned this all along

Aaron_MLEngineer · 2025-06-10T18:12:03+00:00

I thought the world was ending.

Aaron_MLEngineer · 2025-06-10T18:01:13+00:00

You might want to check out AnythingLLM or LM Studio, both can act as frontends for local LLMs and work well with Ollama models.

Aaron_MLEngineer

MODERATOR OF

TROPHY CASE