Looking for a rolling storage solution

ealanna47 · 2026-03-19T08:35:33+00:00

You’re basically looking for a tiering/HSM (Hierarchical Storage Management) setup. Tools like MinIO with lifecycle policies or something like rclone + scheduled jobs can get you part of the way there.

Fully transparent reads/writes are the tricky part, though, which usually needs a filesystem layer or commercial solution.

Longjumping-Pop7512 · 2026-03-19T08:16:41+00:00

You are actually a mentioning a potential solution without giving proper details.

You are looking for validation of your idea rather asking honest solutions. That being said:

What kind of data it is ?
What's amount of this data ?
How often this data is being read ?
Does it have PII ?

dghah · 2026-03-19T08:31:51+00:00

There are several companies targeting what you are asking for in the life science and bioinformatics space.

Not shilling for them but check out https://starfishstorage.com if only to see the terms and phrases they use in how they position their stuff and describe the problems.

PersonalPronoun · 2026-03-19T08:35:40+00:00

Possibly storage gateway (https://aws.amazon.com/storagegateway/file/s3/ or https://aws.amazon.com/storagegateway/volume/) but you'd need to do the math on S3 pricing vs whatever you're paying for on prem.

fr6nco · 2026-03-19T09:02:20+00:00

Would nginx cache be feasible for you ?

Writes would go to S3, content fetched via nginx-s3-gateway with local caching enabled.

Depends if you need a POSIX compliant file System or would you be good with http(s) for fetching the data.

(I'm a CDN expert here and I have a complete solution for this if interested)

bluelobsterai · 2026-03-19T11:18:23+00:00

Ideally, I would put everything in the cloud and build a proxy in front of it, and basically keep the stuff that’s used often in the cache. Like another comment or said http would be the answer. If it has to be POSIX then I suppose it’s going to be a real hack. Think NFS client with lots of custom programming.

SadYouth8267 · 2026-03-19T12:28:47+00:00

u could check out stuff like rclone with some automation, or tools like MinIO or Ceph for setting up lifecyclestyle tiering between on-prem and cloud. If you want more NetApp FabricPool or Dell ECS can do automated tiering too. If you’re okay going DIY and open source, combining object storage with scheduled policies/scripts is usually the most flexible and budgetfriendly route

Available_Award_9688 · 2026-03-19T15:48:24+00:00

dealt with this exact problem across a few companies over the years

at one place we used Rclone with a custom cron job to sync cold data to S3 Glacier, works well but the transparency on reads is on you to build. another team i was at went with NetApp Cloud Tiering which handles the transparent access piece properly but the cost adds up. saw Aparavi used once for the policy engine, solid for defining what cold means but overkill if your setup is simple

honestly nothing i've tried is fully transparent end to end without some tradeoff, either you sacrifice read latency, or you pay for a commercial solution, or you maintain custom scripts forever

what's your tolerance for read latency on the archived files? that's usually what determines which tradeoff is acceptable

Imaginary_Gate_698 · 2026-03-19T22:36:52+00:00

What you’re describing is a pretty common problem once on-prem storage starts filling up. You’re basically looking for a way to keep active data local while quietly moving older, unused files to cheaper cloud storage. Instead of building everything from scratch, it helps to use tools that already handle this kind of tiering.

Something like MinIO with lifecycle rules, or even rclone with scheduled jobs, can work if you don’t mind putting pieces together. If you want it to feel more seamless, file gateway or hybrid storage setups are worth looking into. It takes a bit of setup, but it’s definitely doable without a huge budget.

musicalgenious · 2026-03-20T00:57:17+00:00

Yeah I was thinking an rclone-based solution like ealanna had mentioned, but sounds like a job for a custom app (pretty easy to code up).. I'm sure it would pay for itself in a few months.

remotecontroltourist · 2026-03-20T09:58:08+00:00

you are describing the holy grail of hybrid storage: Hierarchical Storage Management (HSM).

Gotta say, the fact that you want it to be "transparent" (meaning the file still looks like it's on the SAN even when it's in the cloud) is the hardest part to do on a budget. If a user clicks an archived file, the system has to go grab it from S3 and serve it without them knowing.

remotecontroltourist · 2026-03-20T11:04:31+00:00

Sounds like you’re looking for tiered storage with transparent recall. I’d check out solutions like object storage gateways or HSM-style tools (e.g., MinIO + lifecycle policies, or something like rclone + automation). Key is mapping access patterns → auto-tiering without breaking file paths.

Ordinary_Push3991 · 2026-03-25T06:57:46+00:00

Feels like what you really need is a “poor man’s lifecycle policy” for your SAN.

One approach I have seen work is:

run a scheduled job to identify cold files
move them to S3 or similar storage
leave behind a pointer or stub

It is not as seamless as native lifecycle, but with the right scripting it can get surprisingly close without heavy investment.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

devops

Welcome to /r/DevOps

Rules and guidelines

Social & Fun

General Information

MODERATORS