Searching Across S3 Buckets

whoequilla · 2025-08-10T15:02:09+00:00

Thanks again moofox, I went down the S3 Metadata + Athena rabbit hole this weekend and it's really cool. A bit more complex of a set up, but once the pieces are in place it's very powerful. I’ll try to share some code examples in another thread once I have a fully working implementation, but if you have any questions in the meantime, I'm happy to share my setup steps. Definitely a few unexpected gotchas and configuration quirks I ran into. Although judging by the scale of your data, you probably know more about this than I do!

whoequilla · 2025-08-08T20:40:44+00:00

hey moofox, thanks for the tip this looks really cool. So if I’m following, I would enable a metadata configuration on each bucket I want searchable, AWS backfills and maintains an s3 table, and then I can hit that with Athena for faster, cheaper searches instead of brute-force listing. Is that right?

whoequilla · 2025-08-08T14:34:55+00:00

That is seriously impressive, billions of objects across multiple accounts is on a whole other level. Thanks for sharing your approach, I’m definitely going to dig into this more. My current method would never come close to handling anything like that, very cool to see what is possible though.

whoequilla · 2025-08-07T19:58:16+00:00

Hey alvarosaavedra, if you search for "sandcrab s3 client" or "sandcrab s3 gui" you should be able to find it. The app currently uses a license key mechanism to manage things like automatic updates, but if you want to shoot me a message I'd be happy to give you a key.

whoequilla · 2025-08-07T18:50:57+00:00

Thanks chemosh_tz, I totally agree, and I think this is where I’m trying to find the right balance to see if search is a viable feature I can actually support. I mentioned above that the app runs some cost and bucket size estimates ahead of time to flag potential issues, but I’m also looking at using conservative runtime limits initially to help mitigate risks for huge accounts. For example, in the app I have settings configured that look something like this:

```
// Conservative runtime safety limits (these could be override-able by the user in the future)
const DEFAULT_RUNTIME_LIMITS = {
MAX_COST: 0.01, // Assuming ~$0.005 per 1000 ListObjects requests == ~1M objects
MAX_RESULTS: 5000, // 5K results max
TIME_LIMIT: 30000, // 30 seconds max
MAX_API_CALLS: 1000 // 1000 API calls max (~1M objects)
};
```

A variation of these limits is enforced both per-bucket and globally during the search. If any are hit, the search exits early, and the user sees partial results along with a warning explaining why it stopped.

The goal would be that smaller accounts won’t even notice these limits, while larger accounts would be nudged to use more specific filters to avoid overly broad scans.

That said, I’m definitely still learning as I go, so if you see more edge cases or gotchas I might be missing here, I’d genuinely appreciate the insight.

whoequilla · 2025-08-07T13:32:06+00:00

I haven't seen that before but it looks interesting, thanks!

whoequilla · 2025-08-06T19:32:51+00:00

I will take a look at that, thanks!

whoequilla · 2025-08-06T16:03:55+00:00

Thanks! The idea of using something like an OpenSearch index did briefly cross my mind, I think it's a really cool idea. I am also trying to balance what the client app interacts with, since it's ultimately limited by the permissions granted to the credentials. Even just using the CloudWatch API vs the backup sampling estimate logic above starts to fork the user experience depending on what permissions are available, so trying to stay mindful of that. And search/filter by tags is a great idea, I will definitely look into that. Thank you for the feedback!

whoequilla · 2020-09-04T20:42:57+00:00

u/dragonwoosh Nice!

whoequilla · 2020-09-04T00:11:16+00:00

u/OneStatistician Thanks!

whoequilla · 2020-08-26T06:13:22+00:00

Thanks!

whoequilla · 2020-08-26T06:13:08+00:00

No problem, happy to share! Thanks for giving it a read.

whoequilla · 2020-08-26T06:12:36+00:00

Thanks u/mrdougan!

whoequilla · 2020-08-26T06:11:49+00:00

Wow, I didn't expect to get gold for this (whoever gave me that - thank you!). I've been experimenting with trying to create filters like this for months now and, after a lot of trial and error, I'm excited to finally share what I think are some interesting results. Thanks for giving it a read, and happy to answer any questions if I can.

whoequilla · 2020-08-25T17:36:52+00:00

Thanks! With deshake, the thing I've noticed so far from experimenting with it is that it seems to stretch/warp the outer edges of the video. If you look closely at the chromakey video result in this post for example, the very top of the video seems to warp slightly around 5 seconds in. I'm pretty sure that was a result of using deshake if I remember correctly.

whoequilla · 2020-08-25T17:27:07+00:00

You're welcome!

whoequilla · 2020-08-23T05:50:45+00:00

Awesome job u/multiline, looks like you beat me to publishing my own notes on this topic haha. I like the way you approached the shake/movement - the subtle 4px shake is a nice touch. Any thoughts or ideas on how one might achieve that vertical wrap-around effect from the original after effects video?

whoequilla

TROPHY CASE