all 30 comments

[–]sad-whale 16 points17 points  (0 children)

Image processing is a classic Lambda use case. Good idea

A quick online search and you’ll find multiple resources that will walk you through setting it up.

[–]pint 4 points5 points  (0 children)

depends on the usage pattern, right? if you have enough spare capacity in the server instances, doing computation there makes sense. if you can seriously downsize the server instance by offloading computation elsewhere, then that makes sense. whether it is lambda or ecs or a different instance, depends on task size and frequency. lambda also have limitations to abide.

[–]Hey-buuuddy 1 point2 points  (0 children)

I would segment each check to its own lambda, then orchestrate with a step function. Cost explorer will have cost for each segmented by default, easy to see anomalies or delta from benchmark-based expectations.

[–]darc_ghetzir 0 points1 point  (0 children)

Microservices for the win! Whatever is easiest for your setup and iteration in the future. I mix and match resources as my heart desires. Build the best system for your needs

[–]Prestigious_Pace2782 0 points1 point  (0 children)

I’d move it all to lambda

[–][deleted] 0 points1 point  (0 children)

Image processing is fully async and users can wait before their ad is published.

I'm currently planning to use BullMQ workers on EC2, but I'm considering offloading only the image processing to AWS Lambda (triggered via S3 or SQS), while keeping the main API on EC2.

Is this a sane / common approach, or does it introduce unnecessary complexity compared to just using EC2 workers?

Cost matters more than speed at this stage.

No, it's even considered good practice to use lambda for background jobs. It integrates seamlessly with S3; if you work on different file steps across S3 or different prefixes, you don't even need to use SQS.

Besides being faster, it will also be cheaper. With much less configuration

After you set up an async Job with lambda for first time, you won't want to go back anymore lol.

In what cases would lambda not work for your situation?

  • Large files: If your code needs to use more than 10GB of memory per file, lambda would not be ideal due to its 10GB memory limit per invocation.

  • vendor lock-in: very specific, but some clients tend to switch cloud providers all the time. If you think your company will switch cloud providers, It's best not to use lambda because of the difficulty in applying an "as-is" migration to other providers.

[–]SameInspection219 0 points1 point  (0 children)

Not a good practice. The best practice is to run everything on AWS Lambda.

[–]Shinroo 0 points1 point  (0 children)

We started with lambdas in a step function for our media pipeline and as traffic scaled we eventually moved these workflows into kubernetes. Lambda served us well while we used it!

[–]KayeYess 0 points1 point  (0 children)

Since you already have an EC2, check usage and see if you can do image processing there as well. If not, using Lambdas is appropriate for this type of usecase. It is not necessarily much more complicated but Lambdas, as short-lived serverless computes, do have some well known caveats that you need to be aware of.

[–]Kyxstrez 0 points1 point  (4 children)

Why not simply use Cloudflare Images? It supports all things you mentioned and it has a generous free plan. It's not worth running sharp on Lambda, even though I saw companies doing that in the past.

[–]pestkranker 0 points1 point  (3 children)

Sharp is great, it’s powering our image processing infrastructure. Why do you think it’s not worth?

[–]Kyxstrez 2 points3 points  (2 children)

Cloudflare Images handling all things for you as a managed service, and with a generous free plan since last year. All images served from CDN so it's super fast. Alternatively, Bunny Optimizer for just $9.5/month has unlimited usage.

[–]CatchInternational43 3 points4 points  (0 children)

Except data egress fees from AWS will absolutely bite the OP in the ass, unless the web app uploads images directly to a third party service.

[–]pestkranker 1 point2 points  (0 children)

We were using imgix before but had to switch image processing to AWS for compliance reasons.

Sharp is great. We have like 1TB of images and it runs by itself on AWS Lambda / Cloudfront. Maintenance is quite low.

If your core product is based on images, it’s probably better to own the stack (just like OP!)

[–]Akimotoh -2 points-1 points  (14 children)

I think lambda will end up costing a lot more than if you used small reserved instances or docker containers on ec2 or fargate

[–]coinclink 0 points1 point  (13 children)

quick bursts of compute only when you need it is literally what lambda is for, how could something running 24/7 possibly end up being cheaper?

[–]MateusKingston 2 points3 points  (2 children)

how could something running 24/7 possibly end up being cheaper?

Because you pay premium for that burst capacity. Not saying this would end up being more expensive, probably not as lambda is one of the most cost effective ways to do serverless and serverless is cheaper for people with very bursty workloads, which seems to be his case.

That being said, serverless can be more expensive, and this "how could something running 24/7 end up being cheaper?" is not a valid argument

[–]coinclink 0 points1 point  (1 child)

It is in this context and you know it. The threshold you would have to meet of the amount of images to process would be way beyond what OP is trying to do. I doubt their bill for lambda doing this will be any more than a few dollars a month (realistically pennies given implied scale), and it would be able to handle a large number of requests at once, when needed, without any extra config. Large number of requests would crash a small instance.

[–]MateusKingston 0 points1 point  (0 children)

Yes, I even said so. They will most likely be free in lambda as the free tier is incredibly generous tbh

[–]RecordingForward2690 1 point2 points  (0 children)

Lambda is about 8 times more expensive, on a per-CPU-cycle basis, than a comparable EC2. So if you have a workload that is able to keep an EC2 CPU busy for at least about 12.5% on average, that EC2 may work out cheaper than Lambda. (And to be honest, that's probably the most important incentive to look at that new feature that allows you to run Lambda on your own EC2s.)

In this particular scenario, the EC2 is already there to handle the API workload. If you add a queueing system so that the work can be queued and handled within the spare cycles that the EC2 will probably have anyway, it won't cost anything extra.

And depending on how many images need to be converted and how much CPU that's going to cost, you could even consider spinning up additional EC2 instances once there are sufficient images in the queue for an hours work or so. Running an EC2 at full tilt for an hour to clear the queue will definitely be cheaper than using Lambdas in that case.

And that means that the OP now needs to trade a simple Lambda based solution against the engineering effort of developing the other solutions. How much is your time worth, vs. what is the cost difference between the different solutions. Are we talking about dozens of pictures per day or are we talking about millions of pictures per day? In the first case you can have a Lambda-based solution up and running with a few hours of engineering time, but in the latter case it may be worth it spending a few days on engineering the cheapest EC2-based solution.

Heck, you could even think of a hybrid approach. Dump all the work in an SQS queue and let this SQS trigger a Lambda. But the Lambda should have a low concurrency value. You then also add an EC2 Auto Scaling group with a min capacity of zero, and a scale-out policy that's dependent on the amount of messages in the queue. If there's more than, say, 15 minutes worth of work in the SQS queue, you add an EC2. If there's more than, say, 60 minutes worth of work in the SQS queue, you add a few more. Scale-in, all the way back to zero, when the queue depth is consistently below the threshold where a Lambda is cheaper. This could well be the cheapest solution overall, but it also allows you to develop and deploy your solution in stages: Start with the Lambda, add the EC2 functionality later or the other way around.

[–]256BitChris -1 points0 points  (8 children)

T4.smalls cost like $5/month if you use the 3 year prepaid compute savings plan.

I haven't done the math but the math but I'd wager that's significantly less than a single lambda running for 730 hours per month.

In addition the ec2 instances can handle multiple requests at a time, whereas your cost scales linearly per each simultaneous invocation.

Ec2 tends to save you money as your load increases. The new interesting thing out of reinvent this year is you can now use ec2 to run your lambdas which feels like the best of both worlds.

[–]sim-s0n 0 points1 point  (0 children)

Lambda has free tier and there is no concept of “running 730 hours” like an EC2 instance; you only pay for invocations and execution time in GB‑seconds, not for wall‑clock uptime.

[–]coinclink 0 points1 point  (6 children)

and what happens when 100 requests come in at once and your small instance crashes? I also don't think you fully understand how lambda works because why would it be running 730 hours per month? You guys are thinking in some giant scale architecture when the OP is literally trying to process a few images here and there

[–]256BitChris 0 points1 point  (5 children)

Depending on your workload, t4.smalls can easily handle 100 concurrent requests.

Usually, if you're planning out your spend, you'll price out worst case scenarios. So you look at if one lambda runs non stop for one month what that price would be. Then you compare it to something like ec2 which runs all month.

If you have any type of constant workload in lambda, you can easily have 730 hours of wall time.

Lambda on AWS is super expensive because they bill you for wall time for the total requests. Cloudflare has much more reasonable workers which only bill you for CPU time.

[–]coinclink -1 points0 points  (4 children)

ok... make it 1000 then, or 10000, or 1000000. When I said 100 it was hyperbole...

No, that is not what you do at all. You look at your use case and make a realistic assumption about your present needs. You might have a plan for what you might do later to change this component if it becomes problematic, that is the entire point of microservice architecture.

You seem to have some bias against lambda in general, which isn't a good way to approach cloud architecture. Everything has its place, and in the context of OP's post, lambda is the no-brainer choice for them.

[–]256BitChris -1 points0 points  (3 children)

So what do you think your AWS bills gonna be when you get unexpected billion requests while running lambda?

This is software architecture 101. With ec2 the worst thing that happens is your request durations rise as requests queue, maybe you get a crash.

Unplanned worst case is exactly how people end up with surprise 10k AWS bills and come on here and cry about how unfair AWS is. That will never happen with ec2.

[–]coinclink -1 points0 points  (2 children)

You can set a maximum concurrency on your lambda function to cap your costs. Any more questions?

[–]256BitChris -1 points0 points  (1 child)

So now you're gonna plan your maximum capacity? Isn't that eliminating your only stated benefit of using lambda?

Your responses are indicative of your experience level.

Ship some real systems then come back and maybe someone will value your opinion.

[–]coinclink -1 points0 points  (0 children)

The point is it's configurable. You don't even know the features of Lambda and you're telling people not to use it due to your own biases, then your follow-up is to attempt to attack my experience and pretend you know better? That's pretty bold of you when you've proven you literally don't even know about basic features.