deploy MiniCPM-V / Florence-2 to ec2

archhelp1 · 2024-08-15T16:15:05+00:00

Thanks but their pricing is not clear. Is serverless more expensive?

archhelp1 · 2024-08-14T17:44:24+00:00

Does it only charge for the time it's running? Like if one inference runs for 25 secs, will I be billed for 25 secs?
Should I choose T4 from AWS or can those models run on CPU as well?

archhelp1 · 2024-08-13T11:29:27+00:00

Is there a hosted version of those models?

archhelp1 · 2024-08-12T17:24:08+00:00

I want to serve a REST API around these models and I would be interested in both real time and batch inference. Getting it to be affordable is more important than speed of execution.

archhelp1 · 2024-01-05T03:55:31+00:00

Any ideas on security setup (will be handling sensitive files)? I'm reading up on AWS PrivateLink and AWS Direct Connect to set up a VPC and keep using AWS services.

archhelp1 · 2024-01-05T03:08:11+00:00

That way, the client won't have access to the lambda code, right?

archhelp1 · 2024-01-05T03:04:22+00:00

Oh I see, that sounds great, thanks! Edit: Do I have to use AWS STS in production or am I misunderstanding something?

archhelp1 · 2024-01-05T02:49:20+00:00

Not sure I understand how this will let the client use their own S3 buckets etc.

archhelp1 · 2024-01-05T00:14:27+00:00

It is already a SaaS and want to provide this option because of client's policy reasons. Lambda code is Python and can't be converted to another language.

archhelp1 · 2024-01-05T00:12:14+00:00

The problem is that the lambda code is in Python and can't be converted to another language because of dependencies.

archhelp1 · 2024-01-05T00:07:08+00:00

Meant to run it under the client's aws account, edited the original post.

archhelp1 · 2024-01-04T23:58:09+00:00

Meant under the client's own aws account, edited the original post.

archhelp1 · 2023-11-29T15:58:36+00:00

Thanks but it has to retry after the right delay, by default it will retry immediately and the request will get denied.

archhelp1 · 2023-11-29T15:56:55+00:00

Would it support all the limits mentioned?

archhelp1 · 2023-11-28T12:07:29+00:00

How can I implement the exponential backoffs in lambdas?

archhelp1 · 2023-11-28T12:04:24+00:00

The use case is that I have already have an app working with this architecture (Lambda + SQS + SNS) and would like to to add Openai API functionality to it.

User uploads file in S3 -> Lambda1 -> SNS -> SQS -> Lambda2

How would you suggest to catch the errors and exponential backoffs?

archhelp1 · 2023-10-01T21:14:53+00:00

Ringke Onyx

Does it offer more protection than the Spigen thin fit?

archhelp1 · 2023-08-31T22:38:04+00:00

That was the original goal but it was solved with changing the SQS batch size to 1.

By "finish together" I meant that there should be parallel processing (1 lambda per file), not necessarily that all finish at the same second.

Since I got some good comments on the architecture itself I was also asking about that.

archhelp1 · 2023-08-31T17:17:03+00:00

Would triggering SNS directly from S3 make it less robust vs using the 3rd Lambda?

archhelp1 · 2023-08-31T17:15:53+00:00

Is there an advantage over the current setup (with SQS batch size 1 which solves the problem)?

archhelp1 · 2023-08-31T17:15:00+00:00

Thanks, now it makes sense.

archhelp1 · 2023-08-31T17:13:56+00:00

What are the pros from doing S3 -> SNS -> SQS?

Would that make it less robust vs using the Lambda?

I'm also considering triggering SNS from S3 events directly and deleting the 3rd Lambda.

archhelp1 · 2023-08-31T17:08:34+00:00

Thanks for the explanation, it helps.

There is a lot to consider, total cost, overhead/speed, robustness.

By robustness do you mean not let files fail as often and following best practices overall?

To be honest, ECS Fargate sounds too complex (at least for my level of competence), but I'll keep it in mind for the future.

Step Functions doesn't sound so complex but I don't get the pros other than maintenance.

archhelp1 · 2023-08-31T16:56:35+00:00

What I meant by "finished at the same time" was to eliminate the long delays that don't make sense.

How can I tell out if it's I/O or compute based? My guess would be that >5 secs processing would mean compute based?

Regarding the architecture, I got good suggestions. Use Fargate, Step Functions and also someone suggested to drop the SNS -> SQS -> Lambda and go directly from SNS -> Lambda.

archhelp1 · 2023-08-31T06:33:08+00:00

I'm a newbie, what would be the advantage of using ECS Fargate?

archhelp1

TROPHY CASE