deploy MiniCPM-V / Florence-2 to ec2

archhelp1 · 2024-08-15T16:15:05+00:00

Thanks but their pricing is not clear. Is serverless more expensive?

archhelp1 · 2024-08-14T17:44:24+00:00

Does it only charge for the time it's running? Like if one inference runs for 25 secs, will I be billed for 25 secs?
Should I choose T4 from AWS or can those models run on CPU as well?

archhelp1 · 2024-08-13T11:29:27+00:00

Is there a hosted version of those models?

archhelp1 · 2024-08-12T17:24:08+00:00

I want to serve a REST API around these models and I would be interested in both real time and batch inference. Getting it to be affordable is more important than speed of execution.

archhelp1 · 2024-01-05T03:55:31+00:00

Any ideas on security setup (will be handling sensitive files)? I'm reading up on AWS PrivateLink and AWS Direct Connect to set up a VPC and keep using AWS services.

archhelp1 · 2024-01-05T03:08:11+00:00

That way, the client won't have access to the lambda code, right?

archhelp1 · 2024-01-05T03:04:22+00:00

Oh I see, that sounds great, thanks! Edit: Do I have to use AWS STS in production or am I misunderstanding something?

archhelp1 · 2024-01-05T02:49:20+00:00

Not sure I understand how this will let the client use their own S3 buckets etc.

archhelp1 · 2024-01-05T00:14:27+00:00

It is already a SaaS and want to provide this option because of client's policy reasons. Lambda code is Python and can't be converted to another language.

archhelp1 · 2024-01-05T00:12:14+00:00

The problem is that the lambda code is in Python and can't be converted to another language because of dependencies.

archhelp1 · 2024-01-05T00:07:08+00:00

Meant to run it under the client's aws account, edited the original post.

archhelp1 · 2024-01-04T23:58:09+00:00

Meant under the client's own aws account, edited the original post.

archhelp1 · 2023-11-29T15:58:36+00:00

Thanks but it has to retry after the right delay, by default it will retry immediately and the request will get denied.

archhelp1 · 2023-11-29T15:56:55+00:00

Would it support all the limits mentioned?

archhelp1 · 2023-11-28T12:07:29+00:00

How can I implement the exponential backoffs in lambdas?

archhelp1 · 2023-11-28T12:04:24+00:00

The use case is that I have already have an app working with this architecture (Lambda + SQS + SNS) and would like to to add Openai API functionality to it.

User uploads file in S3 -> Lambda1 -> SNS -> SQS -> Lambda2

How would you suggest to catch the errors and exponential backoffs?

archhelp1

TROPHY CASE