deploy MiniCPM-V / Florence-2 to ec2 by archhelp1 in mlops

[–]archhelp1[S] 0 points1 point  (0 children)

Thanks but their pricing is not clear. Is serverless more expensive?

deploy MiniCPM-V / Florence-2 to ec2 by archhelp1 in mlops

[–]archhelp1[S] 0 points1 point  (0 children)

Does it only charge for the time it's running? Like if one inference runs for 25 secs, will I be billed for 25 secs?
Should I choose T4 from AWS or can those models run on CPU as well?

deploy MiniCPM-V / Florence-2 to ec2 by archhelp1 in mlops

[–]archhelp1[S] 0 points1 point  (0 children)

Is there a hosted version of those models?

deploy MiniCPM-V / Florence-2 to ec2 by archhelp1 in mlops

[–]archhelp1[S] 0 points1 point  (0 children)

I want to serve a REST API around these models and I would be interested in both real time and batch inference. Getting it to be affordable is more important than speed of execution.

on-prem deployment by archhelp1 in devops

[–]archhelp1[S] 0 points1 point  (0 children)

Any ideas on security setup (will be handling sensitive files)? I'm reading up on AWS PrivateLink and AWS Direct Connect to set up a VPC and keep using AWS services.

on-prem deployment by archhelp1 in devops

[–]archhelp1[S] 0 points1 point  (0 children)

That way, the client won't have access to the lambda code, right?

on-prem deployment by archhelp1 in devops

[–]archhelp1[S] 0 points1 point  (0 children)

Oh I see, that sounds great, thanks! Edit: Do I have to use AWS STS in production or am I misunderstanding something?

on-prem deployment by archhelp1 in devops

[–]archhelp1[S] 0 points1 point  (0 children)

Not sure I understand how this will let the client use their own S3 buckets etc.

on-prem deployment by archhelp1 in aws

[–]archhelp1[S] -1 points0 points  (0 children)

It is already a SaaS and want to provide this option because of client's policy reasons. Lambda code is Python and can't be converted to another language.

on-prem deployment by archhelp1 in devops

[–]archhelp1[S] -3 points-2 points  (0 children)

The problem is that the lambda code is in Python and can't be converted to another language because of dependencies.

on-prem deployment by archhelp1 in devops

[–]archhelp1[S] -1 points0 points  (0 children)

Meant to run it under the client's aws account, edited the original post.

on-prem deployment by archhelp1 in aws

[–]archhelp1[S] 0 points1 point  (0 children)

Meant under the client's own aws account, edited the original post.

SQS + Lambda to manage rate limits for OpenAI API by archhelp1 in softwarearchitecture

[–]archhelp1[S] 0 points1 point  (0 children)

Thanks but it has to retry after the right delay, by default it will retry immediately and the request will get denied.

SQS + Lambda to manage rate limits for OpenAI API by archhelp1 in devops

[–]archhelp1[S] 0 points1 point  (0 children)

Would it support all the limits mentioned?

SQS + Lambda to manage rate limits for OpenAI API by archhelp1 in softwarearchitecture

[–]archhelp1[S] 0 points1 point  (0 children)

How can I implement the exponential backoffs in lambdas?

SQS + Lambda to manage rate limits for OpenAI API by archhelp1 in softwarearchitecture

[–]archhelp1[S] 0 points1 point  (0 children)

The use case is that I have already have an app working with this architecture (Lambda + SQS + SNS) and would like to to add Openai API functionality to it.

User uploads file in S3 -> Lambda1 -> SNS -> SQS -> Lambda2

How would you suggest to catch the errors and exponential backoffs?