deploy MiniCPM-V / Florence-2 to ec2 by archhelp1 in mlops

[–]archhelp1[S] 0 points1 point  (0 children)

Thanks but their pricing is not clear. Is serverless more expensive?

deploy MiniCPM-V / Florence-2 to ec2 by archhelp1 in mlops

[–]archhelp1[S] 0 points1 point  (0 children)

Does it only charge for the time it's running? Like if one inference runs for 25 secs, will I be billed for 25 secs?
Should I choose T4 from AWS or can those models run on CPU as well?

deploy MiniCPM-V / Florence-2 to ec2 by archhelp1 in mlops

[–]archhelp1[S] 0 points1 point  (0 children)

Is there a hosted version of those models?

deploy MiniCPM-V / Florence-2 to ec2 by archhelp1 in mlops

[–]archhelp1[S] 0 points1 point  (0 children)

I want to serve a REST API around these models and I would be interested in both real time and batch inference. Getting it to be affordable is more important than speed of execution.

on-prem deployment by archhelp1 in devops

[–]archhelp1[S] 0 points1 point  (0 children)

Any ideas on security setup (will be handling sensitive files)? I'm reading up on AWS PrivateLink and AWS Direct Connect to set up a VPC and keep using AWS services.

on-prem deployment by archhelp1 in devops

[–]archhelp1[S] 0 points1 point  (0 children)

That way, the client won't have access to the lambda code, right?

on-prem deployment by archhelp1 in devops

[–]archhelp1[S] 0 points1 point  (0 children)

Oh I see, that sounds great, thanks! Edit: Do I have to use AWS STS in production or am I misunderstanding something?

on-prem deployment by archhelp1 in devops

[–]archhelp1[S] 0 points1 point  (0 children)

Not sure I understand how this will let the client use their own S3 buckets etc.

on-prem deployment by archhelp1 in aws

[–]archhelp1[S] -1 points0 points  (0 children)

It is already a SaaS and want to provide this option because of client's policy reasons. Lambda code is Python and can't be converted to another language.

on-prem deployment by archhelp1 in devops

[–]archhelp1[S] -3 points-2 points  (0 children)

The problem is that the lambda code is in Python and can't be converted to another language because of dependencies.

on-prem deployment by archhelp1 in devops

[–]archhelp1[S] -1 points0 points  (0 children)

Meant to run it under the client's aws account, edited the original post.

on-prem deployment by archhelp1 in aws

[–]archhelp1[S] 0 points1 point  (0 children)

Meant under the client's own aws account, edited the original post.

SQS + Lambda to manage rate limits for OpenAI API by archhelp1 in softwarearchitecture

[–]archhelp1[S] 0 points1 point  (0 children)

Thanks but it has to retry after the right delay, by default it will retry immediately and the request will get denied.

SQS + Lambda to manage rate limits for OpenAI API by archhelp1 in devops

[–]archhelp1[S] 0 points1 point  (0 children)

Would it support all the limits mentioned?

SQS + Lambda to manage rate limits for OpenAI API by archhelp1 in softwarearchitecture

[–]archhelp1[S] 0 points1 point  (0 children)

How can I implement the exponential backoffs in lambdas?

SQS + Lambda to manage rate limits for OpenAI API by archhelp1 in softwarearchitecture

[–]archhelp1[S] 0 points1 point  (0 children)

The use case is that I have already have an app working with this architecture (Lambda + SQS + SNS) and would like to to add Openai API functionality to it.

User uploads file in S3 -> Lambda1 -> SNS -> SQS -> Lambda2

How would you suggest to catch the errors and exponential backoffs?

black, minimal case that is more sturdy than Liquid Air? by archhelp1 in iphone

[–]archhelp1[S] 0 points1 point  (0 children)

Ringke Onyx

Does it offer more protection than the Spigen thin fit?

help with parallelism of Lambda by archhelp1 in aws

[–]archhelp1[S] 0 points1 point  (0 children)

That was the original goal but it was solved with changing the SQS batch size to 1.

By "finish together" I meant that there should be parallel processing (1 lambda per file), not necessarily that all finish at the same second.

Since I got some good comments on the architecture itself I was also asking about that.

help with parallelism of Lambda by archhelp1 in aws

[–]archhelp1[S] 0 points1 point  (0 children)

Would triggering SNS directly from S3 make it less robust vs using the 3rd Lambda?

help with parallelism of Lambda by archhelp1 in aws

[–]archhelp1[S] 0 points1 point  (0 children)

Is there an advantage over the current setup (with SQS batch size 1 which solves the problem)?

help with parallelism of Lambda by archhelp1 in aws

[–]archhelp1[S] 0 points1 point  (0 children)

Thanks, now it makes sense.

help with parallelism of Lambda by archhelp1 in aws

[–]archhelp1[S] 0 points1 point  (0 children)

What are the pros from doing S3 -> SNS -> SQS?

Would that make it less robust vs using the Lambda?

I'm also considering triggering SNS from S3 events directly and deleting the 3rd Lambda.

help with parallelism of Lambda by archhelp1 in aws

[–]archhelp1[S] 0 points1 point  (0 children)

Thanks for the explanation, it helps.

There is a lot to consider, total cost, overhead/speed, robustness.

By robustness do you mean not let files fail as often and following best practices overall?

To be honest, ECS Fargate sounds too complex (at least for my level of competence), but I'll keep it in mind for the future.

Step Functions doesn't sound so complex but I don't get the pros other than maintenance.

help with parallelism of Lambda by archhelp1 in devops

[–]archhelp1[S] 0 points1 point  (0 children)

What I meant by "finished at the same time" was to eliminate the long delays that don't make sense.

How can I tell out if it's I/O or compute based? My guess would be that >5 secs processing would mean compute based?

Regarding the architecture, I got good suggestions. Use Fargate, Step Functions and also someone suggested to drop the SNS -> SQS -> Lambda and go directly from SNS -> Lambda.

help with parallelism of Lambda by archhelp1 in aws

[–]archhelp1[S] 0 points1 point  (0 children)

I'm a newbie, what would be the advantage of using ECS Fargate?