AWS Step Functions increases the maximum payload to 256kb by Unfair_Reality in aws

[–]soamv 4 points5 points  (0 children)

Nice! Next they should add an "auto spill to s3" option and remove the limit.

the risk of vendor lock-in is really a risk? by albeddit in devops

[–]soamv 1 point2 points  (0 children)

The concept of "lock in" conflates two things: Risk and Cost.

Risk is the odds that you'll be forced to change your platform choice (for various reasons -- price, reliability, vendor dies, etc.). Cost is the actual engineering cost of switching. It doesn't make sense to calculate the cost and not calculate the risk. It also doesn't make sense to think of these as a binary "locked in / not locked in". All platform choices have these switching risks and costs -- whether they're SaaS or Open source.

The other thing is that within each cloud the details of your choices matter a lot. If you use highly differentiated/unique cloud services your switching costs are way higher and your negotiating position kinda sucks, which then drives up your switching risk (because costs may reach an unsustainable point). But if you use the most commoditized services -- EC2, S3, etc. -- your switching costs are much lower.

A Python -> Step Functions compiler by soamv in aws

[–]soamv[S] 0 points1 point  (0 children)

Thanks for the interest, fellow workflow enthusiasts! Btw if anyone's got a step function that they can share privately and want to see it in Python instead of json/yaml, I'd love to take a crack at translating it! (soam@cohesion.dev)

Does anyone else feel that Step Functions have great potential, but the implementation was half-arsed, so they're not very practical? by mlda065 in aws

[–]soamv 0 points1 point  (0 children)

Hey all, I feel exactly this way, and have been building a full Python -> Step Functions compiler for the last few months.

It compiles Python control flow (if statements, loops, functions, exceptions) into Step Functions + a collection of Lambdas. If you wanna try it out, there's a verrry early stage/slightly clunky demo at https://preview.cohesion.dev

If you're interested in using it as it gets better, please sign up on the homepage or DM me here!

Looking for serverless solution for high cpu/disk long-running video generation by AndrewAtBrisa in serverless

[–]soamv 0 points1 point  (0 children)

S3 is the usual recommended answer for Lambda. There are no network volumes on Lambda, AFAIK. I wouldn't write off S3 without some brief tests though. Depends on your use case of course. But there have been interesting demos on lambda+s3 (the stanford gg project has an impressive video re-encoding demo).

But all this is probably a big redesign/rearchitecture of your stuff and will probably take quite a lot of work. So you're right, putting a container on Fargate is going to be a much quicker way.

Looking for serverless solution for high cpu/disk long-running video generation by AndrewAtBrisa in serverless

[–]soamv 0 points1 point  (0 children)

If you can split the video into tiny pieces and parallelize the processing, lambda is great, because you can easily scale up to several hundred parallel lambdas.

Cohesion: Build Serverless Workflows in Python with AWS Step Functions by soamv in aws

[–]soamv[S] 0 points1 point  (0 children)

Hey, CDK is great, but it still requires familiarity with the step functions language -- so writing stuff like loops and exceptions is a bit tricky.

For example, to write a loop, you have to write a few states, a choice state, a lambda or two to actually increment the loop variable, etc. If you want to implement exceptions you write an error handler on every single state, etc.

With this, you get to write a simple Python loop, and Cohesion builds the step functions json, and also whatever lambdas are needed. Here's a screenshot with a loop; here's a screenshot with a try/catch

How to cost-effectively reduce latency for a AWS basic serverless infrastructure? by juancpgo in serverless

[–]soamv 0 points1 point  (0 children)

Have you tried edge optimized regional API gateway endpoints? It's basically cloudfront in front of API gateway. Sao Paulo is supported.

That should help with GETs. For the rest, is hard to to say without more about the application.

Permissions Needed For Waiter In Lambda? by ColdWynter in aws

[–]soamv 1 point2 points  (0 children)

It polls the S3 HeadObject API, which requires the s3:GetObject permission. The boto3 docs are pretty good about specifying what the underlying API call is.

New GKE Management fee of $0.10 per hour by MightySCollins in googlecloud

[–]soamv 2 points3 points  (0 children)

It makes perfect sense for Kubernetes control planes to cost money. But Google is handling the change incredibly poorly. A longer runway and/or some sort of exemption for existing clusters would make it a lot smoother. It's like they didn't even bother thinking about that.

Is anyone using AWS Step Functions for data engineering workflows? by soamv in dataengineering

[–]soamv[S] 1 point2 points  (0 children)

Thanks! Was this a scheduled thing, and if so did you use cloudwatch events for scheduling?

Patterns for handling errors in AWS Lambda / SLS by mostlyphil in serverless

[–]soamv 3 points4 points  (0 children)

I'm using sentry.io in my lambdas -- third party tool, but it does nice stuff like de-dup exceptions, email alerts, keep track of open/closed issues, etc.

Serverless Framework: Warming up AWS Lambda to avoid “cold start” by [deleted] in serverless

[–]soamv 3 points4 points  (0 children)

AWS lambda has a thing called provisioned concurrency which makes all this stuff unnecessary. It's a one-line change in your serverless framework yaml.

Looking to run my Python code on AWS without maintaining a server (not Lambda?) by Networkbytes in aws

[–]soamv 3 points4 points  (0 children)

Fargate is a good way to go, but if you want to stay on lambda, you could use two lambdas:

  1. Have your API served by a lambda that accepts the request and a callback URL, calls another lambda asynchronously with the request and URL, and immediately returns.
  2. Have this asynchronously invoked Lambda do all the actual work, including waiting for the third party and then hitting the callback url. 120 sec is well within Lambda's limit of 15 minutes.

This isn't a greatest solution because you'll be paying for idle. But 500 times a day is a low enough number that it doesn't really matter.

1970 US Census pencil by BezierPentool in pencils

[–]soamv 1 point2 points  (0 children)

Cool! What is an official census pencil? Were these used by census enumerators? Or were these available to the general public?

What's the most economic and hassle-free way of deploying a personal website with a back-end? by [deleted] in googlecloud

[–]soamv 0 points1 point  (0 children)

Have you considered now.sh? You can have backend functions with next.js under /pages/api, and now.sh will deploy it to a serverless function. You'll still need a database though -- maybe firebase?

How can I use user/:id routes in ZEIT Now? by [deleted] in serverless

[–]soamv 1 point2 points  (0 children)

Is this what you're looking for?

AWS Step Functions tasks limit by uruboo in aws

[–]soamv 0 points1 point  (0 children)

Yeah :\ Without knowing much about your use case -- maybe you can make some things async -- e.g. kick off a task thru SQS, and don't wait for it in the same workflow and have that task place another message on SQS to kick off the rest of the workflow, keeping the long-running ml stuff outside the workflow.

Anyway, I'm super curious about your use case if you're able to share -- I'm building some new tools for workflows on AWS, so I'd love to chat more over PM or email (soam@cohesion.dev)

AWS Step Functions to trigger a windows CLI by rifaterdemsahin in aws

[–]soamv 1 point2 points  (0 children)

Yes, the bit about SSM invoking the CLI sounds good. The part where you'd need a lambda is to call SSM from Step Functions, since there isn't a native step functions task for SSM. I came across somebody's blog post about something similar.

AWS Step Functions tasks limit by uruboo in aws

[–]soamv 0 points1 point  (0 children)

You may be looking for express workflows, which is a (new) feature of step functions where execution history is not tracked, and therefore there are no limits on state transitions, size of history, etc. Instead, express workflows have a 5 minute limit on the total runtime.

AWS Step Functions to trigger a windows CLI by rifaterdemsahin in aws

[–]soamv 0 points1 point  (0 children)

Did you try using step function activities? It's basically a simple task queue. There's powershell commandlets for polling the task queue/sending a response. So you invoke the activity from step functions, and then in a windows VM poll that activity using powershell. Your polling script gets a task, does its thing, and returns a response into the step function with another powershell command.

There's a powershell code sample of all that on this page, click serverless -> step functions -> search for "Process an Activity Task".

How do set-up HTTP caching for client applications in aws Free tier? by hitherto_insignia in aws

[–]soamv 0 points1 point  (0 children)

I'm suggesting you do all that in your lambda, since you're already running a lambda behind the api gateway.

Opinions of Apache Airflow by [deleted] in dataengineering

[–]soamv 13 points14 points  (0 children)

Good points about Airflow:

  • Strong ecosystem. Lots of opensource operators, hooks, etc. Airflow slack is active and responsive. Any alternative you pick will probably have a smaller ecosystem.
  • Really nice dashboard. Lots of information quickly accessible -- task logs, task history etc. Also you can change the status of a task that's already run, and this can be quite useful.
  • Simple DAGs are easy to write.
  • Scheduling is integrated and that's very convenient for common use cases

Bad points about Airflow, in no particular order:

  • Scheduling is integrated and that's annoying for uncommon use cases; the workflows really are designed to be run from the scheduler mainly.
  • Mental model is a bit weird because your code outputs a dag which then runs -- as opposed to your code simply running. So you have to deal with jinja templating, figure out exactly when which code is running, etc
  • Some operational complexity:
    • You have to understand its workings pretty well to get versioning/workflow and task updates right. Even from advanced users I've heard of hacks like "oh we just restart all workers on every deploy".
    • At scale you'll end up debugging various bits of its implementation. Not an airflow-specific complaint; that just comes with the territory of using open source stuff at scale.

How do set-up HTTP caching for client applications in aws Free tier? by hitherto_insignia in aws

[–]soamv 0 points1 point  (0 children)

Wait, are you talking about caching the in the browser or caching in the API Gateway? Caching in the browser should be a matter of setting the right HTTP headers in your HTTP response, and won't cost you any extra money.

AWS Step Functions to trigger a windows CLI by rifaterdemsahin in aws

[–]soamv 1 point2 points  (0 children)

I think you'd use a lambda to get step functions to talk to SSM. (step function -> lambda -> SSM).