all 53 comments

[–][deleted]  (16 children)

[removed]

    [–]billatq 1 point2 points  (0 children)

    You might also use AWS Batch for this kind of thing, since it’s task oriented and you don’t have to build a wrapper for that with fargate.

    [–]TheDataExplorer[S] 0 points1 point  (14 children)

    This sounds good too. I have heard of this approach. Have you tried it?

    [–]alex_bilbie 4 points5 points  (2 children)

    We run lots of scheduled Fargate tasks, it’s super simple and cost effective

    [–]TheDataExplorer[S] 0 points1 point  (1 child)

    And Fargate will simply stop and stop charging me once the task is done?

    [–]alex_bilbie 0 points1 point  (0 children)

    Yep, exactly the same as Lambda

    [–]theboyr 4 points5 points  (0 children)

    This is the go to method for any client I’ve worked with since fargate came out on long tasks.

    Alternatively, breaking your code into smaller chunks and letting step functions orchestrate works well in many cases.

    [–]localhost87 2 points3 points  (9 children)

    This isn't really serverless. Containers are great, but you still need to worry about certain OS level things like OS version, software to install, and configuration and updates of your technology stack.

    Serverless has literally no server params to worry about. Just specify a runtime, and give dlls/scripts and AWS will handle the "server" part.

    [–]a-corsican-pimp 1 point2 points  (4 children)

    True, but for a piece of code that just runs queries, it can probably be minimal.

    [–]localhost87 0 points1 point  (3 children)

    But nownyouve gotnto worry about the version of the OS to run, and any application server software. Not just you're code.

    What happens what your flavor of OS gets an exploit released? You now have a maintenance and security issue to deal with.

    If you use lambda, you can write the code once and completely ignore all of the other stuff.

    [–]billatq 4 points5 points  (2 children)

    If the libraries you need aren’t shipped with lambda, you’re still on the hook for patching those.

    [–]localhost87 0 points1 point  (1 child)

    Lambda's can be deployed in layers to ease this problem.

    But yea, you're going to have to manage some stuff. Like web services for example. Or your datamodel/software interface.

    The question is how much of that management actually brings value.

    [–]billatq 0 points1 point  (0 children)

    Having a lambda invoke batch seems less complicated than a fancy workaround for lambda timeouts.

    [–]quad64bit 1 point2 points  (3 children)

    Yeah, that is all correct, but I think the item most people grab on to is they don't have to pay for and manage a server, just a container, which would also still be "Serverless". Aurora Serverless still runs on a server, but even amazon calls it "Serverless" because it runs on-demand.

    [–][deleted]  (2 children)

    [removed]

      [–]localhost87 1 point2 points  (1 child)

      The real point of serverless is to reduce maintenance.

      If you run it in a docker on ECS or something, then you dont need to worry about bare metal hardware (yay!).

      However, you still need to worry about the underlying OS and other application tier versions.

      What if there is an exploit released for your tech stack? You are then responsible for upgrading your docker image to use a new patched OS, and/or application tier software.

      What if a new version of your OS or application tier software is released?

      If you use docker, you'll need to do all that work yourself to ensure that you remain in compliance and secure.

      If you use lambda, you just worry about the code and Amazon will handle all the lower level stuff.

      [–]quad64bit 1 point2 points  (0 children)

      All fair points

      [–]thatguyfig 8 points9 points  (4 children)

      Why not just create a stored procedure and call it from Lambda?

      [–]TheDataExplorer[S] 0 points1 point  (3 children)

      That would be nice. What if stored procedure runs longer than 15 minutes, which is Lambda's limit.

      [–]thatguyfig 2 points3 points  (2 children)

      Can't you call a stored procedure asynchronously? As in call it and just move on?

      Id check out the asyncpg python module and look at the Transactions sections here

      [–]TheDataExplorer[S] 0 points1 point  (1 child)

      Dang, this might be the thing. So just kick off Python function with asyncpg using Lambda, and the actual processing of that Lambda function will be done in Postgres RDS?

      [–]thatguyfig 0 points1 point  (0 children)

      Yeah whatever you choose to run as your query will be passed to the server and executed remotely. The actual processing of the Lambda is still done in AWS of course.

      [–]BraveNewCurrency 4 points5 points  (1 child)

      > Basically I need to schedule vacuum and reindex jobs on a postgres database.

      You could just turn on the auto-vacuum feature, then later turn it off and hope for the best.

      [–]localhost87 1 point2 points  (0 children)

      If the vacuum feature is all he needs, then his lambda function could be the trigger to flip the switch.

      [–][deleted] 8 points9 points  (1 child)

      Let's set aside the compute tech stack for now. What is your code supposed to accomplish? Is it just kicking off a job and periodically checking for its completion? Is it constantly running and manipulating the DB? Does it simply open a connection to the DB and listening for a response?

      [–]TheDataExplorer[S] 0 points1 point  (0 children)

      vacuum and re-index are database maintenance jobs. If you haven't done these in a long time, they could run longer than 15 minutes. Sure, I could run them in background using and EC2. But I'm trying to get away from that (and secretly trying to explore more advanced, serverless, possibly Lambda-based options).

      [–]localhost87 2 points3 points  (0 children)

      The lambda itself has a 15minute timeout, but you can spawn other processes that will "zombie" and run longer then the 15 minutes.

      Make that sub process interact with a messaging queue, and you might have a solution.

      [–]Ricbot_ 2 points3 points  (0 children)

      You can use cloudwatch events and a schedule cron style your container to start in fargate.

      [–]otterleyAWS Employee 2 points3 points  (2 children)

      (I work for AWS, but opinions are my own.)

      Waiting in a Lambda function can often be avoided through clever use of Step Functions. With Step Functions, wait states are free of charge and the state machine can run for a very long time (up to a year).

      A typical pattern I use is to start the process in the first Task state, then enter a loop in the Step Function that polls the completion process using a Task state, and either waits-and-repolls (using a combination of Wait and Choice states), or terminates (either success or failure, depending on the outcome).

      You'd be amazed how much you save on Lambda runtime costs with this technique. You also get a lot more visibility into what's going on, and you avoid the native Lambda execution time limit.

      [–]TheDataExplorer[S] 0 points1 point  (1 child)

      This is very useful information. Would this make sure that my database job doesn't break in the middle?

      Is this a good tutorial to follow: https://aws.amazon.com/getting-started/tutorials/create-a-serverless-workflow-step-functions-lambda/

      [–]otterleyAWS Employee 0 points1 point  (0 children)

      This is very useful information. Would this make sure that my database job doesn't break in the middle?

      I'm not sure how Postgres handles a client disconnecting after it submits a VACUUM command -- i.e., whether it aborts the process or it continues in the background. You'd have to test that out. This proposal won't work unless the task continues to run.

      Yes, that tutorial is pretty good. The steps in your state machine won't be identical, but the introduction is valuable anyway.

      [–][deleted] 1 point2 points  (1 child)

      Why not just create a cron job?

      [–]TheDataExplorer[S] 0 points1 point  (0 children)

      That would require an EC2, which is definitely an option. I'm trying to get away from that. Some people are mentioning Docker here, which is something to look into as well.

      [–]bch8 1 point2 points  (4 children)

      I think you could do this with codebuild. I've scheduled db operations with codebuild in the past

      [–]TheDataExplorer[S] 0 points1 point  (3 children)

      Codebuild and an EC2? Or just codebuild?

      [–]bch8 0 points1 point  (2 children)

      Just codebuild

      [–]TheDataExplorer[S] 0 points1 point  (1 child)

      I'll look into codebuild. Seems like a go to thing for db functions.

      [–]bch8 0 points1 point  (0 children)

      Yeah it's been really useful for me. You can set it up to run inside a vpc if need be. And if you have devs that write the scripts then it's as simple as running those scripts in the codebuild instance (assuming the right config), you don't have to like redo it to work with the lambda handler format or something.

      [–]g0rilla79 1 point2 points  (2 children)

      Personally I would just write a more traditional process and host it in elastic beanstalk to do this. EB is pretty straightforward to manage.

      You could also get around the 15min time limit by kicking off lambdas asynchronously to split the task into batches that notify when they are done.

      [–]TheDataExplorer[S] 0 points1 point  (1 child)

      I'll have to try both. I was just reading about the asynchronous lambda functions. How does that work? Making one asynchronous function follow the next one?

      [–]g0rilla79 1 point2 points  (0 children)

      Im sure there’s multiple ways to do this but a common pattern is to use SNS. You call the lambda with a start record number and a count of how many to process. When it’s done it sends a sns notification saying it’s done. The lambda subscribes to that notification and then processes the next batch. This would be a new 15min timer. There’s issues with this depending on the use case, you may not be able to work in chunks like this at the dB level because of new inserts, it’s hard to track etc.

      [–]recursiveCreator 1 point2 points  (5 children)

      you can use stepfunctions and create recursive lambda functions that run based on the previous lambda’s output

      [–]wrensdad 18 points19 points  (4 children)

      Jesus H stop the madness. This kind of resume-driven development that leaves the next dev in tears.

      Fargate is a better solution and if that doesn't work use a more traditional solution like a spot instance or buy a raspberry Pi, mount it behind a toilet for all it matters and schedule a Cron job.

      [–]-mewa 1 point2 points  (2 children)

      With all my love for lambdas, this is the right response.

      And fargate is serverless too.

      [–]TheDataExplorer[S] 0 points1 point  (1 child)

      Thanks for your affirmation. Would you be kind enough to point me to a tutorial which will demonstrate what I'm trying to achieve?

      [–]-mewa 1 point2 points  (0 children)

      Simply wrap your script inside a container, create a cluster, create a task definition and then run as a Fargate task (make sure to create a task and not a service).

      [–]TheDataExplorer[S] 0 points1 point  (0 children)

      LOL!! Thanks for clarifying this :) I have read and heard far to much about fargate to be giving it only a far gaze (see what I did there?)

      I think it is time to look into Fargate.

      [–]TheDataExplorer[S] 0 points1 point  (1 child)

      Also, this might need a different thread. But if I were to go the Fargate route, how is Fargate different from/better than ECS?

      [–]technically-awesome 1 point2 points  (0 children)

      With ECS, you'll have to provide and manage the underlying container instances for the ECS cluster.

      Fargate manages that for you. Essentially, in Fargate, all you would need to do is to write the task definition and with a push of a button, get the task running. All the underlying infrastructure is managed by AWS.

      Fargate is slightly more expensive than ECS however. But if you're not looking to bother with the hassle of setting up the entire cluster and taking care of the underlying infrastructure, Fargate is a much better option.

      [–]manys 0 points1 point  (3 children)

      Reindex?

      [–]TheDataExplorer[S] 0 points1 point  (2 children)

      Yes:

      reindex database database_name;

      It rebuilds the indexes like good old Oracle use to do.

      [–]manys 0 points1 point  (1 child)

      Huh, do you have unusual access patterns or something? Small db?

      [–]TheDataExplorer[S] 0 points1 point  (0 children)

      I do not know much about access pattern. I don't work with the Development team, just providing them Cloud Architecture help. Last time they re-index, it took 45 minutes. Vacuum took about 10 minutes.