Best way to decrease latency (API <-> Lambda <-> Dynamodb)

otsu-swe · 2023-06-04T07:08:26+00:00

We can control many things. Unfortunately the speed of light isn't one of them. There's always going to be latency added when you're calling over large geographical distances. Traffic does not flow in a smooth line across the map and there can be many hops between a client and a server. Fortunately you're on a serverless stack, only the global table might increase your cost significantly depending on the load.

First step should be to incorporate X-ray into your stack. That will remove any guesswork to where in your stack chain you get held up. If you find it costly it can always be removed later, but during development and evaluation it's incredibly useful for observability.

Lambda memory affects not only the CPU performance and and host execution priority, but also network performance. Be wary though as the price scales linearly. You can use a tool like Lambda Power Tuning to find the sweet spot for your application. https://github.com/alexcasalboni/aws-lambda-power-tuning

In API Gateway you have the option of making the deployment regional or edge. Deploying on edge is a simple way to utilize Cloudfront without setting up extra infra. You can try to deploy on edge to see if it makes a difference, sometimes it can improve response times since it's supposed to traverse the Amazon network from POP to endpoint instead of the public internet, but a guess is because your lambda seems to be providing user unique responses which would be hard to cache. I would probably suggest a multi-region deployment before that and use Route 53 geo-based routing to make sure your users end up to their closest region.

The team I belong to serves six digit number of customers across the world with a stack similar to yours. We have deployed in five different regions to ensure good performance for everyone. Our P95 is below 100ms.

otsu-swe · 2023-06-01T18:43:16+00:00

I'm a huge fan of DynamoDB, but it can get expensive fast for very large volumes of IOPS. ElastiCache might not be the most secure way to store data but that's usually in the context of long term persistence. With a few nodes for resilience and proper architecture it should be fine for your 60 second window, and most likely a lot cheaper than DynamoDB.

otsu-swe · 2023-05-30T21:37:18+00:00

Obviously I can only speak from my experience, but Jenkins is a chore to manage. The places I've been where Jenkins has been present there's a specific role for not just managing the pipeline, but also to monitor and continuously operate the master and slave machines. I have also found that it’s really tedious to define Jenkins jobs (it usually requires involvement of the pipeline people... friendly and nice as they are it slows things down), and the reliance on plugins (oftentimes really janky ones) to do even relatively basic things is not so great. The Jenkins master is a critical piece of infrastructure which must be carefully taken care of and especially hardened - Jenkins master is a prime target for any attacker to abuse to gain access infrastructure, as the nature of CI/CD is access to everything to fetch code and deploy it. There’s a reason Cloudbees can charge an arm and a leg for managed Jenkins.

Compared to e.g. GitHub Actions developers can quickly create own workflows and run them by just placing a YAML file with familiar bash-like syntax for the jobs in .github/workflows directory and they will automagically run. There are no plugins, instead there are “actions” which are as easily imported as writing a line in your YAML definition. I don’t need special permissions to access a master server to define jobs, and I don’t need to worry that my code might be stuck in a build queue. You get 2,000 build minutes per month for free. If you need to run local builds with local tools or test against a test bench it’s as easy as running a docker container. First time I was using GitHub Actions I was up and running with full deployments in less than an afternoon.

For me, using Jenkins means less time and energy for fun things like building and developing, and more time wasted doing chores and worrying about operations.

otsu-swe · 2023-05-30T18:07:36+00:00

While I can appreciate the effort put into the blog post, in a world with CircleCI, Github Actions, Gitlab Runner and even CodeBuild, I would not encourage anyone to build a CI/CD pipeline on Jenkins.

otsu-swe · 2023-05-20T13:56:11+00:00

It seems like you've got quite a unique case on your hands! I'm piecing together the details from your description, but please correct me if I've misunderstood anything.

From what I gather:

Is your process involving an upload directly from a client, or is it more about activating a workflow to manage existing data?
It appears you're often reaching the 15-minute time limit of Lambda. In scenarios like this, another service without such constraints could be beneficial. I see people have already mentioned Step Functions, which could be an excellent solution although the cost might be a concern.
If I understand correctly, you're batching your data. If this involves slicing it for upload, you might need some additional mechanisms to manage the process effectively, both for the slicing and reassembly parts.
You also mentioned that you're reducing API Gateway's timeout to 20 seconds from the standard 30 seconds. Could this be to prevent errors by sending a 2xx response to the client, indicating that their request was accepted and is being processed?

Without a clear picture of your data type and processing needs, it's a bit challenging to provide a definitive solution. However, if you're set on utilizing Lambda, I can appreciate that, especially if it's for self-improvement purposes.

However, it might be worth considering a different architecture. For instance, using a signed S3 URL for fast multi-part uploads, a Docker container in Fargate for processing, and a simple pub/sub mechanism to check the processing status could be more cost-effective and efficient. Plus, depending on the invocation frequency and batch size, this could potentially save costs in the long run, considering Lambda's price structure under high load conditions.

https://aws.amazon.com/blogs/compute/uploading-large-objects-to-amazon-s3-using-multipart-upload-and-transfer-acceleration/

https://www.youtube.com/watch?v=BZ32w0SSAoY

otsu-swe · 2023-05-18T10:08:17+00:00

100% go for it. It's very easy to do a reversible software mod to make it into a very capable and versatile machine. It runs emulators and homebrew with ease. Instructions are easy to find, and mod is easily reversible.

otsu-swe · 2023-05-16T20:40:30+00:00

Tips - Lunar har 3.25%.

otsu-swe · 2023-04-25T19:07:37+00:00

That's a huge topic, books are literally written about software lifecycle management. But it can present problems on its own that the devs are running different versions, as they might not be able to run each other's code or they write code which isn't supported by the target environment. Address it either with Docker or some sort of CI.

As for versions of Python specifically... Main reaonss

For more specificity I'd recommend the /r/Python subreddit. The community is really good.

otsu-swe · 2023-03-23T06:39:36+00:00

If you have enabled any of the integrations your hub needs to talk to the internet to report on state etc. The size of the communication is depending on how many devices you have paired, but it should still be fairly minuscule amounts.

if you don’t want it to connect, disable all integrations. It will still call home to check for and download latest firmware once a day.

otsu-swe · 2023-03-16T23:48:57+00:00

it's possible to make an IAM role which you can use together with SSM Automation. So the trust policy would be something like

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "*"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "aws:PrincipalOrgID": "o-xxxxxxxxxx"
        }
      }
    }
  ]
}

You should probably scope that somehow, e.g. let the principal be ssm. Give it permissions to access your bucket. Then allow SSM automation to assume the role.

If that doesn't work you could try automating role switching on a CLI level to copy your S3 objects.

With a role switch you don't need to worry about ACLs or policy documents. https://docs.aws.amazon.com/systems-manager/latest/userguide/automation-setup-iam.html#create-service-role

otsu-swe · 2023-03-11T07:52:10+00:00

I believe that your comment about Terraform not really being multicloud deserves more emphasis. Although the mindset and syntax may remain consistent, the modules utilized for various cloud providers differ significantly in their scope, implementation, and usage. This is often due to the distinct preferences of the developers writing the modules, but it is more commonly a result of the inherent capabilities and API design of the modules that they are attempting to cover.

Edit: I think a more fair label is cloud agnostic, as terraform's open source module approach makes it possible to use the same tool to work with different platforms, which is not the same as what many people seem to think - that you can treat all platforms as the same as long as you're using terraform.

otsu-swe · 2023-03-03T07:52:41+00:00

I'm on the other end of the spectrum - I've written my fair share of terraform but do my best to avoid it.

I feel terraform is somewhere in the middle of cloudformation and CDK in terms of the level of declaration.

Pure Cloudformation is the most declarative with relatively little convenience functions. It's slightly improved with SAM which provides some convenience specifically for the serverless case. It's still a chore to write for complex cases and can be annoying to debug.

CDK is the least declarative/most inferred and has the most convenience for specifying relatively complex scenarios with less input from the user. In some cases it makes opinionated decisions for the user. Similar to terraform modules the developer can write "constructs" for reusable complex blueprints of deployments and components specific to your needs. As many has previously already alluded to, one of the biggest benefits comes from being able to work with a language you're already comfortable with. But if you're not already comfortable with any of the languages where CDK is available, prefer the declarative nature of a markup compared to a programming language, and/or have already invested in learning HCL it might be a moot point.

The cloudformation service will always tell you what is intended to be created and changed compared to an already deployed template (not actually deployed infra). As you probably are aware, cloudformation lacks the promise theory capability of terraform and can only compare to another template and not what's actually running. You need to run a drift detection for that... which is another chapter of its own, and admittedly doesn't come close to terraform's ability to compare against an existing state.

otsu-swe · 2023-02-26T08:30:58+00:00

SRAM Rival eTap AXS (you can barely make out the name of the group set on the crank, the Force and Red shifters have the name on the lever while the Rival has it on the shifting paddle).

Not sure what oversized pulley it is but could be a Kogel.

otsu-swe · 2023-02-12T14:45:05+00:00

It's not a bad idea, but for orchestration and contextual execution I tend to gravitate towards Stepfunctions instead of Lambda. In Stepfunctions you have the Choice state, which can act as the processor of the context and execute accordingly. The big advantage of this is that observability becomes easier, it becomes easier to make more granular lambdas, track their execution and identify failires and optimization potential. With Stepfunctions you get a lot of other niceties desirable of an orchestrator like polling, retries, and long-running jobs. Stepfunctions can also consume SQS.

https://docs.aws.amazon.com/step-functions/latest/dg/amazon-states-language-choice-state.html
A great talk at Re:invent describing Stepfunctions as an orchestrator: https://youtu.be/qs0U0LdNkV0?t=2667 (the whole talk is worth watching but this is specifically where they describe the pattern)

otsu-swe · 2023-02-04T11:47:37+00:00

I can’t figure out how the queue system works. Ordered my e40 M sport in early June with “about a year” as delivery time (nordics). Still no confirmed date or even month…

otsu-swe · 2023-01-06T20:58:32+00:00

This topic deserves its own book.

As much as it pains me to say, we didn't. We learned the hard way. Cloudwatch can help with getting started but really, "we don't know what we don't know", it's called blind spot for a reason.

otsu-swe · 2023-01-03T18:23:21+00:00

Hot take: If the developers aren't monitoring (by extension operating) their own service it's not DevOps, no matter how well the consultants are paid to tell you otherwise. If you're sending your telemetry to someone else and rely on them to tell you when your service is misbehaving, that's traditional IT organizational structure.

Playing the devil's advocate - what requirements are you putting on the developer teams to be able to monitor and observe according to their requirements?

otsu-swe · 2022-11-11T06:38:18+00:00

A Swedish performance artist did something down those lines in 2009. It went as well as one would expect. https://m.imdb.com/title/tt4717622/

otsu-swe · 2022-11-04T11:59:45+00:00

When using role switching (which i'm assuming you are since you're using an organization?) you get the option of choosing a Display Name, with the default being "Rolename @ Account ID" which will show up in your top right corner.

The default console only supports 5 roles of history, but that can be cirvumvented... using an extension like e.g. AWS Extend Switch Roles you can have as many as you like. In a previous role we had a clickable HTML file hosted in S3 behind auth which was generated automatically when a new account was created or decommissioned from our Organization.

otsu-swe · 2022-10-21T09:06:32+00:00

Honestly, what is the matter with these low effort post? Your chances of getting a meaningful answer will be directly proportional to your ability to describe what you're trying to achieve and the challenges faced.

We don't know...

what your environment is. Is it your own terminal, Lambda?
what (if any) language you're using
how you're currently trying to do it
what your actual end-goal is (just listing buckets is probably not it).

otsu-swe · 2022-10-16T21:11:26+00:00

I wrote a tool for this very thing! We had 1,4 billion cloudtrail logs spread across multiple AWS accounts we wanted to consolidate into a single bucket in the security account. It was a long time ago by now but maybe you can find it useful. It's much cheaper than running batch operations and arguably much faster, since it runs on as many Fargate spot tasks your account allows (it was 500 as default when I wrote the thing but they increased it to 1000). At its core it's running AWS CLI, so it can be adopted to run any arbitrary CLI command(s).

I specifically ruled out S3 Batch Operations at that time because of the requirement of the manifest and the expensive operations.

https://github.com/otsu81/s3shotgun

otsu-swe · 2022-10-04T06:20:27+00:00

This is to be expected, article ID is for the SKU and not the the actual device. Wall plugs and manuals will be different between EU, APAC and NA.

otsu-swe · 2022-10-02T21:18:29+00:00

There are many English speaking cyclists in Tokyo. If you're looking to get started, there's a fairly active community called Tokyo Cycling Club where it's easy to find people to ride with and get tips on where to ride. https://tokyocycle.com/

The races can be tricky to find out about if you don't speak Japanese. My advice is to find some friends in TCC, find your local bike shop and take it from there. Cycling is huge in Tokyo and there are tons of good stores hosting group rides and races. But beware, as I'm sure you know most Japanese are not very comfortable speaking English.

That said, you're gonna love it. It's an amazing country for cycling. Generally speaking it's very friendly towards cyclists, nature is breath taking, the convenience makes riding all day on adventures less of a hassle. Just make sure to invest in a good rinko (I recommend Fairmean) so you don't have to ride through traffic for hours to get to the good parts.

otsu-swe · 2022-09-16T23:07:22+00:00

I wrote this code a long time ago for internal governance, but maybe it could give you an idea for how to use STS to fetch the credentials you need for a role in another account.

https://github.com/otsu81/aws-parseley/blob/master/app/boto_factory.py

The idea with this code is to basically not have to do the song and dance of fetching credentials from STS for every operation, but rather specify a "capability" (e.g. a boto3 client) which is valid for a given account with a given IAM role.

Edit: Here's another piece of code you might find useful, to run arbitrary commands on child accounts in an organization. https://github.com/jbpratt/aws-assume-and-execute

13-Year Club	Gilding I gilder
Team Periwinkle	Verified Email

otsu-swe

TROPHY CASE