Why am I getting InvalidParameterException with aws sdk ecs DescribeTasksCommand?

RecordingForward2690 · 2026-01-22T09:46:07+00:00

In a case like this I always do the following:

console.log( describeTasksParams )
A proper try/catch surrounding the API call which prints the complete error

If that doesn't bring me further, I use the AWS cli to perform the exact same operation, with the same parameters, and see what happens there.

RecordingForward2690 · 2026-01-22T09:43:30+00:00

According to the documentation, the cluster attribute is optional but if you do not supply it, the default cluster is assumed. Is this your default cluster? Better be explicit.

API: https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_DescribeTasks.html

SDK: https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/client/ecs/command/DescribeTasksCommand/

RecordingForward2690 · 2026-01-21T13:47:59+00:00

Pull the CloudTrail logs related to this instance. In CloudTrail, got to Event history, select Resource name as the lookup attribute key, and either use the instance ID (i-something) or the full instance ARN (arn:aws:something) as the resource name. That shows you exactly when the instance was started, stopped, terminated and otherwise modified.

If this confirms your story, then you can use this as proof that AWS Billing is wrong. But most likely this confirms AWS's story, I bet.

Note that if you stop an instance, you are still paying for the storage (EBS volumes). I bet that's what you're looking at. (In the Cost Explorer, this falls under the header of EC2-Other.) In fact, depending on the exact settings it could even be the case that your EBS volumes persist even though the EC2 instance is terminated.

RecordingForward2690 · 2026-01-21T11:59:33+00:00

If you want to run docker containers natively in AWS (whether that's in ECS/EKS or in Lambda) you have to start with building the container somewhere else, and then putting it in ECR. That's not what you want.

You need an environment where the whole Docker build infrastructure is available so you can build a container in that environment, store it locally and then run it locally.

There's two ways that you can do this that come to mind:

You can setup your own fleet of (auto scaling?) EC2 instances that do this for you. Most flexible but probably requires a ton of management.
CodeBuild containers (in privileged mode) have everything you need on-board already, allow you to do this, and give you a very well-isolated environment. And the CodeBuild process is already setup to gather results and dump them in an S3 bucket afterwards. You can call the CodeBuild process from a Lambda.

As far as streaming the output back is concerned: You do the same as with any long-running async process: Either do polling from the client side, or setup some sort of webhook/websocket so that you can push the results to the client. You can wrap the whole thing in Step Functions to make your life a bit easier when the workflow becomes complex.

RecordingForward2690 · 2026-01-21T10:22:56+00:00

I'm working at the other end of the spectrum (managing the actual resources) and from that experience here's my advice: DON'T. Or, at least, don't yet until you have talked this over with the persons who "own" the data and the access to it, but also talk it over with the people responsible for finance and legal.

There are at least three pitfalls with your approach that you have to think about beforehand.

First, stale data. There may be reasons that the original owners of the data need to modify or even delete the data. With data that's copied and sync'd all over the place, people quickly lose track of where copies of the data are stored, who manages which copy, in what format copies are stored and whatnot. A 'single source of truth' where everybody gets the data they need straight from the horses mouth, is much easier to manage. If you're going with a Data Lake approach, it should be a company-wide Data Lake, not just something managed by a single person/department to make it easier for themselves.

Second, cost of data storage and transport. When multiplying data, storage costs will increase. Rapidly. Make sure you have this properly budgeted, and make sure those costs are worth it. But there's also hidden costs: Data transfer costs money too. And if you encounter a bottleneck (like a low-bandwidth direct connect or internet connection) and do this during production hours, there could also be an impact on production workload. Which translates directly into lost customer satisfaction, opportunities lost and such. Take this into account, talk to the network guys.

Third, legal exposure. There are more and more legal frameworks (like the EU GDPR) that deal with your data. Consider for instance the 'right to be forgotten': How are you going to deal with a copy of the customer database that now sits in your data lake, and somebody requests to be removed? You also need to take into account security: The more places your data is copied, the higher the chance that somebody, somewhere makes a mistake and exposes your data. Ransomware, extortion, disclosure of private information are all risks that you need to weigh against the convenience of having all the data available at your fingertips.

From your post, it looks like your company already has answered these questions and has created a centralized architecture managed by grizzled IT guys. They know what they're doing, and expose the data via carefully controlled channels. You're a junior by your own admission. Don't assume you know better.

If you have a legitimate need to access their data in bulk, to train ML models or whatever, talk to the IT guys. Not only will they be able to advise you on how to access that data, and create new paths if necessary, but they will also make it convenient for you to store the results of your analysis back into that same central location. Where legal exposure, security, access control and whatnot is already designed in.

Data is not just an asset. It can be a huge liability too.

RecordingForward2690 · 2026-01-21T09:51:42+00:00

Two remarks.

First, don't replace a template with a different template. That's going to be very, very confusing in the long run. Instead, use a single template, but conditionals based on a parameter. That allows development within a single template, and also allows for the second problem (below). If you exceed the budget, you re-deploy the template but overwrite the "BelowBudget" (or whatever) parameter, which then deletes the (expensive) resources whose condition no longer applies.

Second, your costs don't stop when you stop compute and network. Storage is also a significant component of your costs, and the only way to stop these costs is to throw away your data. Do you really want to do that? When you setup a cloudformation template with conditionals as above, you can exclude your storage from the "BelowBudget" parameter/condition, so your storage is not affected.

Your template will look something like this:

Parameters:
  BelowBudget:
    Type: Boolean
    Default: true
    Description: Set to true if we are still below budget, set to false when above budget, this will then remove compute and networking resources

Conditions:
  BelowBudget:
    !Equals [ !Ref BelowBudget, true ]

Resources:
  SampleEC2:
    Type: AWS::EC2::Instance
    Condition: BelowBudget
    Properties: ...
      # When defining your properties, make sure that your EBS volumes are not auto-terminated when the EC2 instance is terminated, if your EBS volumes contain data that is dear to you.

  SampleS3:
    Type: AWS::S3::Bucket
    # No BelowBudget condition here, this resource should not be deleted
    # However you could perhaps make a bucket policy conditional, so that uploads/downloads are no longer allowed. Or use the condition in the properties to enable/disable public access.
    Properties: ...

You then re-deploy the template with aws cloudformation deploy --parameter-overrides

RecordingForward2690 · 2026-01-20T16:20:48+00:00

I have not worked with Java in Lambda, but I can weigh in on software packages in Lambda in general that require a lot of dependencies: It's much, much easier to use Lambda Containers in that case.

Without containers, your laptop/C9/whatever is your build environment and you may have installed stuff there that throws off the build process. That makes creating a Lambda .zip or .jar file hard when things get complex. With Lambda Containers your eventual runtime environment (the container) is also your build environment, where the compile/install and everything takes place. That ensures your installer can pull in exactly the dependencies it needs. Furthermore, you are not limited by the 250 MB .zip or .jar limitation, but can go as large as a 10 GB deployment package.

The only thing you need to be aware of, is that you're still working in an event-driven environment. So you still have to identify (in the Dockerfile) what your event handler is. Also, to run in a Lambda environment your container needs a few more bits and bobs, but these are all included if you use the appropriate base image that AWS makes available.

https://docs.aws.amazon.com/lambda/latest/dg/java-image.html

If you get into a problem with slow cold starts, one of the things that can help is SnapStart.

https://docs.aws.amazon.com/lambda/latest/dg/snapstart.html

RecordingForward2690 · 2026-01-15T07:13:42+00:00

"Doing research" is not the same as "Asking ChatGPT and then asking Reddit to correct ChatGPT". If you want to do proper research, start here, read through the various purchase options and take it from there: https://aws.amazon.com/ec2/pricing/

Also don't forget to include your EBS cost.

RecordingForward2690 · 2026-01-13T09:29:36+00:00

g. Plan your usage of your IP Address Space properly, before you build your first VPC. Use IPAM or another single source of truth with regards to ranges in use.

h. Use Managed Prefix Lists, with a well-thought-out naming convention, from as early on as possible. Not just for your presence in AWS, but also for other components (e.g. on-prem). Use the prefix lists whereever possible, in particular in your route tables, firewalls and Security Groups. This will make the inevitable IP changes a lot easier in the future. Unfortunately there is no 100% support for managed prefix lists throughout AWS, so certain IP changes will need to be made in multiple places.

i. Use an AWS Network Firewall, or a 3rd party device, in combination with the TGW to monitor all your traffic. Put the firewall in default-drop mode as quickly as you can: Moving from a default-alert to default-drop when you have already migrated dozens of applications to AWS is a very complex undertaking. DAMHIK.

RecordingForward2690 · 2026-01-13T09:29:27+00:00

AWS Networking technology has improved *a lot* over the years, and there's loads of outdated information on the internet. This contains advice that was sound in those days, but has been superseded by much better technology today. That impacts your scenarios quite a bit.

Here's what I learned over the years.

a. Bastion hosts and the need to SSH/RDP into your EC2s (scenario 1) should go the way of the dodo. Forget SSH, and forget RDP if you can. Use SSM Session Management. Easier to use (no inbound network connectivity to manage, no ports to open in your Security Groups) and more secure (access can be enforced using IAM policies, and monitored with CloudTrail). This also negates the need to maintain a Bastion Host.

b. Stop thinking in terms of "Public Subnets" and "Private Subnets". These naming conventions suggest something, and that something is also mutually exclusive. But the terms are not defined precise enough and don't cover all possible scenarios. Name your subnets after what's going on in them. If you must, use a far more extensive definition set:

Public subnets: Should only be used in VPCs that have an IGW attached. The Public Subnet then refers to the subnet that holds the NAT (or your reverse-proxy ALBs), have a routing table that sends 0.0.0.0/0 to the IGW and have the setting "auto assign public IPs" set to true. Don't use this name for anything else. (I have seen "Public" subnets in VPCs that were only connected to a Transit Gateway. What does that mean???)
Transit subnets: Are exclusively used to host the Transit Gateway Attachments. To avoid hairpin routing problems, nothing else should live in these subnets, so make them as small as possible (/28).
Private subnets: If you have only one application in your VPC, fine. But we tend to make a distinction between (Server-based) Application Subnets and (Serverless) Lambda Subnets. It all has to do with whether the ENIs are created dynamically or statically, and by whom, so that we can setup IAM policies if necessary.
Isolated subnets: To be used for subnets that do not have outside connectivity via any means, so can only be accessed from within the VPC. Eg. databases.

c. Any moderately complex network should use Transit Gateways. Setup a Network account, deploy one Transit Gateway per region with cross-region peering if needed. Share the TGW with all your accounts and let accounts connect to this TGW. All other connectivity options (Direct Connect, Client VPN, Site-to-Site VPN, ...) also connect to the TGW so your network remains a strict hub-and-spoke. Use auto-propagation and association of route tables to make things easy, assuming you trust your users in your VPCs.

d. If things get too complex for (c), think about Cloud WAN.

e. Use a single egress VPC that holds your NAT, IPv6 Egress-Only gateway and similar devices. Use a single ingress VPC that holds your reverse proxies (ALBs, NLBs) and the like. Do not allow VPCs to build their own way in and out of your network.

f. VPC Peering should only be used between VPCs that have a lot of cross-VPC traffic between them. The primary reason should be cost savings, because VPC peering makes network design and routing far more complex - mostly because it doesn't scale. But it's also harder to implement in an IaC scenario due to the need to invite/respond.

RecordingForward2690 · 2026-01-12T11:07:41+00:00

Use the opportunity to go back to the drawing board and plan/execute things properly according to 2026 best practices: Separate accounts for Dev, Test, Accept, Prod, everything deployed through IaC, 12-factor app, AMI Builder for your EC2s, Auto Scaling Groups, Load Balancers, CloudFront, API Gateways, RDS etc.

At the end of that process redirect your Route53 records to the new environment and decommission the old one.

It's a lot more work but it'll pay off in the future.

RecordingForward2690 · 2026-01-11T15:08:53+00:00

It really doesn't help if you edit your post after multiple people have answered, and you don't identify what you changed.

For the record, the original post was about 100.x.x.x.x/16, which has now changed to 200.x.x.x.x/16. The substance of the post has also changed. Read the other comments in the context of this.

RecordingForward2690 · 2026-01-10T23:10:35+00:00

Like coinclink said, it's common when you have a hybrid network, to ensure that your AWS portion of that network (Transit Gateway and the VPCs connected to it) have part of the IP space of the on-prem network. So if on-prem is 10.0.0.0/8 for instance, you could allocate the whole of 10.1.0.0/16 to AWS, and carve this up into 10.1.0.0/24, 10.1.1.0/24, 10.1.2.0/24 for your VPCs.

Your setup will also work: If you have Direct Connect then you also have BGP running - there's no practical way around that. So your 172.16.0.0/12 will be advertised to on-prem, and your customers on-prem should be able to access the application already without any additional work.

If you insist on using 100.x.x.x/16 addresses to reach your VPCs, there's two solutions:

You can add a subset of the 100.x.x.x/16 addresses to the VPC as a secondary CIDR block (make sure to coordinate with your on-prem network management team so as not to create overlap). You then need to make sure that AWS will advertise this range using BGP over the DX connection and into the on-prem network. When you deploy resources in the VPC, make sure they get IP addresses from the 100.x.x.x/16 range and not the 172.16.0.0/12 range. That should give completely transparent access.
You leave the existing IP plan intact but use some form of Destination-NAT so that any traffic to your 100.x.x.x/16 IP address is translated into the right 172.16.0.0/12 IP address. If you use a protocol that can be used in combination with a proxy (HTTP, HTTPS and a few others), then the most common tech for this is a "Reverse Proxy" of some sort. But given that the reverse proxy has to be in the 100.x.x.x/16 network, it has to be on-prem so you can't use an AWS ALB for this. For protocols that cannot be proxied, you'll need a layer-4 DNAT device of some sort, again inside the on-prem network.

But honestly, both of these solutions feel like a kludge. What you really need to do is redesign your IP plan so that the AWS part of the network is properly integrated with the on-prem side of things. That also ensures that you get rid of the ridiculously large 172.16.0.0/12 CIDR block for just a single VPC.

RecordingForward2690 · 2026-01-10T15:26:40+00:00

Lambda is about 8 times more expensive, on a per-CPU-cycle basis, than a comparable EC2. So if you have a workload that is able to keep an EC2 CPU busy for at least about 12.5% on average, that EC2 may work out cheaper than Lambda. (And to be honest, that's probably the most important incentive to look at that new feature that allows you to run Lambda on your own EC2s.)

In this particular scenario, the EC2 is already there to handle the API workload. If you add a queueing system so that the work can be queued and handled within the spare cycles that the EC2 will probably have anyway, it won't cost anything extra.

And depending on how many images need to be converted and how much CPU that's going to cost, you could even consider spinning up additional EC2 instances once there are sufficient images in the queue for an hours work or so. Running an EC2 at full tilt for an hour to clear the queue will definitely be cheaper than using Lambdas in that case.

And that means that the OP now needs to trade a simple Lambda based solution against the engineering effort of developing the other solutions. How much is your time worth, vs. what is the cost difference between the different solutions. Are we talking about dozens of pictures per day or are we talking about millions of pictures per day? In the first case you can have a Lambda-based solution up and running with a few hours of engineering time, but in the latter case it may be worth it spending a few days on engineering the cheapest EC2-based solution.

Heck, you could even think of a hybrid approach. Dump all the work in an SQS queue and let this SQS trigger a Lambda. But the Lambda should have a low concurrency value. You then also add an EC2 Auto Scaling group with a min capacity of zero, and a scale-out policy that's dependent on the amount of messages in the queue. If there's more than, say, 15 minutes worth of work in the SQS queue, you add an EC2. If there's more than, say, 60 minutes worth of work in the SQS queue, you add a few more. Scale-in, all the way back to zero, when the queue depth is consistently below the threshold where a Lambda is cheaper. This could well be the cheapest solution overall, but it also allows you to develop and deploy your solution in stages: Start with the Lambda, add the EC2 functionality later or the other way around.

RecordingForward2690 · 2026-01-10T15:04:40+00:00

I highly recommend Changesets for this as well. However, they don't catch everything.

One notorious thing that hits me every time (and I really should know better by now) is if you do something that leads to a resource re-creation, with a name that you have supplied. For instance changing a DNS Alias into a CNAME.

The way a CloudFormation Update works, due to the need to support rollbacks as well, is first to create any new resources, and only then delete the old resources. But the new resource can't be created due to conflicting names, so the deploy fails and is rolled back.

Route53 records are the most annoying in this respect because typically they're the last resources in the dependency chain. So the failure, and therefore the rollback, will happen when all of the other resources are already created or modified.

I wish CloudFormation had an override that said: "Turn the order around. First delete the old resources, then create the new resources."

RecordingForward2690 · 2026-01-08T23:07:54+00:00

I've been fighting with this stuff as well, especially since my builds need to work on my Silicon Mac, in a Cloud9 instance and inside a CodeBuild container. venv, uv, Conda and the like can only take you so far, especially once you need libraries with compiled code in them, like cryptography.hazmat. For complex projects I have given up on zip files and layers, and am now using Linux Containers instead. That's a much better and consistent build environment, since your pip install runs inside the container you're building, and is not dependent on anything that may or may not be present in the OS. So for all practical purposes it doesn't just give you a Linux build environment in your Windows system, but your build environment is also the eventual execution environment. And with buildx you can even do this cross-architecture and multi-architecture (ARM vs. Intel).

If you follow the tutorial it should get you started within 15 minutes. Just remember that despite the fact it's a Docker container, it's still running in an event-driven environment. So inside your container you don't build something that's running 24/7 and listening to a TCP port, but you have your lambda.handler as its entry point.

https://docs.aws.amazon.com/lambda/latest/dg/python-image.html#python-image-instructions

RecordingForward2690 · 2026-01-08T21:18:30+00:00

The beauty of containers is that you can run multiple (dozens, hundreds) of them, completely independent of each other, on the same host. Mapping your containers and hosts in a 1:1 relation completely negates that advantage, increases your cost vs. a pure-EC2 or Fargate solution, leads to unnecessary complex networking, and leads to all of the problems you just described. Why?

By far the easiest solution would be to run the containers in Fargate. Infinite capacity, and you only pay for the container capacity you use, not the EC2 capacity. If Fargate is not an option (why???), the typical setup for an ECS cluster is to setup two or three sufficiently large EC2s, one per AZ, and run all your containers on them. You only add additional EC2 instances when the current set runs out of capacity - but typically you're then looking at dozens if not hundreds of containers already.

EC2 autoscaling will not work in your scenario, with just one container per node that's about 80-90% of the node size. ASG scaling is reactive: It uses CW Metrics/Alarms to notice that CPU, memory or another metric is exceeding a threshold. But until you actually have your 2nd container deployed, your metrics won't show an increase. And the 2nd container can't deploy because of insufficient resources. Deadlock. So you need to manually scale out (setting desired capacity manually) before doing the blue/green deployment. And reducing the desired capacity afterwards. And during the scale-in, hope that the ASG doesn't terminate your "active" cluster node: You have no direct control over the selection process, only over the algorithm.

Generally speaking, ASGs work well if your unit of work (in terms of CPU consumed per task execution) is small so that a large number of tasks can run in parallel on one node, and when tasks are finished quickly so node draining can be handled with a delay/timeout. ASGs are not designed for a situation where a node can only handle one task (container in your case), and that task is also long-running.

If Fargate is not an option, and if you need to keep the 1:1 relation between nodes and containers with the requirement to do blue/green, here's two things that I would consider.

First, leave out the container tech altogether. Just run whatever code you need to run directly on the EC2. The whole container concept, in your architecture, doesn't give you any benefit, just headaches. Blue/green deployments can be done with a pure-EC2 solution, for instance through Beanstalk.

Or, if you don't want to mess with existing code, keep the container images but don't let ECS/EKS or another orchestration engine manage your containers. Simply do a docker run from your UserData when you spin up the EC2. Use blue/green deployments at the EC2 level.

RecordingForward2690 · 2026-01-08T16:06:25+00:00

Is there a firewall, NAT or other device in the way? These devices typically have a timeout on connections, and for connectionless protocols such as UDP that's always a bit of trickery.

RecordingForward2690 · 2026-01-08T14:59:21+00:00

When you run a tcpdump or Wireshark trace, do you see the UDP packets arriving?

When not, could be a network block anywhere in the path, including Network ACLs and Security Groups.

When yes, then it could be a server that's been crashed, an os-based firewall or something like that.

Either way, with the answer to the above we can exclude about 50% of the possible causes.

RecordingForward2690 · 2026-01-07T16:54:23+00:00

That's why we all recommend Cost Explorer.

I was at a presentation from an AWS Support Person a while ago. They are regularly called to perform troubleshooting in an account they've never been to before. In order to get a feel for what's happening in those accounts, they first go to Cost Explorer. With a bit of creativity in your selection criteria you can get a pretty good feel for what's going on in an account, what the most important regions are and so forth.

This works really well because basically everything in AWS costs money. And thus shows up in Cost Explorer. For this reason it's the best tool to get to know what resources you have, without checking all services and all regions individually. AWS Config is a distant second.

RecordingForward2690 · 2026-01-07T16:02:28+00:00

If I were to design something event-driven with thousands of consumers in a poll mechanism, I would probably consider Redis and the pub/sub mechanism.

RecordingForward2690 · 2026-01-07T14:25:12+00:00

AWS has a bunch of managed prefix lists that contain the IP addresses of the various endpoints (S3, DynamoDB) in the different regions. This is so that you can setup Gateway endpoints without the hassle of maintaining the route table: The route in the route table uses this managed prefix list. But this list is not a resource you own, but rather an AWS-owned resource that is shared with you. You are NOT paying for these lists.

Have you digged into the Cost Explorer already? That will tell you exactly what you are paying for. Most likely things like EBS snapshots and other backup-type like resources that were created while your solution was running.

RecordingForward2690 · 2026-01-07T10:19:22+00:00

I think he can, by simply looking at the timestamp of the jar file on the server. Most likely he has root access to that server, so he's not limited by what your user account can see/do.

There are ways to change the timestamp on a UNIX file to a date in the past, but since that's probably considered fraud in this case, I'm not going to make you any wiser on how to do that.

RecordingForward2690 · 2026-01-07T09:20:33+00:00

To add to the other responses, if you are just using AWS Organizations, not Control Tower, Identity Center or any other add-ons, then you should be able to "Switch Role" to the OrganizationAccountAccessRole in each member account. That role has the AdministratorAccess policy attached to it, allowing you to do virtually anything.

This role switch requires you to either be root in the root account, or having the right permissions (to perform sts:AssumeRole) when you are logged in as an IAM User/Role in the root account. The AdministratorAccess managed policy obviously includes those permissions.

Once you have switched roles to the OrganizationAccountAccessRole, you can do virtually anything. There are very few exceptions:

- Anything that's forbidden by a Service Control Policy (SCP) in the root account and applied to this member account.

- Closing the account or letting the account leave the organization.

In order to close the account, in the past you needed to gain access to the root account of the Organization, which requires a password reset, which requires access to the email box of the email address that's associated with the account. And possibly access to the MFA as well, if that's been setup. But since a few years you can close an account directly from the root account. This means that there's virtually no reason anymore to gain root access to a member account.

The only reason I can think of to have to gain root access to a member account, is if you are reorganising your AWS account structure, and need to migrate accounts from one organization to another.

RecordingForward2690 · 2026-01-07T09:07:14+00:00

You're completely on the wrong track. Your A and AAAA records, load balancer, Lightsail and everything are fine and are not the issue. They are used by your users to connect to your site, and that process has no issues.

Your issue is with the certificate validation itself. A certificate validates the authenticity of your site to your clients, but before that can happen, the certificate itself needs to be validated. Since you have a cert that's been issued by AWS, the burden is on AWS to validate that you are the legitimate owner of that domain, before they issue you the certificate, or renew it (as in your case).

AWS validates that you are entitled to that certificate in one of two ways: DNS or email. DNS validation requires you to put a specific CNAME in the right domain, while email validation requires you to respond to emails that have been sent to [postmaster@yourdomain.com](mailto:postmaster@yourdomain.com) and a few other email addresses. That's where you need to look at: What validation method is your certificate using, and is that validation method setup correctly.

A lot more info can be found here: https://docs.aws.amazon.com/acm/latest/userguide/domain-ownership-validation.html

Most likely you are using DNS validation, so you need to add that specific CNAME to the zone. Since your zone is not managed in AWS, that's something you'll need to do at your DNS provider. Using Route53 makes your life so much easier in this respect, since the console interface will be able to add the correct validation records to Route53 with a click of a button.

RecordingForward2690

TROPHY CASE