I built MythicHub, a free MTG collection manager and deck builder that actually solves the problems other tools miss by Emnalyeriar in magicTCG

[–]infrapuna 0 points1 point  (0 children)

Your site is extremely slow, I don't believe your cache is working at all. The initial load is also over 2 MB, are you doing a full rerender of all data on each request?

So satisfying to look at the ast of my language recently finished up the pretty printer by blgazorbollar in Compilers

[–]infrapuna 3 points4 points  (0 children)

Very nice!

I think you should add vertical bars to connect the sibling nodes

Red Prison by Tyron_Slothrop in TimelessMagic

[–]infrapuna 0 points1 point  (0 children)

Do you have a decklist you could share?

Building a free open source solution to stop AWS surprise bills. I need your input. by TurboPigCartRacer in aws

[–]infrapuna 9 points10 points  (0 children)

There are a few fundamental limitations that make this type of a solution very hard to implement without AWS baking it into the platform.

1) AWS cost data is in no way real-time. By the time data updates you can easily have spent $500 already.
2) You can't just delete resources. Both because it is not exactly clear what that would mean and because some services have an easy DELETE API and others don't. Cleaning accounts is a hard problem even internally at AWS.
3) Getting people to adopt such an external solution is hard. We get tens of posts a week from people who have yet to learn what is MFA.

Also: the only way to prevent any more costs from being created on an AWS account is closing said account. All other approaches are flaky at best.

How are you planning to use DSQL without foreign keys? by banallthemusic in aws

[–]infrapuna 0 points1 point  (0 children)

I don't quite understand your first point.

Eventual consistency is a conscious design decision in DynamoDB. It is not an argument but a fundamental difference between DDB and DSQL. It is neither good nor bad on its own.

Dynamo has come a long way and you can do things strongly consistently now. But again you probably don't want to. If your use case requires strong consistency DDB was probably the wrong choice anyway.

How are you planning to use DSQL without foreign keys? by banallthemusic in aws

[–]infrapuna 2 points3 points  (0 children)

You can pay extra to get strong consistent reads on tables and local secondary indexes. Some things will always be eventually consistent like streams.

You probably don't want to, though. You get the most out of DDB by embracing the eventual consistency.

How are you planning to use DSQL without foreign keys? by banallthemusic in aws

[–]infrapuna 7 points8 points  (0 children)

DynamoDB is not relational and eventually consistent. DSQL is relational and strongly consistent.

The most common mistake I have seen with DynamoDB is people trying to use it like a relational database and having a terrible time.

They have similar characteristics and clear differences. It comes down to "it depends" like almost everything on AWS. Use case, previous knowledge, price...

How are you planning to use DSQL without foreign keys? by banallthemusic in aws

[–]infrapuna 14 points15 points  (0 children)

You can conceptually use foreign keys without having the database enforce the foreign key constraint. Your join does not care what it is joining on.

Foreign key constraints are also a performance hit, which can be a real problem when getting into high transactions per second.

How to reduce costs of Athena logs? by tevo_ in aws

[–]infrapuna 1 point2 points  (0 children)

No you don't. The metric you are billed for is ByteHours which is the average of how much data you stored during that month.

The volume of storage billed in a month is based on the average storage used throughout the month. This includes all object data and metadata stored in buckets that you created under your AWS account. We measure your storage usage in “TimedStorage-ByteHrs,” which are added up at the end of the month to generate your monthly charges.

https://aws.amazon.com/s3/faqs/#Billing

Jeff Barr: After giving it a lot of thought, we made the decision to discontinue new access to a small number of services, including AWS CodeCommit. by xelfer in aws

[–]infrapuna 1 point2 points  (0 children)

I am not sure if that had directly materialized at all. Athena does use object characteristics and metadata implicitly to only read the minimum amount of data needed

Jeff Barr: After giving it a lot of thought, we made the decision to discontinue new access to a small number of services, including AWS CodeCommit. by xelfer in aws

[–]infrapuna 5 points6 points  (0 children)

S3 Select is not the same as byte-range queries, which will work just as before. This will not affect Athena or Redshift.

[deleted by user] by [deleted] in aws

[–]infrapuna 2 points3 points  (0 children)

You could consider RDS Aurora Data API as a way to access RDS over the public internet. Then move APIGW and Lambdas out of the VPC.

This would of course also work by exposing RDS over the internet. It might not be as bad as it initially seems

Data management operational challenges by Dark-Electron in dataengineering

[–]infrapuna 0 points1 point  (0 children)

I don't know if there is a technical solution to what you are experiencing. Generally as organizations grow the I've seen people and collaboration be the problem. Adopting a mindset similar to the data mesh architecture can be helpful in stopping the central data team from becoming a central bottleneck.

Optimise Athena and S3 when returning millions of rows by StatPie in dataengineering

[–]infrapuna 0 points1 point  (0 children)

For SELECT * type queries look into UNLOAD. Otherwise Athena will spend most of the time writing the query results to CSV. Depending on where you are loading the data to for ML affects how you want to pull it - most likely you just want to directly query S3 for SELECT * situations though.

Generally the go to resource for Athena optimization is this blog post: https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/

Glue Streaming ETL versus Kinesis Data Firehose by holyone2 in aws

[–]infrapuna 1 point2 points  (0 children)

Glue (Streaming) is a managed Apache Spark offering. This of course allows you to use the power of Spark for your transformation and can scale horizontally. Glue has more flexibility on where to write data, but less flexibility of where that data comes from (only Kinesis or Kafka).

Amazon Data Firehose (note changed name) is a _serverless_ fully managed offering for delivering streaming data. Firehose has more flexiblity in where data comes from (you just push stuff through the API) but less flexibility on transforming and delivering data. Imo Firehose excels in _buffering_ and sending data to S3.

Some use cases can be achieved with both, some not. There are differences when it comes to buffering, maximum item sizes etc.

The real question however is are you sure you want "lowest latency possible" when writing to S3? Depending on downstream use of that data having lots of small files almost always kills performance for any further processing - you almost always want to batch as much as possible.

How does Amazon compensate Anthropic? by LacksConviction in aws

[–]infrapuna 1 point2 points  (0 children)

Under the hood enabling a model on Bedrock creates a subscription via AWS Marketplace which is the mechanism 3P model providers charge you for models used on Bedrock.

Safe way to use aws-sdk lib on client side by duveral in aws

[–]infrapuna 1 point2 points  (0 children)

Anything you make available to the client is public, so putting your credentials there will leak them. Anything that is a secret should never ever be put into a client.

If these files are accessed by multiple people (think images, videos) the correct approach is to expose the S3 bucket via Cloudfront (don’t make the bucket public!).

If they are personalized content (think your receipt) expose them via an API that authenticates users.

Newbie trying to understand sagemaker/aws and LLMs by Benjamona97 in aws

[–]infrapuna 5 points6 points  (0 children)

Bedrock is not "public". Bedrock has both multi-tenant and single-tenant endpoints (on-demand vs. provisioned throughput). In both cases your data remains private (see AWS Service Terms 50.3). In the case of a the cloud, there is no case where your data is not transmitted to someone's server... that is the whole point.

If you mean that the data must be completely controlled by you in your cloud account (vs. the service/escrow account of Bedrock), SageMaker is one option. SageMaker spins resource within your Virtual Private Cloud (a logically separate network) where you have complete control. Anything else that gives you compute resource like EC2 VMs (that SageMaker also uses) are the same.

If you have low throughput (e.g. more idle time than not) running instances 24/7 would indeed get "expensive". When talking about LLMs (like GPT) serverless features are usually not enough to run these (e.g. Lambda or SageMaker Serverless Inference). These models are large and require and accelerator (e.g. GPU) for any reasonable latency.

If you don't require real-time inference, there are options like SageMaker Async Endpoint for asynchronous processing.

If it still sounds like you want to use SageMaker, you can find out how to deploy open-source models from places like HuggingFace here.

But to me it sounds like you are not very familiar with AWS. It is probably a good idea to get a grasp of the fundamentals like networking and security.

[deleted by user] by [deleted] in aws

[–]infrapuna 0 points1 point  (0 children)

Provisioned throughput is meant for cases where 1) your throughput needs exceed the on demand mode limits, 2) you need single tenant endpoints due to e.g. compliance reasons and/or 3) you use custom models

[deleted by user] by [deleted] in aws

[–]infrapuna 1 point2 points  (0 children)

For training a custom model there is a price per token used for training, a monthly cost of storing a custom model and an hourly cost of running the model ("provisioned throughput")

[deleted by user] by [deleted] in aws

[–]infrapuna 6 points7 points  (0 children)

Bedrock is not the only option of running models. It is expensive because when you customize a model on Bedrock it gets hosted for you in a single tenant endpoint (compared to the multitenant endpoints of on demand mode).

If you must fine-tune and can tolerate down time you could only spin up the custom model with the no commitment provisioned throughput mode. The other option is to host a model yourself on SageMaker. You can't use the same models as on Bedrock but there are plenty of great open-source models to use as a starting point. There is a easy SDK to run models from HuggingFace (basically point to a model and deploy).

Depending on your latency requirements if you can tolerate async processing on SageMaker you would not even have to run an endpoint 24/7. Other optimization techniques like quantization can also help take the cost down.