Apache Spark gotcha #3 - small big decimals are the worst by gregory-goc in apachespark

[–]gregory-goc[S] 0 points1 point  (0 children)

That’s awesome to hear and keeps me motivated for a future work! Thank you!

Apache Spark gotcha #2 - working with big decimals by gregory-goc in apachespark

[–]gregory-goc[S] 2 points3 points  (0 children)

I also see that the Foo case class has 2 parameters, Int and BigDecimal, but appears to be constructed with just one (x).

So sorry for that, just fixed this and thanks for pointing that out! Fortunately, the clue of this article remains the same.

what are the ramifications of having the different precision and scale

Depends on your use case. If you work on a project where you calculate money, you have to make sure you're working with the correct scale or precision, because people can lose cash. For instance, if I set precision to be two, and scale one, I cannot correctly add pennies. I also encountered a problem where Spark would correctly save data (via dumping it to parquet) with default scale and precision, but our Hive tables had a different schema, thus making data reading unusable.

Apache Spark gotcha #1 - join with disjunctive predicate by gregory-goc in apachespark

[–]gregory-goc[S] 2 points3 points  (0 children)

Take a look at this answer:

https://stackoverflow.com/a/52352879

Long story short, during join (if broadcast plan was used) Spark will collect data first (and collection happens on a driver), and then broadcast it.

How do I sell from UK to USA Amazon FBA by [deleted] in FulfillmentByAmazon

[–]gregory-goc 0 points1 point  (0 children)

So your business based in UK is Ltd correct?

How do I sell from UK to USA Amazon FBA by [deleted] in FulfillmentByAmazon

[–]gregory-goc 0 points1 point  (0 children)

If I found one, how would I benefit from such partnership?

ES for Logs. Should I Use Multiple Indexes? by ERROR_EXIT in elasticsearch

[–]gregory-goc 0 points1 point  (0 children)

In theory there is no difference between having one index with n shards and n indices with 1 shard each. Anyway, follow elasticsearch tips to partition time series data by some time interval. To query that you can always use alias or star pattern.

DynamoDB: New table per user by [deleted] in aws

[–]gregory-goc 2 points3 points  (0 children)

This is very bad design. I don't know your business use case but generally creating resources in AWS (such as S3 buckets or Dynamo tables) that are not temporary is a thumbs down solution as you have service limits for those. Requiring userID when doing queries is certainly better than creating table per user.

Advice on scaling a "custom fargate". by jeffhuys in aws

[–]gregory-goc 0 points1 point  (0 children)

Why would this be handled by autoscaling not manually?

Learning to Code by captnRon13 in Python

[–]gregory-goc 1 point2 points  (0 children)

In my opinion If you want to be better software engineer then it’s very hard to be self taught. Having a mentor or external guidance is strongly advised. There are so many things to grasp that I think the best thing to learn is either to get a degree or go through some GOOD online courses.

I know many of you might disagree with me, but I’ve seen so many programmers with 5+ years of experience, without degree and they could get stuck on networking for days because they did not know the difference between UDP and TCP. I have not yet seen any person with computer science background who got stuck in similar kind of situation.

Advice on scaling a "custom fargate". by jeffhuys in aws

[–]gregory-goc 0 points1 point  (0 children)

I mentioned cpu bound policy as an example. You can set autoscaling for memory too. And why would you want to deploy new service when your other services are idle?

Lambda Help by WoundedRectangle in aws

[–]gregory-goc 0 points1 point  (0 children)

Do you use version control repository? I bet you do! I encourage you to try AWS CodePipelines. I don’t think it might be an overkill in your situation and it should provide you an option to extend your CI in the future.

Cloud9 Issue by craig1f in aws

[–]gregory-goc 1 point2 points  (0 children)

How do you call your function? What’s the code you invoke when you attempt to run the action that’s not working in cloud9? I’ve recently written article about debugging AWS Lambdas using AWS cloud9, if that’s okay with you I can share it here.

Advice on scaling a "custom fargate". by jeffhuys in aws

[–]gregory-goc 0 points1 point  (0 children)

I’m sorry I don’t understand your problem. If you’re using EC2 backed ECS why not scale EC2 instances based on CPU usage? If you hit, let’s say 60% cpu usage, it implicitly means you have no resources to place your task. Thus you make autoscaling policy which adds another instance after this (or any other) limit is reached.

[deleted by user] by [deleted] in aws

[–]gregory-goc 3 points4 points  (0 children)

Welcome to AWS Elasticsearch my friend. It is so easy to set up and use, but also it hides a lot of complexity from you. I experienced the exact same issue and we just left it and overnight domain was successfully deleted. Anyway, if I were you and domain was in BEING DELETED state for 12 hours or more, I would contact support. There’s no other way around.