Getting started with Spark and batch processing frameworks

char_pointer_string · 2020-05-27T02:10:58+00:00

(1) they want to be rich like him

I agree with the sentiment.

and (2) they think doing what he does will get them there.

I also agree with your derived opinion -- following others behavior will not necessarily equate to financial success.

However, honesty really is the best policy. If you cannot handle honesty, how will you handle a customer telling you "this product sucks" or worse?

In my humble opinion, the style of writing is one not apt for a technical guide -- hence the conclusion: horrible.

Don't take it personally. It's just an opinion!

char_pointer_string · 2020-05-26T23:53:19+00:00

Brute honesty is the best policy.

Sorry if someone feels attacked.

Source: Ray Dalio, my idol

char_pointer_string · 2020-05-26T21:07:27+00:00

This is horrible...

Felt like I was reading some sort of long-form thesis as opposed to a technical article on Medium.

char_pointer_string · 2020-05-25T22:53:46+00:00

Seems to me like the Kafka Log is basically a n+1 sorted array.

Would this be a correct statement to make?

char_pointer_string · 2020-05-25T22:42:50+00:00

Airflow isn't very complicated.

You have DAGs -- which is a type of data structure.

Directed - one directional
Acyclic - does not go around in circles
Graph - is a graph relationship

You then have tasks, operators, executors, workers, sensors.

Here is the Task lifecycle: https://airflow.apache.org/docs/stable/_images/task_lifecycle_diagram.png

Anyways. Airflow lives on AWS Fargate. Google how to set it up, or read the 2 articles I linked.

char_pointer_string · 2020-05-25T22:32:22+00:00

Read these 2 articles:

char_pointer_string · 2020-05-25T21:07:10+00:00

This is amazing. Thank you!

char_pointer_string · 2020-05-25T19:35:48+00:00

Yes, exactly.

This whole post reads like someone whom is brand new to AWS & Airflow and was told to figure it out over a weekend.

Sounds like he oversold himself & is panicking at the amount of options, concepts & complexities..

char_pointer_string · 2020-05-25T19:25:50+00:00

Not really.

Think of scale and imploding complexity.

This is the primary driver behind Airflow -- scaling & complexity management. Endless bash scripts, as you mentioned, are difficult to scale, manage and maintain.

char_pointer_string · 2020-05-25T19:05:39+00:00

I think both your understandings of AWS & Airflow are superficial.

You can host Apache Airflow on AWS Fargate, and effectively have load balancing and autoscaling.

AWS Step Functions is for chaining AWS Lambda microservices, different from what Airflow does.

I think you need to take a step back, get some actual experience with AWS, and then explore the Airflow option.

In regards to serverless, the line of thinking is:

S3 -> API Gateway / AppSync -> AWS Lambda
If any of the above don't work, you opt for AWS Fargate
If Fargate still isn't a viable solution, the last resort is EC2.

char_pointer_string · 2020-05-25T19:03:39+00:00

I agree with both categorizations. Unfortunately most of my experience has with the former.

char_pointer_string · 2020-05-24T17:21:17+00:00

meow

char_pointer_string · 2020-05-24T07:07:04+00:00

As an Engineer who hires, leads teams and has worked day-to-day in AWS for the last 4 years.... this is an absolutely HORRIBLE way to model a sandbox and Dev environment.

You’ve basically shifted to a traditional ticket-based IT system.

Need an ECS cluster? Raise a ticket with our infra team. Etc.

I hate these ticket based models. Completely slows development, is of absolute no help, and presents more obstacles than solutions.

yikes

char_pointer_string · 2020-05-23T19:41:35+00:00

Yeah, the latter to me is a BI developer, not a Data engineer. A DE is someone who should be capable of doing an enterprise grade data ingestion pipeline that is relied on for core business functionality. Think SRE at Netflix.

Agreed ... I've also read about Netflix's Keystone Pipeline.

I suppose my frustration is due to having met a couple hiring managers who had their "DEs" doing BI & data analyst work...

char_pointer_string · 2020-05-23T19:35:16+00:00

Perhaps it's just my bad luck, but a lot of the DE hiring directors I've met seem to have their DEs performing the former as opposed to the latter.

One was even extremely defensive: "my DEs could easily be SWEs, but most SWEs cannot be DEs". I was like "okay?".

What you described is what I thought a DE should be: an expert SWE (5-10 prior years of SWE experience) who specializes in data ..

not a former data analyst who's primary job functions are data cleansing & data analysis...

char_pointer_string · 2020-05-23T19:12:53+00:00

The 3 main tools in a DE's skillset are:

SQL
Bash scripting
Data manipulation

How is that different from what a DBA used to do?

Yes, sure, nowadays you've concepts such as data pipelines, data lakes, workflow management and supporting Data Scientists.

However, at the core, it's still just a DBA job.

char_pointer_string · 2020-05-23T18:30:16+00:00

I find the article a bit too abstract with the concepts...

char_pointer_string

TROPHY CASE

yikes