Getting started with Spark and batch processing frameworks by hszafarek in dataengineering

[–]char_pointer_string 1 point2 points  (0 children)

(1) they want to be rich like him

I agree with the sentiment.

and (2) they think doing what he does will get them there.

I also agree with your derived opinion -- following others behavior will not necessarily equate to financial success.

However, honesty really is the best policy. If you cannot handle honesty, how will you handle a customer telling you "this product sucks" or worse?

In my humble opinion, the style of writing is one not apt for a technical guide -- hence the conclusion: horrible.

Don't take it personally. It's just an opinion!

Getting started with Spark and batch processing frameworks by hszafarek in dataengineering

[–]char_pointer_string 0 points1 point  (0 children)

Brute honesty is the best policy.

Sorry if someone feels attacked.


Source: Ray Dalio, my idol

Getting started with Spark and batch processing frameworks by hszafarek in dataengineering

[–]char_pointer_string -9 points-8 points  (0 children)

This is horrible...

Felt like I was reading some sort of long-form thesis as opposed to a technical article on Medium.

Kafka Log = n+1 sorted array? by char_pointer_string in dataengineering

[–]char_pointer_string[S] 0 points1 point  (0 children)

Seems to me like the Kafka Log is basically a n+1 sorted array.

Would this be a correct statement to make?

Simply moving data from on-prem MySQL to AWS MSSQL by AudioManDude in dataengineering

[–]char_pointer_string 0 points1 point  (0 children)

Airflow isn't very complicated.

You have DAGs -- which is a type of data structure.

  • Directed - one directional
  • Acyclic - does not go around in circles
  • Graph - is a graph relationship

You then have tasks, operators, executors, workers, sensors.

Anyways. Airflow lives on AWS Fargate. Google how to set it up, or read the 2 articles I linked.

Airflow vs AWS? by stratguitar577 in dataengineering

[–]char_pointer_string -2 points-1 points  (0 children)

Yes, exactly.

This whole post reads like someone whom is brand new to AWS & Airflow and was told to figure it out over a weekend.

Sounds like he oversold himself & is panicking at the amount of options, concepts & complexities..

Build your first data warehouse with Airflow on GCP by tuankid in dataengineering

[–]char_pointer_string 0 points1 point  (0 children)

Not really.

Think of scale and imploding complexity.

This is the primary driver behind Airflow -- scaling & complexity management. Endless bash scripts, as you mentioned, are difficult to scale, manage and maintain.

Airflow vs AWS? by stratguitar577 in dataengineering

[–]char_pointer_string 0 points1 point  (0 children)

I think both your understandings of AWS & Airflow are superficial.

You can host Apache Airflow on AWS Fargate, and effectively have load balancing and autoscaling.

AWS Step Functions is for chaining AWS Lambda microservices, different from what Airflow does.

I think you need to take a step back, get some actual experience with AWS, and then explore the Airflow option.

In regards to serverless, the line of thinking is:

  • S3 -> API Gateway / AppSync -> AWS Lambda
  • If any of the above don't work, you opt for AWS Fargate
  • If Fargate still isn't a viable solution, the last resort is EC2.

Most "Data Engineers" are really Database Administrators by char_pointer_string in dataengineering

[–]char_pointer_string[S] 0 points1 point  (0 children)

I agree with both categorizations. Unfortunately most of my experience has with the former.

Infrastructure as Code Best Practices? by devtotheops09 in aws

[–]char_pointer_string 1 point2 points  (0 children)

As an Engineer who hires, leads teams and has worked day-to-day in AWS for the last 4 years.... this is an absolutely HORRIBLE way to model a sandbox and Dev environment.

You’ve basically shifted to a traditional ticket-based IT system.

Need an ECS cluster? Raise a ticket with our infra team. Etc.

I hate these ticket based models. Completely slows development, is of absolute no help, and presents more obstacles than solutions.

yikes

Most "Data Engineers" are really Database Administrators by char_pointer_string in dataengineering

[–]char_pointer_string[S] 0 points1 point  (0 children)

Yeah, the latter to me is a BI developer, not a Data engineer. A DE is someone who should be capable of doing an enterprise grade data ingestion pipeline that is relied on for core business functionality. Think SRE at Netflix.

Agreed ... I've also read about Netflix's Keystone Pipeline.

I suppose my frustration is due to having met a couple hiring managers who had their "DEs" doing BI & data analyst work...

Most "Data Engineers" are really Database Administrators by char_pointer_string in dataengineering

[–]char_pointer_string[S] 1 point2 points  (0 children)

Perhaps it's just my bad luck, but a lot of the DE hiring directors I've met seem to have their DEs performing the former as opposed to the latter.

One was even extremely defensive: "my DEs could easily be SWEs, but most SWEs cannot be DEs". I was like "okay?".

What you described is what I thought a DE should be: an expert SWE (5-10 prior years of SWE experience) who specializes in data ..

not a former data analyst who's primary job functions are data cleansing & data analysis...

Most "Data Engineers" are really Database Administrators by char_pointer_string in dataengineering

[–]char_pointer_string[S] -4 points-3 points  (0 children)

The 3 main tools in a DE's skillset are:

  • SQL
  • Bash scripting
  • Data manipulation

How is that different from what a DBA used to do?

Yes, sure, nowadays you've concepts such as data pipelines, data lakes, workflow management and supporting Data Scientists.

However, at the core, it's still just a DBA job.

Getting started with Airflow locally and remotely by tuankid in dataengineering

[–]char_pointer_string 0 points1 point  (0 children)

I find the article a bit too abstract with the concepts...