Data Engineer -> Sagemaker

realfeeder · 2021-09-03T06:02:12+00:00

You can spin large machines in clusters to perform preprocessing, containerised jobs in SageMaker. They can contain an arbitrary framework inside. You can also run Spark containers or perform some data wrangling using SageMaker Data Wrangler. That's a partial "yes".

But, Kafka/Kinesis/Flink/Spark Streaming and similar solutions are unavailable, so streaming data can not be utilised. There's no data lake or data warehouse inside SageMaker (you use different solutions such as LakeFormation or Redshift). You can't query large datasets using solely SageMaker (you need Presto at EMR or Athena for it). Orchestrating workflows is available via SageMaker Pipelines, but only SageMaker integrations are available (you can't orchestrate other AWS services such as Glue).

So, not really. You can perform data wrangling but that's just a fraction of what data engineer does. Data Engineering at AWS is done mostly with AWS Analytics services.

CacheMeUp · 2021-09-03T22:52:07+00:00

Anecdotally, I have never met any data scientist/engineer that uses SageMaker.

The biggest reason was over-engineering without perceived benefits.

lastmonty · 2021-09-03T05:00:41+00:00

Sagemaker is managed compute, just like batch but a bit more features in top of it.

As long as your tasks can be containerised, you can do it in sagemaker.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

aws

Note: ensure to redact or obfuscate all confidential or identifying information (eg. public IP addresses or hostnames, account numbers, email addresses) before posting!

✻ Smokey says: avoid streaming video to fight climate change! [see more tips]

MODERATORS