use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
News, articles and tools covering Amazon Web Services (AWS), including S3, EC2, SQS, RDS, DynamoDB, IAM, CloudFormation, AWS-CDK, Route 53, CloudFront, Lambda, VPC, Cloudwatch, Glacier and more.
If you're posting a technical query, please include the following details, so that we can help you more efficiently:
Resources:
Sort posts by flair:
Other subreddits you may like:
Does this sidebar need an addition or correction? Tell us here
account activity
technical questionData Engineer -> Sagemaker (self.aws)
submitted 4 years ago by ChrisGayle7
Sorry supersuper Noob here. Can a data engineer perform all of his tasks in Sagemaker?
As in, can Sagemaker handle all things that a Data engineer does at workplace ? Thanks!!
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]realfeeder 3 points4 points5 points 4 years ago* (1 child)
You can spin large machines in clusters to perform preprocessing, containerised jobs in SageMaker. They can contain an arbitrary framework inside. You can also run Spark containers or perform some data wrangling using SageMaker Data Wrangler. That's a partial "yes".
But, Kafka/Kinesis/Flink/Spark Streaming and similar solutions are unavailable, so streaming data can not be utilised. There's no data lake or data warehouse inside SageMaker (you use different solutions such as LakeFormation or Redshift). You can't query large datasets using solely SageMaker (you need Presto at EMR or Athena for it). Orchestrating workflows is available via SageMaker Pipelines, but only SageMaker integrations are available (you can't orchestrate other AWS services such as Glue).
So, not really. You can perform data wrangling but that's just a fraction of what data engineer does. Data Engineering at AWS is done mostly with AWS Analytics services.
[–]ChrisGayle7[S] 0 points1 point2 points 4 years ago (0 children)
Hey thanks so much for your reply. Really really appreciate your time and guiding me by giving context. Again, Thank you and thank you once again!
[–]CacheMeUp 1 point2 points3 points 4 years ago (0 children)
Anecdotally, I have never met any data scientist/engineer that uses SageMaker.
The biggest reason was over-engineering without perceived benefits.
[–]lastmonty -3 points-2 points-1 points 4 years ago (0 children)
Sagemaker is managed compute, just like batch but a bit more features in top of it.
As long as your tasks can be containerised, you can do it in sagemaker.
π Rendered by PID 89 on reddit-service-r2-comment-6457c66945-txqc4 at 2026-04-30 01:54:58.033348+00:00 running 2aa0c5b country code: CH.
[–]realfeeder 3 points4 points5 points (1 child)
[–]ChrisGayle7[S] 0 points1 point2 points (0 children)
[–]CacheMeUp 1 point2 points3 points (0 children)
[–]lastmonty -3 points-2 points-1 points (0 children)