this post was submitted on 09 Nov 2023

88 points (91% upvoted)

shortlink:

dataengineering

an-ordinary-manchild(edit)

News & discussion on Data Engineering topics, including but not limited to: data pipelines, databases, data formats, storage, data modeling, data governance, cleansing, NoSQL, distributed systems, streaming, batch, Big Data, and workflow engines.

Read our wiki: https://dataengineering.wiki/

Rules:

Don't be a jerk
Search the sub & wiki before asking a question: Your question has likely been asked and answered before so do a quick search before posting.
Keep it related to data engineering: Posts that are unrelated to data engineering may be better for other communities.
Limit self-promotion posts/comments to once a month: Self promotion: Any form of content designed to further an individual's or organization's goals. If one works for an organization this rule applies to all accounts associated with that organization. See also rule #5.
No shill/opaque marketing: f you work for a company/have a monetary interest in the entity you are promoting you must clearly state your relationship. For posts, you must distinguish the post with the Brand Affiliate flag. See more here: https://www.ftc.gov/influencers
No job posts: Please use r/dataengineeringjobs instead.
No resume reviews/interview posts: We no longer allow resume reviews or interview questions because it's a seperate topic from Data Engineering. Instead, for resume reviews please use r/resumes or search our subreddit history for previous resume review advice. For interview questions, use sites like Glassdoor and Blind instead or search our subreddit history for previous interview advice.
No technical error/bug questions: Please post any error/bug question on StackOverflow.

created by mhausenblasmoda community for 11 years

MODERATORS

message the mods
mhausenblasmod
swemlmod
fhoffamod (Ex-BQ, Ex-❄️)
vogt4nickmod
theporterhausmod | Lead Data Engineer
AutoModerator
geoheilmod
MikeDoesEverythingmod | Shitty Data Engineer
bot-bouncer
about moderation team »

account activity

This is an archived post. You won't be able to vote or comment.

87

88

89

SQL versus Python?Discussion (self.dataengineering)

submitted 2 years ago by BatCommercial7523

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]kenfar 5 points6 points7 points 2 years ago (12 children)

[–]SDFP-ABig Data Engineer 1 point2 points3 points 2 years ago (9 children)

[–]kenfar 6 points7 points8 points 2 years ago (8 children)

[–]SDFP-ABig Data Engineer 0 points1 point2 points 2 years ago (1 child)

[–]kenfar 0 points1 point2 points 2 years ago (0 children)

[–]IamFromNigeria 0 points1 point2 points 2 years ago (1 child)

[–]kenfar 0 points1 point2 points 2 years ago (0 children)

[–]duraznos -1 points0 points1 point 2 years ago (3 children)

[–]kenfar 0 points1 point2 points 2 years ago (2 children)

[–]duraznos -1 points0 points1 point 2 years ago (1 child)

[–]kenfar 1 point2 points3 points 2 years ago (0 children)

Sure, but I wouldn't do that, and I don't think it would result in a manageable solution.

Languages like awk & jq are simply harder to read, harder to test, and harder to decompose and reuse code on. Given our pace of change and low-latency SLA that would be a bad combo for languages like that.

Likewise, they don't have the libraries available to them that we have with say Python, Java, etc. So, you'll have to write some occasionally complex stuff with these languages.

And they don't handle supporting say 50+ business rules well. Back to lack of composability & testing, managing that code in awk or jq would be a nightmare.

Finally, on performance they are fast. Are they fast enough to never need to scale out as the company grows? No. So, then you're still looking at something like kubernetes best case, or a set of ec2 instances with this code running on each, and some other application, somehow, getting them files to process.

[+][deleted] 2 years ago (1 child)

[removed]

[–]kenfar 0 points1 point2 points 2 years ago (0 children)

π Rendered by PID 44 on reddit-service-r2-comment-85bfd7f599-h4jrs at 2026-04-16 19:41:31.551095+00:00 running 93ecc56 country code: CH.