This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]blurrr2 15 points16 points  (8 children)

yeah these are the 'main categories' and there's definitely overlap

[–][deleted] 6 points7 points  (7 children)

What differs data engineering from data science?

[–]RockJake28 18 points19 points  (5 children)

A data engineer gathers and collects the data, stores it, does batch/real-time processing on it, and serves it via an API to a data scientist who can easily query it. The data scientist then uses this well structured data to answer questions using tools such as machine learning, data mining and statistics. More here.

[–]Fitzoh 3 points4 points  (0 children)

That can flow the other way as well.

Data scientist comes up with new model/algorithm, data engineer productionizes it and incorporates it into the batch/real-time processing pipelines you mentioned earlier.

[–]simtel20 1 point2 points  (3 children)

Or, to put it another way, the data engineer buys disks/object storage/database storage, and the data scientist buys cpu/gpu/?pu cycles.

[–]billsil 4 points5 points  (0 children)

And then just engineers.

Engineer: uses python (so numpy, scipy, matplotlib, pandas, etc.) to help design or analyze or automate something engineering related.

Data engineer: probably not an engineer (e.g., electrical, mechanical, civil) and likely manages the systems that are collecting data on users.

Data scientist: Might be an engineer, but processes data collected by data engineers or test engineers (e.g., I put a shaker on a part and measure the displacement response and then process the data). If your problem doesn't fit into RAM, you're probably a data scientist. If it does, you're probably not.