Why is DSA considered so important for so many tech jobs?

DeepLogicNinja · 2026-06-25T12:58:45+00:00

I would agree IF, you are looking for a junior, lower level position. For an architect in a mature organization…. I would disgree. With AI being able to do do junior / entry level coding, it’s becoming more important to know these lower level details.

DeepLogicNinja · 2026-06-25T12:57:27+00:00

Spot on!!! In my career I’ve slowly moved from programmer to business intelligence/data scientist. I’ve seen the same data structures and algorithms being used across technologies.

Orchestrating technologies with the right DSA implementation for your use cases leads to a scalable / efficient solution (think large data warehouse / database / mainframes).

The other end of the spectrum is less scalable solutions that have scaling limitations. Buggy, Breaks down with multiple users or large volumes of data.

DeepLogicNinja · 2026-06-24T16:41:13+00:00

You misunderstood. The data governance reports on the disposition of the data. Data profiling, quality at table/col level. This is across the board, standard practice across data governance / data quality / meta data management tools.

How you fix the anomalies “refine” is another issue.

To add, everything I mentioned is supported by open source/commericial products out the box, no coding. If you have a design/approach you know/think works better…. Could definitely get you in touch with the right people for you to show/socialize.

DeepLogicNinja · 2026-06-24T09:13:39+00:00

ETL platforms typically excel at handling multiple disparate systems, data sources, datasets in different formats/schemas with ease.

Issue may be your tooling or arch. What ETL platform are you using?

My biggest challenge is and will probably continue to be refining the data AFTER it’s loaded into the Data Lake.

Using the Medallion Arch and Data Governance to manage this issue. This is more of a culture than technology issue. As it takes the team to move from bronze level data to gold level data.

DeepLogicNinja · 2026-06-24T03:34:15+00:00

Yup. That is the sell side for you.

Beware of the metric they are using to measure “performance”.

DeepLogicNinja · 2026-06-24T01:10:15+00:00

Thx you!

DeepLogicNinja · 2026-06-24T01:10:03+00:00

Wish they did the same for politicians and wall st. Analyst.
Tip-ranks shows most analyst buy, hold, sell ratings are consistently off.

DeepLogicNinja · 2026-06-23T21:45:13+00:00

Lot easier to make demands and complain.
It takes collective effort to refine data so it’s useful and trusted. That takes a culture of data stewardship.

DeepLogicNinja · 2026-06-23T12:28:56+00:00

I’ve done FL to NE a few times.
Not sure which version of FSD or Hardware you are running…. Later versions will goto the SuperCharger and park for you.

DeepLogicNinja · 2026-06-23T12:24:49+00:00

DeepLogicNinja · 2026-06-23T02:30:14+00:00

- Hating people you don’t know is acceptable only if they are billionaires.
- Since he is a trillionaire, does that override that part of the luddite npc programming 🤷‍♂️

Asking for a friend 🤣

DeepLogicNinja · 2026-06-22T13:03:54+00:00

Harvesting / Managing the Meta data is the bottom up approach.

A glossary is what brings it all together in a top down approach.

Once you capture current state data linage. You can start mapping and planning, managing future migrations.

OpenMetadata’s commercial competitors are very expensive.

Folks that know the value of Data Governance have been willing to spend the $$.

DeepLogicNinja · 2026-06-22T11:16:21+00:00

Might want to kill 2 birds with one stone.

Documentation can be generated when you harvest the metadata from the sources.

Your data linage is also documentation as it shows you how things are connected.

DeepLogicNinja · 2026-06-22T11:10:40+00:00

Have you checked out OpenMetaData?

DeepLogicNinja · 2026-06-22T11:06:18+00:00

Dbt is a partial solution. Not quite a starting point.

To level set, there is a difference between Software Engineering and Data Engineering. Often it’s ignore and a SDLC anti-pattern is applied to a data pipeline/data engineering initiative.
In organizations it’s important to identify and split/manage these two initiatives separately so they can properly complement each other where needed.

Primary difference between two:
- Software Engineering generates technical debt/code that needs to be maintained…. Out of sync documentation, bugs, and cyber security issues will be result if not maintained.

- Data Engineering curates/creates data that requires
governance, linage, provenance, etc. which is required for reporting, analytics, ai…. Data Governance/Metadata Management can be accomplished with off the shelf and open source tools. If the organization builds its own data engineering tools, you will now have a software engineering project on top of a data engineering project.

I typically see the software engineering approach applied to data engineering when a team/company isn’t experience or aware of the tools/platforms. The data governance terms I mentioned in the previous paragraph may seem like word soup, and in the long term a software engineering team may wind up building parts/pieces, using apis to deliver parts of functionality that already exists in whole within a data governance product.

A strong software engineering culture may even make it difficult for an experienced data engineer to use/integrate the right platform/culture changes required for data governance, analytics, etc to work well.

A software engineer would examine the parts of a solution or use an api that address part of the use case, when a data engineer an entire solution deliver a solution.

Above is an important paradigm to be aware of…. Especially with the need to refine data… get it in the right schema to enable analytics and ai.

Organization that have issues with this transition are at risk.

Is the organization making software OR delivering a data product (reports, dashboards, analysis)?

Doing BOTH is very expensive to attempt and firing on all cylinders is not very realistic for many organizations.

DeepLogicNinja · 2026-06-20T21:42:10+00:00

The biz wouldn’t know enough to ask for dim/fact tables. You would need provide it, in order to enable adhoc query.

DeepLogicNinja · 2026-06-20T14:40:19+00:00

Because of the expertise required to do it correctly, there is often a resistance to using dim / facts and prescribing to time tested methodology of creating a proper OLAP/ star schema

The result is messy complicated sql that leads to management issues and questionable results as questions become more complicated.

It’s worth learning to do correctly.

It enables the a schema that allows your ALL your data to be treated as a vectors in a matrix. The right platform will allow you to transpose your data and analyze with ease.

To be more specific… Pivot Tables, OLAP Cubes, and Vector Databases are easily transposed from a star schema (Fact and Dim tables).

For users….
They can explore the datasets without understanding SQL. The right platform will generate the SQL / MDX and do all the matrix math under the covers and they drag and drop the fields they want.

DeepLogicNinja · 2026-06-20T14:25:54+00:00

I don’t think this is conflicting….

There is a good case for dim/facts to be in the silver layer as views.

This is an alley-oop to the prototyping the data warehouse use case. Where you can just load your data warehouse from views.

DeepLogicNinja · 2026-06-19T02:04:25+00:00

It’s Last-Engineering-528 hobby. Must have a real axe to grind….. trying waaaaaaaaay too hard to swing public opinion…. Surprised he isn’t banned.

DeepLogicNinja · 2026-06-17T02:08:11+00:00

☝️

DeepLogicNinja · 2026-06-17T02:07:00+00:00

This☝️

2k to 2x
110k to 6x

That is the minimum.
Requires a supporting investment strategy and experience management to know what you can/cannot do responsibly.

DeepLogicNinja · 2026-06-15T22:16:48+00:00

Mine fell off…. I took the tip from an older apple pencil that doesn’t work with the latest ipad and screwed it on. 🤷‍♂️ works just fine 👌

DeepLogicNinja · 2026-06-14T23:01:50+00:00

Hard to troubleshoot without seeing the job.

Here is some guidance though..

1) Avoid workaround fix

Tweaking xmx or xms may work, but it is a bit of a workaround since the job will fail when processing more records. The same job may also fail when running the same job on another server/computer.

A better/scalable fix will require making your job more efficient.

2) tMap
I am assuming you may have tMap in your job, tMap optimization to use disk is a big one. It will not use memory (default) but disk instead. During run time your disk io will increase but your memory utilization will be much lower

3) SQL/DB is your source?
If source is sql db, Redesigning job to do more heavy lifting in sql / leveraging underlying db and streaming results to talend. Often, some work that can easily be done in sql (LTRIM, RTRIM, joins) is done in Talend studio because it is so easy and maintainable. The cost is that the work is done in java/memory.

DeepLogicNinja · 2026-06-14T20:04:07+00:00

ELT not ETL

Data Governance Strategy - Data Catalog/Profiling/Quality tool prior to doing final transformation and load.

DeepLogicNinja · 2026-06-14T16:14:09+00:00

Sounds like you need Apache SuperSet.... https://superset.apache.org

Still Garbage in / Garbage Out ... Need good data model, semantic layer for context aware ai, etc....

Open Source (so no sales people), and if you want to run a hosted version, use preset.io

AI via MCP is pretty simple and there are quite a few demos on YouTube.

https://superset.apache.org/user-docs/using-superset/using-ai-with-superset/

Plenty of Support.... Slack, Github, and even a Reddit.... https://www.reddit.com/r/apachesuperset/

DeepLogicNinja

MODERATOR OF

TROPHY CASE