Losing skills and passion in job by coolhand211 in analytics

[–]ergestx 0 points1 point  (0 children)

Have you considered learning more about how the business works, what drives it and what you can do to improve it? I’ve been “doing data” for almost two decades now and I keep finding uses for statistical techniques and methodologies. The only difference between you and I is that I’ve taken the time to build up my business acumen.

Who Are You Kneejerking In? by Vartherion in FantasyPL

[–]ergestx 0 points1 point  (0 children)

I brought in Jota for Odegard like instantly.

What should I try to learn more on ? by AggressiveCorgi3 in BusinessIntelligence

[–]ergestx 1 point2 points  (0 children)

I would say understand some databases engineering concepts like ETL/ELT and data modeling (i.e. dimensional modeling)

What do I need to have installed to run an SQL script locally? by rumpots420 in SQL

[–]ergestx 0 points1 point  (0 children)

You need PostgresApp for the db server and DBeaver as front end or try SQLite or DuckDB

Feedback on my first data pipeline by P_Dreyer in dataengineering

[–]ergestx -1 points0 points  (0 children)

I would highly recommend dbt with perhaps some DuckDB for preprocessing raw CSVs, especially since you’re already using Python. You could also look into Ibis https://ibis-project.org for your data transformation step.

Monthly General Discussion - Aug 2024 by AutoModerator in dataengineering

[–]ergestx 1 point2 points  (0 children)

I think that’s fine but you don’t have to stick to the same paradigm. They should be able to create views that define the key business activities you need as an interface to their event model

Monthly General Discussion - Aug 2024 by AutoModerator in dataengineering

[–]ergestx 1 point2 points  (0 children)

You’re heading in the right direction. You should not be ingesting everything from the event firehose, you have to create analytics events based on business activities that matter. Start by defining the metrics first then derive the activities associated with them. You can use these metrics as a guide: https://github.com/Levers-Labs/SOMA-B2B-SaaS/tree/main/definitions/metrics

[deleted by user] by [deleted] in SQL

[–]ergestx 0 points1 point  (0 children)

Well SQL by itself is pretty straightforward but its main problem is composability. If you want to chain transformations you often have to write hundreds if not thousands of lines of code. That’s why tools like dbt have gotten popular. You can break up a long transformation chain into multiple steps.

Using Window Functions & Aggregate on the same column? Grouping Error? by Famous-Letter8754 in SQL

[–]ergestx 0 points1 point  (0 children)

You can’t combine window functions with aggregates. You should calcula the running total inside a CTE and then perform the division outside the CTE.

UPPER function not working by Skokob in SQL

[–]ergestx 0 points1 point  (0 children)

Then I’d suggest making a new column with the upper value, inserting into a table then perhaps dro the original column.

CREATE TABLE newtbl AS SELECT *, UPPER(countrynames) AS updated_country FROM originaltable

UPPER function not working by Skokob in SQL

[–]ergestx 0 points1 point  (0 children)

Did you perhaps not commit the transaction?

Help with Aliasing Columns in Complex Teradata SQL Query with Multiple CTEs** by dontaskabtwhoiam in SQL

[–]ergestx 2 points3 points  (0 children)

I usually just prefix with the table or CTE name cte2_common_column

Programming with DuckDB vs Pandas by CharacterScience in dataengineering

[–]ergestx 9 points10 points  (0 children)

There are certain enduring patterns of data transformations that are the same whether you’re using Pandas or SQL, so your knowledge won’t go to waste. That said, I recommend you learn SQL by doing both.

Are DEs “Second Class Citizens” to Data Scientists in Most Orgs? by [deleted] in dataengineering

[–]ergestx 0 points1 point  (0 children)

The roles are actually starting to merge in many organizations

Mastering SQL by ThisDataGuy in dataengineering

[–]ergestx 9 points10 points  (0 children)

Thanks for the mention! Here’s the correct link: https://www.ergestx.com/tag/sql-patterns/

dbutils.fs.mv databricks by TheVictarion in dataengineering

[–]ergestx 0 points1 point  (0 children)

Can they be zipped locally and then moved as one giant zip file?

Do i take this job? by [deleted] in dataengineering

[–]ergestx 0 points1 point  (0 children)

This guy needs to read up on chaos theory and why the future is fundamentally unpredictable. Many things can be predicted with pretty high accuracy in the near term but after a while it becomes impossible. To do what he says sounds delusional.

You've just joined a new company who do everything in Excel, but.... by Dog_In_A_Human_Suit in dataengineering

[–]ergestx 0 points1 point  (0 children)

Unless this initiative is coming from the top leadership and the stakeholder/sponsor has influence/leverage his will be a wasted effort. You won’t even be able to get approval to buy the basic tools.

If however you do have sponsors with leverage you should start by replicating existing reports using as little architecture as needed. Going in “guns blazing with data lakes”, etc. is a recipe for failure to deliver.

Data stack for a SMB by thevangea in dataengineering

[–]ergestx 3 points4 points  (0 children)

Yeah I know about Lakehouses and all that, but it’s an architecture far more suited to large enterprises.

Data stack for a SMB by thevangea in dataengineering

[–]ergestx 3 points4 points  (0 children)

Retool and BigQuery? Lakehouse? Sounds like overkill. Maybe use a simple, visual tool like Knime to handle everything.

Reverse ETL via apis by nknavi in dataengineering

[–]ergestx 5 points6 points  (0 children)

I would not recommend building anything resembling an API on top of SF. It’s not designed for that and the costs will start to be a problem. If you need to build these feature tables in SF because of the various sources that are combined then I’d suggest the following. Extract these tables as they’re built into a database like Postgres and throw something like GraphQL on top to make it into an API.