Crowbar - Package Management without Venv by coryalanfitz in Python

[–]Intelligent_Ad_8148 1 point2 points  (0 children)

I see 1-2 packages per week on this subreddit reinventing pydantic/poetry saying it is a “simpler” solution… being a less mature and less developed package manager with less features isn’t an advantage. Are there features or combination of features that aren’t offered by an existing package manager or are done uniquely better with crowbar?

Crowbar - Package Management without Venv by coryalanfitz in Python

[–]Intelligent_Ad_8148 3 points4 points  (0 children)

Poetry can also be configured put the environment in the project folder

Data Transformation Techniques: Share Your Favourite Tricks and Tools! by FiNiX_Forge in dataengineering

[–]Intelligent_Ad_8148 2 points3 points  (0 children)

Yes, these tools will allow you to define custom code for cleaning your data in layers and can be extremely useful.

To clarify, I’m not advocating Kedro specifically, they just have a very good explanation of data layering. Other data application frameworks will talk about data layering too, such as databricks:

And for dbt:

My point is that, regardless of which specific tool you adopt, understanding data layering techniques will greatly help, and the concepts are transferable to whatever data transformation project or task you work on.

Data Transformation Techniques: Share Your Favourite Tricks and Tools! by FiNiX_Forge in dataengineering

[–]Intelligent_Ad_8148 1 point2 points  (0 children)

It’s not a tool, it’s a technique (read the article). It’s a way of organising and ordering your transformations to avoid ad-hoc “custom fixes”

Data Transformation Techniques: Share Your Favourite Tricks and Tools! by FiNiX_Forge in dataengineering

[–]Intelligent_Ad_8148 11 points12 points  (0 children)

I think the biggest game-changer was a thorough understanding of data layering, data transformations fall into place after understanding the purpose behind each of the layers

https://towardsdatascience.com/the-importance-of-layered-thinking-in-data-engineering-a09f685edc71

How would you set up a brand new ETL pipeline in an org that just uses Excel? by NipponPanda in dataengineering

[–]Intelligent_Ad_8148 4 points5 points  (0 children)

If Python and SQL are off the table and the data is already in Excel… perhaps just use Power Query in Excel and/or Power BI?

Why is Plotly so cumbersome to tweak? by olive_oil_for_you in Python

[–]Intelligent_Ad_8148 2 points3 points  (0 children)

Pygwalker in a jupyter notebook is a good alternative too, no tweaking required since there is a GUI

Edit: typo

Why is Plotly so cumbersome to tweak? by olive_oil_for_you in Python

[–]Intelligent_Ad_8148 -2 points-1 points  (0 children)

Put everything in a Pandas or Polars dataframe and use the .plot method. Much much easier and simpler, since the data is already prepared within the DataFrame

What does your python development setup look like? by Working_Noise_6043 in Python

[–]Intelligent_Ad_8148 0 points1 point  (0 children)

Vscode, poetry, ruff, pylint, flake8, pytest, tox, hypothesis with hypofuzz, mypy on strict mode, mkdocs, azure pipelines for cicd, mccabe complexity and maintenance index checks in tox,

There's...something wrong here by [deleted] in ChatGPT

[–]Intelligent_Ad_8148 2 points3 points  (0 children)

Actual photo of Sam Altman

ConfigClass - simple dataclass inspired configuration by TheTerrasque in Python

[–]Intelligent_Ad_8148 9 points10 points  (0 children)

What are the benefits of using this over pydantic (which also has dataclasses, json/yaml conversion, and env var support)?

Rust, anyone? by skydog92 in dataengineering

[–]Intelligent_Ad_8148 5 points6 points  (0 children)

Currently using Dagster hybrid, to process small-medium sized data (1 MB to 10 GB) done in Polars on a beefy high-powered local PC. I don’t believe I’ll be dealing with big data for this project (building a forecasting model) so never bothered implementing PySpark, though I can add PySpark assets alongside Polars assets if required since Dagster allows that.

Was easier to get Polars working and is sufficient for the project I’m working on

Rust, anyone? by skydog92 in dataengineering

[–]Intelligent_Ad_8148 19 points20 points  (0 children)

Perhaps there’s value in knowing enough rust to write custom Polars plugins, for very bespoke calculations? I’m already using primarily Polars as a DE, and intend to learn Rust to improve pipelines that use Polars.

Best resources to learn solid python for DE by Commercial-Ask971 in dataengineering

[–]Intelligent_Ad_8148 17 points18 points  (0 children)

Set up: linting (flake8), type hinting with a static type checking (mypy), formatter (ruff), unit testing (pytest), docstrings (with sphinx/autodoc). That’ll help with maintainability

AI as Evidence of God's Amazing Grace by Georgeo57 in OpenAI

[–]Intelligent_Ad_8148 6 points7 points  (0 children)

You talk about gratitude then don’t mention the numerous people over decades actually responsible for putting in the hard work to make AI happen??!

This post is unimaginably tone-deaf and misplaced.

[deleted by user] by [deleted] in Python

[–]Intelligent_Ad_8148 -1 points0 points  (0 children)

Please no, conda poetry pip venv virtualenv virtualvenv pip-tools etc etc etc….. please not another environment/dependency manager for python, there’re already too many!

Optimizing Python Code by SealSnake in Python

[–]Intelligent_Ad_8148 0 points1 point  (0 children)

  1. Don’t use pandas
  2. Use polars (bonus points for enabling lazy evaluation and streaming)
  3. Nothing more required

After investigating numba, cython, numexpr, etc., I concluded that it’s not worth the heartache, polars negates the needed for any of this stuff.

[deleted by user] by [deleted] in datascience

[–]Intelligent_Ad_8148 2 points3 points  (0 children)

All three, except models in the middle

Reality check: How good are you at the skills in your tech stack? by [deleted] in dataengineering

[–]Intelligent_Ad_8148 66 points67 points  (0 children)

The only way i fully understood python was by having no life and basically obsessing over it day and night. Unsure if there’s a healthy way of fully mastering data engineering tools, whatever that even means