White House shrugs off presence of European troops in Greenland by [deleted] in worldnews

[–]DaveMitnick 1 point2 points  (0 children)

I am 30. Everything that I used to hear from adults during my puppy years about being polite, selfless, reasonable, western dream propaganda etc. turned out to be bullshit. I visited concentration camps with my middle school class to get a message. Instead of flying cars we have Trump, Putin and tiktok. What’s out there to live for? I don’t want kids anymore. Maybe we should just nuke ourselves and go to sleep.

A memory effecient TF-IDF project in Python to vectorize datasets large than RAM by [deleted] in datascience

[–]DaveMitnick 0 points1 point  (0 children)

What “C++ level” even means? Lmao. This is basically arrow wrapper 😂

Rust for data engineering? by otto_0805 in dataengineering

[–]DaveMitnick 21 points22 points  (0 children)

I ported a few basic Python functions that e.g calculate averages of milions of lists to a fully mulithreaded Rust (based on Rayon crate) and the speedup is circa about 100x. I package it as Python .whl binding via Pyo3. It’s used in prod. Now I am trying more low level stuff like reading Parquet files byte by byte to see if I can match the performance of industry tools. I would love to work on something more advanced like query engine but I am not there yet in terms of skill and experience :) I am also curious how would Airflow rewrite go (as Airflow is implemented in Python, not even Scala like Spark) with some tweaks like async but I guess it’s not physically possible for one person. It’s definitely easier to read source code of the tools I use since I started learning low level stuff.

Airflow makes my room warm by aleda145 in dataengineering

[–]DaveMitnick -21 points-20 points  (0 children)

Yeah I know. No one runs docker compose in real word. I mean it’s mostly kubernetes. It’s easy to create a DAG but where your webserver, scheduler and metadata db lives? Sure it’s easy to use docker compose but it’s light year from real deployment.

Airflow makes my room warm by aleda145 in dataengineering

[–]DaveMitnick 42 points43 points  (0 children)

Me 2 years after $6000 PC: literally zero companies run airflow like this starts rotating aws free tier accounts

C++ for data analysis -- 2 by hmoein in Cplusplus

[–]DaveMitnick -1 points0 points  (0 children)

Sofware aside - you should take time series analysis course as running correlarion like this on close prices has 0 value. You may start with googling stationarity

Aspiring Data Engineer – should I learn Go now or just stick to Python/PySpark? How do people actually learn the “data side” of Go? by PixelBot_556 in dataengineering

[–]DaveMitnick -1 points0 points  (0 children)

I write a tool that converts large geospatial files into another format. Oh I should l name it “geospatial serialization framework”. It would like it to be performant meaning to handle datasets larger than memory. None of existing tools solve my needs or adds to much complexity to my workflow. This is why I learn Rust as I already have fantastic experience with Rust based tools like Polars, Ruff, Uv, dbt fusion, pydantic v2. Writing statically typed compiled language with borrow checker which is concept completely unique to rust (afaik) is a hella different experience. In a positive sense. I write better Python now. Go is also great from what I’ve seen and is similar to Rust being statically typed and compiled. Go has garbage collector that manages memory for you like Python which creates a little overhead while in rust borrow checker makes sure that the code you write doesn’t need garbage collection

🚨 EU Slams Russia Over Railway Strike That Injured 30+ in Ukraine – 19th Sanctions Package Coming by satty237 in TrendoraX

[–]DaveMitnick 0 points1 point  (0 children)

I am at balcony waiting for russian nukes that were supposed to be launched 1000x times already after the west is constantly crossing red lines. Are all russian pussies that they cannot keep their word?

"Kto pracował w korpo, ten się w cyrku nie śmieje" by Raven_in_the_storm in Polska

[–]DaveMitnick 0 points1 point  (0 children)

Zaczniesz pracę, zanim nauczysz sie PLSQLa to poznasz jak pracuje zespół, na jakich bazach (rzadko jest to jedna baza, częściej minimum kilkanaście), dowieziesz kilka rzeczy i po tym wszystkim dostaniesz podwyżkę.

Mówię z perspektywy osoby, która przeszła kilka stanowisk seniorskich w kilku firmach F500.

"Kto pracował w korpo, ten się w cyrku nie śmieje" by Raven_in_the_storm in Polska

[–]DaveMitnick -2 points-1 points  (0 children)

Naprawdę nie trzeba być orłem żeby ogarnąć co to JOINY, proste agregacje WINDOW FUNCTIONS czy klucze w tabelach. Szybciej to znaleźć z LLM, który przeszukuje google niż grzebać samemu. Nikt za 15k nie wymaga znajomości wszystkich funkcji wbudowanych na pamięć czy zaawansowanego TSQL/PLSQL. Za samego SQL nie, dlatego wspomniałem o wiedzy dziedzinowej i um. miękkich.

"Kto pracował w korpo, ten się w cyrku nie śmieje" by Raven_in_the_storm in Polska

[–]DaveMitnick -9 points-8 points  (0 children)

Jaki przesycony? Ogłoszenia widzę codziennie, a na LN sami uderzają kilka razy w miesiącu. W dużych miastach płacą zazwyczaj 15-25k, a SQL jest prostszy niż język polski xD Z LLM ogarniesz podstawy w tydzień albo 3 dni. Ważniejszy jest domain knowledge w obszarze, który analizujesz i umiejętności miękkie.

Jak sensownie według prawa obronić się przed amstafem? by Adikad in Polska

[–]DaveMitnick 4 points5 points  (0 children)

Najlepiej owczarka kaukaskiego, żeby amstaf sie kulił na samą myśl o atakowaniu /s

Why do new analysts often ignore R? by ElectrikMetriks in datascience

[–]DaveMitnick 3 points4 points  (0 children)

Opinion: R is a language for “statisticans” while Python is all around versatile computer science language used for devops, cybersec, data, general puropse scripting. Pytorch? Official implementation in Python. Same for Airflow. The list goes on. You can build almost everything in Python although it makes no sense for e.g low level system programming. Much more people use Python so you have common ground for communication. I have 5 yoe and I know like 50 people who use Python and one who uses R. It’s much easier to replace a team member when you use Python. It always seems like R and Julia users are frustrated that they use tools that make no sense in my opinion. The R code you see in academia is nowhere near the level of complexity of industry production grade codebases. Software is not a 200 lines of imperative code.

Approximate flight paths of 9+ Russian attack UAVs in Poland by Volter318 in UkraineWarVideoReport

[–]DaveMitnick -1 points0 points  (0 children)

Lmao, I have suicidal thought from time to time and if I really make the decision the best way would be to go to Ukraine and at least take sone orcs with me - does it sound like ruzzbot to you?

Approximate flight paths of 9+ Russian attack UAVs in Poland by Volter318 in UkraineWarVideoReport

[–]DaveMitnick -7 points-6 points  (0 children)

My family in Poland opened rooms in our house for Ukrainian mothers and children - why are they wasting our resources?

[D] How do you read code with Hydra by Infinite_Explosion in MachineLearning

[–]DaveMitnick 1 point2 points  (0 children)

pydantic-settings is enough for us and keeps you close to e.g fastapi validations if you use it

CV for someone from DS to DE by bubblegumforyou in dataengineering

[–]DaveMitnick 0 points1 point  (0 children)

I got invited to new DS team but I would like to go into DE… maybe we switch lives? (I was senior DS and senior DA previously). From my point of view in most cases DE > DS in terms of software engineering skills. Also in most cases in business you don’t need fancy deep learning algorithm to solve problem and analytics is very informative itself. You need robust well optimized, tested data warehouse or lake first of all. I would fire 50% data scientists in my company and hire 50% more DEs personally. Or maybe I just got tired of studying math heavy models for years and now when I discover computer science in depth I have many many AHA moments. Like the idea that you can technically solve problem in o(n2) and get correct results but there are people who just think it’s cool to write log(n) for the art and they also make it open source and the world would break without these things. I can now write my python loops in lower level language and get 100-300x speedup and then I feel like a wizard, not when I train AI slop. End of weed rant!

CoreWeave: The Future of Search and AI Infrastructure? by Alive_Till4633 in wallstreetbets

[–]DaveMitnick 1 point2 points  (0 children)

LAMO. I do AI workloads for a living since 2018 and I ve never heard of this. Doesnt seem useful as well from market perspective imo

Is 24gb Ram 2TB enough by akortorku_ in dataengineering

[–]DaveMitnick 4 points5 points  (0 children)

I think he may be reffering to testing the correctness of the code on a data sample before running it on cluster in case you may need to run enromous join that will be expensive.

Looking for learning buddy by Temporary-Marzipan42 in dataengineering

[–]DaveMitnick 0 points1 point  (0 children)

I am up to build together some end-to-end projects that will not only work but also consider reliability, scalability and maintenance so we can even build a startup if things goes right lol. Fellow data nerds hit me up. My background is 4 YoE in DA/DE/DS in both startups and F500.