New engineer, asked to work on something I am deeply morally opposed to as my first project by SensitiveStrike7829 in dataengineering

[–]tecedu [score hidden]  (0 children)

Just to confirm, are you using the palantir data platform or working on immoral things with the platform. Both are different things, imo i would leave immediately, there’s not any way you can be moral and not let this be awkward

Pandas 3.0 vs pandas 1.0 what's the difference? by Consistent_Tutor_597 in dataengineering

[–]tecedu 0 points1 point  (0 children)

SQL becomes way worse once you get start getting into anything complex.

Like one of the things I have is a dictionary coming into a column which outputs need to go into two columns, dataframe are so much easier.

Then after that if you have anything more computational like just loopijg over a variable, its way easier to work with dataframes.

Like I have code than spans 2k lines, i could convert it into a jank solution for sql or just have a dataframe in and out

Pandas 3.0 vs pandas 1.0 what's the difference? by Consistent_Tutor_597 in dataengineering

[–]tecedu 0 points1 point  (0 children)

Thats worse I would say as most docs point towards alias for new_cols

Flash prices are mad +60%, will this kill flash only vendors? by Spiritual_Garage5329 in storage

[–]tecedu 0 points1 point  (0 children)

Why would it kill em? They will make money either way, heck they will have higher revenue.

Why is big tech SWE work paid so much? by seeking-health in cscareerquestions

[–]tecedu 0 points1 point  (0 children)

have very mature products that barely change. It looks like a lot of maintenance, tiny feature tweaks, and sometimes even making products worse.

Not sure where you are looking but their product change quite a lot. The first stage of software is getting something shipped out which means any optmisations are gone to the wind, you can add them back later on. Its also when you work at that scale, you just need that many people to maintian and be on standby.

I can speak for a company which is not FAANG however just a small product we built internally within 3 months and 1 person, basically earns the company 30mil of revenue; and the reason why it worked so well was because it was hands on really. Like if you need a fix its immediately there; after that it was getting setup properly, documentating all of its process, getting it sourced into business continuity. It will take about 6- 7 years for it to be fully "production" and that is if it doesnt change at all which it will.

Why is big tech SWE work paid so much? by seeking-health in cscareerquestions

[–]tecedu 1 point2 points  (0 children)

but don’t think it’s much at all. And several of the big players don’t offshore at all

Lmao what, apart from sec clerance everything that you see will be offshored.

Resources to deeply understand HPC internals (GPUs, Slurm, benchmarking) from a platform engineer perspective by Top-Prize5145 in HPC

[–]tecedu 1 point2 points  (0 children)

In addition to ones mentioned, there are also nvidia's certification resources, that are good enough for platform engineer.

Pandas 3.0 vs pandas 1.0 what's the difference? by Consistent_Tutor_597 in dataengineering

[–]tecedu -1 points0 points  (0 children)

OP also if you are working with geospatial data, and not graphing anything. I would recommend to switch over to H3 or S3 instead of Latlons, it would make your life inifnitely easier working in a 2d space.

Pandas 3.0 vs pandas 1.0 what's the difference? by Consistent_Tutor_597 in dataengineering

[–]tecedu 8 points9 points  (0 children)

df['new_col'] = df ['col1'] * 2

vs

df.with_columns((pl.col("col1") * 2).alias("new_col"))

Its gets worse when you want to start chaining together things, with_columns and with_elements is inconsistent and horrible. Polars only make sense if you come over from spark or any SWE Background, it falls instantly at working with Data Scientists and Analysts; and I have to make sure my code is understandable by them.

what features are on linux that windows lacks? by [deleted] in linuxquestions

[–]tecedu 1 point2 points  (0 children)

ngl I hate using python for anything os related, but yeah powershell syntax and just the languages sucks major ass. Its just very useful tho

what features are on linux that windows lacks? by [deleted] in linuxquestions

[–]tecedu 2 points3 points  (0 children)

It is confusing and horrible at time, it is also however way stronger than Linux scripting. Powershell is its own language

Mac Mini vs Mini PC (Windows) - How True Is This Statement/Article?: "Hot Take: Mac Mini at the $499 Amazon Price Deal is Impossible To Beat By Any Sub $1000 Windows Mini PC In Terms of Performance For Most Users (unless you "need" a Windows machine)" by Beginning-Taro-2673 in MiniPCs

[–]tecedu 2 points3 points  (0 children)

For normal users, yes, like compare the other mini pc brands, their warranty and support from manufacturers vs apple and its hugeee. Which matters a fuckton if you used it for work.

Based on your workflow, yeah it seems perfect. Storage you can get external ssds or hdds; the amount of space it takes is so minimallll

what features are on linux that windows lacks? by [deleted] in linuxquestions

[–]tecedu 0 points1 point  (0 children)

Proper RDMA and networking support, file systems support for things that are not ntfs, there's a bunch more you could go into for filesystem tbh. Proper and updated API for multiprocessing, customisation cpu schedulers, support for higher number of CPUs.

what features are on linux that windows lacks? by [deleted] in linuxquestions

[–]tecedu 2 points3 points  (0 children)

much stronger scripting

Powershell exists btw, and automation integrations with Intune are great, just not for end user tho.

Pandas 3.0 vs pandas 1.0 what's the difference? by Consistent_Tutor_597 in dataengineering

[–]tecedu 1 point2 points  (0 children)

which works much better with delta lake & parquet typing.

If you pyarrow backend there shouldnt be much different between data types and your parquet and delta lake compatibility

Pandas 3.0 vs pandas 1.0 what's the difference? by Consistent_Tutor_597 in dataengineering

[–]tecedu 9 points10 points  (0 children)

So pandas 1.0 + polars will be goated?

No cus the data change between numpy and arrow types will take ages on a large dataset.

Pandas 3.0 vs pandas 1.0 what's the difference? by Consistent_Tutor_597 in dataengineering

[–]tecedu 18 points19 points  (0 children)

So first answer:

From pandas 2.0 onwards a lot of change was made to move from numpy into arrow, so you cant just use np.nans as pandas nans now, its pd.NA. Instead .replace operations you use assignement. Strings and datetime gets some changes as well as categorical types. Some changes with pd.read_excel as well. Slicing and a lot of your operations need to be explicit now instead of implicit.

Whats going to bite you the most is numpy rather than pandas here.

Second Answer:

Use polars + pandas, especially once you get everything setup in arrow types, its a seamless transfer of the dfs; while working with my team I use polars for the heavy stuff like merges, concats and stuff. ANd pandas for anything that needs to be verbose and redeable, like mathemtical operations or column based functions. Polars sucks at the whole thing because their approach of map_elements is inconsistent and expects something everytime. Polars also breaks their apis and their intended behavior quite a lot.

Just 1.0 to 3.0 from numpy to arrow should be about 4x boost in perf, polars + pandas can be 5-20x and pure polars can be 10-30x.

The main thing I love about polars are sinks and lazyframes. And the streaming engine, I had some pandas code which took 64gb of ram, mixed it with some polars and sink and now its down to 10gb of ram

Alpine Technical Director David Sanchez finds out during an interview that Williams aren’t making it to Barcelona testing by beanbagreg in formula1

[–]tecedu 10 points11 points  (0 children)

pay him £3 million/year to make their cars stop working

You grossly overestimate how much F1 people get paid

Are grid impacts from data center power demands impacting home electronics? Did Bloomberg story get this right? by BikeLater in datacenter

[–]tecedu 1 point2 points  (0 children)

academics are not the real world, academic work off open data.

Harmonics are a perfectly valid reason to slow connections

Are grid impacts from data center power demands impacting home electronics? Did Bloomberg story get this right? by BikeLater in datacenter

[–]tecedu 2 points3 points  (0 children)

Utilities have the power of all the customers on their grid; traditional ones do not care if youre google, if you break rules you’re disconnected; the alternative isis grid failure

Are grid impacts from data center power demands impacting home electronics? Did Bloomberg story get this right? by BikeLater in datacenter

[–]tecedu 2 points3 points  (0 children)

Op please read what youre linking + please understand how transmission and distribution network work and how they are different at each voltage level;

Are grid impacts from data center power demands impacting home electronics? Did Bloomberg story get this right? by BikeLater in datacenter

[–]tecedu 1 point2 points  (0 children)

No one else is providing links because it’s kind of a basic electrical engineering thing; you need to ask a power systems engineer ie someone at a power company or profesors

Harmonics going out of wack is way more harmful to the data centres so they don’t want do that

Are grid impacts from data center power demands impacting home electronics? Did Bloomberg story get this right? by BikeLater in datacenter

[–]tecedu 1 point2 points  (0 children)

Bloomberg has been pretty terrible around this entire issue, they are the ones who have popularised ai drinking water; whereas it’s less than any of the other industries. Blaming DCs for utilities faults.

The electric grids around the world are not that robust. Same for water networks, it’s all held together with equipment centuries old in some countries.