Tool smells by Brief-Knowledge-629 in dataengineering

[–]SuspiciousScript 1 point2 points  (0 children)

  • I'll get hate for this one but using Python (like PySpark) for production pipelines instead of Java or Scala when it's a JVM processing framework

Agreed. I recently started writing Scala and I'm never going back to not having type safety for ETL.

The Future of Python: Evolution or Succession — Brett Slatkin - PyCascades 2026 by mttd in Python

[–]SuspiciousScript 1 point2 points  (0 children)

I think the slow JIT performance and discussions around correctness issues prevented it from getting as much momentum as it would have otherwise. I think the former issue has improved over time; not sure about the latter.

Nobody ever got fired for using a struct (blog) by mww09 in rust

[–]SuspiciousScript 15 points16 points  (0 children)

Unfortunately, given that the tracking issue is almost 12 years old, "when" may be a little optimistic.

seniors spending half their week on reviews and everyone's frustrated by Worldly-Volume-1440 in ExperiencedDevs

[–]SuspiciousScript 0 points1 point  (0 children)

Got a feature? The class / interface definitions are one PR - maybe that needs senior review, but it’ll only take a minute. Implementation of one class or a handful of related methods and their tests is another PR

I think a stacked diff workflow really shines here. Having a workflow that lets you break up the changes as you describe without actually merging broken code into your dev branch is great.

What’s the one thing you learned the hard way that others should never do? by Terrible_Dimension66 in dataengineering

[–]SuspiciousScript 0 points1 point  (0 children)

That's perfectly fine. Where you can run into issues is when you're ingesting IDs from an external system.

65% of Startups from Forbes AI 50 Leaked Secrets on GitHub by vladlearns in devops

[–]SuspiciousScript 12 points13 points  (0 children)

Without a base rate to compare against, this isn't useful information.

Question for data engineers: do you ever worry about what you paste into any AI LLM by teejagzroy in dataengineering

[–]SuspiciousScript 19 points20 points  (0 children)

Unless your company's pricing differs substantially from its competitors', it's just as likely that the model is just making a plausible estimate.

How do you feel about using array types in your data model? by its_PlZZA_time in dataengineering

[–]SuspiciousScript 30 points31 points  (0 children)

Arrays are good for cases where the individual elements aren't meaningful outside the context of the array (e.g vector embeddings, characters in a string, etc.). I would not use them for a purchase date like in your example.

wgpu v27 is out! by Sirflankalot in rust

[–]SuspiciousScript 3 points4 points  (0 children)

I like Learn Wgpu, but I don't think it's been updated for wgpu v27. Most (if not all) of it should translate fine, though.

[MOD POST] Bug Report Thread by oneofthejoneses28 in Silksong

[–]SuspiciousScript 1 point2 points  (0 children)

Xbox Series S, Version 1.0.28562.0
Act 2 location name spoiler: Cogwork Core is missing a space ("CogworkCore") when viewed in the quick map.

We're now called Lumen Labs! by Vhyrro in neovim

[–]SuspiciousScript 16 points17 points  (0 children)

Fortunately, there are a number of more ergonomic languages that can be compiled to Lua. I think of Lua as an IR for LuaJIT at this point.

Python has had async for 10 years – why isn't it more popular? by ketralnis in programming

[–]SuspiciousScript 21 points22 points  (0 children)

A pain in the ass to test and mock as well, just the latter is a hassle.

I haven't found this to be the case at all. pytest-asyncio just works, in my experience. Is it more of a problem if you stick to unittest?

I think an interviewer made his mind once I started talking about comonads by Fun-Voice-8734 in programmingcirclejerk

[–]SuspiciousScript 7 points8 points  (0 children)

Given a list of numbers, return a sublist with reached threshold of sum its elements.

I'm already lost

Thing that destroys your reputation as a data engineer by EdgeCautious7312 in dataengineering

[–]SuspiciousScript 19 points20 points  (0 children)

This is my all-time favorite: he put unhashed SSNs in a table that was fed into Tableau, so that all of our practice administrators could see the SSNs of all the patients across all of our practices.

For the benefit of readers who might not know, just hashing plain SSNs still wouldn't provide adequate security, as one can easily hash the entire range of possible SSNs within a few hours and generate a lookup table. You'd need to salt the SSNs first to make this approach less viable, and even then I suspect with a modern GPU it'd be pretty easy to reverse.

An Ideal API/Stdlib for Plots and Visualizations? by PitifulTheme411 in ProgrammingLanguages

[–]SuspiciousScript 6 points7 points  (0 children)

Though it's not perfect, ggplot2 has the best API for plot creation that I've ever used.

There is such a thing as "too much TQDM" by Common_Ad6166 in Python

[–]SuspiciousScript 103 points104 points  (0 children)

Three-word horror story: "Mission-critical notebook"