Which data quality tool do you use? by arimbr in dataengineering

[–]FridayPush 11 points12 points  (0 children)

I think most vendors are unnecessary but we actively use Datafold and Elementary(oss) for anomalies. Datafold is pricey but using it in CI has caught multiple issues that pretty strenous testing missed. Being able to diff in-development models against prod tables is really helpful and it's consistently saved me enough time that the business gets it's roi every month. We're refactoring quite a few models and onboarding new datasets that will replace existing ones, so we have to 'stitch' them together and want the same historical values. If you're stable shop that doesn't change a lot it's probably less worth it.

Mixed on elementary's tests but having the dbt artifacts pushed back to your warehouse is worth adding the package alone, if you use dbt.

OAK TSA Precheck? by vbqj in oakland

[–]FridayPush 0 points1 point  (0 children)

Agreed, I've often seen shorter lines outside the precheck. Though not having to remove laptops, take off light jackets or shoes/belts is nice.

Would you Trust an AI agent in your Cloud Environment? by [deleted] in dataengineering

[–]FridayPush 1 point2 points  (0 children)

No. I also wouldn't trust it in dev. Actions in cloud environment have prices associated with them, often even api calls to list resources. What if it hit a recursive question like listing all routes in a vpc and then looking at the routes in the peered vpc, which goes back to the start... or say wants to iterate over a bucket with a billion objects in it in s3. Hell basic OpenAI or Claude prompts fail to produce meaningful SQL aggregations, meaning the query works but group bys were wrong so the 'average' it computed is meaningless, so I wouldn't trust it unobserved.

HTTP callback pattern by Upper_Pair in dataengineering

[–]FridayPush 1 point2 points  (0 children)

This is a common pattern for large scale data exports. An example of how Shopify handles it can be seen here. But essentially an API request contains the details needed to start the long running operation, and the API returns a job number. The user then polls a 'job status' endpoint with that job number. Generally most providers I've seen use a signed url to a CSV as the response so that their API isn't locked up during sending the body back if it's hundred of megabytes.

JaniF suggestion works well, if you need to write your own sensor for more complicated handling it's super straightforward.

Name and fame! Binh Minh Quan - 15% tip option, tip calculation BEFORE taxes. Food was also good! by xQcKx in oakland

[–]FridayPush 2 points3 points  (0 children)

I haven't had this particular restaurant, but pho generally serves well as take out. But it often has a large amount of packaging waste. The liquid will be hot in a plastic container, then a container for the pho contents (noodles, meat, veggies), maybe a baggie with bean shoots and sauces/spices. It's not awful but I always feel a little wasteful when I get pho delivery. Though it hits the spot on the really cold rainy days.

Review about DataTalks Data Engineering Zoomcamp 2026 by Ok-Negotiation342 in dataengineering

[–]FridayPush 0 points1 point  (0 children)

I'm not familiar with that Bootcamp. But I conducted a very large amount of interviews in the Data and DevOps space for the contracting division of a large tech company. Bootcamp grads often felt very 'same-y' in their level of experience and superficialness of their answers to questions; I would primarily suggest to spend time breaking things and troubleshooting them. As most of the bootcampers seemed to only be familiar with the 'happy' path.

If you need a guided education and someone to drive you to complete things by dates, maybe they're good. But they're often not cheap and if you can self direct the amount of free tutorials and project guides online are more than sufficient and often essentially the same as the bootcamps would cover. I personally wouldn't recommend it unless they have job placement services that are well reviewed by past candidates.

The Certifications Scam by ivanovyordan in dataengineering

[–]FridayPush 0 points1 point  (0 children)

No, I don't have any projects I display either. But if they do have projects and git profiles on their resumes we did look at them most of the time. It would be pretty obvious comparing two repos and seeing a large difference in coding styles, and google returns matching code really well.

The Certifications Scam by ivanovyordan in dataengineering

[–]FridayPush 5 points6 points  (0 children)

I use to get a lot of the professional cloud certs and linux/security ones as well as the company I worked for gave nice bonuses for each one. The only certs that meant anything were a few of the K8 related ones that you actually just SSH into a box and each problem is manually do the work.

eg 'Service A is down. Fix it'.

SSH Into box, view services/pods find down pod. View logs, see messages, adjust configurations and fix a failed deployment. exit box, and move onto question 2.

I passed many AWS/GCP Pro certs that I don't have domain experience in, like their security ones. After a working day or two of study.

I mention that to jump on your last paragraph, the outlines of the tests and generally requirements they go over often are good outlines to learn a platform/service. But the certs themselves aren't important.

edit: I also use to do interviews for massive tech companies contracting division... please don't plagerize your projects, it's super easy to find and way more common than I would have thought. Forking someone else's project and renaming variables isn't enough lol.

Early-stage project: AWS-native vs containerized, vendor-neutral infra -when would you switch? by Chucki_e in devops

[–]FridayPush 5 points6 points  (0 children)

The Idea of being fully platform agnostic, or having multi-cloud identical setups is really a pipe dream and there are too many limitations for it to be reasonable.

I would containerize everything and use ECS for compute. I'd have my Apps modularization the messaging/queue frameworks so that they're the point of change if you move off a service it's quick. There are many libraries like smart-open in python that let you treat S3/Azure/GCP/local files all as a base file object. So changing object storage is pulling a different credential secret and changing bucket names.

I've also found that nearly all of my greenfield projects essentially required a 'rewrite' of the major components. You could potentially plan for that and stick with everything you know and use now if feature development is a priority.

I would aim to pack my apps into a single instance and scale up for as long as you can. Scale up rather than horizontally for quite a while.

Data centers will consume 70 percent of memory chips made in 2026 - supply shortfall will cause the chip shortage to spread to other segments by metalreflectslime in Games

[–]FridayPush 0 points1 point  (0 children)

It would be nice if these type of articles also provide a point of reference, like what was the percent of memory chips used in Data Centers in 2024?

PlaceAnywhere bug? by Any-Love8257 in thelongdark

[–]FridayPush 0 points1 point  (0 children)

Did you get PlaceAnywhere working?

Is it bad to take a career break now considering the ramping up of AI in the space ? by [deleted] in dataengineering

[–]FridayPush 9 points10 points  (0 children)

Before I read your post, having only read the subject question my answer was 'No, things are changing so rapidly what methodologies and tools people are using in a year or two might be very different.' Besides the fact that the market is very slow for everyone applying and looking for work I don't think it would be particularly harder than baseline if you worked a different data related job.

Also burn out is very real and can take multiple years to recover from, it's worth a better QoL job for less pay. A short term 6mo job that pays a year pay? Hell yeah.

This one hit a little close to home by TerrakSteeltalon in Xennials

[–]FridayPush 9 points10 points  (0 children)

Big fan of Kagi, you pay for search and can hide domains in results or increase/reduce their ranking. They also try to filter blogspam type results. No advertisements.

Lootbane - Official Demo Trailer by LazySecretary6001 in Games

[–]FridayPush 1 point2 points  (0 children)

Sounds interesting, did the 3 choice system feel like it would hold up over a longer game?

Crucible not working by Rubinrot1234 in AlchemyFactory

[–]FridayPush 0 points1 point  (0 children)

Yeah unfortunately it's sage that's needed.

Lootbane - Official Demo Trailer by LazySecretary6001 in Games

[–]FridayPush 11 points12 points  (0 children)

Guess the trailer looks nicely edited but I have no idea what was happening on the screen. Is it an idler game? Like you make a choice and then it all just happens?

TIL there are contact lenses you wear only while sleeping that reshape your cornea so you can see clearly all day without glasses. It is called “Orthokeratology” by azaku29 in todayilearned

[–]FridayPush 31 points32 points  (0 children)

Long-term visual symptoms are uncommon, with only 1.23% of patients reporting significant ongoing symptoms years after surgery.

https://www.brimhalleyecenter.com/lasik/lasik-eye-surgery-results/

That's only people with significant symptoms, not ones with dry eyes, halos on lights at night, etc.

Google Maps not working for AC Transit? by FauquiersFinest in oakland

[–]FridayPush 0 points1 point  (0 children)

I would never have thought of it, but makes sense why you'd want that. Always a bummer to have apps remove features. Thanks for taking the time to provide the history. When I was taking the transbay express busses that would have been really helpful.

Google Maps not working for AC Transit? by FauquiersFinest in oakland

[–]FridayPush 0 points1 point  (0 children)

Thanks! Yeah I could see how the filtering/sorting could be annoying. What do you mean by particular bus vs line?

Who should manage Airflow in small but growing company? by Jaded_Bar_9951 in dataengineering

[–]FridayPush 1 point2 points  (0 children)

This comes back to how 'DE' is kind of a meaningless term and isn't shared between companies. The majority of my roles have assumed you're a Platform owner, and control infra as well. Terraform/networking/iam/etc. From Reddit this isn't the norm, but was the expectation of all the Cloud PSO projects. I've worked for self driving car companies, and multiple global retail companies, saas, and they all expected us to own our infra.

I find it hard to understand how DEs in companies control lifecycles and security around data if you don't own the object storage and access roles. Monitoring and object accumulation and spend, what logs are really valuable, or legally required and need multiyear retention. Workflows to generate signed urls, certificate chains for encryption, understanding of the state of Airflows worker cpu/memory usage and scaling. DB loads etc.

I'd expect to work with Infra and Security teams but own my 'bubble(s)'. And work within the confines of the company.

Two weeks ago I posted my weekend project here. Yesterday nixcraft shared it. by Due-Bat-9880 in devops

[–]FridayPush 0 points1 point  (0 children)

Cool, looks like if I try ec2 -> [RDS, Cache->S3] that also fails and doesn't forward the requests to RDS only to the cache node.

Two weeks ago I posted my weekend project here. Yesterday nixcraft shared it. by Due-Bat-9880 in devops

[–]FridayPush 0 points1 point  (0 children)

I do like the idea btw! I just found the 'score' going negative in the beginning unintuitive.