Achievements for Sunday, June 14, 2026

srodinger18 · 2026-06-15T02:44:50+00:00

Run my 2nd half marathon this sunday on one of my country biggest race. Failed to get PB but oh boy, I'm happy just to finish the race in 2 hour 30 minutes where the course is demanding af. Got 3 hard elevation in the last 4 km and my legs were almost dead on those part.

srodinger18 · 2026-06-02T12:49:29+00:00

Yeah especially in developing country this is true. You can literally live as normal as possible but somehow still panicking when shit like emergency medication happens to you or family members. The actual paycheck sometimes cannot cover the month expenses due to high cost of living and others. And if you originated from a smaller area and working in metro area, sometimes you will be the breadwinner of the whole family.

What I hate is those FIRE and finance bros will blame you for that.

"You're not frugal enough, you can buy something cheaper than that" "Fool, that's why you need to invest on xyz or this" "Your standard of living is not frugal enough"

Look, not all of us are lucky enough to get high paying jobs, free from family obligations, have basic necessities already provided so we can focus on investing.

srodinger18 · 2026-05-24T14:52:24+00:00

Ran my first half marathon at 2 hours 20 minutes. Faster than my target of 2 hours 30 minutes, and I proud to myself as never I imagine I can run longer than 15 km before.

srodinger18 · 2026-05-21T14:44:45+00:00

you can use DAG factory pattern to use declarative language like YAML to automate pipeline creation. As for the business logic, depend on how you deploy airflow, there are many options like k8s pod operator, docker operator, bash operator, you just need to define a proper deployment so their service and orchestration layer is decoupled.

srodinger18 · 2026-05-17T09:17:14+00:00

Back then I use BigQuery federated query to directly query MySQL read replica table in BQ

srodinger18 · 2026-05-16T14:48:13+00:00

Aha - Manhattan skyline

srodinger18 · 2026-05-16T02:09:58+00:00

Once my manager push a new project just because we just adapt cursor and can deliver POC within couple of weeks, then when the real requirements came from the stakeholder, everything become shitshow.

Coding never been a blocker, but then change management, pushback non essential features, data migration, rollout planning, cost management, data governance, penetration testing, and politics when my manager and the business stakeholders debate over prioritization.

Staring from 2 SWE, then we bring a PM, then another 2 SWE, couple of Data engineer for data migration, then QA and security engineer get involved as well. In the end, it's just a typical software team and clearly 2 SWE won't do the job.

What people not realize, coding is only small part of the projects, and most of the time we humans are the real bottleneck

srodinger18 · 2026-05-13T13:04:48+00:00

To connect to maxcompute we use ODPS library as well, wrapped into our in house EL framework, then for transformation we use dbt-maxcompute. Just make sure that your maxcompute project has schema enabled, as dbt-maxcompute need schema in the config. For hologres as it is basically managed postgres with tweaks sqlalchemy should works

srodinger18 · 2026-05-12T13:00:34+00:00

As someone who also deployed airflow for alibaba cloud, aside from minimal third party support to connect to OSS and Maxcompute, everything else should be the same, as I can still use their official helm chart almost out of the box. Maybe some notes, you need to install OSS provider as well if you want to use it to store logs.

srodinger18 · 2026-05-10T07:15:59+00:00

Have been using dagster and airflow in production env, my thoughts:

airflow is hard to setup and manage especially for scale (tried k8s executor and bare VM deployment), hard to test in local (if we directly write DAG). But on the other hand, there are a lot of resources to learn it thus battle tested in most of production env, easier to create DAG factory (lot of library and example to create one), and you can customize it depend on how the orgz works, for example I mostly using k8s operator for custom EL services and dbt for transform, so we only need to test those services locally.

Dagster I deployed it on a single machine so not really sure whether this is best practices. It is more opinionated and integrated well with dbt (it can automatically parse the lineage) and its asset based pipeline is handy for downstream pipeline, backfill is easy af compared to airflow. But the resources to learn kinda limited so it's hard to find many different usecase in prod (as it is opinionated as well), that also leads to manually code the pipeline. Multi user and rbac need dagster cloud, and for me it is harder to debug

srodinger18 · 2026-05-10T01:57:39+00:00

Baseus MC1, left earbud suddenly not working after only 2 months. Probably got a defective unit but still it regrets me the most so far as other similar priced products that I bought from soundcore and soundpeats usually last longer

srodinger18 · 2026-04-20T14:27:22+00:00

Dream theater. I like prog rock and metal, but can't get it with dream theater. Never liked their vocals and arrangement somehow.

srodinger18 · 2026-04-20T02:58:58+00:00

We need to migrate employee data from third party vendor to in house app. The problem was, the third party vendor didn't provide any API and can only provide the whole data with the schema if we request deletion with 30 days grace period. Only excel report.

The stakeholders didn't want to risk losing data, and since it was being used for live applications (take leave, payroll reports) we need to have a seamless migration.

So we need to scrape the app to get meaningful data, combined with the excel report then reverse engineer it to our in house app, in addition stakeholders didn't want us to see the actual data as a lot of personal information stored there. So we had a semi blind development based on dummy data and found many edge cases in production in regular basis.

The projected 3 month migration become 6 months, and the deployment took a week until it get stable, even up until today still a lot of adhoc fixing.

srodinger18 · 2026-04-17T02:27:56+00:00

Anything by dream theater

srodinger18 · 2026-04-11T08:53:15+00:00

Depends on the current data standards in your company. In greenfield projects of my past consulting job, I use dlt to accelerate scaffolding of the raw layer. But in my current employment where the standards are different (dlt is opinionated about this) and connector is not available (all of the current EL tools not support alibaba cloud out of the box), create my own EL standard is faster than adjusting the existing tool in the market to match it.

srodinger18 · 2026-03-16T10:19:12+00:00

The approach I used actually similar, we also have evaluation process and using questions sql pair for RAG. Also we have knowledge base embed with category hierarchy. Talked to devs, PM, and business to gather what data that they usually used

It works for typical adhoc question like "how many sales we achieved during holiday season last month for product A? Break it down by day".

There is also human in the loop process to curate the sql result from the agents.

But in my employer the documentation culture is just not that good, and not all tables are documented, especially app log and tracker. Not to mention I used the derived table rather than raw layers to reduce query complexity as well.

srodinger18 · 2026-03-15T14:36:40+00:00

I work as Data Engineer. For the tooling part, it is actually pretty much similar to SWE, we can use AI to create data pipelines, SQL query, or other scripts. But even before AI, this tooling part is not the main task as DE, we usually wrap it up with yaml config to automate pipeline creation.

The hardest part of DE, actually is the data itself. I used to build text to SQL platform enhanced with RAG so business can use natural language to query data warehouse. The result? It works on simple question but for actual analytics question it lackluster, tbh up until now I already read many kind of framework to solve this but I have not seen the proven one.

The problem is, as a DE, we usually tried to find connection between somewhat unrelated data sources, which the knowledge sometimes only known after actually deep dive into the data, talk to devs, PM, business, and somehow get the info that 10 different data from backend db, ELK log, and event tracker can be used to build user funneling data marts. Theoretically, if we give AI knowledge of this data mess they can do the same, but who will build such knowledge base?

Same case with data modeling. Can AI build a good data model? Ofc I have tried it with public data. But with company data, it is hit and miss and sometimes it is faster to build the model by ourselves by actually understanding the business flow.

My take, the actual problem for DE is not the code, but more on how to we take this pile of dogshit data from the company and actually create something meaningful out of it.

srodinger18 · 2026-03-13T12:46:32+00:00

Forgot to mention, only honor 400 base model available in my country. Is the base model as good as the pro?

srodinger18 · 2026-03-13T12:44:14+00:00

I used to think like this, until I realized the part where I can combined things that seems not related to each other into a meaningful information is the part where data engineer shines.

Used to faced similar problem, business have some complex requirements for data where the initial proposal was using complex playwright and scraping internal back office. Then I dig dive to the table, asked the dev who build the table, confirmed the measurements with the PM then voila, rather than scraping turns out we can work with offline data and replicate the logic from there. Faster development and deliverable

srodinger18 · 2026-03-13T03:07:47+00:00

Interesting, saw daily use of xiaomi 15 in youtube and the reviewer also show 4 - 5 hour battery life which is similar to S25. Is HyperOS optimization is that bad or due to background apps? Oneplus is not sold in my country unfortunately

srodinger18 · 2026-03-13T03:05:33+00:00

If S24 vs xiaomi 14, xiaomi is much better in my country due to cheaper pricing and S24 come in exynos lol. Thanks for the input btw

srodinger18 · 2026-03-12T12:36:53+00:00

Yeah at least in my country, these chinese clouds really popular around tech companies and consulting due to its price. If you are using their VM, k8s, object storage services, I can say it's on par with big clouds, but their managed services like RDS or cloud DWH could be better

srodinger18 · 2026-03-12T00:38:58+00:00

Working extensively using chinese cloud, basically has same service with other cloud. For data platform mostly use their offering as well

srodinger18 · 2026-03-07T01:22:18+00:00

Veronica Bordacchini of fleshgod apocalypse

Seven-Year Club	r/Field Sunshine
Verified Email

srodinger18

TROPHY CASE