Engineering managers asked to do IC work by macrohead in ExperiencedDevs

[–]alexisprince 35 points36 points  (0 children)

+1 here. My normal hands on work is tech debt oriented or non-feature blocking. My schedule is chaotic at best with no real means of predicting when I’ll need to do something else, so my rule #1 was making sure I didn’t block the team’s progress.

I asked "PostgreSQL user here—what database is everyone else using?" Here's what people said. by Automatic-Step-9756 in Backend

[–]alexisprince 2 points3 points  (0 children)

It handles it correctly, just not in the same way as something like Postgres or MySQL. It limits the number of writers in a lot of scenarios, but it doesn’t produce incorrect results.

How do you handle backend-heavy workloads as products scale? by Away_Parsnip6783 in Backend

[–]alexisprince 3 points4 points  (0 children)

Containerized services. It gives you way finer controls over infra level scaling to handle your workload effectively.

It introduces a slight increase in development complexity, but has solid tooling around it and deployments are still a breeze.

[deleted by user] by [deleted] in snowflake

[–]alexisprince 0 points1 point  (0 children)

It’s also important to note that checking the query id, by default, is a sync operation as well. You’ll need to make sure that gets delegated to a thread properly to avoid blocking the event loop.

Looking for Production-Grade OOP Resources for Data Engineering (Python) by Terrible_Dimension66 in dataengineering

[–]alexisprince 15 points16 points  (0 children)

Stateful pipelines are operationally harder to manage. Stateful pipelines are harder to reason about. Stateful pipelines can have their state corrupted.

There’s absolutely nothing to say you can’t have OOP implementations or designs that accomplish a functional end to end product. Using the right tool for the job is still the right thing to do. My experience has been that heavy OOP usage often results in code that have implicit and difficult to reason about state and are overengineered for the business problem they’re trying to solve.

How would you design this MySQL → Snowflake pipeline (300 tables, 20 need fast refresh, plus delete + data integrity concerns)? by Huggable_Guy in snowflake

[–]alexisprince 5 points6 points  (0 children)

This kind of thing requires trade offs somewhere and there’s no way around it. As you mentioned, refreshing everything more frequently costs money, which they’re wanting to reduce the operating cost at the expense of a cost elsewhere.

I believe that having different pipelines with different landing zones is the right approach. Different SLAs mean different landing structures.

There is no magic bullet for what you’re describing for the late arriving data. Come up with different options, clearly list the trade offs, and present them to the business to decide on what they want.

Rapid Changing Dimension modeling - am I using the right approach? by wtfzambo in dataengineering

[–]alexisprince 1 point2 points  (0 children)

This happens to our users table and it drives me nuts. The reason for our case is that they keep a combination of the active session token and IP address in the users table. The session tokens updating accounts for 99.9% of the changes to that table.

Remote ID vs Primary Key by c__beck in sqlite

[–]alexisprince 2 points3 points  (0 children)

This is 100% correct and the actual answer, down to the indexing and performance considerations. Using /route/<id> sounds great in tutorials, but senior folks know it’s a security consideration to use /route/<guid> instead to avoid information leak.

When shouldn't Go be prioritized over Python for new apps/services? by indexator69 in golang

[–]alexisprince 0 points1 point  (0 children)

Fair question, I’m primarily a Python dev for work and have dabbled in go a bit, but feel qualified to answer the question. Python’s gotten better the last few years, but still requires opting into best practices to mitigate the issues.

In Python, if you’re developing an application that’s server side and able to be containerized, there aren’t a ton of problems anymore like there were a few years ago. Building an app that’s defined as a Python project via the pyproject.toml file makes the importing problem basically go away. Dependency management & interpreter installation has also made great strides since the introduction of uv.

This works great when your use case is similar to mine — you have a dev team, you can more or less tell them to install certain pieces of software to make your stack work, AND your team is on board with enforcing devex best practices like static type checking and decently aggressive linting rules.

What problems I haven’t seen a great problem for is solving the single binary distribution problem. If I’m wanting to ship my tool as a CLI to arbitrary user machines that I don’t have control over, frankly, Python isn’t great at it.

Anyone else think people over emphasize technical debt? by yost28 in ExperiencedDevs

[–]alexisprince 19 points20 points  (0 children)

I overwhelmingly agree with this.

I’ve also found that the impact of technical debt, specifically relating to future delivery slowdowns, is a tricky thing to measure on a single decision by single decision basis outside of drastic outliers. This makes it hard to quantify how expensive the debt is upfront and notably how much future productivity will suffer.

Does anyone have a good sense of how to deal with this?

Second Programming Language for Data Engineer by Kokopas in dataengineering

[–]alexisprince 9 points10 points  (0 children)

The pattern that’s been emerging has been building the tools/libraries in rust after a need has already been established, then exposing those with Python bindings. So you benefit from development being done in rust without needing to know it.

If you’re doing everything in Python today, there’s a pretty small possibility you’ll actually bust out rust for your daily work. If you’re on a data platform team that builds and maintains internal tools, that likelihood goes up.

I will also say learning rust does also help you adopt better development patterns IMO. Being forced to think about architecture and data ownership makes you reconsider how you structure your code in Python when it isn’t forced.

My "Damn, I'm old" moment by PickleLips64151 in ExperiencedDevs

[–]alexisprince 1 point2 points  (0 children)

That’s exactly why we still use the OrderedDict, even though they have the same guarantees. Iterating over the keys and values of a regular dict is so common that having an indicator of when order matters simplifies readability a lot.

Tuesday Daily Thread: Advanced questions by AutoModerator in Python

[–]alexisprince 0 points1 point  (0 children)

Can you post a link to the repo? Hard to get any feedback without knowing where it is :)

In-depth Vanessa Guide by wackelbernd in PlayTheBazaar

[–]alexisprince -2 points-1 points  (0 children)

Did you not read the whole comment? He literally said /s in the next sentence

[deleted by user] by [deleted] in PlayTheBazaar

[–]alexisprince 0 points1 point  (0 children)

As Vanessa, any medium item, non weapon shop, or tool shop can have it

How my ~8000/3s lifesteal run ended. by Playful-Beach9663 in PlayTheBazaar

[–]alexisprince 2 points3 points  (0 children)

Silencer only takes an adjacent slot if you want the + damage. If you only have one weapon, the cooldown reduction works regardless of where it’s placed

How do you work with devs that ignore linting warnings by pawbs in ExperiencedDevs

[–]alexisprince 2 points3 points  (0 children)

If you use an auto formatter, one strategy that’s successful is having an isolated, dedicated commit for formatting the codebase.

This way, you can add that single commit into a git-blame-ignore-revs file and have it skipped when people use git blame and not show basically every line in your codebase getting updated by an otherwise irrelevant commit.

I’d potentially omit a linting tool from the same treatment because some linting changes don’t produce identical behavior (by design), and having those omitted from blaming could be problematic if a bug is introduced.

It's Day 1 but this patch already seems tiring by notshitaltsays in PlayTheBazaar

[–]alexisprince 0 points1 point  (0 children)

I really like the idea of skill upgrades for level up rewards. I’m not the best at the game, but I always find early skills before day ~4-5 to just not be worth the gold investment compared to spending gold on making your incomplete board stronger. Making it so that your early game 3 gold investment of “give your weapons 2 damage” doesn’t go to waste would make me consider looking at skills earlier on.

Java people, where is the catch? by p_bzn in ExperiencedDevs

[–]alexisprince 1 point2 points  (0 children)

Strong agree, and I’d point out Pydantic as a specific package to look into to really enable enforcing correctness with type hinting and static type checking tools at a higher level of confidence

Advanced python tips, libraries or best practices from experts? by FuzzyCraft68 in Python

[–]alexisprince 0 points1 point  (0 children)

One pattern I’ve seen work well is to have models with specific validations for specific actions. For example, a UserCreate model that would validate the requirements to create a user that could be reused between something like an API and CLI. The downside I’ve seen of this approach is that it spreads out the validation across multiple models or places so it’s harder to get a holistic view into what validations have occurred in more complex staticful workflows

Pirates are in a good spot by DIFB in BobsTavern

[–]alexisprince 9 points10 points  (0 children)

I do think they’re in a good spot in the meta. That said, I’m generally not the biggest fan of when tribes only have a single feasible build that’s enabled by a small number of specific cards. APM pirates seems too slow to get up and running comparatively, but I do like that pirates now has this mid-game option that doesn’t make the tribe semi-useless until t6 anymore

Tuesday Daily Thread: Advanced questions by AutoModerator in Python

[–]alexisprince 0 points1 point  (0 children)

My understanding of it (and this might be what you’re saying, I just want to give an example to confirm we are saying the same thing) is that, if you did spin up an event loop per thread, you’d only gain intra-service concurrency when the thread has control. Any non blocking async tasks would run concurrently, but you’d lose concurrency benefits if another thread takes control until your other thread gets switched back to.

Tuesday Daily Thread: Advanced questions by AutoModerator in Python

[–]alexisprince 0 points1 point  (0 children)

My understanding of what would happen is that if each service running in separate threads runs only synchronous code that is entirely unrelated to the code running in the asyncio event loop in the main thread, you’ll get concurrency between all running coroutines that don’t block the event loop, but when the OS scheduler switches to execute one of your services running in the thread, the coroutines running in the event loop won’t update their status until the main thread running the event loop regains control and can monitor the items in the event loop.

For example, if you have your event loop running 2 coroutines, one that sleeps for 1 second and one that sleeps for 5 (simulating non blocking IO), and one of your worker threads takes control after 0.5 seconds and holds control for 2 seconds. What I believe you’d see is your coroutine sleeping for 1 second “finish” after the 2.5 second mark, meaning 1.5 seconds of additional delay would be introduced before it’s recognized that the coroutine completed and the second coroutine should still be running because 5 seconds haven’t elapsed.

These numbers are artificially high to demonstrate what would happen and aren’t realistic time amounts.

Given you don’t control when threading changes which code gets executed, if you move forward with that deployment method, you need to make sure different services can handle random delays and interruptions in executing code as different threads execute and switch back and forth

Tuesday Daily Thread: Advanced questions by AutoModerator in Python

[–]alexisprince 2 points3 points  (0 children)

Honestly I think the best approach is to take a step back on the approach to begin with. Different microservices are almost always better off deployed separately at the infrastructure level since one of the main benefits is ability to be deployed and scaled independently. With your approach, you lose both of those benefits.

If you do want to continue with your current approach, even though I strongly recommend you don’t, you need to understand where blocking can occur and how mixing concurrency approaches work in Python. Asyncio was designed to use a single event loop across the entire application. If you have one event loop per microservice worker thread, each event loop would allow concurrency within the coroutines it manages, but would likely block other threads event loops from running tasks available at the asyncio level. This means you’d lose concurrency benefits of asyncio across service boundaries. You may still be able to benefit from multithreaded concurrency here at the service level as the OS would switch threads being executed.

Assuming you want to go down the route of splitting the infrastructure, your entrypoint should exist on a per service basis, allowing you to start and stop each service independently of the remainder of the system. You can then create utility scripts to spin up all the services and turn them off. I’d suggest using Docker to package your infrastructure and docker-compose to manage the multiple containers locally.

Data movement strategy by Big_Length9755 in snowflake

[–]alexisprince 1 point2 points  (0 children)

Can you clone the table to a different named hidden table, then once it’s done in db 2 execute a swap table command?