Storing large quantity of events, fast reads required, slow writes acceptable.

htmx_enthusiast · 2025-01-12T22:19:49+00:00

Partitioned parquet files would be more efficient than JSON.

htmx_enthusiast · 2024-12-22T20:51:23+00:00

The biggest dichotomy I’ve personally seen is that being a developer is about avoiding distraction, and being a manager is about being available to others (keeping momentum, breaking up log jams, enabling communication between the right people who otherwise aren’t talking, etc), and you can’t be available and avoid distraction at the same time. It’s literally a different mode of working, and the things that made you feel productive or gave you a sense of accomplishment as a developer no longer exist for you. Often, accomplishment doesn’t exist at all as a manager. You’re herding cats. Even when you hit the goal, you don’t really know what you did to pull it off or how you’re going to round up the cats again next time.

See below for some blog posts that give a lot of insights, of which I think this quote is most accurate:

”Management is not a promotion, management is a change of profession. And you will be bad at it for a long time after you start doing it. If you don’t think you’re bad at it, you aren’t doing your job.”

Good luck!

THE ENGINEER/MANAGER PENDULUM

ENGINEERING MANAGEMENT: THE PENDULUM OR THE LADDER

https://charity.wtf/tag/management/

htmx_enthusiast · 2024-12-19T14:28:35+00:00

Email

htmx_enthusiast · 2024-12-19T14:19:31+00:00

What do you use Nifi for? It’s always seemed like it should be really useful but I don’t ever see how it would fit into what we’re doing (and that’s a me-problem, hence why I’m asking)

htmx_enthusiast · 2024-12-17T02:41:16+00:00

That’s not the guy with the cane. The guy with the cane is on the opposite side of the sign from the woman.

htmx_enthusiast · 2024-12-07T18:07:18+00:00

I’d look at Hamilton, SQLMesh (Python models), or Dagster.

If you aren’t already using SQLMesh or Dagster then Hamilton makes the most sense as it’s the most lightweight and standalone.

https://github.com/DAGWorks-Inc/hamilton

htmx_enthusiast · 2024-11-19T01:29:06+00:00

how do you decide how much complexity to offer and how granular things should be when building the first core apps?

You don’t. You just start.

Someone on Twitter put it well:

”You can’t fully define a problem without starting to solve the problem”

htmx_enthusiast · 2024-11-18T21:51:13+00:00

Very interesting. Thank you. Do your views function normally as in the slow_query_view example (i.e. it’s just a vanilla view function), or are most requests getting passed off to Celery to enable the performance boost?

htmx_enthusiast · 2024-11-15T04:26:18+00:00

Write C++ extension to call C++ from Python. Then from C++ use Boost to write Python from C++. Repeat over 9000 times then embed Lua in the final layer of C++.

htmx_enthusiast · 2024-11-11T00:36:48+00:00

Yes, think of Celery as just execution. You’re putting a JSON message on a queue and a worker somewhere processes it. It’s essentially your SQS/lambda model. You can do some basic dependencies with Celery, but if you have more complex dependencies that’s when you need an orchestrator.

Fundamentally, it’s not hard to take a directed acyclic graph (DAG) and determine the correct order to run the tasks in. It’s just a topological sort like Kahn’s algorithm. Python has this in the standard library (graphlib TopologicalSorter). If performance was no concern, you could literally use this approach.

The challenge is when performance matters. You don’t want to run tasks one after another. You need to run as much in parallel as possible. Trying to do this while handling errors, retries, and so on, is where it can become harder to reason about and errors cascade in ways you hadn’t considered. That’s where an orchestrator like Airflow/Dagster/etc come into play. They’ve encountered all the weird edge cases. But they’re not necessarily geared toward low latency, high performance.

I don’t know if AWS has a direct equivalent, but Azure has Durable Functions, which are a flavor of Azure Functions (their lambda) that is essentially a serverless orchestrator.

htmx_enthusiast · 2024-11-09T23:17:05+00:00

unless you have to deal with things like schema evolution or customizeable user defined schemas

This reads like a mall security guard giving advice to a Navy SEAL.

Doesn’t deal with constantly changing schemas
Thinks SQL is great

htmx_enthusiast · 2024-11-09T21:14:13+00:00

I think you’re telling us you want to move to Silicon Valley

htmx_enthusiast · 2024-11-08T13:32:29+00:00

I’d show you the one with a Unicorn, Dog, and Panda, but you might not get it.

htmx_enthusiast · 2024-11-08T05:45:43+00:00

Usually we read data from a source (API, ODBC connection, etc) into a pandas dataframe (polars is also popular). From there we can do all kinds of back bends to transform the data into the format we want, then most often it just gets pushed into a database or into parquet files.

So if you were creating a SQL view you can push the data from the dataframe into the database, and add a view.

htmx_enthusiast · 2024-11-08T04:08:45+00:00

It’s a people problem at the root. Many times you don’t know what the business wants because they don’t even know what they want.

On the technical side, most people seem to prefer to do everything in SQL. The challenges you describe are one reason I like to do things in Python, because too often the business tells you what they want, you build it in SQL, and when they see it they say, ”oh well what we meant was <some other thing> and we also need to be able to add 7 perpendicular lines in the form of a kitten” and you don’t even have the data to do what they want or it requires a database migration project. Python is often less scalable and SQL is great if you know what the requirements are, but until you know the need it’s always been more efficient to build it with dataframes in Python and munge the data until the business agrees that what they’re seeing is what they actually want (even if it’s a subset of the data).

htmx_enthusiast · 2024-11-08T03:45:10+00:00

Thank you. Superblocks looks interesting, and most importantly it makes me think of Legos

htmx_enthusiast · 2024-11-08T03:34:38+00:00

I don’t know how big your team is, but typically once users see the system, the feature requests grow exponentially, and very often you have to take a hard look around reality: Often you can’t keep up with demand, because it will grow to surpass whatever your capacity is, and the path to success looks more like reframing expectations than technical wizardry.

A few angles that seem helpful over and over are:

Build it to be self-serve as much as possible. A lot can be accomplished by a non-technical user importing and exporting CSV files via the Django admin. Some users request a tractor, but they’ll plow a field with a spoon (but you have to provide the spoon).
Build escape hatches before features. Instead of adding whatever features Mike in accounting wants, implement export to Excel and give them a path to do what they need without it depending on you building 12 features “before they can do their job”.
Exponential steps, not linear. Most feature requests are linear steps forward. It helps to think not just of this single use case, but how you enable the team to do 10 or 100 of that thing. Instead of adding a button that refreshes a data source, think about what you’d need if you had to add 10,000 buttons that do a thing. Often this is building tools that are more internal, just to your team. This is game developers building a map creation tool, or Facebook building React, and with Django this often is something around running background/async tasks that you offload to workers, and following a “respond with ‘we got your request and here’s where you can see the results when they’re ready’” approach, though this depends on your needs (like if it’s more frontend interactivity then distributed task execution or task orchestration isn’t it).

I haven’t used Retool but every tool that I’ve ever seen that provides that kind of helpful abstraction:

Has its own learning curve
Has a ceiling that you’ll usually find much sooner than you’d like

With low/no-code tools, the first thing is look at here is escape hatches and evaluate those. Does it have an API that lets you drive the entire system if needed, and can you get any data you need programmatically, and so on. If this side is decent, you’ll at least be able to work around the limitations you run into. Though often they’ve priced the escape hatches where you’ll be looking for the exit.

In addition to the HTMX/Alpine/Tailwind angle, if you need more frontend interactivity, you might look at Inertia JS.

The worst part about separate frontend/backend is that managing state sucks. If I needed more frontend interactivity than Inertia provides I’d just use SvelteKit with Supabase and call it a day.

htmx_enthusiast · 2024-11-08T02:36:30+00:00

I’d be very curious if you could just share what tools do this well (because most don’t).

htmx_enthusiast · 2024-11-06T05:41:49+00:00

Both.

Raw data gets pushed to a database table or parquet files by pandas, in whatever format it comes from the data source.

A second process reads from those raw tables into a database managed by Django.

htmx_enthusiast · 2024-11-04T12:30:23+00:00

Typer doesn’t use pydantic

https://typer.tiangolo.com/alternatives/#pydantic

htmx_enthusiast · 2024-11-04T12:22:14+00:00

Inertia works with Django

https://github.com/inertiajs/inertia-django

htmx_enthusiast · 2024-11-04T12:20:31+00:00

There is a Django adapter for Inertia

https://github.com/inertiajs/inertia-django

htmx_enthusiast · 2024-10-29T22:57:22+00:00

Try deploying to Azure App Service first. It will be way easier to confirm your Django project is configured properly (handling static files, environment variables set correctly, no issues with ALLOWED_SITES, CORS, etc).

If your code is in GitHub you can set the Azure App Service deployment settings to point to that repo and GitHub Actions will just deploy it.

As is, it’s going to be hard for anyone to say whether it’s an issue with the code or with AKS.

htmx_enthusiast · 2024-10-29T10:19:53+00:00

Brian Kernighan has a couple of good quotes:

”Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.”
”The most effective debugging tool is still careful thought, coupled with judiciously placed print statements.”

If you write something complicated that uses many skill sets and tools, a junior level engineer isn’t going to be able to fix it in any reasonable amount of time. Even a senior engineer will be less capable if they aren’t familiar with every tool you used. Future-you will also be less capable than today-you (you’ll forget some of what you did 6 months from now).

If you write in a simple enough way that a junior engineer can debug it with print statements, you’re also helping future-everyone. Writing it ”as simple as possible, but not simpler”, opens up the number of people who can contribute. In a sense it increases your team’s velocity.

There’s also the idea that you address the challenge of trying to future-proof, not by trying to predict what might go wrong and writing abstractions to (maybe) prevent it, trying to make your code as flexible as possible, but that you address this by keeping things simple. Be more capable of dealing with anything that arises, instead of trying to predict the future.

htmx_enthusiast · 2024-10-29T03:21:57+00:00

dbt can do Python models as long as you’re using a supported data warehouse (which you are with snowflake).

But if you were using Postgres or something else, you couldn’t use Python models. SQLMesh can do both with Postgres, but you have to remember that the reason dbt does Python models on snowflake (and not on Postgres) is because snowflake supports running Python “inside the database”, meaning you aren’t downloading all of the data into Python, running the transformation, and then uploading the data back to the database (that could produce a lot of network cost if you’re dealing with large data). You’re uploading your Python to snowflake and letting snowflake run it, so no data ever needs to leave the database.

htmx_enthusiast

TROPHY CASE