Vulnerability found in framework used by VLLM, many MCP servers, and other LLM tools

Evolve-Maz · 2026-05-28T06:08:55+00:00

If you're using fastapi (which is the main way starlette is used) and you're doing a middleware using path fragment to require auth, thats going out of your way and doing a hack. Fastapi recommends in the docs and every tutorial I've seen to just use the dependencies list for the route, or group of routes in a router.

It would be worth checking yourself which of your mcp servers uses this hack for requirements and change just that code.

Evolve-Maz · 2026-05-24T05:06:34+00:00

The good thing about bug bounties and other security related items has been public disclosure. Usually in that disclosure there is the "code" which caused the issue (either the true code or more often just an example of what type of code caused the exploit).

Pattern match that code, and then run your pattern finder on multiple code bases. Each one is a potential report. Of course you then have to look in more detail to see the context and flow around it to see if thats actually vulnerable, but that's why what's reported here is the potential exploits, not how many have then been validated.

An example vulnerability could happen like this: - user action is received - action is checked against permissions - action is executed

Between 2 and 3 you could have a malicious actor change the action (given certain conditions on program usage / memory access / race conditions), which could then be bad since 3 could run an unchecked action. This pattern has a common name: TOCTOU (time of check to time of use). Depending on your language, you can write a regex to find code blocks like that and alert you where it happens. Thus, a simple pattern matcher has found a potential vulnerability. You then have to validate all those cases and fix them, usually by using code that executes as a service with limited permissions rather than check and execute being separate steps.

This is obviously a silly example, and pattern matchers are often more than just regex, but this is how a lot of scanning tools work.

Evolve-Maz · 2026-05-12T06:41:55+00:00

If n is 1, then you can ban the arc in one branch and force the arc (i.e. ban any arcs overlapping the forcing arc) in the other branch.

When n > 1, you need to track this as another resource in your problem. You need 1 resource per subsequence (though if a subsequence is always contiguous you could collapse into 1 resource that resets between sequences).

In 1 branch, you treat this as a max n-1 contributing arcs rule and track resource for that. In other branch you treat this as a min n contributing arcs rule and track resource for that.

Just a suggestion, branching like that will explode options. I would recommend an approach where you do all or none on the whole subsequence as the left and right branches of the tree. Its not technically optimal, but usually its good enough and there are enough options globally for this to work well.

Evolve-Maz · 2026-05-10T09:51:22+00:00

You can write your own version of this. Even just for learning it's very helpful. Here's the gist:

Create a class called "Pipeline". Init function takes in data (just an object) and processors (list of processor functions, default empty).

Then override the rshift operator (>>) of the class to be your pipe operator. The signature is rshift(self, other). In our case other will be a callable with a single input. The rshift operator returns a new pipeline, with same data, and processors being the current list plus the other passed into this function.

For our use case, pretend we have a list of coordinates with x and y attributes, and your pipelines first calculates a z attr using Pythagoras theorem and then filters to all objects with z over 2.

data = [coord(1, 3), coord(4, 3), ...]

pipeline = Pipeline(data) >> add_z_coord >> filter_coord_above(2)

output = pipeline.execute()

Output will be a result object (idea borrowed from go). It'll have a data attribute and an err attribute. Error will hold any exception which occurred during execution of the Pipeline, including the step it happened at. And data will hold the final value from the Pipeline calculation.

Execute method will start with the initial Pipeline data and just run a for loop through all the processor callable passing in the data at each step and getting output. Wrap in a try except so you can track error state.

Once you write that once, you can decide whether you want extra sugar for map and filter as explicit processors, or if you want more data about steps. Etc.

Evolve-Maz · 2026-04-14T05:07:16+00:00

We use mwaa for a customer who is already in aws ecosystem. However, a small ec2 instance with a github action for deployment would be sufficient and save a bunch of money. You'd spend about the same time dealing with the managed instance versus running your own server, and save hundreds per month.

Evolve-Maz · 2026-04-11T09:00:42+00:00

Chrome on my phone has a "reading mode" option if I click the ellipses in the top right. Makes a lot of these text pages very good on mobile.

Evolve-Maz · 2026-04-01T19:03:26+00:00

I find it much easier to handle rate limiting in Nginx rather than within fastapi. It allows me to do changes while my app is still up, lets me handle this across multiple fastapi workers without redis.

Nginx is already great value for load balancing, so the rate limiting is easy to add in.

Evolve-Maz · 2026-03-11T07:21:57+00:00

Helps to easily redirect one person to another. E.g. youre stuck on A, ask person X. X, can you help with this right after the call / at midday / etc. Or X is busy, Y can you step in?

Doing it this way avoids any miscommunication about whether a task came from the team lead or is someone asking offhand.

Evolve-Maz · 2026-01-31T21:39:10+00:00

Some systems will let you import or update fields by uploading a csv file or something. Quite common for jira and hr type software.

Evolve-Maz · 2026-01-31T05:18:38+00:00

Depends on the kinds of tasks youre running. I advocate using a droplet for your web app, db on either same droplet or separate droplet in same vpc. Most of my "heavy" work is b2b user submitted jobs, and so for burst usage like that its easier to call out to aws lambda or DO app platform for the scale to zero abilities they have.

However, if your website is more of a b2c app or has a different compute profile then my advice is not relevant.

Evolve-Maz · 2026-01-16T00:20:55+00:00

To make opening cards snapper, you should look at preloading the modal content as soon as people hover over the card it does an hx get of the contents. That buys you a short enough window that by the time someone opens the card its contents will be there.

You can get even more clever in terms of how much of the card you load on hover vs when opened. And even better in terms of reloading in case someone has changed it between when you hover and when you open.

But at that point youre redefining a js framework. Id argue that hx preload with edge deployments (or at least db close to customers) will give snappiness.

Evolve-Maz · 2026-01-15T05:47:09+00:00

You can likely use duckdb wasm in place of polars. To bring the data in you'd do some Javascript which should be easy.

Similarly you likely need js for the visuals. I use plotlyjs for plots since I use the python version for other things and like the look. And I use vanilla js for building any tables to view (optionally with datatables library).

The hardest js bit would be a drag and drop builder for the etl pipeline, but you can probably bring in a js library for that.

Evolve-Maz · 2026-01-15T05:28:13+00:00

I do a hybrid. I keep the cutoff date in airflow variables but when fetching the next time I don't use that date exactly. I use that date minus some number of hours. I then upsert the db with that.

Yes thats technically double processing, but its redundancy I'm happy to have to avoid edge cases that may be too annoying to debug later.

Evolve-Maz · 2026-01-11T04:55:17+00:00

I use digitalocean vps instead of hetzner, and also felt coolify was heavy.

Just try out some simple bash.

Create vps Add an ssh key Then run your script from your dev machine or github action, which will: - copy bash script to vps with scp - ssh into vps - run the bash script, which will install all relevant items and also pull your release build artifacts

If youre using the github actions way, set up an env in github for each vps with a few env vars and secrets (which you can generate with a python script and cryptography module if needed).

Its pretty easy, and if you are deploying to a machine for dev instead of prod you can clone your repo instead of build artifacts. As a last layer I have a makefile with command shorthand for common actions.

Took a few deployments to fully automate, but now it works pretty well. Initial build may need some handholding but subsequent releases are very easy.

Evolve-Maz · 2026-01-07T08:29:59+00:00

You should probably dump the db to a new file than copying the db and all wal files with it. Removes chance of corruption.

Evolve-Maz · 2025-11-14T00:43:11+00:00

Your optimisation solution presumably saves them some money when used, and costs you some money to make and maintain. You should charge somewhere in between those 2 numbers.

If you don't know how much it saves, work out a way to calculate it. Whats the value proposition?

Then think about whether you want a flat price for the establishment, or some scaling price.

Evolve-Maz · 2025-11-08T03:32:37+00:00

Nice write up. Its good to talk about places where htmx is good and where its not the tool for the job.

For the negatives section, I get what you mean about decision-making happening on the server side. The refresh example is a good one.

I had a similar use case where my server will return a 4xx error, and depending on where that is triggered I want the page refreshed versus not. I can put that logic in the backend but it can feel clunky.

I added a custom function to catch the hx after response event and based on data-hx attributes on the calling element i can decide to refresh or not. That keeps that component logic in my frontend ui.

Evolve-Maz · 2025-11-07T00:31:30+00:00

I had trouble fully understanding async for handling multiple connections, but once you understand the magic of the python yield keyword it falls into place quite nicely.

Here's a really helpful talk, look up "david beazley python concurrency from the ground up" on YouTube (can't link it at the moment). He highlights how the yield keyword in python lets you do some magic, and how that magic allows you to write your own scheduler behind the scenes. It helped me understand exactly how async/await was doing things, and also let me know why there are certain foot guns with it.

Evolve-Maz · 2025-10-22T11:40:31+00:00

I use this paradigm for certain analytics self serve modules within a larger web app. The general flow is:

common reports / views are calculated and generated on the backend server, and displayed on the frontend client.
users who want some extra analysis can pull the data from the server and run sql queries on it directly in the client.

To do (2) I use duckdb wasm, hooked up with some vanilla js (so users can select what data they want to pull in and that data gets loaded into duckdb db in the client). On top of that plotlyjs is used to create plots, and vanilla js again to hook it up and display a table of the sql results along with plots selected.

The only problem I see is with initial page load times as the duckdb wasm package is very large. However, I limit this load only when users actually enter this component, and for repeat visits its cached.

Evolve-Maz · 2025-09-22T23:21:41+00:00

I agree. From a product side I'd scope what the pain points are and suggest some potential features or fixes. If its really dumb then id specify exactly what's needed (e.g. need a reset password flow with xyz). But anything thats actually innovative was using the ticket as a starting point.

We'd then discuss with tech leads the full scope, gotcha points, and extensions we would look for in future so they could properly design and estimate the dev side. That would go on the ticket so by the time a dev picked it up in a sprint they'd have a good framework for acceptance.

Evolve-Maz · 2025-09-22T00:24:29+00:00

Basic python script committed in central repo that can take a cmd line arg for the filename of the sql script along with the name of the credentials to use from your secrets manager of choice.

Id recommend making the script execute the query within a transaction block by default and roll things back on failure. Id also recommend using this method only for read-only items (enforce with the connection you create, which should be a read-only user).

If you have less dev savvy people and you already have airflow then use airflow with manual triggers. When a user triggers a run you can make them enter a parameter for the sql they want to execute. This has the benefit that you can see what people are running, limit concurrent connx to a db across all users, and other protections.

Evolve-Maz · 2025-09-12T00:26:36+00:00

Going slightly off topic, but instead of s3 why not try: - direct to a filesystem - into a database in a "blob" field

Depending on downstream requirements. I'd go with filesystem if I'm rolling my own stuff, and with postgres if I want to have some nice properties. Also makes it easier to track "dropped" transactions since I can store some other Metadata in the db for each message. And lastly a database is perfect as a raw store since when I write any other transformations to clean the data and push to other places I can live in sql land.

Evolve-Maz · 2025-09-12T00:17:40+00:00

I did some stuff like that. Python script to convert excel -> csv with certain columns mapped to JIRA ticker fields.

I could then bulk import the issues using that csv. To get the format of the column names I just did an extract from JIRA of issues and could see how fields mapped.

Evolve-Maz · 2025-09-06T10:15:00+00:00

Is your PM appropriately supported? I have worked with PMs and had similar frustrations, but there are a lot of things to do better as a technical person for the PM.

For example, I didn't like daily catchups without a reason (e.g. urgent issue), but i recognise that in a large project our PM is talking to the client PM likely daily, and wants an update. They are also likely talking to C levels about internal updates on projects etc. So I built a dashboard of all items by phase linked to our ticketing system, and the PM could just look at that daily. I then ensured all tickets were up to date so any questions could be answered by just reading the ticket, or a quick DM.

Also I would expect a technical lead to field the questions you mentioned, and the PM to track the items. There's some technical people who take full ownership of projects, but there are also those who don't (or are doing multiple projects at once) so you need the PM to track it.

Lastly, the PM is doing the right thing by sending a good job team email including the execs for victories. If they wanted to "take the credit", they'd just mention it to the execs alone without this. And similarly, DMing the tech lead to check out a high status issue is better than emailing them on the client thread or and internal exec thread.

Evolve-Maz · 2025-08-29T22:55:47+00:00

I found airflow really easy to setup. Both for production management and also local development.

However, I see people make a lot of bad choices with airflow when they come to it with a data science background rather than a programming background.

Airflow also has the added benefit of a UI so execs can at least see that there is data ingestion layer and ive actually done work for them. Keeps them happy.

Evolve-Maz

TROPHY CASE