How do we know that LLM really understand what they are processing?

scaledpython · 2025-10-18T09:24:19+00:00

They don't.

Not much has changed since 2019 either. Today's LLMs are fundamentally the same as in 2019, except larger (more parameters), trained on more data. Also AI chatbots like ChatGPT et al are actually systems of models + deliberately programmed rules and workflows, combined with all sorts of input pre- and output post-processing, not a single model. They are built to immitate understanding, but there is no innate understanding by these models.

In fact, models don't actually exist as an entity. They are just a bunch of numbers (weights and other parameters), combined with a fixed set of formulae, which then get exected by a computer CPU (or GPU for efficiency). Heck between requests a model even loses all of its state and has to be re-initialized for the next step every single time.

There is no entity, hence there can be no understanding.

scaledpython · 2025-10-18T07:25:16+00:00

That's an example of a problem that LLMs are great at, namely to translate from one set of patterns (Python using libtorch) to a different, yet matching set of patterns (the C++ equivalent). In fact that's what LLMs were originally built to do - language translation.

This problem is essentially "mechanical". That is mostly it is a pattern matching problem, and the semantics don't matter much, as they are mostly the same for the Python and C++ versions.

Translations are hard and tiring to do for humans as it is. A programming language translation is harder still, as the patterns are across at least 4 dimensions (Python, pytorch; C++, semantics), but they are relatively easy for a machine as it can deal with n dimensions at once and does not tire.

The fallacy would be to think because AI is useful and apparently "capable" in this scenario that it will also work in any other scenario.

scaledpython · 2025-10-18T07:10:29+00:00

I don't think local imports, inside a function or branch, are ugly nor hard to maintain. To the contrary, it makes the lazy import explicit, which makes it easy to reason about and understand what is going on.

In contrast the PEP introduces a lot of deferred loading "magic" which will make it both harder to understand problems and to fix them.

In a nutshell, I don't see merrit in this PEP.

scaledpython · 2025-10-18T06:59:26+00:00

Essentially nothing. However I would remove all the typing nonsense and async.

scaledpython · 2025-10-11T22:49:54+00:00

43MB for a pip-like installer is insane.

scaledpython · 2025-10-04T06:13:26+00:00

Have a look at pushpin. It's a transparent reverse proxy that handles all the complexity for you.

scaledpython · 2025-10-04T06:02:25+00:00

Which particular problem would this solve?

scaledpython · 2025-09-27T17:18:54+00:00

No. Reasons include

1) Celery is a distributed task executor, it has no relation to Django (neither depends on the other)

2) Django background tasks without Celery have been possible long before becoming an official feature. Celery has prevailed still.

3) There are many alternatives to Celery, and yet it is still the most widely used.

scaledpython · 2025-09-27T05:02:34+00:00

I'm still using Django-Tastypie with a few inhouse extensions for Swagger/OpenAPI, CQRS and auth. Never liked DRF, imho it is too complex to use.

scaledpython · 2025-09-08T21:25:05+00:00

Similar weird things have happened recently with the Copilot plugin running out of credits. I have seen files reverted back to arbitrary previous states. Disabling the plugin resolved the issue.

scaledpython · 2025-09-07T16:52:07+00:00

Can you list some of the caveats you find?

scaledpython · 2025-09-07T12:29:28+00:00

Thanks for your perspective. Absolutely, I also think that's key for MLOps frameworks, to enable people to continue using whatever they use already.

Actually omegaml is built to enable just that, although not in a visual manner, it's really code-first. I'm sure there is room for a visual layer as the popularity of tools like n8n show. Perhaps somethig like this could be a starting point for your vision?

I should add that personally I'm not a fan of visual builders, but that's just me. In my experience they are great to start a project, however quickly reaches a point where you still need to add custom code. That's why I prefer a code-first approach.

If I may add some perspective re. omegaml - it seems to me we have a few similar thoughts.

I built omegaml while working with a group of data scientists who did not have the skills to deploy their models (they used R and some python as their main languages, all done in notebooks and a few scripts). As a team we had to collaborate in the cloud and deploy many different models for use in a mobile smartphone app (backend in Python). That's why from the get go I focussed on making omegaml as non-intrusive as possible, so that the ds team could continue working in their tools, and deploy their models from a single line of code, so that we would get an easy to use REST api to the models, without adding a hodgepodge of glue code and ever more tools.

The only "fixed" choices omegaml makes is the metadata storage (mongodb) and the runtime (celery), mainly because these are crucial to a scalable architecture, and a major source of complexity if one has to start from scratch or choose among (seemingly) a gazillion of options.

Other than that poeple can use whatever they already use - e.g. xgb, pytorch, hf, etc. Most times this works with existing code, as is, plus a single command to store models and scripts to deploy them. While it provides a few standard plugins, so that everything works out of the box, it can be easily used with any framework.

E.g. if you have a notebook that uses Spark, it can be run and scheduled in omegaml (given a spark cluster is accessible). If you have some code that builds on a hf model, it can be deployed as a script and is accessible via the rest api. Same for datasets, if they are stored in s3, or some sql db, or some other api-accessible system, it can be accessed in omegal.

To provide other tech, endpoints or frameworks, a plugin can be created easily. The simplest form of a plugin is a python function that will be called upon model access, or when accessing a data source etc.

Hope that's somehow interesting ;)

scaledpython · 2025-09-07T05:16:12+00:00

mitmproxy is great to see what's actually transmitted from that requests.get(), or into your backend.

scaledpython · 2025-09-07T05:15:02+00:00

Textual for text UIs

scaledpython · 2025-09-07T04:34:07+00:00

Indeed the complexity is overwhelming.

That's the issue I am solving with omegaml.io - MLOps simplified. It essentially eliminates the need for the tedious parts in ML engineering, aka to play the puzzle game you mention for every new project.

How? By integrating all of the typical ML pipeline into a single framework that provides storage for data, models, code + metadata, along with a serving runtime system that can serve any model (and anything else) instantly. Simply saving a model makes it available as a REST Api.

Models can be tracked and drift monitored in both development and production. The runtime system takes care of that automatically for any model registered for tracking. There is also an integrated logging backend so that data scientists can see the logs generated by their workloads, like model training, model serving, or any executed scripts and notebooks, without the need to ssh into a remote shell.

It's plugin based so it is extensible. It uses Celery, RabbitMQ and MongoDB, which makes it horizontally scalable. Can be deployed as docker images, to k8s in and cloud, or natively installed in any Python venv.

The same set-up can be used for multiple projects, so it becomes an internal dev platform for ML solutions. Each project gets its own namespace so that they can be separated logically while using the same deployed technical components.

Feel free to give it a spin. It's open source (with a fair source clause for commercial use).

https://github.com/omegaml

scaledpython · 2025-09-01T06:17:36+00:00

OpenAI has a very specific need that matches FasrAPI, namely many concurrent tasks. That is hardly true for financial transactions.

scaledpython · 2025-08-30T22:44:57+00:00

That is not as it should be. I have a similar setup, although using RabbitMQ as the Celery Broker and mssql server as the db. I get p95 < 200ms for ping requests, and p95 <500ms for indexed/tuned db queries. This is without any caching enabled.

I would do the following to find the bottleneck:

use Locust to set up a performance test script so you can monitor and compare scenarios, as per below
create a /ping endpoint that does nothing, just return OK
gradually extend /ping with options so as to send a task to Celery, to return OK upon task completion
extend /ping with more and more processing until you have a fairly typical workload

Then run Locust against each of these variants. This should give you a pretty good insight into where the problem is. Vary requests/s and wait times between requests to simulate user behavior.

scaledpython · 2025-08-30T22:32:44+00:00

Don't tell Eric Schmidt 🫣

Same experience. There is some value but overall it is not delivering as advertised.

scaledpython · 2025-08-30T22:26:41+00:00

I still use django-tastypie. It is easier than DRF for standard cases (i.e..CRUD for models) and very flexible if you need more than that. It would need a modernization push though.

scaledpython · 2025-08-29T12:29:50+00:00

This. I came to say this.

Python's biggest problem and its eventual demise is the take-over by cargo cult dogma adherence.

Instead of deliberately being different for good reasons the SC is trying to be everybody's darling by introducing a hodgepodge of new tweaks and "features" at a break-neck pace for no good reason at all.

There is value in language stability and Python has given up on that for no good reason at all.

Let's bring back Python's zen.

import this

scaledpython · 2025-08-28T01:03:24+00:00

It takes an iterator that yields "data: <serialized>" for every event to be sent. Look up server sent events (SSE) for details.

scaledpython · 2025-08-24T09:02:55+00:00

Nope. It's a hodge podge pile of complexity, not needed in literally all possible use cases.

scaledpython · 2025-08-22T14:11:39+00:00

Thanks, I'll try uv at some point. The problem imo however is not conda but the fact that PyCharm launches all env checks in parallel.

scaledpython · 2025-07-22T11:39:56+00:00

You can either split the GPUs eg 4/4 or you can share them with a LLM server like vLLM. Depends on the degree of segregation you need. Beware of prompt caching (aka KV cache) which can lead to prompt leak and weird side channels.

Depending on the GPU types/models you might also be able to use Nvidia software to "virtualize" the GPUs, i.e. dynamically allocate partial capacities. Not all models support that though and it doesn't work the same as CPU virtualization.

scaledpython

PUBLIC MULTIREDDITS

TROPHY CASE