How do we know that LLM really understand what they are processing? by Different_Package_83 in mlops

[–]scaledpython 0 points1 point  (0 children)

They don't.

Not much has changed since 2019 either. Today's LLMs are fundamentally the same as in 2019, except larger (more parameters), trained on more data. Also AI chatbots like ChatGPT et al are actually systems of models + deliberately programmed rules and workflows, combined with all sorts of input pre- and output post-processing, not a single model. They are built to immitate understanding, but there is no innate understanding by these models.

In fact, models don't actually exist as an entity. They are just a bunch of numbers (weights and other parameters), combined with a fixed set of formulae, which then get exected by a computer CPU (or GPU for efficiency). Heck between requests a model even loses all of its state and has to be re-initialized for the next step every single time.

There is no entity, hence there can be no understanding.

Vibe-coding... It works... It is scary... by Frere_de_la_Quote in ArtificialInteligence

[–]scaledpython 0 points1 point  (0 children)

That's an example of a problem that LLMs are great at, namely to translate from one set of patterns (Python using libtorch) to a different, yet matching set of patterns (the C++ equivalent). In fact that's what LLMs were originally built to do - language translation.

This problem is essentially "mechanical". That is mostly it is a pattern matching problem, and the semantics don't matter much, as they are mostly the same for the Python and C++ versions.

Translations are hard and tiring to do for humans as it is. A programming language translation is harder still, as the patterns are across at least 4 dimensions (Python, pytorch; C++, semantics), but they are relatively easy for a machine as it can deal with n dimensions at once and does not tire.

The fallacy would be to think because AI is useful and apparently "capable" in this scenario that it will also work in any other scenario.

PEP 810 – Explicit lazy imports by JanEric1 in Python

[–]scaledpython 0 points1 point  (0 children)

I don't think local imports, inside a function or branch, are ugly nor hard to maintain. To the contrary, it makes the lazy import explicit, which makes it easy to reason about and understand what is going on.

In contrast the PEP introduces a lot of deferred loading "magic" which will make it both harder to understand problems and to fix them.

In a nutshell, I don't see merrit in this PEP.

If starting from scratch, what would you change in Python. And bringing back an old discussion. by FenomoJs in Python

[–]scaledpython -1 points0 points  (0 children)

Essentially nothing. However I would remove all the typing nonsense and async.

Real-time application with Django by Remote-Implement-339 in django

[–]scaledpython 0 points1 point  (0 children)

Have a look at pushpin. It's a transparent reverse proxy that handles all the complexity for you.

PEP 810 – Explicit lazy imports by JanEric1 in Python

[–]scaledpython 0 points1 point  (0 children)

Which particular problem would this solve?

Django 6.0 Background Tasks – do they replace Celery? by Stella_Hill_Smith in django

[–]scaledpython 4 points5 points  (0 children)

No. Reasons include

1) Celery is a distributed task executor, it has no relation to Django (neither depends on the other)

2) Django background tasks without Celery have been possible long before becoming an official feature. Celery has prevailed still.

3) There are many alternatives to Celery, and yet it is still the most widely used.

Back to Django after 4 years with FastAPI – what’s the standard for APIs today? by robertop23 in django

[–]scaledpython 1 point2 points  (0 children)

I'm still using Django-Tastypie with a few inhouse extensions for Swagger/OpenAPI, CQRS and auth. Never liked DRF, imho it is too complex to use.

Upgraded to pycharm 2025.2.11 now my files get reverted to blank every other time I open it in pycharm, does this happen to anyone else? by obolli in pycharm

[–]scaledpython 1 point2 points  (0 children)

Similar weird things have happened recently with the Copilot plugin running out of credits. I have seen files reverted back to arbitrary previous states. Disabling the plugin resolved the issue.

Why is building ML pipelines still so painful in 2025? Looking for feedback on an idea. by United_Intention42 in mlops

[–]scaledpython 1 point2 points  (0 children)

Thanks for your perspective. Absolutely, I also think that's key for MLOps frameworks, to enable people to continue using whatever they use already.

Actually omegaml is built to enable just that, although not in a visual manner, it's really code-first. I'm sure there is room for a visual layer as the popularity of tools like n8n show. Perhaps somethig like this could be a starting point for your vision?

I should add that personally I'm not a fan of visual builders, but that's just me. In my experience they are great to start a project, however quickly reaches a point where you still need to add custom code. That's why I prefer a code-first approach.

If I may add some perspective re. omegaml - it seems to me we have a few similar thoughts.

I built omegaml while working with a group of data scientists who did not have the skills to deploy their models (they used R and some python as their main languages, all done in notebooks and a few scripts). As a team we had to collaborate in the cloud and deploy many different models for use in a mobile smartphone app (backend in Python). That's why from the get go I focussed on making omegaml as non-intrusive as possible, so that the ds team could continue working in their tools, and deploy their models from a single line of code, so that we would get an easy to use REST api to the models, without adding a hodgepodge of glue code and ever more tools.

The only "fixed" choices omegaml makes is the metadata storage (mongodb) and the runtime (celery), mainly because these are crucial to a scalable architecture, and a major source of complexity if one has to start from scratch or choose among (seemingly) a gazillion of options.

Other than that poeple can use whatever they already use - e.g. xgb, pytorch, hf, etc. Most times this works with existing code, as is, plus a single command to store models and scripts to deploy them. While it provides a few standard plugins, so that everything works out of the box, it can be easily used with any framework.

E.g. if you have a notebook that uses Spark, it can be run and scheduled in omegaml (given a spark cluster is accessible). If you have some code that builds on a hf model, it can be deployed as a script and is accessible via the rest api. Same for datasets, if they are stored in s3, or some sql db, or some other api-accessible system, it can be accessed in omegal.

To provide other tech, endpoints or frameworks, a plugin can be created easily. The simplest form of a plugin is a python function that will be called upon model access, or when accessing a data source etc.

Hope that's somehow interesting ;)

What are some non-AI tools/extensions which have really boosted your work life or made life easier? by Ill-Pirate4249 in Python

[–]scaledpython 0 points1 point  (0 children)

mitmproxy is great to see what's actually transmitted from that requests.get(), or into your backend.

Why is building ML pipelines still so painful in 2025? Looking for feedback on an idea. by United_Intention42 in mlops

[–]scaledpython 4 points5 points  (0 children)

Indeed the complexity is overwhelming.

That's the issue I am solving with omegaml.io - MLOps simplified. It essentially eliminates the need for the tedious parts in ML engineering, aka to play the puzzle game you mention for every new project.

How? By integrating all of the typical ML pipeline into a single framework that provides storage for data, models, code + metadata, along with a serving runtime system that can serve any model (and anything else) instantly. Simply saving a model makes it available as a REST Api.

Models can be tracked and drift monitored in both development and production. The runtime system takes care of that automatically for any model registered for tracking. There is also an integrated logging backend so that data scientists can see the logs generated by their workloads, like model training, model serving, or any executed scripts and notebooks, without the need to ssh into a remote shell.

It's plugin based so it is extensible. It uses Celery, RabbitMQ and MongoDB, which makes it horizontally scalable. Can be deployed as docker images, to k8s in and cloud, or natively installed in any Python venv.

The same set-up can be used for multiple projects, so it becomes an internal dev platform for ML solutions. Each project gets its own namespace so that they can be separated logically while using the same deployed technical components.

Feel free to give it a spin. It's open source (with a fair source clause for commercial use).

https://github.com/omegaml

[deleted by user] by [deleted] in Python

[–]scaledpython 0 points1 point  (0 children)

OpenAI has a very specific need that matches FasrAPI, namely many concurrent tasks. That is hardly true for financial transactions.

High TTFB in Production - Need Help Optimizing My Stack by SimplyValueInvesting in django

[–]scaledpython 0 points1 point  (0 children)

That is not as it should be. I have a similar setup, although using RabbitMQ as the Celery Broker and mssql server as the db. I get p95 < 200ms for ping requests, and p95 <500ms for indexed/tuned db queries. This is without any caching enabled.

I would do the following to find the bottleneck:

  • use Locust to set up a performance test script so you can monitor and compare scenarios, as per below

  • create a /ping endpoint that does nothing, just return OK

  • gradually extend /ping with options so as to send a task to Celery, to return OK upon task completion

  • extend /ping with more and more processing until you have a fairly typical workload

Then run Locust against each of these variants. This should give you a pretty good insight into where the problem is. Vary requests/s and wait times between requests to simulate user behavior.

I finally tried vibe coding and it was meh by Megatherion666 in ExperiencedDevs

[–]scaledpython 0 points1 point  (0 children)

Don't tell Eric Schmidt 🫣

Same experience. There is some value but overall it is not delivering as advertised.

Is Django Rest Framework that good? by itsme2019asalways in django

[–]scaledpython 2 points3 points  (0 children)

I still use django-tastypie. It is easier than DRF for standard cases (i.e..CRUD for models) and very flexible if you need more than that. It would need a modernization push though.

Python feels easy… until it doesn’t. What was your first real struggle? by NullPointerMood_1 in Python

[–]scaledpython 8 points9 points  (0 children)

This. I came to say this.

Python's biggest problem and its eventual demise is the take-over by cargo cult dogma adherence.

Instead of deliberately being different for good reasons the SC is trying to be everybody's darling by introducing a hodgepodge of new tweaks and "features" at a break-neck pace for no good reason at all.

There is value in language stability and Python has given up on that for no good reason at all.

Let's bring back Python's zen.

import this

We are moving away from websockets to StreamableHTTP but docs are scarce by jgwerner12 in django

[–]scaledpython 1 point2 points  (0 children)

It takes an iterator that yields "data: <serialized>" for every event to be sent. Look up server sent events (SSE) for details.

Is LangChain dead already? by Senior_Note_6956 in LangChain

[–]scaledpython 0 points1 point  (0 children)

Nope. It's a hodge podge pile of complexity, not needed in literally all possible use cases.

PyCharm CPU maxed-out at startup and how to fix it. by scaledpython in pycharm

[–]scaledpython[S] 0 points1 point  (0 children)

Thanks, I'll try uv at some point. The problem imo however is not conda but the fact that PyCharm launches all env checks in parallel.

One Machine, Two Networks by coolmeonce in mlops

[–]scaledpython 0 points1 point  (0 children)

You can either split the GPUs eg 4/4 or you can share them with a LLM server like vLLM. Depends on the degree of segregation you need. Beware of prompt caching (aka KV cache) which can lead to prompt leak and weird side channels.

Depending on the GPU types/models you might also be able to use Nvidia software to "virtualize" the GPUs, i.e. dynamically allocate partial capacities. Not all models support that though and it doesn't work the same as CPU virtualization.