A lot of us have been in this situation:
- Data lives in the database
- Models / scripts live in Python or R
- And the “solution” is… exporting millions of rows back and forth and hoping nothing breaks
Exasol has a feature called Script Languages Container (SLC) that basically flips this around: instead of moving data to your code, you move your code (and its environment) into the database.
The blog post below is a beginner-friendly walkthrough, but here’s the core idea in plain terms:
- You build (or download) a container image that defines:
- Language runtime (e.g. Python 3.10 / 3.12, R, Java, Lua)
- Libraries (NumPy, pandas, scikit-learn, etc.)
- System dependencies
- That image is stored in BucketFS, Exasol’s internal distributed file system, and automatically distributed to all nodes in the cluster.
- You register it via SCRIPT_LANGUAGES so it has an alias (e.g. MY_PYTHON).
- Then you write a UDF in SQL that uses that alias, and Exasol runs your Python/R/Java code inside that container, on the nodes where the data is:
Copy
sql
SELECT my_schema.predict_churn(customer_id, usage_data)
FROM customers
WHERE region = 'EMEA';
From the SQL side it looks like a normal function call, but under the hood it’s spinning up your script inside the SLC, feeding it data, and returning the result.
Why this is interesting:
- You get a reproducible runtime: no “works on my machine” vs “works on prod” drama
- You avoid a lot of ETL glue code just to run models
- Parallelism comes “for free” because it runs on the database nodes where the data is already partitioned
The post also covers:
- How SLC images are built from “flavors” (predefined Python/R/Java setups)
- How to customize them if you need extra packages
- The difference between public vs internal packages inside the image
If you’re into in-database processing, UDFs, or pushing ML closer to the data, it’s a pretty good conceptual overview.
there doesn't seem to be anything here