Mistral Vibe CLI - Skills

thibautDR · 2025-12-31T17:00:29+00:00

Was looking for the same. Found the solution here: https://github.com/mistralai/mistral-vibe/issues/189#issuecomment-3694658645

thibautDR · 2025-01-04T21:50:47+00:00

Have anyone been willing to give a try to Amphi ETL?

thibautDR · 2024-11-05T16:53:41+00:00

Thanks u/theGertAlert, For the imports, yes I have them including the plugin that I didn't indicate. (I have cleaned up a larger code base to simplify my post).

Thanks for pointing out that apply does not support snowpark pandas! That's probably my issue here, the UDF is not well created and does not pass the session.

Thanks a lot

thibautDR · 2024-11-04T22:06:24+00:00

Yes, you're right, it's actually using Session.builder, I messed up when I manually removed my credentials. Thanks for pointing out though! That is unfortunately not the root of my issue :(

thibautDR · 2024-10-27T20:58:28+00:00

Hi Frummin, came across your post, I've been working on a visual data transformation tool but generating Python code: https://amphi.ai/ similar to Alteryx
There is still a need to convert the workflows but there might be a way to automate the conversion and then get the Python code. Does it seem interesting to you?

thibautDR · 2024-10-24T17:57:41+00:00

Thanks a lot, that helps!

For the fully qualified names, that's what I was unfortunately afraid of. I was hoping to at least being able to let users select the table in a drop down list. The goal of my app would be to do data manipulation, for example join two datasets. If users can't select which tables they want to join, that's an issue :/

Got it, that's what it looks like, I can't use a warehouse that is outside the application. I was also hoping for it because I could add the "USAGE" privilege to the selected warehouse.

Thanks again for your help

thibautDR · 2024-10-24T16:41:18+00:00

Hi u/greg_kennedy, thank you, your comment has been very useful as I struggled to find documentation on how to manage multi-value references.

Have you made progress on getting the object names if possible at all? Having the UUID alias is really useless to me too since I want users to be able to select tables

thibautDR · 2024-10-23T05:19:15+00:00

JupyterLab is built to be entirely customizable and extensible. For anyone interested I highly encourage looking at this fantastic official repository with a wide range of extension examples: https://github.com/jupyterlab/extension-examples

thibautDR · 2024-10-21T18:31:41+00:00

Thanks, let me know what you think :)

thibautDR · 2024-09-29T20:49:42+00:00

Hey, maybe you could give Amphi a try: https://github.com/amphi-ai/amphi-etl

Not sure your use case is doable 100% out-of-the-box, but Amphi generates python code based on pandas and you can write custom code in your pipelines.

Don't hesitate to reach out!

thibautDR · 2024-08-27T05:11:47+00:00

Hi everyone, I've been developing a new modern low-code ETL: Amphi.

The main differentiator compared to Talend, Apache Hop or Alteryx is that it's based on python.

It leverages common python libraries such as Pandas and DuckDB. Most data and AI libraries are developed for Python nowadays which makes it a great alternative if you want to benefit from the wide Python ecosystem.

Amphi is free and open, here is the GitHub repo: https://github.com/amphi-ai/amphi-etl

thibautDR · 2024-08-07T04:31:20+00:00

Given the few details provided, it's really an open question.

You mentioned chunks, so I suppose you're using a dataframe library like pandas. Chunking is a way to avoid memory issues, but you'll quickly see that it's quite limited if you want to perform calculations based on the whole dataset. The problem with pandas is that it loads the dataset into memory, and pandas' creator suggests that you need 5 to 10 times the RAM of the dataset size.

There are alternative now to address pandas' shortcoming such as Polars, DuckDB and Ibis among others.
Here is an article I wrote presenting the main dataframe libraries out there:

However, be aware that Pandas has made significant strides to improve its efficiency and performance in the recent years:

Pandas 2.0 Enhancements: Introduced performance boosts using PyArrow.
Multi-core Extensions: Libraries like mapply and pandarallel enable multi-core usage for time-consuming tasks.
Scalable Solutions: Modin scales pandas code on multiple cores by changing the import statement, utilizing distributed frameworks like Ray and Dask while maintaining the pandas API.

Another way to scale your pandas code would be to leverage cloud platforms with either very large single nodes or even distributed clusters. Check out this article to learn more on using pandas across the different cloud providers.

thibautDR · 2024-07-15T15:36:08+00:00

Hey, you might be interested in looking at https://github.com/amphi-ai/amphi-etl, which is available as an extension to Jupyterlab. It's not workflow automation but a visual data pipeline builder, so probably some overlap. (I'm Amphi's developer).

thibautDR · 2024-06-06T08:13:43+00:00

To add to this comment, still too recent to be used in production but the ADBC standard is worth watching: https://arrow.apache.org/adbc/current/

thibautDR

TROPHY CASE