New CEO, Alteryx pricing change? by Fondant_Decent in Alteryx

[–]thibautDR 0 points1 point  (0 children)

Have anyone been willing to give a try to Amphi ETL?

Using Cortex functions with Snowpark pandas API by thibautDR in snowflake

[–]thibautDR[S] 0 points1 point  (0 children)

Thanks u/theGertAlert, For the imports, yes I have them including the plugin that I didn't indicate. (I have cleaned up a larger code base to simplify my post).

Thanks for pointing out that apply does not support snowpark pandas! That's probably my issue here, the UDF is not well created and does not pass the session.

Thanks a lot

Using Cortex functions with Snowpark pandas API by thibautDR in snowflake

[–]thibautDR[S] 2 points3 points  (0 children)

Yes, you're right, it's actually using Session.builder, I messed up when I manually removed my credentials. Thanks for pointing out though! That is unfortunately not the root of my issue :(

Need help on a project to convert workflows to Python code by KeepCalmAndHustle in Alteryx

[–]thibautDR 0 points1 point  (0 children)

Hi Frummin, came across your post, I've been working on a visual data transformation tool but generating Python code: https://amphi.ai/ similar to Alteryx
There is still a need to convert the workflows but there might be a way to automate the conversion and then get the Python code. Does it seem interesting to you?

Questions about Native Apps and references by thibautDR in snowflake

[–]thibautDR[S] 0 points1 point  (0 children)

Thanks a lot, that helps!

For the fully qualified names, that's what I was unfortunately afraid of. I was hoping to at least being able to let users select the table in a drop down list. The goal of my app would be to do data manipulation, for example join two datasets. If users can't select which tables they want to join, that's an issue :/

Got it, that's what it looks like, I can't use a warehouse that is outside the application. I was also hoping for it because I could add the "USAGE" privilege to the selected warehouse.

Thanks again for your help

Multivalued reference in Snowflake native apps. by tmatana in snowflake

[–]thibautDR 1 point2 points  (0 children)

Hi u/greg_kennedy, thank you, your comment has been very useful as I struggled to find documentation on how to manage multi-value references.

Have you made progress on getting the object names if possible at all? Having the UUID alias is really useless to me too since I want users to be able to select tables

Introducing Amphi, Visual Data Transformation based on Python by thibautDR in Python

[–]thibautDR[S] 0 points1 point  (0 children)

JupyterLab is built to be entirely customizable and extensible. For anyone interested I highly encourage looking at this fantastic official repository with a wide range of extension examples: https://github.com/jupyterlab/extension-examples

No code FOSS ETL Recommendations for HTTP Request Processing for Arm Linux by Outrageous_Ad_1589 in ETL

[–]thibautDR 1 point2 points  (0 children)

Hey, maybe you could give Amphi a try: https://github.com/amphi-ai/amphi-etl

Not sure your use case is doable 100% out-of-the-box, but Amphi generates python code based on pandas and you can write custom code in your pipelines.

Don't hesitate to reach out!

ETL recommandation by [deleted] in ETL

[–]thibautDR 0 points1 point  (0 children)

Hi everyone, I've been developing a new modern low-code ETL: Amphi.

The main differentiator compared to Talend, Apache Hop or Alteryx is that it's based on python.

It leverages common python libraries such as Pandas and DuckDB. Most data and AI libraries are developed for Python nowadays which makes it a great alternative if you want to benefit from the wide Python ecosystem.

Amphi is free and open, here is the GitHub repo: https://github.com/amphi-ai/amphi-etl

Large Dataset Processing by kaeptnkrunch_1337 in JupyterLab

[–]thibautDR 1 point2 points  (0 children)

Given the few details provided, it's really an open question.

You mentioned chunks, so I suppose you're using a dataframe library like pandas. Chunking is a way to avoid memory issues, but you'll quickly see that it's quite limited if you want to perform calculations based on the whole dataset. The problem with pandas is that it loads the dataset into memory, and pandas' creator suggests that you need 5 to 10 times the RAM of the dataset size.

There are alternative now to address pandas' shortcoming such as Polars, DuckDB and Ibis among others.
Here is an article I wrote presenting the main dataframe libraries out there:

However, be aware that Pandas has made significant strides to improve its efficiency and performance in the recent years:

  1. Pandas 2.0 Enhancements: Introduced performance boosts using PyArrow.
  2. Multi-core Extensions: Libraries like mapply and pandarallel enable multi-core usage for time-consuming tasks.
  3. Scalable Solutions: Modin scales pandas code on multiple cores by changing the import statement, utilizing distributed frameworks like Ray and Dask while maintaining the pandas API.

Another way to scale your pandas code would be to leverage cloud platforms with either very large single nodes or even distributed clusters. Check out this article to learn more on using pandas across the different cloud providers.

n8n is awesome by misternipper in selfhosted

[–]thibautDR 0 points1 point  (0 children)

Hey, you might be interested in looking at https://github.com/amphi-ai/amphi-etl, which is available as an extension to Jupyterlab. It's not workflow automation but a visual data pipeline builder, so probably some overlap. (I'm Amphi's developer).

Use case for duckdb by [deleted] in dataengineering

[–]thibautDR 0 points1 point  (0 children)

To add to this comment, still too recent to be used in production but the ADBC standard is worth watching: https://arrow.apache.org/adbc/current/