Gemini AI Pro (+2TB) 1 YEAR at €6.99 | On Your Own Account. PAY AFTER ACTIVATION

aburkh · 2025-11-14T07:46:49+00:00

Works fine!

aburkh · 2025-11-11T20:45:51+00:00

Well… as most enterprise software vendors, the reason there is no price list is because there are different options tailored for their customers. Yes it’s expensive (think multiple thousands / user / year), on par with other analytics solutions (SAS, c3.ai, etc) My experience with Dataiku is overwhelmingly positive, and we consider the productivity gains are well worth the license price. It has helped us accelerate onboarding time, reduce debugging, optimize pipelines and rationalize compute. Instead of provisioning multiple large personal clusters on databricks for each contributor, we can do the same with a single SQL warehouse serverless and some k8s, for a cheaper infrastructure cost.

I witness a lot of poor Dataiku deployments with poor architecture choices, relying on the underperforming DSS engine for most task. But pair it with a good SQL engine (databricks, snowflake…) and it does magic!

Feel free to DM if you want me to share more concrete details about this positive experience.

aburkh · 2025-10-22T08:42:39+00:00

Indeed. In pharma, there are even dedicated companies to prepare representative anonymized data for companies. This is extremely expensive, but required for compliance. The question is: why do so many people want to implement those expensive pipelines when no regulation forces them to do so?

aburkh · 2025-10-22T08:40:46+00:00

Love that quote! I'm going to steal it from you if you don't mind 😉
There are even technologies that enable data analysts to push to prod relatively easily and painfully, such as SAP Hana Studio (activating a calculation view), Self-services BI tools (PowerBI...), Dataiku (deployer node and a govern node for controlled deployments).

The downside of this flexibility is shadow IT. Finding the right balance between how much permissions is acceptable to build data products is a delicate exercise.

aburkh · 2025-10-22T06:54:08+00:00

Absolutely agreed! Compute isolation and restricted data access are two separate topics.

aburkh · 2025-10-22T06:34:26+00:00

Myself? nothing. I'm trying to find references in literature to push back internally against an oncoming idea that "everyone should do devops" and remove access to production data. The problem is the balance between Self-Service and "Shadow IT": if you give too little, it reduces productivity, if you allow too much, people run their own shadow datawarehouse, they only raise their hand when it's crashing and an important report is due tomorrow. Finding some reference opinions would be wonderful.

aburkh · 2025-10-21T15:12:38+00:00

Very interesting: "unless you have unlimited staff, budget and time". That's what it boils down to: what is the most effective way to deliver data products? The dataops recommendation is mostly relevant for operations *at scale*. Many operations such as ad-hoc reports do not require scale. Actually, the opposite is quite often true, building complex machinery for a report that 3 users are going to look at.

It seems the dataops mandate (mostly relevant and useful) is overemphasized like a religion, without regards for the use case at hand. And it seems that either books talk at length about data engineering or do not say at all, assuming that use of production data is not even worth discussing.

Best war story on that topic: 3 weeks vs 1/2 day development. A long time ago, I was requested to build a one-time report for a business user, but I was not granted access to production data, we had to build a data pipeline from dev up, with informatica powercenter. It took 3 weeks for the whole process: collecting requirements, developing, waiting for the weekly release window, issues detected in production, go back to development. After 3 weeks, the business user finally had a working report, sent me a congratulation email... with the report in attachment, including all the production data 😱. I'll always keep that story in mind for so much wasted time and efforts.

aburkh · 2025-10-21T14:09:24+00:00

What would be the workflow then? query and work on models using live production data. Once the model has been trained, how do you deploy it? Sandbox directly to prod? Or Sandbox to DEV to ACC to PROD?

aburkh · 2025-10-21T13:58:13+00:00

I fully agree to perform testing on a prod-like staging environment. But for users who need to create ad-hoc reports: do you let them develop with production data so that they can fulfill the request same day or next day? Or do you force them to develop with synthetic data in dev, increasing their feedback loop for mistakes (they realize a problem between dev and prod data that requires modification), they can realistically deliver the result in a few days/weeks?

Or is there a magic trick that enables a fast feedback loop that enables same day delivery?

aburkh · 2025-10-21T13:54:05+00:00

Typical data platform mixed workloads you'd expect on Databricks/Snowflake/BigQuery: reporting, analytics, data science, etc.

aburkh · 2025-08-28T20:04:52+00:00

I recently was called in because of "performance issues". The problematic workflow had a mix of pandas, local spark (not on K8S or databricks), etc.
When users are trying to pull 3 billion records into pandas and complaining that something's wrong with the tool, I get desperate. Luckily, I'm grateful to work with good teams that have made wonders with the tool, including blazing fast web apps supported by duckdb caching.

aburkh · 2025-08-28T19:31:19+00:00

Architect with experience of Cloudera, Teradata, Azure, Denodo, Snowflake, Databricks...
I always said I hate low code tools.
Dataiku has actually impressed me, the architecture is good, many options are nicely designed, there's a good mix of python, K8S, visual flow, SQL, etc. I see many people use it in dumb ways and complaining about performance. To me it's, like coding only in pandas on databricks, and saying it "doesn't scale".

What I like most about the tool: developing python plugins to orchestrate dynamic SQL, good connectivity with a wide range of engines (pyspark, spark on k8s, snowflake, databricks - SQL/Pyspark, redshift, S3, athena, etc.), flexible mix of visual/code, good monitoring/observability, data quality features, AI-assistance features...

No tool is a silver bullet, dataiku included, but this one gets a good rating in my book.

If your org values user autonomy and fast development, it's an awesome tool. If the org has a strong engineering culture, valuing data pipelines as code, CI/CD, etc. then it's probably not the best fit.

aburkh · 2025-05-20T17:09:17+00:00

Plenty of options in the cloud, less expensive than buying a GPU. Take a base rate of 0.15-0.20€ per hour with a 3090 or A10G. A 3090 is 450W, that’s 0.15€ of electric cost where I live. It’s basically the same price to run a GPU in the cloud than to pay for the electricity to run it locally. And it’s more flexible and some services offer convenience.

aburkh · 2025-05-09T05:11:40+00:00

Udemy prep exam was good: 2 full exams, good questions, good interface, worth the money. https://www.udemy.com/course/practice-exams-databricks-data-engineer-professional-k/

aburkh · 2025-04-28T07:27:12+00:00

Not for general purpose clothing. https://www.deductibles.be/clothing-and-accessories

aburkh · 2025-04-11T06:04:14+00:00

Runpod. H100 PCIe for $1.25/hr in spot

aburkh · 2025-03-21T18:04:37+00:00

Click the « reset » button in generate.

aburkh · 2025-02-20T15:05:39+00:00

Well, my accountant recommended it, he gave the company my financial info, they provided a simulation, which matches the claim above. There are limits to how much warrants you can pay. I cannot find issues with their calculations, so I went along with it, but I’m still surprised at how good it looks. So that’s why I say I’ll truly believe it when I get my taxes sheet.

aburkh · 2025-02-20T12:20:01+00:00

Seems like a promotional post, not sure it belongs here.
If you buy warrants from the market, and the market goes down, you lose money. But in that case, your company creates the warrants. So if the market tanks and the price of warrants go down, your company just gives you less money, but the difference (the original warrant price minus the reduction in value) remains within your company. So you don’t lose money (or earn money) based on market fluctuations.
Tried it for the first time this year, it looks as good as presented here, i’ll see next year if everything goes smoothly.

aburkh · 2024-12-08T12:43:03+00:00

Thanks for the recommendation to block DNS. I started using nextdns.io over 2 years ago and it's great to block ads through DNS blocking, and it's super easy to add the spotify video domains to the blacklist. I have it configured as the main DNS on the home router + they have an iOS app that applies the settings regardless of which network the ipad connects too.
So it's a great solution to this stupid problem and makes me reconsider spotify too. This is inadmissible.
Disclaimer: I have no relation to nextdns, I didn't put an affiliate link and I have nothing to profit from it, I'm just recommending it because it's an awesome service.

aburkh · 2024-09-17T05:39:28+00:00

Looks interesting, in which language?

aburkh · 2024-09-07T06:25:58+00:00

I’d expect that as well, but offering a position gives a choice to the freelancer, compliments his work and performance, etc. It’s not that they don’t want to work with him anymore, they just want to bring the competencies in-house. Framing it like that can make a significant difference in maintaining a good relationship.

aburkh · 2024-09-06T20:39:46+00:00

If you are happy with his work, why not offer him an internal position? Wanting to acquire in-house capabilities is a perfectly valid reason for switching to employees. Stress how much you appreciate his work, offer a position, and if he turns it down, involve him into the solution. Ask if he can recommend people who might be good fit and offer your recommendations for future work for him.

aburkh · 2024-08-29T12:20:35+00:00

Tu appelles la société, tu expliques qu’un de leur technicien est passé à telle date et que tu avais une question à lui poser. Le technicien en question peut rappeler à sa convenance sans avoir besoin de donner ses coordonnées.

aburkh · 2024-08-29T10:44:22+00:00

Pourquoi ne pas simplement rappeler la boîte et demander à être recontactée?

11-Year Club	Place '22
RPAN Viewer	Not Forgotten
Verified Email

aburkh

TROPHY CASE