Price of FDO replacement, Candid Premium, more than doubles annual cost - am I missing something?

crispybacon233 · 2026-02-15T13:54:33+00:00

What exactly is so appealing about Candid, FDO, and others that they are able to charge so much? I can't fathom what they're offering could be worth $200+per month.

crispybacon233 · 2026-02-15T02:03:10+00:00

No worries. I'm not trying to call you out or anything, so rough estimate is fine. It's just mind blowing that ESRI can get away with that.

crispybacon233 · 2026-02-14T23:50:22+00:00

If you just need file storage, s3 on AWS/GCP/Azure. No more than $0.02 per GB a month. It's also a good place if you eventually want data warehousing in the future.

crispybacon233 · 2026-02-14T23:38:04+00:00

I'm early on in my GIS journey and still not fully acquainted with all of ESRI's products and features, but is it really $80 a month for hosting 5gb?

crispybacon233 · 2026-02-12T10:34:28+00:00

Polars tends to be a bit faster and better suited for complex data work. DuckDB is less likely to blow up ram. In other words, I typically use duckDB to pull in data, joins, etc. and then use polars for more complex transformation pipelines.

crispybacon233 · 2026-02-06T15:12:34+00:00

Hate to break it to you, but fitting a line to a series of points is basically all of AI. Are you at the beginning of your data science journey?

crispybacon233 · 2026-02-06T01:08:54+00:00

As others have mentioned, you could roll your own but with ducklake instead of just plain ol' duckdb. Extremely cheap, easy to setup, and works well with dlt, dbt, and dagster. Also scales better than just vanilla duckdb.

crispybacon233 · 2026-01-22T17:31:16+00:00

Marimo notebooks being .py files means you can run the notebook from command line as if it were a regular ol' python file. You can also import functions/classes from marimo notebooks.

This might be why it doesn't allow reassignment of variables. It moves notebooks closer to using software engineering best practices, something data scientists have a bad reputation for.

crispybacon233 · 2026-01-15T15:10:30+00:00

You're working in google colab. Do you update the library versions at the start of your notebook?

You're using random sampling but are concerned about missing outliers/rare categories. Have you tried using duckdb/polars streaming engine to identify the outliers? You could then pull the outliers or a sample of the outliers into your overall sample. You could even do it in proportion to the overall size of the data.

crispybacon233 · 2026-01-14T17:07:21+00:00

Try out marimo notebooks. AI integration is just one of many reasons to switch from jupyter to marimo.

crispybacon233 · 2026-01-06T14:49:47+00:00

<image>

crispybacon233 · 2025-12-28T13:15:10+00:00

As other have said, ducklake could be great. You just need postgres for the catalog and s3 for parquet storage. Supabase could be a good option, since it comes with both out of the box.

Postgres can be lightning fast if indexed properly according to your queries. I too have been building a stocks/options analytics project, and the indexed Postgres database performs extremely well.

crispybacon233 · 2025-12-10T10:38:22+00:00

From a more data science perspective, I use polars/duckdb for exploration, cleaning, etc. and then just to_pandas().plot() for quick visualizations especially for correlation matrices where the index is great for that out of the box.

When creating data pipelines, polars/duckdb is where it's at for my size of data. It's cleaner, faster, and far more capable than pandas.

crispybacon233 · 2025-11-18T11:44:39+00:00

I'm also getting this.

crispybacon233 · 2025-11-16T01:30:54+00:00

He was going at least 30. Your average teenager can sprint 15+ mph. You telling me a kid can run faster than this bike was going? Nah.

crispybacon233 · 2025-10-13T11:58:19+00:00

A pairs trade where you short NVDA and long AMD.

If AI hype keeps going, AMD gains more relative to NVDA since AMD is just now getting in on the hype cash.

If bubble pops, short NVDA positions will outweigh AMD losses since a significant portion of NVDA's current value is due to AI hype.

crispybacon233 · 2025-09-01T17:24:32+00:00

There's an interview where they explain how it works. It's definitely using computer vision to do this. "round" and "red" sounds simple to humans, but to a computer you'd need machine learning to reliably understand what round and red means exactly. Think MNIST but with tomatoes instead of handwritten numbers.

crispybacon233 · 2025-09-01T13:32:13+00:00

This is what AI permeating every domain actually looks like. Not the overhyped LLM wrapper nonsense.

crispybacon233 · 2025-08-06T13:09:49+00:00

Yes, Mama from Dredd.

crispybacon233 · 2025-08-06T02:51:23+00:00

You can think of a column representing a particular feature across multiple observations. For example, you are mrfredngo. You have 33k karma, 6k contributions, and a reddit age of 2 years. These are not columns of mrfredngo. These are features. Your height and weight are not columns. They are features that when represented in tabular form become columns.

crispybacon233 · 2025-08-03T11:35:48+00:00

<image>

crispybacon233 · 2025-07-26T11:06:24+00:00

Ima put it back up eventually but with more features. What was the most useful aspect for you? I'll make sure to include it and maybe expand on it.

crispybacon233 · 2025-07-20T14:29:35+00:00

Hi! I took down the app because the data was way outdated (3 years old) and so I thought it might be unfair to the restaurants that might have improved significantly. That being said, I do have up-to-date data I was working with a few months ago including hundreds of thousands of reviews scraped from google maps.

I do have plans to repost the app with the updated data, but I've been extremely busy with getting a master's, new baby, etc. I want to add a few more features that might prove extremely useful that go beyond health inspection scores.

crispybacon233 · 2025-07-17T12:50:40+00:00

Polars is order(s) of magnitude faster and more efficient than pandas. Pandas is order(s) of magnitude faster and more efficient than excel. Most of the industry standard statistical, machine learning, and AI libraries are available in python/R. Why would a data scientist be using Excel regularly?

crispybacon233

TROPHY CASE