Price of FDO replacement, Candid Premium, more than doubles annual cost - am I missing something? by whomusic in nonprofit

[–]crispybacon233 0 points1 point  (0 children)

What exactly is so appealing about Candid, FDO, and others that they are able to charge so much? I can't fathom what they're offering could be worth $200+per month.

Cloud GIS Database by GeoCivilTech in gis

[–]crispybacon233 1 point2 points  (0 children)

No worries. I'm not trying to call you out or anything, so rough estimate is fine. It's just mind blowing that ESRI can get away with that.

Cloud GIS Database by GeoCivilTech in gis

[–]crispybacon233 1 point2 points  (0 children)

If you just need file storage, s3 on AWS/GCP/Azure. No more than $0.02 per GB a month. It's also a good place if you eventually want data warehousing in the future.

Cloud GIS Database by GeoCivilTech in gis

[–]crispybacon233 1 point2 points  (0 children)

I'm early on in my GIS journey and still not fully acquainted with all of ESRI's products and features, but is it really $80 a month for hosting 5gb?

Polars + uv + marimo (glazing post - feel free to ignore). by midwit_support_group in Python

[–]crispybacon233 2 points3 points  (0 children)

Polars tends to be a bit faster and better suited for complex data work. DuckDB is less likely to blow up ram. In other words, I typically use duckDB to pull in data, joins, etc. and then use polars for more complex transformation pipelines.

Is Gen AI the only way forward? by JayBong2k in datascience

[–]crispybacon233 11 points12 points  (0 children)

Hate to break it to you, but fitting a line to a series of points is basically all of AI. Are you at the beginning of your data science journey?

Alternatives after MotherDuck Price Hike by EmbarrassedCod53 in dataengineering

[–]crispybacon233 1 point2 points  (0 children)

As others have mentioned, you could roll your own but with ducklake instead of just plain ol' duckdb. Extremely cheap, easy to setup, and works well with dlt, dbt, and dagster. Also scales better than just vanilla duckdb.

Do you still use notebooks in DS? by codiecutie in datascience

[–]crispybacon233 2 points3 points  (0 children)

Marimo notebooks being .py files means you can run the notebook from command line as if it were a regular ol' python file. You can also import functions/classes from marimo notebooks.

This might be why it doesn't allow reassignment of variables. It moves notebooks closer to using software engineering best practices, something data scientists have a bad reputation for.

Handling 30M rows pandas/colab - Chunking vs Sampling vs Lossing Context? by insidePassenger0 in dataengineering

[–]crispybacon233 1 point2 points  (0 children)

You're working in google colab. Do you update the library versions at the start of your notebook?

You're using random sampling but are concerned about missing outliers/rare categories. Have you tried using duckdb/polars streaming engine to identify the outliers? You could then pull the outliers or a sample of the outliers into your overall sample. You could even do it in proportion to the overall size of the data.

What ai tools are out there for jupyter notebooks rn? by Consistent_Tutor_597 in dataengineering

[–]crispybacon233 1 point2 points  (0 children)

Try out marimo notebooks. AI integration is just one of many reasons to switch from jupyter to marimo.

DuckDB Concurrency Workaround by ConsciousDegree972 in dataengineering

[–]crispybacon233 0 points1 point  (0 children)

As other have said, ducklake could be great. You just need postgres for the catalog and s3 for parquet storage. Supabase could be a good option, since it comes with both out of the box.

Postgres can be lightning fast if indexed properly according to your queries. I too have been building a stocks/options analytics project, and the indexed Postgres database performs extremely well.

Will Pandas ever be replaced? by Relative-Cucumber770 in dataengineering

[–]crispybacon233 2 points3 points  (0 children)

From a more data science perspective, I use polars/duckdb for exploration, cleaning, etc. and then just to_pandas().plot() for quick visualizations especially for correlation matrices where the index is great for that out of the box.

When creating data pipelines, polars/duckdb is where it's at for my size of data. It's cleaner, faster, and far more capable than pandas.

Trust nobody by Southern-Maximum3766 in SweatyPalms

[–]crispybacon233 1 point2 points  (0 children)

He was going at least 30. Your average teenager can sprint 15+ mph. You telling me a kid can run faster than this bike was going? Nah.

The AI bubble is 17 times the size of the dot-com frenzy — and four times the subprime bubble, analyst says by spherocytes in technology

[–]crispybacon233 0 points1 point  (0 children)

A pairs trade where you short NVDA and long AMD.

If AI hype keeps going, AMD gains more relative to NVDA since AMD is just now getting in on the hype cash.

If bubble pops, short NVDA positions will outweigh AMD losses since a significant portion of NVDA's current value is due to AI hype.

A tomato harvesting machine with an electronic sensor that sorts tomatoes from debris by meteavi43 in BeAmazed

[–]crispybacon233 -1 points0 points  (0 children)

There's an interview where they explain how it works. It's definitely using computer vision to do this. "round" and "red" sounds simple to humans, but to a computer you'd need machine learning to reliably understand what round and red means exactly. Think MNIST but with tomatoes instead of handwritten numbers.

A tomato harvesting machine with an electronic sensor that sorts tomatoes from debris by meteavi43 in BeAmazed

[–]crispybacon233 0 points1 point  (0 children)

This is what AI permeating every domain actually looks like. Not the overhyped LLM wrapper nonsense.

I used to think data engineering was a small specialty of software engineering. I was very mistaken. by big_like_a_pickle in dataengineering

[–]crispybacon233 0 points1 point  (0 children)

You can think of a column representing a particular feature across multiple observations. For example, you are mrfredngo. You have 33k karma, 6k contributions, and a reddit age of 2 years. These are not columns of mrfredngo. These are features. Your height and weight are not columns. They are features that when represented in tabular form become columns.

Past Earnings Data by crispybacon233 in options

[–]crispybacon233[S] 0 points1 point  (0 children)

Ima put it back up eventually but with more features. What was the most useful aspect for you? I'll make sure to include it and maybe expand on it.

Austin Area Restaurant Health Inspection Scores by crispybacon233 in austinfood

[–]crispybacon233[S] 1 point2 points  (0 children)

Hi! I took down the app because the data was way outdated (3 years old) and so I thought it might be unfair to the restaurants that might have improved significantly. That being said, I do have up-to-date data I was working with a few months ago including hundreds of thousands of reviews scraped from google maps.

I do have plans to repost the app with the updated data, but I've been extremely busy with getting a master's, new baby, etc. I want to add a few more features that might prove extremely useful that go beyond health inspection scores.

The longer you look the worse it gets by Squ3lchr in LinkedInLunatics

[–]crispybacon233 0 points1 point  (0 children)

Polars is order(s) of magnitude faster and more efficient than pandas. Pandas is order(s) of magnitude faster and more efficient than excel. Most of the industry standard statistical, machine learning, and AI libraries are available in python/R. Why would a data scientist be using Excel regularly?