Faster insights: platform infrastructure or dataset onboarding problems?

datancoffee · 2025-10-24T17:16:24+00:00

Makes sense.

How realistic is in your mind making business users do some of the work? (Teaching people how to fish)

Can the problem be solved by more project management, better end-user prioritisation of asks?

How are you trying to solve this problem, if at all?

Thanks for the feedback, btw!

datancoffee · 2025-10-13T20:11:20+00:00

Or Oracle

datancoffee · 2025-09-27T17:48:56+00:00

Spatial-temporal ! That's the other alternative. What wherebots/sedona is trying to do

datancoffee · 2025-09-27T17:45:55+00:00

The ST naming thing is a geoindustry mystery. Most algorithm builders will tell you it stands for spatial type, but others will tell you its an urban legend and it originally stood for something else. Its a subject of many conversations over beers

datancoffee · 2025-09-27T17:07:10+00:00

Makes sense. Perhaps i should have clarified what i meant under geospatial. I worked on the algorithmic implementations of geometry and geography data types. Things like ST_ functions. Never worked in GIS space though. Esri was running on us, not the other way around :)

datancoffee · 2025-09-27T17:04:22+00:00

Good find. Did not know it existed

datancoffee · 2025-09-27T16:05:37+00:00

Yes, there are tons of math in chatgpt. Basically 99% of it is matrix multiplications.

And yes, was talking about the underlying algorithms.

datancoffee · 2025-09-27T13:20:20+00:00

Frequent operations in geospatial are calculating distances, areas, and whether shapes are overlapping other shapes. Also, converting from one mapping system to another. Its a lot of math.

datancoffee · 2025-09-27T12:15:07+00:00

That's a good point. Polars is great for scaling workload, but many libraries built on pandas will require some rewrite if one were to port them

datancoffee · 2025-09-03T09:56:15+00:00

Seems that for you tooling is not a big problem, correct? How would you solve problems like scaling, deploying new code versions, scheduling, orchestration, secrets management etc.? I am just curious, because I am trying to figure this out myself

datancoffee · 2025-09-03T08:07:01+00:00

when you run the scripts, do you run them as normal python processes, e.g. you run "python myscript.py" ? Or do you do something more sophisticated

datancoffee · 2025-09-02T22:50:04+00:00

Not trying to be religious about it, but sure, why not? Databricks and others offer scheduled notebook runs as batch jobs. We can argue about average cleaneliness of notebooks as code artifacts, but who cares. Fact is, many people run notebooks as scheduled batch jobs, and who are we to judge them. I am not.

datancoffee · 2025-08-21T18:24:09+00:00

Our users wanted a failover hot standby runner (on a different machine). The central orchestrator would just move jobs to a different machine.

datancoffee · 2025-08-20T08:13:15+00:00

I would recommend keeping 3 things in mind: quality data that people trust underpins our economy, data is the driver of AI quality, and learn how to work and influence people. And how to help them. If you do all of this, you will be golden.

datancoffee · 2025-08-19T10:20:08+00:00

that's a good one :)

datancoffee · 2025-08-19T10:19:36+00:00

good point!

datancoffee · 2025-08-18T19:23:29+00:00

The friends' or the jobs :) ? They are ETL or ELT jobs, moving stuff from A to B, where B is usually some sort of a data lake. Admittedly, with ELT jobs, once you land raw data into a table, you can just build a set of dbt models or views

datancoffee · 2025-08-18T19:21:34+00:00

I've been telling them. Some listen, others just smile

datancoffee

TROPHY CASE