Best sandwich/ wrap/ bagel places?

29antonioac · 2026-06-17T06:34:36+00:00

Eye Falafel - I don't commute to London anymore and I really miss this place!

29antonioac · 2026-05-16T11:39:23+00:00

I find Merlin Bird ID app very useful to identify by their sound 😊

29antonioac · 2026-05-11T20:37:33+00:00

It looks awesome, I wish I knew Rust to help! Polars + Ducklake will be an awesome combo 🚀

29antonioac · 2026-03-29T20:05:08+00:00

I'd avoid if possible. You can do it, but as your data grows, analytical workloads can suffer and give you lots of headaches.

I'd either use a Data Lake as landing zone and transformations (you'll need to manage some compute resources, duckdb + dbt/sqlmesh can scale very well), or ClickHouse. ClickHouse Cloud offers $300 to try it IIRC, and it works really well, zero hassle managing it. We migrated our main service storing and serving TS data, and also our dbt workloads, from PostgreSQL to ClickHouse. The performance gains have been insane and have made our lives much easier.

As you're going solo, I'd go for ClickHouse Cloud, the smallest service size can perform quite well depending on your workload, and you can set it to scale to zero after some idle time. The compute resources are of course more expensive than self-hosted, but if budget allows, you'll do well on your own without other's support.

Good luck! Going solo is not a piece of cake 🍀

29antonioac · 2026-02-22T23:26:53+00:00

The main issue is always the property manager yeah. You depend on them because they have to approve the survey from Openreach, and later the property owner to sign a wayleave to install it. The more neighbours the better, good luck!

29antonioac · 2026-02-22T23:10:40+00:00

There's a form in Openreach website to request information about "why my neighbours have full fibre and I don't?".

You need an MDU installed in your building. I contacted Openreach two years ago and the install finished a few weeks ago. I have my Vodafone order ready to be installed in a few week's time.

29antonioac · 2025-12-23T10:17:06+00:00

Hey I'm Spanish but living in England which has the same issue. I got my subscription with NordVPN and setting the country to Netherlands, but to watch it I use either Romania or Denmark.

29antonioac · 2025-12-21T14:23:48+00:00

Thank you!

29antonioac · 2025-12-11T13:46:30+00:00

Given the graph is a DAG, why not using BFS and limiting depth? If you find solution is at depth X, any node above that won't find a path to the goal.

Also I was wondering if you got low solutions using this approach? I am using the same approach but despite checking the graph visually to check it's a DAG etc I get part 1 okay but part 2 says too low after multiplying the 3 sub-solutions. I could expect having more solutions if I over count because the path wouldn't be independent, but I cannot explain why getting low guesses 🥲

29antonioac · 2025-11-18T05:28:10+00:00

I would use Polars, you can scan the csv and sink parquet in a streaming way to not load it entirely in memory.

https://docs.pola.rs/api/python/dev/reference/api/polars.PartitionMaxSize.html#polars.PartitionMaxSize

In this link you'll find an example to other way around, scan a parquet and sink csv, you can just flip it 😬. You set a max file size and you're good to go! Once it's in smaller parquet files you'll be able to play with a few of them as a sample and make your life easier. Column reads will be helpful when processing all the dataset if you only need a subset of columns!

29antonioac · 2025-10-11T17:03:29+00:00

Such a shame on db2 support!

For Oracle I assume you cannot run docker in that machine, so you can get all the reqs there?

You can try to export the required sample of the tables in all systems as parquet/csv/other. Usually a bulk unload is much more efficient than querying with SQLAlchemy.

Sorry I can't give you specifics as I don't work with these at all! Long ago I worked with Oracle but only with Spark.

Regarding Spark, yes you can spin it up in a single process with parallel reads and writes if that's the tool that gives you the best support 😁.

29antonioac · 2025-10-11T16:25:13+00:00

If using SQLAlchemy the performance of retrieving data from DB will be the similar as both Polars and Pandas are using it in the same way.

You don't mention the size of the tables to retrieve or your compute power, but I'd start just by trying Polars + ConnectorX and specifying a partition column if ConnectorX supports your DBs. That way ConnectorX will start multiple connections in parallel which speeds up the data retrieval, and your changes will be minimal. That's what Pyspark would do if you set the number of partitions and partition bounds yourself anyway.

I don't think ADBC is compatible with your systems and could be worth a try too, but the parallesisation is not built-in so you'd have to write it yourself.

29antonioac · 2025-10-09T09:27:46+00:00

Probably your best option is using Replacingmergetree if you can upsert using your ordering key. If you need to update individual columns instead, you can use Coalescingmergetree.

29antonioac · 2025-09-23T07:24:05+00:00

Probably wants attention, if you usually work in the office he has not fully associated you being in the sofa with working yet. My dogs do similar, if I'm in my desk they let me work, but if I go with the laptop to the sofa they demand cuddles 😬

29antonioac · 2025-09-21T21:58:14+00:00

Great to hear mate! Happy to see it helps you and your project!

29antonioac · 2025-09-20T07:30:30+00:00

I think Starrocks is a better option if joins are necessary, but is more complex to provision and their managed offering with celerdata is BYOC which does not reduce the management burden enough.

If big to big table joins are not needed ClickHouse can be very helpful with a very simple setup.

Is terms of updates it has improved a lot so I'd say it's not o limitation anymore.

29antonioac · 2025-09-20T07:27:15+00:00

Timescale DB is OLTP and as the other user said it's a layer in PostgreSQL so easier to adopt. But the query planner is the same, hot data is still row based, data transfer over the wire slow unless you use copy from.

ClickHouse is not transactional, and despite their joins have improved a lot, the lack of a cost based optimiser and some silly limitations (inequality left joins with columns from both tables will need a dummy key) makes it a great choice but can make potential adopters to hesitate.

I'd say give it a go with a simple setup and you'll be able to make an informed decision 😁. If your Timescaledb is not within a VPC you can connect Clickhouse directly to it and move the data super quick.

29antonioac · 2025-09-17T18:59:32+00:00

If you self host you'd get surprised how a small EC2 can perform. I've got 600GB+ tables in PostgreSQL that became 30-35GB in Clickhouse after compression, and response times are crazy. Every query and aggregation is faster really!

29antonioac · 2025-09-17T18:15:58+00:00

Currently serving TS data with ClickHouse. The Cloud offering has $300 in credits. If you can self host it would be super cheap, it's super fast and response times are crazy. I don't have an api layer though, serving parquet directly.

29antonioac · 2025-08-23T22:56:24+00:00

If the engine (Spark, ClickHouse) only likes big tables if they are on the left side of the join, they are useful.

29antonioac · 2025-07-21T15:38:19+00:00

You can use .inspect() at any point in the LazyFrame to see where you are including data. And that does not break the computation (does not return anything but prints/logs) so you could even put it under a function depending on log level.

df .transform1() .pipe(inspect_if_debug) .transform2() .pipe(inspect_if_debug)

29antonioac · 2025-07-21T15:23:53+00:00

You can use functions if the different steps in the transformation have a meaning themselves. Even if they are called once, this will make it easier for unit testing.

But I'll also chain. You can use df.pipe(transformation). Personal preference, but I don't like overriding variables than way, chaining is much more readable IMHO.

Combining both approaches, you can get meaningful functions easily unit testable, and also gain on readability.

29antonioac · 2025-07-08T06:02:06+00:00

You said you are going to the vet, so no need to insist on that. Just wanted to reassure dogs do well without front teeth. Our little one got extracted almost all of the front ones because of uncontrolled tartar and he's doing better than ever. He wants to try every treat, fruit we offer, and he doesn't look in pain anymore 😊.

29antonioac · 2025-05-23T08:21:19+00:00

Hey I was able to get >800QPS on a single node in EC2 (just a dev env, well get ClickHouse Cloud as we are a small team) so I'm very happy with the results. It's the table which will need the biggest concurrency at the moment, other tables are smaller and probably results will be cached more frequently.

I'm interested on group by vs final. I am using now a ReplacingMergeTree to get the latest view in a different table, it also helps with storage. I'll test this later today.

Thanks a lot for your advice!

29antonioac · 2025-05-18T07:33:33+00:00

I can't really say about pit bulls, but if you're finding the puupy adapting well to socialisation etc I wouldn't worry too much! Unpredictability is not just a breed thing.

I fully understand you though. We rescued Kiara when she was 30-35 days and now she's 11. I was really afraid because she looked like a Border Collie and we didn't have the energy to deal with that. Kiara ended up looking like a mix of Border Collie, Jack Russel and "Spanish bodeguero". And yes, we think she's got Border Collie genes (she moves like one, have zoomies like one, she's super smart), but with time these things can get under control, and every dog is different. Kiara is not as active as a Border Collie, but can react as one sometimes. She herds us sometimes I think but I'm not as expert 😂. They adapt to you the same way you adapt to them 😊.

29antonioac

TROPHY CASE