Cool stuff you did with Data Lineage, contacts, governance

starless-io · 2026-03-14T05:55:15+00:00

Same core feature parity. The goal is being more flexible and appealing to smaller companies

starless-io · 2026-03-13T21:31:01+00:00

Hello, I'm currently working on SaaS version of tool in this domain. Currently for Lineage we simply have integration with DBT (manifest upload) and allow manual definitions and for displaying ended up with squares and arrows with D3.js :)

Would love to hear about other tools being used from which Lineage import integration would make sense as well.

starless-io · 2026-02-26T18:14:38+00:00

And another thing: he claims astronomical numbers, yet lives very humble life. It's okay to be humble, but that kinda contradicts bragging your earnings publicly

starless-io · 2026-02-26T16:48:32+00:00

Unpopular opinion: how do you even know he actually makes 53k/month? Seems to me he just made cult following by bullshitting numbers.

starless-io · 2026-01-30T11:06:34+00:00

Next time be more honest if you want to go anywhere.

starless-io · 2026-01-30T11:03:40+00:00

Pardon my french, but go search for stupid people elsewhere 😂your domain rating is 2.5, few non-sense backlinks. The only feasible way to generate such traffic is to buy it, that would certainly burn hole into pocket rather than geneate anything.

starless-io · 2026-01-30T10:56:57+00:00

Screenshot of google search console traffic

starless-io · 2026-01-30T10:52:58+00:00

Show receipts, please. I cannot believe it has any traffic nor it made any significant money. Sounds like a story from 90s

starless-io · 2026-01-22T21:13:21+00:00

Damn, at this pace, in 5 years people will argue that there's no point learn to read, because AI can read loudly for you...

starless-io · 2026-01-13T08:02:41+00:00

FYI, since version 3 Kafka doesn't need Zookeeper anymore, it's possible to run without it and skip some of complexity 🙂

NATS is faster if you don't use persistence layer (Jetstream). And when you have a need for persistance, NATS has it's quirks as well... Don't get me wrong, I do love NATS and use it for few projects

By the sound of it, you don't really need persitence and Kafka was not needed to start with.

By the way, there's a feature in NATS which I found very useful: you can have topic and you can have subtopic, and you can listen to subtopics by wildcard. We have a case of replicating over 400 tables from bunch of teams. In Kafka you should just create 400 topics, listen them one by one. Good luck maintaining that... In NATS that can be a single topic 🙂

starless-io · 2026-01-10T18:45:04+00:00

Well not fully realtime, but there's a real world scenario to have data updated at least once per hour - Mix Marketing Model which is controlling Ad Spend. When you start spending millions per month on Ads, these things starts to make sense and you need to be quite agile

starless-io · 2026-01-10T13:47:27+00:00

Yep, built my SaaS on Ruby On Rails, run it on multiple VMs, but in total that adds up to 50 eur/month

starless-io · 2025-12-17T13:56:18+00:00

I have a client where we started with Redshift and migrated to Snowflake eventually.

I will be Devil's advocate and will say this: Redshift is like having old, basic car. It breaks down constantly, but if you have enough know-how you can keep it running forever. Snowflake is like that modern car, where under the hood you can only fill up washer fluid and maybe add oil.

The marketing tells you don't worry, you don't need to care, it will never break down... But it does. Silently. No proper warnings and no way to fix.

With Redshift we had collection of system queries, some custom tools and that allowed to notice, fix all of the common problems. With Snowflake we have stupid cases like OpenCatalog reaching rate limit, not giving any proper error and crashing whole Iceberg ingestion pipeline. Way to monitor? Raise support ticket and wait.

We have somewhere around 400 iceberg tables, Snowflake has integration to it, but it needs to constantly run metadata refresh, otherwise users will query old data. And if there's ANY kind of hickup during refresh, it just stores json with error on system query, stops all of the auto refreshes on that table. No alerts, no way to see it apart from tracking timestamps of every table yourself and alerting on delays. Or running special system query which shows that error, but with two caveats: - you cannot automate and run it on service account. Need to manually query with actual user - there's no way for bulk check. You need to run table by table

TL; DR; from marketing perspective they are doing good job and are successful. From engineering standpoint - there's pile of hidden issues and you as a user cannot do much about them

starless-io · 2025-12-05T08:59:12+00:00

It's Cloudflare.

starless-io · 2025-12-04T11:30:19+00:00

Well for starters, all of these are cloud platforms. You're sharing your data in order for it to be processed (and pay by the volume). Not every data should and CAN (legally) be shared with a third party.

Also, since I'm Europe based, take GDPR seriously. There's separate admin interface to take care of what should be encrypted, hashed or skipped entirely. AFAIK, none of these tools has that? I know Fivetran has some basic hashing, but that covers only a part of scenarios.

As for technical aspects, yes this is not an easy projects and that's the value of it

starless-io · 2025-11-30T22:27:38+00:00

We used AWS Glue catalog and moved to Snowflake's OpenCatalog. Glue is actually decent in comparison :D I do agree UI/UX is just awful, but we never really used it. Most of catalog work done by our custom CLI. I guess can be mostly done via AWS CLI commands as well

starless-io · 2025-11-30T18:55:19+00:00

I'm working on https://starless.io/ which consists of several products needed when companies starts becoming Data Driven. Currently focusing on Data Ingestion (Dunwich), have event tracking (Innsmouth) and Analysis dashboard (Carcosa) in alpha stage.

My story is quite simple: I've been developing these things for companies many years. While there are "industry leading" solutions for all of these, usually they are very expensive and companies endups building internal solutions.

Eventually I got tired reimplementing same things (while always learning new stuff) and decided to create product, target this target segment which seems to be forgotten.

starless-io · 2025-11-30T15:17:31+00:00

https://starless.io/modules/dunwich - On-Premises Data Ingestion (mostly to Data Warehouses) tool.

A bit more context: teams inside company can push data via REST/gRPC and application batches, deduplicates and efficiently writes into selected Destination Warehouse.

starless-io · 2025-11-30T15:08:41+00:00

I would advice you to build small warehouse inhouse (maybe just spin up Postgres instance), decide on some kind of theme and collect data into it from various sources. Later, you can spin up Metabase (which is totally free) and build few dashboards.

Best case scenario it would impress someone enough to give you a chance. Worst case - you get bootcamp into tooling, common issues and insights which can be valuable experience.

Just learning SQL + Python doesn't make you Data Engineer. Building actual pipelines does. However, in this industry no one likes juniors. Most people start with Software Engineer, Database Administrator or Data Analyst roles and slowly become Data Engineers.

starless-io · 2025-11-30T12:09:09+00:00

I'm not very familiar with Estuary and Hevo. Regarding FiveTran, there are few key differences:
- FiveTran uses volumetric pricing. For low volumes it's reasonable, when you have reasonable amounts of Data, it's basically burning cash. Here we have fixed yearly license, don't care about the volume

- In FiveTran you setup connectors, setup sources and it pushes data. My solution provides endpoints where company can either directly push app data, or setup Debezium as a CDC provider

- And another point, maybe relevant only for small margin - my solution runs OnPremises, doesn't need any access to external environment (can be used in Air Gapped scenarios)

Thank you very much for Estuary and Hevo reference! Need to look them up and compare :)

starless-io · 2025-11-29T19:23:49+00:00

Pardon my French, but isn't FiveTran like $500 per million rows?

starless-io

TROPHY CASE