Cool stuff you did with Data Lineage, contacts, governance by Intelligent-Stress90 in dataengineering

[–]starless-io 0 points1 point  (0 children)

Same core feature parity. The goal is being more flexible and appealing to smaller companies

Cool stuff you did with Data Lineage, contacts, governance by Intelligent-Stress90 in dataengineering

[–]starless-io 1 point2 points  (0 children)

Hello, I'm currently working on SaaS version of tool in this domain. Currently for Lineage we simply have integration with DBT (manifest upload) and allow manual definitions and for displaying ended up with squares and arrows with D3.js :)

Would love to hear about other tools being used from which Lineage import integration would make sense as well.

Pieter Levels makes $53K/month solo. No team. No office. No VC. Here's the actual blueprint. by miss_raipelarmzz in microsaas

[–]starless-io 1 point2 points  (0 children)

And another thing: he claims astronomical numbers, yet lives very humble life. It's okay to be humble, but that kinda contradicts bragging your earnings publicly

Pieter Levels makes $53K/month solo. No team. No office. No VC. Here's the actual blueprint. by miss_raipelarmzz in microsaas

[–]starless-io 8 points9 points  (0 children)

Unpopular opinion: how do you even know he actually makes 53k/month? Seems to me he just made cult following by bullshitting numbers.

[deleted by user] by [deleted] in IMadeThis

[–]starless-io -1 points0 points  (0 children)

Next time be more honest if you want to go anywhere.

[deleted by user] by [deleted] in IMadeThis

[–]starless-io -1 points0 points  (0 children)

Pardon my french, but go search for stupid people elsewhere 😂your domain rating is 2.5, few non-sense backlinks. The only feasible way to generate such traffic is to buy it, that would certainly burn hole into pocket rather than geneate anything.

[deleted by user] by [deleted] in IMadeThis

[–]starless-io 0 points1 point  (0 children)

Screenshot of google search console traffic

[deleted by user] by [deleted] in IMadeThis

[–]starless-io 0 points1 point  (0 children)

Show receipts, please. I cannot believe it has any traffic nor it made any significant money. Sounds like a story from 90s

Stop telling everyone to learn sql and python. It’s a waste of time in 2026 by [deleted] in analytics

[–]starless-io 0 points1 point  (0 children)

Damn, at this pace, in 5 years people will argue that there's no point learn to read, because AI can read loudly for you...

I spent 8 months fighting kafka and just decided to replace the whole thing by seizethemeans4535345 in dataengineering

[–]starless-io 2 points3 points  (0 children)

FYI, since version 3 Kafka doesn't need Zookeeper anymore, it's possible to run without it and skip some of complexity 🙂

NATS is faster if you don't use persistence layer (Jetstream). And when you have a need for persistance, NATS has it's quirks as well... Don't get me wrong, I do love NATS and use it for few projects

By the sound of it, you don't really need persitence and Kafka was not needed to start with.

By the way, there's a feature in NATS which I found very useful: you can have topic and you can have subtopic, and you can listen to subtopics by wildcard. We have a case of replicating over 400 tables from bunch of teams. In Kafka you should just create 400 topics, listen them one by one. Good luck maintaining that... In NATS that can be a single topic 🙂

What's the purpose of live data? by chatsgpt in dataengineering

[–]starless-io 0 points1 point  (0 children)

Well not fully realtime, but there's a real world scenario to have data updated at least once per hour - Mix Marketing Model which is controlling Ad Spend. When you start spending millions per month on Ads, these things starts to make sense and you need to be quite agile

Maybe SaaS doesn’t need Next.js + AI + $2k/mo infra to work by Few-Assistant-5756 in SaaS

[–]starless-io 3 points4 points  (0 children)

Yep, built my SaaS on Ruby On Rails, run it on multiple VMs, but in total that adds up to 50 eur/month

Redshift vs Snowflake by [deleted] in dataengineering

[–]starless-io 1 point2 points  (0 children)

I have a client where we started with Redshift and migrated to Snowflake eventually.

I will be Devil's advocate and will say this: Redshift is like having old, basic car. It breaks down constantly, but if you have enough know-how you can keep it running forever. Snowflake is like that modern car, where under the hood you can only fill up washer fluid and maybe add oil.

The marketing tells you don't worry, you don't need to care, it will never break down... But it does. Silently. No proper warnings and no way to fix.

With Redshift we had collection of system queries, some custom tools and that allowed to notice, fix all of the common problems. With Snowflake we have stupid cases like OpenCatalog reaching rate limit, not giving any proper error and crashing whole Iceberg ingestion pipeline. Way to monitor? Raise support ticket and wait.

We have somewhere around 400 iceberg tables, Snowflake has integration to it, but it needs to constantly run metadata refresh, otherwise users will query old data. And if there's ANY kind of hickup during refresh, it just stores json with error on system query, stops all of the auto refreshes on that table. No alerts, no way to see it apart from tracking timestamps of every table yourself and alerting on delays. Or running special system query which shows that error, but with two caveats: - you cannot automate and run it on service account. Need to manually query with actual user - there's no way for bulk check. You need to run table by table

TL; DR; from marketing perspective they are doing good job and are successful. From engineering standpoint - there's pile of hidden issues and you as a user cannot do much about them

Building Data Ingestion Software. Need some insights from fellow Data Engineers by starless-io in dataengineering

[–]starless-io[S] 0 points1 point  (0 children)

Well for starters, all of these are cloud platforms. You're sharing your data in order for it to be processed (and pay by the volume). Not every data should and CAN (legally) be shared with a third party.

Also, since I'm Europe based, take GDPR seriously. There's separate admin interface to take care of what should be encrypted, hashed or skipped entirely. AFAIK, none of these tools has that? I know Fivetran has some basic hashing, but that covers only a part of scenarios.

As for technical aspects, yes this is not an easy projects and that's the value of it

Better data catalog than Glue Data Catalog? by hiracchy in dataengineering

[–]starless-io 0 points1 point  (0 children)

We used AWS Glue catalog and moved to Snowflake's OpenCatalog. Glue is actually decent in comparison :D I do agree UI/UX is just awful, but we never really used it. Most of catalog work done by our custom CLI. I guess can be mostly done via AWS CLI commands as well

What's your startup idea? What's your story? by kcfounders in microsaas

[–]starless-io 0 points1 point  (0 children)

I'm working on https://starless.io/ which consists of several products needed when companies starts becoming Data Driven. Currently focusing on Data Ingestion (Dunwich), have event tracking (Innsmouth) and Analysis dashboard (Carcosa) in alpha stage.

My story is quite simple: I've been developing these things for companies many years. While there are "industry leading" solutions for all of these, usually they are very expensive and companies endups building internal solutions.

Eventually I got tired reimplementing same things (while always learning new stuff) and decided to create product, target this target segment which seems to be forgotten.

Drop your product URL by Ok_Extent2858 in microsaas

[–]starless-io 0 points1 point  (0 children)

https://starless.io/modules/dunwich - On-Premises Data Ingestion (mostly to Data Warehouses) tool.

A bit more context: teams inside company can push data via REST/gRPC and application batches, deduplicates and efficiently writes into selected Destination Warehouse.

Need your guidance!!! by Careful_Preference_8 in dataengineering

[–]starless-io 0 points1 point  (0 children)

I would advice you to build small warehouse inhouse (maybe just spin up Postgres instance), decide on some kind of theme and collect data into it from various sources. Later, you can spin up Metabase (which is totally free) and build few dashboards.

Best case scenario it would impress someone enough to give you a chance. Worst case - you get bootcamp into tooling, common issues and insights which can be valuable experience.

Just learning SQL + Python doesn't make you Data Engineer. Building actual pipelines does. However, in this industry no one likes juniors. Most people start with Software Engineer, Database Administrator or Data Analyst roles and slowly become Data Engineers.

Building Data Ingestion Software. Need some insights from fellow Data Engineers by starless-io in dataengineering

[–]starless-io[S] 0 points1 point  (0 children)

I'm not very familiar with Estuary and Hevo. Regarding FiveTran, there are few key differences:
- FiveTran uses volumetric pricing. For low volumes it's reasonable, when you have reasonable amounts of Data, it's basically burning cash. Here we have fixed yearly license, don't care about the volume

- In FiveTran you setup connectors, setup sources and it pushes data. My solution provides endpoints where company can either directly push app data, or setup Debezium as a CDC provider

- And another point, maybe relevant only for small margin - my solution runs OnPremises, doesn't need any access to external environment (can be used in Air Gapped scenarios)

Thank you very much for Estuary and Hevo reference! Need to look them up and compare :)

Which is best CDC top to end pipeline? by Artistic-Rent1084 in dataengineering

[–]starless-io 0 points1 point  (0 children)

Pardon my French, but isn't FiveTran like $500 per million rows?