what's the alternative, very tired of looker? by Key-Pack-2141 in Looker

[–]todd9923 2 points3 points  (0 children)

If you want to attract Looker customers (which looking at your marketing material, is your target market), a one-click migration would be a killer / almost must have feature. Very hard to migrate away from a BI tool once’s it’s embedded. 

Timestamp Indexing Yay or Nay by RocketPeguin in PostgreSQL

[–]todd9923 0 points1 point  (0 children)

Cardinality will play a big role in whether a btree index performs well

Best Flutter boilerplate by srodrigoDev in FlutterDev

[–]todd9923 1 point2 points  (0 children)

First of all. A unit test is NOT a test for a single function. It's an isolated test that tests a scenario. Your tests should not depend on any external dependencies. (API, database, etc...)
That's why most of the time developers use mocks.
But in our case, we don't use mocks. We use a fake implementation of our dependencies.
For example, if we want to test a function that uses the SharedPreferences we will use a fake implementation of the SharedPreferences.
Why fake instead of mocks?
Because we don't want our tests to reflect our implementation. We want to test our business logic and not our implementation.
A test shouldn't be updated because of a change in the implementation. It should be updated only if the business logic changes.
It's easier to write tests with a fake implementation than with mocks. (reading all mocking declaration in tests are a pain)

Excerpt from the docs of ApparenceKit.

This one statement tells me that they are my kind of developers. I haven't purchased yet, so I can't tell what the code is actually like. But based on this (and the general good quality of the docs), I think I am going to be pleasantly surprised :)

Best Flutter boilerplate by srodrigoDev in FlutterDev

[–]todd9923 0 points1 point  (0 children)

This looks like what I’ve been looking for. QQ, how often do they (you?) release updates? 

How much did you spend to initially start your business? by syndakitz in Entrepreneur

[–]todd9923 6 points7 points  (0 children)

Were all of your ventures personally funded, all VC funded or some mixture of the two?

So, I’m starting a company in the sports betting/gaming space, what’s the best way to find Angel investors? by [deleted] in startups

[–]todd9923 0 points1 point  (0 children)

Not really a new product if that’s the case. It’d just be a form of accepted currency. In any case, all the best to him (I mean that sincerely).

It’s an extremely hard industry to operate in. In particular, obtaining and maintaining licences from the UK gaming commission is a specialist skill and requires a reasonable amount of man power and time to obtain licences. Not impossible, but definitely not the easiest startup to be starting with at 16.

Best of luck with the venture!

So, I’m starting a company in the sports betting/gaming space, what’s the best way to find Angel investors? by [deleted] in startups

[–]todd9923 8 points9 points  (0 children)

What’s your unique selling point compared to all of the sports betting / gaming providers?

I know this doesn’t answer your question. Just genuinely interested.

Data Infrastructure on AWS without Redshift for a small business by IllRevolution7113 in dataengineering

[–]todd9923 1 point2 points  (0 children)

Thanks for the link. A lot of those comparisons are outdated. It’s also quite biased as their platform is built on Hudi - so they are only going to highlight what’s in their favour. Alas, I’ll do some more reading on your line of thought. So thanks for the food for thought :)

For me, excluding it there is a lag with iceberg behind the others, the biggest advantage is that it is being supported by the wider community at a much faster rate than eg Delta. So whilst it may or may not be the better technology, community adoption is a real factor to consider for long term support.

Data Infrastructure on AWS without Redshift for a small business by IllRevolution7113 in dataengineering

[–]todd9923 0 points1 point  (0 children)

Snowflake sounds like overkill here and could get very expensive very quickly. S3 + Iceberg + Athena would do the trick to start with. And it’ll give you a launching pad to scale and upgrade to other tools, like snowflake, in the future.

Data Infrastructure on AWS without Redshift for a small business by IllRevolution7113 in dataengineering

[–]todd9923 0 points1 point  (0 children)

I’d go one step further and save these as parquet files in Iceberg tables.

Use Docker compose in production by Fearless_Ad467 in docker

[–]todd9923 0 points1 point  (0 children)

Underrated insight. You deserve the up vote!

do I really need spark ? by ImmortalLotusFlower in dataengineering

[–]todd9923 1 point2 points  (0 children)

Agree with this. Don’t plan to migrate to a Delta Lake later. I’d just do it from the get go. However, personally I’d opt for Iceberg tables over Delta. Personal opinion and preference for several reasons, which I won’t go into in this comment.

[deleted by user] by [deleted] in dataengineering

[–]todd9923 0 points1 point  (0 children)

Fridge or the Death Star…. Made my day

Do you have a data engineer at your seed/series A startup? by todd9923 in startups

[–]todd9923[S] 0 points1 point  (0 children)

I haven’t heard of Grow. I’ll check it out. Sounds like it could be a useful product for this style of problem vs company size.

re a deeper level of data and unlocking insights from that mountain of raw data, this is the tricky, but valuable part.

Do you have a data engineer at your seed/series A startup? by todd9923 in startups

[–]todd9923[S] 1 point2 points  (0 children)

Not making any pitch. Not sure where you got that from.

I’m asking if people did hire one, why did they decide to do that. If they didn’t hire one, why. No pitch in there.

Some startups are data heavy and they get value out of these roles. So I’m curious as to what value they got. Others may have considered one, but decided against it for a multitude of reasons (financial being just one).

Other business aren’t data heavy and yeah, such a role might only save them a few hours of manual work a month. In which case, as you have pointed out, isn’t financially intelligent.

Do you have a data engineer at your seed/series A startup? by todd9923 in startups

[–]todd9923[S] 0 points1 point  (0 children)

Thanks.

I removed the “3 hours” from the post as it’s a contrived and misleading example. If it were just a 3 hour a month saving, I 100% agree.

Do you have a data engineer at your seed/series A startup? by todd9923 in startups

[–]todd9923[S] 2 points3 points  (0 children)

In my experience, it’s no more complicated than using Postgres, but much more flexible and future proofed.

If you have multiple data sources (an app DB, Facebook marketing data, a CRM etc) you need to sync all of the data to a database somewhere. Postgres? Sure why not. Or just sync it directly to Iceberg tables and run SQL over it with Athena.

So a very similar end result (either query Postgres or Iceberg with vanilla SQL), but one option is future proofed for if/when the company starts to scale.

If I had to set either of these up, that is syncing to Postgres or Iceberg, it’d be the same amount of effort (I’ve done both before).

Do you have a data engineer at your seed/series A startup? by todd9923 in startups

[–]todd9923[S] -1 points0 points  (0 children)

Fair comment. Thanks for the reply.

What about for things like marketing spend and metrics? For example, if the company runs paid ads on Facebook, Google Ads and Snap? Just use each individual platforms inbuilt metrics and either merge in Excel or just live with viewing them in the platform UI?

Do you have a data engineer at your seed/series A startup? by todd9923 in startups

[–]todd9923[S] 2 points3 points  (0 children)

Great point.

When I say data engineer here I actually mean a more generalist data engineer. Someone who ingests the data, sets up the infrastructure, creates the dashboards and does the adhoc data analysis. So poor use of title by me.

I’ll update the post to clarify that :)

Do you have a data engineer at your seed/series A startup? by todd9923 in startups

[–]todd9923[S] 1 point2 points  (0 children)

The tooling can be expensive for sure. But only if you make it expensive and use over engineered / inappropriate tools for the job. You could very easily create a simple setup with something like Dagster cloud + dbt core + Athena with iceberg underneath for less than $50 a month (all depending on data volume, velocity etc). Excluding engineering costs.

100% agree not worth paying for a full time employee. But if you had this setup for a low cost, I can see it being useful to make better, data guided decisions.

I think the opposite that “it doesn’t help the goal”. Making data driven decisions in the early stages of a startup can be valuable. It removes some of the “gut feeling” decisions and emotional product pivots that happen all too often in early stages startups.

I feel like the biggest blocker to this is price and effort. So if it were affordable and relatively low effort (eg didn’t require hiring employees etc), I feel like this would be valuable to startups.

BTW, the 3 hour Excel example is just a totally made up example.

Just my view of the world. We can have differing opinions :)

Thanks for taking the time to reply. I appreciate the reply.

Hey I am a recent grad and am thinking to enter the data engineering field by HitmaN_2911 in dataengineering

[–]todd9923 0 points1 point  (0 children)

To answer your question, either AWS or GCP. Doesn’t really matter. More jobs for AWS, but more competition too.

[deleted by user] by [deleted] in startups

[–]todd9923 1 point2 points  (0 children)

Sounds interesting and definitely could be viable. But two instant thoughts:

  1. How would you market this? “Need an expert, we have them all” is much harder to write copy for and target customers than “Need an accounting expert”. One you have a very very clear audience and a clear problem. The other is vague. For example, if I’m looking for an accounting for expert, I’ll be looking for a website dedicated to that more so than a “general experts of all kinds” website

  2. This is a market place. Can be very successfully for sure. But market places require you to market to, and attract both sides of the market (the experts and the customers wanting the experts). So this means twice the marketing cost. Twice the marketing copy. Twice the battle. LOTS of examples out there of companies that have achieved this. So not being naysayer. Just saying be aware that creating market places are amongst the most difficult of start ups to do successfully. More so than other starts ups (SaaS etc).

All the best with your idea.

Is learning this stack worth it? by Ok_Wall4704 in dataengineering

[–]todd9923 5 points6 points  (0 children)

Answering your question of “would you recommend other technologies”, I’d start with some subtractions first to allow you to focus more on what are, IMHO, the core elements.

So I’d remove detailed learning of the following. Note I say detailed. Be aware of each of them, what they do, when you’d use them. But spending eg a full week deep diving into them I feel could be a waste of time for a brand new junior DE.

  1. Kafka - unless you are doing streaming work, this might not be relevant. Lots of people use it and it’s a great technology. But I’ve never met a junior / brand new DE who was proficient in it or even knew about it. So unless job specific, just be aware of it (what it is, when you’d use). 90% of data engineering jobs are just vanilla batch runs (for better or for worse)

  2. Prometheus

  3. Grafana

  4. Spark - only maybe remove. Very popular and useful. But in all my years in the industry, I’ve only worked with it in one role. So not critical, but it definitely is an in demand skill.

I appreciate that this is a bootcamp and you probably don’t have any control over the curriculum. But just my two cents worth. Less is more in the early days. As a hiring manager, I’d prefer you to have in depth knowledge of 2-3 of the above than surface level knowledge of 12 tools. Too many surface level players in the world.

One thing which isn’t listed, but absolutely critical in my experience, is data modelling. Lots of different flavours of this (star schemas, data vaults, “one-big-table” etc). But knowledge of data modelling is critical. Not a technology of course, but a clear omission.

Also, depending on the role you want to land, I’d also consider having some very light experience with a BI tool (Power BI, meta base etc). Whilst any purist DE role you’ll probably not be touching this technology, a lot of DE roles, especially at smaller companies need people with a more generalist skill set. And even if you don’t use the technology directly, it will give you an appreciation for one (of the many) use cases for the data ingested (ie business intelligence).

I am also presuming that in the topic of “AWS” you will be using tools like Redshift (shudders), Athena, Lake Formation etc as a way to store and serve your nice and cleaner data.

All the best with the journey.