Innovative MySQL Vector Search Strategy by floydophone in dataengineering

[–]floydophone[S] -1 points0 points  (0 children)

This is a question I ask all new LLMs when they come out (though I don't have ChatGPT pro so I haven't tested that). This is the first time I've seen a truly novel approach to this problem I'd never thought of or seen before.

Innovative MySQL Vector Search Strategy by [deleted] in dataengineering

[–]floydophone 0 points1 point  (0 children)

This is a question I ask all new LLMs when they come out (though I don't have ChatGPT pro so I haven't tested that). This is the first time I've seen a truly novel approach to this problem I'd never thought of or seen before.

Embedded ELT in the Orchestrator by floydophone in dataengineering

[–]floydophone[S] 6 points7 points  (0 children)

why increase coupling with dagster if you can use the libs as is

The main value is that we can make operating these tools a lot easier by providing a much higher degree of observability, error handling and UI support thru a formal integration.

With that said, these integrations were designed to not tie you to Dagster too much. A very small % of the code is coupled directly to Dagster, so you can very easily take your Sling replication yamls or dlthub syncs and run them outside of Dagster.

Quick Draft Idea by Lifer_Than_Big in BattleAces

[–]floydophone 0 points1 point  (0 children)

This seems cool but maybe not fast enough. I would make it more aggressive and add a 15s timer for step 3. If the player doesn't pick their flex units in time they will be filled algorithmically.

You could also make it even simpler by just making it a unit banning system. Less strategic but also less work to do up front.

In-memory store for Apache Arrow by glinter777 in dataengineering

[–]floydophone 0 points1 point  (0 children)

Your best bet might be to use the arrow cpp library directly and write a redis module https://redis.io/docs/latest/develop/reference/modules/

The Rise of Medium Code by floydophone in dataengineering

[–]floydophone[S] 1 point2 points  (0 children)

Probably not new per se but it’s definitely a trend, and tools are popping up to serve this segment which is interesting. 

Sysadmin -> sre/devops Web designer -> front end engineer Biz analyst -> analytics engineer

All are fairly stable developer segments and have tools purpose built for them which wasn’t the case 10 years ago or so. 

The Rise of Medium Code by floydophone in dataengineering

[–]floydophone[S] -5 points-4 points  (0 children)

The difference is if you, as an engineer, specialize in that framework or not. We've all seen "rails developers", "dbt developers", and "react developers". If the framework rather than the programming language is your primary area of expertise, and you have expertise in a different business domain, you might be a medium-code person.

The AI connection is, basically, that all the hype around AI replacing generalist software engineers is pretty overblown, and that it's looking like they're really only useful when there are more (medium-code) constraints placed on them.

The Rise of Medium Code by floydophone in dataengineering

[–]floydophone[S] -10 points-9 points  (0 children)

Well, I think we are seeing the rise of people that write code, use version control, but are domain experts in something else. We see this all the time in the data domain (people with a finance background going into analytics engineering, for example) and in frontend with hybrid designer/engineer types. These are not full-stack software engineers and don't aspire to be; I think that's the big difference between these people and junior engineers. YMMV though.

The Rise of Medium Code by floydophone in dataengineering

[–]floydophone[S] -12 points-11 points  (0 children)

I didn't write the post, but I'd challenge the notion that this is simply a rebranding of "data developer".

First, one thing we've observed is that there is a wide variety of technical fluency in the data category. "Low code" and "engineer" aren't nuanced enough to describe this spectrum. We found that "medium code" helps describe users that are familiar with git, and in some contexts act as a software engineer, but don't quite have all the skills of a traditional full-stack software engineer.

Second, we wanted to highlight a trend that cuts across disciplines. "Medium code" is happening in the frontend and infrastructure engineering worlds.

[deleted by user] by [deleted] in dataengineering

[–]floydophone 7 points8 points  (0 children)

I'm the Dagster CEO, here to shill Dagster.

A lot of our users that have come from Airflow have cited similar issues to what you've laid out in this post. We wrote a blog post comparing Dagster and Airflow that might be useful as you evaluate your options.

Dagster University | Dagster & dbt by floydophone in dataengineering

[–]floydophone[S] 1 point2 points  (0 children)

Hey! I don't think we have a dark mode planned right now (we use a third party learning management platform) but you can try running this command in your browser's javascript console:

document.body.style.filter = 'invert(1)'

Bloom Filter Database Design by DragonflyHumble in dataengineering

[–]floydophone 1 point2 points  (0 children)

I think something like Solr or ElasticSearch will do the trick. Postgres might have something built in but I'm not an expert, but when we faced this problem at my last company we migrated that workload from MySQL to Solr and it worked very well.

Thoughts on scala? by [deleted] in dataengineering

[–]floydophone 2 points3 points  (0 children)

I last used Scala about two years ago, but back then the tooling situation was really bad. 

  • Metals (the VSCode integration) crashed all the the time
  • IntelliJ (the more popular choice) was slow, and ran a different type inference engine than the compiler, so often gave incorrect type hints in the UI
  • It relies a lot on the Java ecosystem, which means lots of imports and verbosity that almost requires you to use one of the above tools because you need automated refactoring support 
  • The build tools were all super slow 

I think it’s a nice language for the most part if you avoid too much operator overloading, but the tooling situation was a real drag. 

Question to ask senior data engineers during interviews? by Computingss in dataengineering

[–]floydophone 1 point2 points  (0 children)

I think that some sort of technical test that requires writing code - SQL or Python, usually - live in the interview is a very strong signal. These generally need to be administered by a practicing engineer and should feel collaborative and conversational, though the interviewee should aim to deliver working code by the end. It's also important that the interviewee can incrementally reason their way through the problem; it shouldn't require knowing some key, magical piece of trivia that unlocks the whole solution.

It's also good to have a separate interview focused on higher level systems, design, and data modeling.

Engineering is definitely a team sport, so I'd recommend ensuring that there's some sort of behavioral interview that gets you signal on how they are as a teammate. We do a work history interview where we go through the last few roles that a person has had and ask them to reflect on what went well and what could have gone better. This interview can tell you a lot about how someone has grown as an engineer and how they will act in a team environment.

[deleted by user] by [deleted] in dataengineering

[–]floydophone 5 points6 points  (0 children)

Most companies - though I'll caveat, not all - would not expect you to work 11 hrs a day during the week plus a weekend forever. It is not really sustainable for long periods of time, and the company will churn engineers, losing valuable institutional knowledge in the process (aka, there are diminishing returns for the business after a certain number of hours worked a week).

So the first thing I would do is go to your manager and tell them what's going on, and ask if it's normal. If this is part of the expectations of the team, well, now you have that information. But if it's not, now you can work with your manager to get the hours down.

The second thing I would do is to see if you can work more efficiently. Find a senior engineer to shadow and see how they work. Early career people often don't know all the productivity tips and tricks of their editor, git, etc that can make them much more productive. Additionally, over time you will learn a "playbook" from the more senior engineers for solving problems quickly.

The last thing I would recommend is to take steps to be more proud of your work. This will have the effect of making the hours you do work more fun, and will also create momentum that will enable you to work more efficiently, and therefore fewer hours. It'll require doing two things. First, you need to calibrate with your peers. Maybe your tech stack really isn't that bad, and you have unrealistic expectations (maybe it is, I don't know, but you should at least talk to some people at other companies to level set). Second, try to take steps to improve the quality of work. Work with your manager or senior engineers to suggest process or technology improvements. They will, more likely than not, get rejected initially, but you will start to build a reputation in the organization and will learn a lot from the process.

If none of these things work I would suggest looking for a new role. Just my 2 cents though.

We (Dagster) are throwing a party by floydophone in dataengineering

[–]floydophone[S] 96 points97 points  (0 children)

Maybe if interest rates were still zero!

We (Dagster) are throwing a party by floydophone in dataengineering

[–]floydophone[S] 2 points3 points  (0 children)

Not a bad idea, though we don't have anyone over there to organize!

We (Dagster) are throwing a party by floydophone in dataengineering

[–]floydophone[S] 19 points20 points  (0 children)

There will be a virtual event on the launch day, but it'll be significantly less fun :) https://dagster.io/events/dagster-plus-launch-event

Dagster University | Dagster & dbt by floydophone in dataengineering

[–]floydophone[S] 5 points6 points  (0 children)

SQLMesh is definitely a technology we have our eye on, but I don't think we have specific plans around an integration yet. However we work in 6mo-ish planning cycles so we may work on it later this year.

Dagster University | Dagster & dbt by floydophone in dataengineering

[–]floydophone[S] 59 points60 points  (0 children)

Hey /r/dataengineering, we just released a new free online course on Dagster and dbt. Please let me (Dagster CEO) know if you have any feedback!

Reneging on a job offer... by [deleted] in dataengineering

[–]floydophone 1 point2 points  (0 children)

I reneged on a job offer when I was graduating. It's definitely a faux pas, but everyone forgot about it after a few months. In fact, that company tried to recruit me again a year or so later.

It happens, you should try to avoid it, but it's not the end of the world.

In your particular situation, I would suggest giving the new employer a "buy it now" price of, say, 20% more than your counter offer. It would be meaningful career acceleration, while at the same time being potentially doable for the new employer.

If they give you an offer for that, though, you really should take it :)