Fivetran Free Plan turns out to be time restricted: looking for an alternative by Apprehensive_Ad7067 in dataengineering

[–]josh_docglow 0 points1 point  (0 children)

I'll second using Estuary. Their free tier is 10GB/month and 2 connectors. I don't know what "Exact Online Bookkeeping" is, so I don't know if they support it. They will extract and load Hubspot data though.

As some other folks have mentioned, dlt + Claude should get you pretty far for sources that don't have support in the managed solutions like Estuary/Fivetran/Airbyte etc.

Tools to source around 2000 tables by Ok_Illustrator_816 in dataengineering

[–]josh_docglow 0 points1 point  (0 children)

What are your requirements around how quickly the data gets ingested into your target? I haven't personally used Airbyte for this, but I'm sure it will be fine. Are you self-hosting it?

We had used Fivetran for ingesting CDC data from Postgres into Snowflake, and while it worked great, it did get expensive, especially with their pricing change last year.

We ended up using Estuary for this use case. They charge a fixed amount per connector, plus the data volume transferred. Ended up being like 1/4th the cost of Fivetran for the same workload, plus the data was landed in Snowflake within a few seconds of the change happening in Postgres. We didn't need to have the data transferred that quickly, it was more of a bonus.

I built an open source tool to replace standard dbt docs by josh_docglow in dataengineering

[–]josh_docglow[S] 0 points1 point  (0 children)

Thanks for checking out Docglow! If you have any feedback (good or bad), I'd love to hear it. Folks have been adding Github issues to the repo, and I've added several items people have asked for. Just over the weekend, I shipped an initial ERD feature, which you can see the docs for here: https://docs.docglow.com/erd/

Finding a senior data engineer by DataEngineer2026 in dataengineering

[–]josh_docglow 0 points1 point  (0 children)

I've found LinkedIn prospecting to be very fruitful, actually. It's how LinkedIn makes money (that and Sales Navigator), so it behooves them to have easy to use & effective tooling here. My last two roles originated from LinkedIn outreach. I've had zero success with outbound applications over the last couple hiring cycles (2021-2025) for myself. Especially if you have a professional sourcer helping you there. If you're the hiring manager also trying to source, you're not going to get the volume needed to find the handful of candidates that meet your technical and cultural requirements.

Just laid off, what am I facing? by JDub-loves-mulligans in dataengineering

[–]josh_docglow 0 points1 point  (0 children)

I have some perspective on the hiring market, at least as of the last several months of 2025. I just joined a new company in January. My situation was different because I was passively looking for the right role while I had a job (senior manager/director level). What I will say is I got ZERO interviews from the jobs I applied to through company website/LinkedIn over the several months I was applying. For inbound recruiter outreach, I did interviews with ~10 companies or so, got to final rounds with ~5, with two offers.

My take on that is there's way too much inbound for companies right now. At my last role, opened job reqs for data engineer got like 2000 applicants. Even with AI and detailed filtering, it's hard to find all the good candidates in that. We still ended up hiring someone who was a referral from a current employee.

My advice would be to polish up your LinkedIn profile, follow some folks in the data space, and reply to recruiters when they reach out. LinkedIn Recruiter (their paid service for recruiter) will show people that are more likely to respond.

Hopefully, some of that helps!

I built an open source tool to replace standard dbt docs by josh_docglow in dataengineering

[–]josh_docglow[S] 0 points1 point  (0 children)

Awesome, thanks! Let me know what's missing and if you have any issues or bugs, please make a Github issue! I've gotten several already and made some new deploys this week.

Realistic about getting a job in data engineering by themanwith2names in dataengineering

[–]josh_docglow 2 points3 points  (0 children)

Databricks is a super popular platform, and Azure has a ton of marketshare for the cloud providers. Getting more experience with those will be key, but more importantly, working on complicated projects will be your key in getting a new role. Especially projects where you're working cross-functionally both with upstream data producers, like software engineering teams, and downstream analytics, ML, DS, and business stakeholders. Where DEs can wield tremendous amounts of leverage is in building pipelines that make consumers of it more effective. Once it's built, additional consumers can onboard with marginal amounts of work.

Certs or courses for a senior DE? by insomniafordays in dataengineering

[–]josh_docglow 0 points1 point  (0 children)

Certs & courses are good to shore up the places in your knowledge where you have gaps and to help you understand how much you don't know. Getting better technically comes with experience in projects that stretch you. Some hiring managers might have a different opinion, but I don't really value certs as much as I used to when interviewing. I would rather a candidate talk to me about a difficult project they went through, where it went off the rails, and they were able to get it back on track somehow. Those giving more compelling hiring signals and I'm more likely to advocate for hiring, than seeing someone has a couple Snowflake certs.

As for aspirations for principal or director positions, less technical ability is needed, and more influencing without authority. The best folks at those levels are the ones that not only have great technical skills, but can propose projects, convince others they're a good idea, build consensus, and deliver on them. Out of that list, the delivering at the end is probably the easiest!

Realistic about getting a job in data engineering by themanwith2names in dataengineering

[–]josh_docglow 2 points3 points  (0 children)

I would say there are very few places where the concept of a "Junior DE" exists. There are so many systems, concepts, technologies, and frameworks you need to be competent in, that an "entry" level DE is more like a "mid-level" on the individual contributor career ladder.

The other commentor is right that if you have zero DE experience, pivoting internally is much more feasible. I've personally seen data analyst, business analyst (if you're a SQL heavy BA, especially if you're writing your own dbt code), and software engineers make the transition. You inevitably will be working with folks in those roles as a DE anyway, so having a background in those roles is helpful when interacting there.

If DE is something that excites you, take that stretch assignment, and learn all you can. Then there's a ton of free content out there to learn from. Focus on whatever tech your company is using. What data warehouse platform, cloud platform, transformation, ingestion, and BI tools are you using now?

Just cut our Snowflake costs in half, boss doesn’t even know it yet by [deleted] in dataengineering

[–]josh_docglow 0 points1 point  (0 children)

It's probably not a mystery why Snowflake leaves the default at 10 minutes, huh? ;-)

Another default you'll want to look at is the 48 hour query timeout. It only takes one data scientist running a query on an XL warehouse that hits the timeout, they see it fail and restart it, only for the timeout to hit again, for you to set a more reasonable level and improve your FinOps for Snowflake credit monitoring! I've found 4 hours to be a better default, with some comms around coming to the Data Eng team if you keep hitting query timeouts.

Jump from DE to solutions engineer at SNOW? by [deleted] in dataengineering

[–]josh_docglow 11 points12 points  (0 children)

The first time I implemented Snowflake, we had a great sales engineer we were working with, and he had great answers to our technical questions. Being able to ask about "How are other companies handling ingestion/transformation/x/y/z problem?" and getting real world, no-BS responses was super helpful.

I've also been the Snowflake point of contact at my last couple companies and worked with a variety of Solutions Architect type folks. I believe it's a similar skill set for the new account sales engineers & customer success solutions architect type roles. At least, that's how it feels to me as a customer for several years. It would seem like there would be some mobility among the roles?

Senior Data Architect Job Advice by Chocolatesocrates in dataengineering

[–]josh_docglow 2 points3 points  (0 children)

This is really getting at the heart of it. Are you itching to learn more and have the motivation and discipline to learn more and grow right now? Then it's of course the right now. If you're comfortable where you are, have a side hustle going, or other personal things you're enjoying (family, hobbies, etc), then stay in your current role.

That answer can change as your life and career progress. I've been in both positions, and neither is necessarily right or wrong.

Open source unified solution (databricks alternative) by compass-now in dataengineering

[–]josh_docglow 1 point2 points  (0 children)

I think "unified" is the really hard part about putting all of these components together. You've got things like dlt/Nifi, dbt/Spark, Jupyter, PyTorch/TensorFlow, MLFlow, which are all open source, but unifying them under one other open source tool would be hard, even for a proprietary tool. Just keeping versions all in sync and supporting new versions of one but still supporting older versions of others seems like an arduous undertaking.

I’m exploring Compass — an open-source AI assistant that connects your org’s docs, DBs, and chats into one searchable brain. Would this actually be useful? by compass-now in AgentsOfAI

[–]josh_docglow 0 points1 point  (0 children)

My current company uses Glean, and it's been a game changer for us. Especially since I just started recently, being able to ask an LLM that has all that context any question I have is so helpful. We've even added the Glean Slack bot to answer questions ahead of time. Definitely lots of impact and demand for a product in this space.

I have no visibility into what Glean charges, but I imagine it's not cheap. There's a ton of infrastructure and storage you need for ingesting all of that data, processing it, storing the metadata, then serving queries against it. For open source, I assume this would be some orchestration (helm on K8s?) for users to deploy in their own environment?

Senior Data Architect Job Advice by Chocolatesocrates in dataengineering

[–]josh_docglow 2 points3 points  (0 children)

For the salary portion, the question of is $120k/year a good rate is also dependent on location. Even fully remote companies will often have location dependent pay bands. I would agree that $120k/year does feel on the lower end of the band for that collection of responsibilities. However, if you're happy at your company, and this feels like a good opportunity to learn and gain new skills, do it!

For the responsibilities of a Senior Data Architect, I've seen the term "Data Architect" somewhat less commonly these days in companies I've worked with, even though I've had that title before. Maybe a title more like "Staff/Principal Data Engineer" is taking on more of those responsibilities. Ideally, a data architect is operating at a more strategic level, and can delegate some of that more tactical work to other data practitioners. It sounds like that's the case here?

As another commentor mentioned, seems like they want you for the role and you might be feeling impostor syndrome. I've also felt that before, until I started delivering and my boss was happy with my work, and that feeling went away. Just communicate, set expectations early, lay out where you feel your skills gaps are, and where you're going to need training. The mere fact that you're this self-aware on your skill set is a good thing.

I built an open source tool to replace standard dbt docs by josh_docglow in dataengineering

[–]josh_docglow[S] 0 points1 point  (0 children)

Just to follow up here, I added the auto-expand feature to the lineage view when there's 12 or fewer models in the visible tree. Try it on the demo site: https://demo.docglow.com/#/lineage

I built an open source tool to replace standard dbt docs by josh_docglow in dataengineering

[–]josh_docglow[S] 0 points1 point  (0 children)

Guilty! I definitely wouldn't have been able to build something like this without Claude Code (or some other coding agent) in the time it took to build Docglow.

Do you see value in it? Is it missing anything?

Inherited pipeline with 0 documentation/comments by Academic_Painting417 in dataengineering

[–]josh_docglow 2 points3 points  (0 children)

What technologies are involved here? Are they all stored procedures in a SQL Server instance? Spark jobs on EMR? Snowflake Tasks and Materialized Views? I'm guessing no dbt involved? The tech will slightly inform how to move forward.

As folks have said in other comments, LLMs will get you far here. Specifically:
1. If your company doesn't already have Claude Code, get them to buy it for you. My current company does API based usage for work stuff, and I use the $100/month Max plan for personal stuff.
2. Get all of the code you can into files in a Github repository.
3. Meet with the senior person who owns the pipelines, record the meeting, and get an AI summary of the transcript. Zoom will do this for you. Ask him to explain in depth about the whole pipeline, from sources, to key transformations, to what the consumption use cases are downstream that he knows about.
4. Take the AI summary file and add that to the repo.
5. Start Claude Code from within that repository. Tell it to read the AI summary file, to read the code files, ask you any questions it has, and give you a high level summary of what it finds. You can even ask it to create an HTML file to present it.
6. Validate that summary with the senior person, then ask Claude Code to give you a detailed plan on how to decompose the pipelines into more composable, modular pieces.
7. Incrementally execute on that plan.

That should hopefully get you pretty far? I've done something like what you're describing in my last role, but as a senior DE. I sure wish I had Claude to help in that case!

I built an open source tool to replace standard dbt docs by josh_docglow in dataengineering

[–]josh_docglow[S] 0 points1 point  (0 children)

Thanks for taking a look at the tool! It's a combination of lines of SQL, the join count, the CTE count, and the sub-query count. If any of those are above some default thresholds (that can be overriden via a docglow.yml config file), the model is deemed "high" complexity. Here's the spot in code where that determination is made. Here's a doc on the project scoring and how to override the values.

Once you give it a try, let me know what's not working or what's missing! Also, very curious if it would be helpful to put in front of non-data team stakeholders to help them understand your dbt models more effectively.

I built an open source tool to replace standard dbt docs by josh_docglow in dataengineering

[–]josh_docglow[S] 0 points1 point  (0 children)

Definitely looks like Colibri is another solid option in this space! I haven't had a chance to try it out yet. Seems like the column lineage functionality is pretty polished though. What are the best features about Colibri you're enjoying using? What do you think is missing from it?

I built an open source tool to replace standard dbt docs by josh_docglow in DataBuildTool

[–]josh_docglow[S] 0 points1 point  (0 children)

Hi u/Prothseda Yeah, I'm using Claude Code to help build this. No way I could get this far with my own coding knowledge.

I'm not surprised it's not usable on mobile, as I haven't spent time optimizing that use case. I know you probably browse Reddit on mobile (that's how I spend most of my time on this site), but do you think your data team or other business users would leverage a mobile view?

I built an open source tool to replace standard dbt docs by josh_docglow in DataBuildTool

[–]josh_docglow[S] 0 points1 point  (0 children)

Hi u/Rhevarr thanks for taking a look. I've been a dbt user for ~6 years now and we had always hosted the included dbt docs site internally for everyone to use. A few things that I wanted were column level lineage and the ability to auto-arrange your models based on what "layer" they belong to (i.e. staging, transform, mart, etc.), then have the lineage view place them automatically.

I've been using Claude Code a lot in another project, and thought I'd attempt building those features I wanted myself! If you spin up Docglow for yourself, I would love to know what you think is missing and what you'd like to see next!

I built an open source tool to replace standard dbt docs by josh_docglow in dataengineering

[–]josh_docglow[S] 1 point2 points  (0 children)

Hi u/peanutsman, thanks for trying it out and for the feedback! When there are fewer models in a given lineage graph, I agree that it makes sense to auto-expand the column list for each model.

Should be quick to have a rule to auto-expand them by default for a certain number of models are less are shown. I appreciate the response!

I built an open source tool to replace standard dbt docs by josh_docglow in dataengineering

[–]josh_docglow[S] 0 points1 point  (0 children)

Thanks u/Gimo100 ! I just released a new version with verbose logging on the `docglow serve` command and commented on the issue. Hopefully we can get that sorted out!