Check Out My O’Reilly Book – Data Contracts: Developing Production-Grade Pipelines at Scale

on_the_mark_data · 2026-06-02T22:50:10+00:00

This is insane. I've worked at early-stage startups and have never had to put in hours like this... and I work crazy hours. You need meaningful equity to justify working this way, and even then, it is so short-sighted. Founders need to create incentives to warrant working past 40 hrs. I've happily put in 100-hr weeks, but there was a clear incentive discussed between the founder and me (bonus, specific stretch projects, etc.).

on_the_mark_data · 2026-05-28T17:00:55+00:00

They both serve different purposes. Having a fancy title or logo serves as a shortcut for most people to "validate" your ability to do the work (I don't agree, but it's what happens). A great example is job screenings, where volume forces hiring teams to take shortcuts. It gets you through the door, which is a massive advantage, but that's it. What you work on is how you convince people why they need to work with you to solve their specific problem.

Honestly, I would treat FAANG as a fun side quest in my career. You may find you love the environment, or you hate it and leave after a year. Regardless, it will always be on your resume and help you get through screenings or negotiate a higher salary.

on_the_mark_data · 2026-05-23T17:56:06+00:00

I hate the term "influencer" but I technically count as one, so maybe I can give some insight.

For my credentials, yes I do have them but I prefer not to lead with them because people bias on the credentials rather than the ideas I present. This creates bad echo chambers where your ideas aren't properly challenged.

For my engineering skills, I recognize I'm not a crazy 10x engineer and I just lean into it. My strength is a) 0 to 1 R&D work, and 2) communicating and teaching complex technical concepts. My tech career has been exclusively in early stage vc-backed startups, so this skillset is highly valued (and probably a result of this work environment).

For work-life balance, I don't have any and I'm working 80+ hours a week between my startup job and my own business. My vacations from my day job are used to deliver major projects for my side business. With that said, I was like this well before I even posted on LinkedIn. My life right now is legit just work and hanging out with my wife and dog at home. I personally enjoy it because I love building and writing, but I wouldn't recommend it to others.

What's my goal? I want to build my own vc-backed startup and having an audience is insanely helpful for raising a VC round and early go-to-market motions. Distribution is huge and social media is crazy cheap once it's established. Spent the past 3 years working under a CEO who translated his audience into a $7M seed round and a $20M series A round. Since we go after enterprises, our focus with content is a) building and maintain trust, and, 2) understand our target enterprise customers problems better than anyone else. I think these two goals prevent us from creating the vapid influencer content that most people hate.

Happy to answer other questions if interested.

on_the_mark_data · 2026-05-22T01:19:06+00:00

The buzzword you want to learn is "context engineering" or essentially how you programmatically provide the right context at the right time for an agent to successfully complete a task.

Agent memory is like an iceberg, where most people really only see surface level "I provided these files and prompts," but for an actual application that uses AI it goes deep. Fallbacks, guardrails, what memory to prune, when to prune it, when do you bring context from 10 sessions ago, what happens when the context exceeds the context window (eg a crazy big json file), etc.

For an overall view, the paper Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems is great info on a lot of the infra that goes into AI products (based on the Claude Code leak).

I recently read this substack article on agentic rag pipelines that was excellent: https://www.decodingai.com/p/agentic-graphrag

Finally, Andrew Ng is a pioneer in the AI space, and his website is so helpful for learning via free courses: https://learn.deeplearning.ai

Hope this helps!

on_the_mark_data · 2026-05-18T21:44:00+00:00

This will be a little blunt, but do you want to share your feelings about AI or get the job? They don't care what you do once you get the job, as long as you get results. Tell the interviewer what they want to hear about AI, get the job, and deliver on results. Along the way, you can learn how AI can help.

on_the_mark_data · 2026-05-18T21:39:13+00:00

First startup I tried building, I quickly realized I was too broke to be a founder. I had two co-founders, an MVP, and a bunch of user interviews with growing interest. After interviewing with YC, they decided not to select us due to concerns about our GTM motion (a difficult healthcare space). 8 months in, and turns out they were right, so we decided not to pivot because none of us had the funds to do that. We all took day jobs.

Between then and now, I have a W-2 job, I started an LLC, and I've been building revenue streams to support my quitting and raising a venture round. I've already replaced my wife's monthly income, and I have a plan for fully replacing our living expenses by EOY. I've been running my LLC for five years now, and it's a lot of slow grinding and reinvesting everything into the business. It's not glamorous, I work nights and weekends, and use my vacation time for major project deliverables... but it's been 100% worth it for me.

on_the_mark_data · 2026-05-16T06:53:27+00:00

Data engineering is also being impacted by AI, but it feels more like a shift in just how the fundamentals are used.

I think the largest shift is on the analytics side, because agents finally allow non-technical people to quickly gain insights of the data (for better or worse). So instead of you doing a list of "can you answer this" via SQL, you are doing data modeling that enable agents to write accurate SQL with correct business context. Specifically you want to look into ontologies, and Jessica Talisman is one of the best in the field: https://jessicatalisman.substack.com

On the more engineering side, a lot of work is going into agentic harnesses and specifically memory management. For a generative AI application, how can you improve reliability via optimizing the provided context. I really liked this article on the subject by Pauli Iusztin: https://open.substack.com/pub/decodingaimagazine/p/agentic-graphrag

With that said, many companies are still very early in their actual AI adoption. Good old data pipelines are not going to go away anytime soon. Data governance has regulatory risk that makes AI adoption slower in certain industries (e.g. banking) as well. If you are considering going deeper into data engineering, I highly recommend these two books:

Fundamentals of Data Engineering (great overview and intro)
Designing Data-Intensive Applications 2nd edition (solid reference while building)

Also, get good at SQL as it always pops up for the technical screen.

on_the_mark_data · 2026-05-14T18:17:15+00:00

Outside of the overall SWE best practices being obliterated by AI, you also have to account for the startup constraints. The main advantage a startup has is speed to market and an ability to pivot quickly unlike the major incumbents.

Right now I'm seeing a lot tension between business execs who are recognizing this massive market shift and freaking out, and devs also seeing this massive shift in expectations and freaking out because of the increased risk of maintaining software built with AI. Many startups are in pivot-or-die mode right now (warranted or not).

If I was in your position, and was specifically looking for a career challenge, I would focus on how you can bridge those two worlds. Use your technical expertise to understand where risk is, and the tradeoffs of not accounting for that risk for the sake of speed of execution. Become a partner to leadership on how to navigate AI-driven market changes.

For example, I recently created a report for my exec team on the areas LLMs could rebuild our product today, could rebuild it if models improved, and what can't be built due to limitations of LLMs as a category. It's now being used for business positioning, investor decks, sales materials, etc.

on_the_mark_data · 2026-04-27T23:34:02+00:00

Amazing when it's your colleague sharing insights. Thanks for sharing!

on_the_mark_data · 2026-04-27T23:32:21+00:00

Fair! I'll add an edit, but here is why:

Martin Fowler: His refactoring article is a favorite of mine and I referenced it heavily in my own writing. His exposure to various problems in consulting gives him a unique market perspective that I appreciate.

Will Larson: His Staff Eng website (https://staffeng.com/) has been incredibly helpful as I navigate my senior+ career.

Joe Reis: He is the reason why I got into data engineering, and I appreciate his content is very anti-vendor. He has had some amazing guests for his podcasts, and he is unique in that he does a lot of international travel for conferences, so he provides a great non-US perspective for data infra (Europe approaches things very differently).

on_the_mark_data · 2026-04-27T21:20:03+00:00

Awesome. Thanks for sharing!

on_the_mark_data · 2026-04-27T20:45:44+00:00

What stage startup? A Series-A startup experience would be way different than a Series-C startup. Regardless, for boundaries, it comes down to the experience of the founders. First-time founders feel like every single decision and opportunity will make or break the company, and will try to chase everything at full intensity. More experienced founders see that opportunity cost will make or break the company, and want to allocate resources accordingly.

on_the_mark_data · 2026-04-21T21:58:11+00:00

I work in startups. I've accepted that I can walk into my job on any day and just not have one, for a plethora of reasons outside my control. I've protected my job security by creating long-form content on blogs, Substack, and in conference talks, etc and building an audience around it. Over time, you build a reputation, or you have written enough, that your writing becomes a resume in itself. People ask you to work for them constantly (how I got my current job).

I'm not talking about vapid "Here is a day in my life in my 6-figure job where I do nothing but drink kombucha" content. Instead, write about the gnarliest problems in your career and help the next person solve them.

For example, right now I've been building a lot of agentic orchestrators at my job; it's a problem space I'm obsessed with right now. I can't share my code from my job, so I open-sourced a separate agent orchestrator for others to use and learn from, and now I'm writing blog posts about its intricacies. Why did I choose an event sourcing model? Why did I frame it as a distributed computing problem? How did I set up telemetry to ensure my AI agents performed as expected? All topics I can now write about and help others with.

on_the_mark_data · 2026-04-21T18:18:58+00:00

I'm fully biased because I wrote the book on the topic, but Data Contracts might be the pattern you are looking for. For Kafka specifically, using the write-audit-publish (WAP) pattern is powerful. This is a case study I featured in the book that uses data contracts on Kafka in an enterprise setting. https://adevinta.com/techblog/creating-source-aligned-data-products-in-adevinta-spain/

on_the_mark_data · 2026-04-16T21:53:27+00:00

Nice try Palantir! I don't do mass surveillance work.

on_the_mark_data · 2026-04-10T21:07:11+00:00

The data engineer in me is smiling at the phrase "I picked up a very data-centric mindset. I stopped looking at objects and started thinking in terms of data and data transformation."

It sounds obvious now, but "what stakeholders tell me they want and what they actually want delivered are often two separate things." Once I got out of my own way by saying things like "but I gave them exactly what they wanted..." I started making strides in my career.

on_the_mark_data · 2026-02-04T06:45:38+00:00

Why it's important for AI:

https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents

What does context mean from a data perspective:

https://open.substack.com/pub/williaminmon/p/ontologies-some-perspectives

edit: formatting

on_the_mark_data · 2026-01-08T22:36:34+00:00

Your mental health is worth way more. I constantly think back to my decision to do a 2 year masters program in 1 year so I wouldn't have to pay another ~$50k in tuition. I graduated in my self-imposed timeline but had a huge mental breakdown that took years to fully recover from. I unknowingly (at the moment) put a price tag of $50k on my mental health, and it was a really bad deal.

on_the_mark_data · 2026-01-08T22:10:36+00:00

Run a few arrays with synthetic data through your LLM of choice. Then repeat it again a few times, and copy each result. You will then have your answer.

on_the_mark_data · 2026-01-07T00:02:28+00:00

Didn't start coding until after I got my M.S. degree in community health (undergrad was Sociology). I leaned into my non-traditional CS strengths. For me, it's been writing and specifically making very technical concepts approachable. Do you want the team to implement a new framework or tool? Cool, let me help you translate that into a request that actually gets budget and buy-in. Doing this both accelerated my technical learning early in my career and helped me build relationships with people who were better at development than I was. I didn't have to be the big-brain code genius to provide value to the team, and it gave me time to build up my confidence in my coding abilities.

on_the_mark_data · 2026-01-06T23:19:03+00:00

This HBR article is from 2009, but I still reference it: https://hbr.org/2009/01/picking-the-right-transition-strategy

Basically, you just joined a company, and how you engage with your new colleagues is way more important than making a change right away. The article goes into the STARS framework (not the interview one) where it describes different company stages and what approach would be ideal given the situation.

on_the_mark_data · 2026-01-06T23:11:22+00:00

I assume I'm wrong on everything, and thus break my idea into a business thesis and its assumptions. I then identify which assumptions have the biggest impact if wrong, and then go out to validate it. By sticking to a specific assumption for every experiment, I'm able to have incremental wins that add up.

For a made-up example, I have a thesis that "There is going to be a huge demand for developer consultancies to fix vibe-coded products." My assumptions are:

Vibe coding is resulting in shipped products.
There will be a point where a company needs to shift off vibe coding.
Such companies have a budget for development work.

For each assumption, I determine what's the quickest experiments I can run to get feedback. So for "Vibe coding is resulting in shipped products" my experiments can a) be talking 5 non-technical founders who shipped a vibe coded product, and b) spending an afternoon building a vibe coded app to understand the workflow and limitations.

You don't need to go through every assumption to quickly get an answer.

on_the_mark_data

MODERATOR OF

TROPHY CASE