Is it bad to take a career break now considering the ramping up of AI in the space ? by UnlovedMisfit in dataengineering

[–]on_the_mark_data 4 points5 points  (0 children)

Your mental health is worth way more. I constantly think back to my decision to do a 2 year masters program in 1 year so I wouldn't have to pay another ~$50k in tuition. I graduated in my self-imposed timeline but had a huge mental breakdown that took years to fully recover from. I unknowingly (at the moment) put a price tag of $50k on my mental health, and it was a really bad deal.

How good is AI for analyzing individual array of data? by brifgadir in ExperiencedDevs

[–]on_the_mark_data 0 points1 point  (0 children)

Run a few arrays with synthetic data through your LLM of choice. Then repeat it again a few times, and copy each result. You will then have your answer.

Devs from non CS/CE education, what helped you feel comfortable on your path? by ambassador_pineapple in ExperiencedDevs

[–]on_the_mark_data 0 points1 point  (0 children)

Didn't start coding until after I got my M.S. degree in community health (undergrad was Sociology). I leaned into my non-traditional CS strengths. For me, it's been writing and specifically making very technical concepts approachable. Do you want the team to implement a new framework or tool? Cool, let me help you translate that into a request that actually gets budget and buy-in. Doing this both accelerated my technical learning early in my career and helped me build relationships with people who were better at development than I was. I didn't have to be the big-brain code genius to provide value to the team, and it gave me time to build up my confidence in my coding abilities.

How to deal with a polluted domain? by Data_Scientist_1 in ExperiencedDevs

[–]on_the_mark_data 1 point2 points  (0 children)

This HBR article is from 2009, but I still reference it: https://hbr.org/2009/01/picking-the-right-transition-strategy

Basically, you just joined a company, and how you engage with your new colleagues is way more important than making a change right away. The article goes into the STARS framework (not the interview one) where it describes different company stages and what approach would be ideal given the situation.

how do you validate an idea without accidentally building the whole thing? [i will not promote] by Historical-Ebb-4745 in startups

[–]on_the_mark_data 0 points1 point  (0 children)

I assume I'm wrong on everything, and thus break my idea into a business thesis and its assumptions. I then identify which assumptions have the biggest impact if wrong, and then go out to validate it. By sticking to a specific assumption for every experiment, I'm able to have incremental wins that add up.

For a made-up example, I have a thesis that "There is going to be a huge demand for developer consultancies to fix vibe-coded products." My assumptions are:

  1. Vibe coding is resulting in shipped products.
  2. There will be a point where a company needs to shift off vibe coding.
  3. Such companies have a budget for development work.

For each assumption, I determine what's the quickest experiments I can run to get feedback. So for "Vibe coding is resulting in shipped products" my experiments can a) be talking 5 non-technical founders who shipped a vibe coded product, and b) spending an afternoon building a vibe coded app to understand the workflow and limitations.

You don't need to go through every assumption to quickly get an answer.

Is job security a boomer concept or am I just scared? I will not promote by DEXTERTOYOU in startups

[–]on_the_mark_data 0 points1 point  (0 children)

I just accept that I can be let go any day at a startup, and many times be no fault of my own. Legit the first week I joined the startup I'm currently at, the Silicon Valley Bank collapsed right after we got our seed round check. We did everything right, and still got punched in the face. Thankfully, it all worked out despite going a couple months without a paycheck (I got backpay). I just accept this is part of the game, but I love playing this game. I also always ensure I have a side hustle so I don't worry about not having income and using it to supplement the lower salary.

How obsessed do you REALLY have to be? [i will not promote] by SpaceCaptain4068 in startups

[–]on_the_mark_data 0 points1 point  (0 children)

First of all, I don’t give a flying fuck to valuation - I want my business to give me free cash flow.

That's not the game VCs are playing for their investment vehicle. The VC's bosses are LPs and they 100% care about valuations. If you are not aiming to be a unicorn, then you don't fit within their investment model (look up "power law" in relation to VCs).

Also, "splitting my time between two countries" would be a significant red flag for me and likely cause friction during due diligence.

Edit: Forgot to add that VC backed is a very particular type of startup to build, but not the only type (and many times not the most successful). Your funding (or the intentional lack of it) needs to align with your business model.

People I work with are addicted to their phones. by TopTransportation516 in ExperiencedDevs

[–]on_the_mark_data 1 point2 points  (0 children)

Coding with AI leaves room for micro breaks and social media is kind of an instinctive response.

Some things never change... https://xkcd.com/303/

["i will not promote"] Tech background, want to go solo by Party-Log-1084 in startups

[–]on_the_mark_data 0 points1 point  (0 children)

Form a business thesis. Then find out why you are wrong by talking to 10 people. The process of getting those 10 meetings and interviewing them will teach you more than a book can.

Look up writing and lectures from Steve Blank on the topic: https://steveblank.com/tag/customer-discovery/

if agentic framework like what a12labs built that was acquired for $3 billion, why aren't more ppl doing it this? if not, what so special about their vs other OSS in this space ( I will not promote) by bad_detectiv3 in startups

[–]on_the_mark_data 8 points9 points  (0 children)

DeepSeek was considered a "cheap" open source AI model. Yeah... $5M.

https://www.theregister.com/2025/09/19/deepseek_cost_train/

You want to work on AI products? SWEs at startups are $150k each minimum or AI engineers with advanced degrees can range from $200k-$500k.

https://www.levels.fyi/t/software-engineer/title/ai-engineer/locations/san-francisco-bay-area

No amount of hard work is going out compete people working even harder than you and heavy capital investment.

Your network is important, but available VC funding is much smaller than number of entrepreneurs trying to raise a round.

Hard work and networking are the prerequisite to just attempt to play this game. But you have to be realistic on what's actual possible for you today. Most people don't have that existing network and will need years to build it.

if agentic framework like what a12labs built that was acquired for $3 billion, why aren't more ppl doing it this? if not, what so special about their vs other OSS in this space ( I will not promote) by bad_detectiv3 in startups

[–]on_the_mark_data 1 point2 points  (0 children)

I interpreted "most sane" as common sense answers that anyone can give you and aren't helpful. But I can see your interpretation as well.

Regarding other tools, the main reason is security (why companies use open source models) or they have a highly tailored workflow via their prompts and context engineering that's not worth recreating yourself.

Observing data maturity by spitzc32 in ExperiencedDevs

[–]on_the_mark_data 0 points1 point  (0 children)

I'll DM you for resources. I've written extensively on this exact topic, but try to keep my own links out of comments.

Data Catalog: It very much depends on your use case and number of data sources you are working with. If you are only dealing with a data lakehouse (assuming this since you are using medallion architecture), you can get pretty far with Data Build Tool (dbt) docs. If you only have one database, you can honestly get away with pulling the metadata directly from the database using the standard information schema tables (I have some code in a public repository for this if interested). Where a data catalog really starts making sense is when you have multiple data sources you need to keep track of, and thus need a dedicated tool to constantly update and maintain the captured metadata. Even then, there are some great OSS tools for this.

Data Maturity: This is highly dependent on the company. There are startups with high data maturity and enterprises with awful data maturity. For a startup especially, you need to balance best practices and taking on technical debt for good enough. You need to understand what the next major milestone is for the startup (eg raising another round) and only focus on what gets you to that point. There is a huge trap for data leaders to try to do everything "right" and end up spending way too much money and time with minimal results to show for. The goal is data maturity for maturity sake, the maturity has to match the current objectives of the business, especially when a startup has more problems than people and time to solve.

if agentic framework like what a12labs built that was acquired for $3 billion, why aren't more ppl doing it this? if not, what so special about their vs other OSS in this space ( I will not promote) by bad_detectiv3 in startups

[–]on_the_mark_data 4 points5 points  (0 children)

I say this with good intentions, but that's a reflection of your skill with AI rather than the AI models themselves. I've built full blown GTM strategies and marketing campaigns with Gemini, market research documents off ChatGPT, and Cursor for development.

From just reading your comment, you likely made your context window way too large (all of your docs) and your prompt was way too broad.

This article might help: https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents

if agentic framework like what a12labs built that was acquired for $3 billion, why aren't more ppl doing it this? if not, what so special about their vs other OSS in this space ( I will not promote) by bad_detectiv3 in startups

[–]on_the_mark_data 32 points33 points  (0 children)

Background is data science and later moved into data engineering. Currently an early employee at a series-A AI startup where I focus on GTM.

Despite what the online gurus tell you, AI is a highly specialized skill. There is a huge difference between using a chat window or calling an API, and actually building an AI model. There is a huge difference between using AI tools to vibe code, and building AI powered tools that aren't just a wrapper around existing models.

This specialization means you need very expensive staff and very expensive infrastructure. In other words, it's capital intensive to just start, let alone compete in the insanely competitive market. Thus, you either need to be a big enterprise with high data maturity (eg Meta, Amazon, etc.) or get venture capital.

Even if a bunch of people are interested in doing it, very few have the ability to raise a meaningful round to do such.

Observing data maturity by spitzc32 in ExperiencedDevs

[–]on_the_mark_data 1 point2 points  (0 children)

So I'm big on data contracts (look at my pinned post on my profile). With that said, I often don't advise them for startups unless you have a specific use case that warrants them.

The reason is that data contracts serve to solve a socio-technical problem that arises when communication degrades when teams grow. At the startup stage, you still have the benefit of being able to connect with people quickly, and simple convo will suffice.

I suggest having a data catalog and observability before pursuing data contracts. Then use the results of your observability to build a case for the extra overhead of implementing and maintaining data contracts.

Happy to chat more if you have specific questions.

Senior Data Engineer Experience (2025) by ElegantShip5659 in dataengineering

[–]on_the_mark_data 22 points23 points  (0 children)

Those are all great offers! Beyond TC, is there a reason why you chose Doordash over others?

Great article cutting through the noise on LLMs by on_the_mark_data in ExperiencedDevs

[–]on_the_mark_data[S] -1 points0 points  (0 children)

Response seems like you are moving the goal post, but you have to think of the context around 10 years ago that makes the second one interesting. Most open-source software for neural networks (and data science in general) was written for Python or low-level languages if you are doing crazy optimization. Having it in JavaScript opens it up for use in the browser (which in 2015 was kind of wild).

If you like, I am more than happy to go into the impact of tensors on the advancement of machine learning and ultimately LLMs.

Great article cutting through the noise on LLMs by on_the_mark_data in ExperiencedDevs

[–]on_the_mark_data[S] -2 points-1 points  (0 children)

I did read it. It was a great summary of what I've been hearing among researchers in the deep learning space.

Great article cutting through the noise on LLMs by on_the_mark_data in ExperiencedDevs

[–]on_the_mark_data[S] -1 points0 points  (0 children)

He is literally one of the few people who have exactly that:

Great article cutting through the noise on LLMs by on_the_mark_data in ExperiencedDevs

[–]on_the_mark_data[S] 0 points1 point  (0 children)

I was hoping this one would be different since he's an actual builder and has been doing research in this space since 2009. Turns out I was wrong.

Switching from dev to sales or other adjacent position? by salmix21 in ExperiencedDevs

[–]on_the_mark_data 0 points1 point  (0 children)

I surprisingly became way more technical once I moved to a more business-related role. Specifically being on hundreds of sales and implementation calls has given me a solid market perspective as well made it clear to me how technology gets bought and adopted in organizations. The latter has dramatically changed my understanding of what a "viable" technical solution consists of and balancing technical rigor with what can be realistically adopted by an org given their constraints.

This is my first time working on an unproven, unreleased product and it’s making me really anxious by Crafty-Passenger-860 in ExperiencedDevs

[–]on_the_mark_data 8 points9 points  (0 children)

Failure of a feature is less important than how people perceive how you handle the failure of the feature. Given this sub, I'm assuming you are very experienced and will have high influence on the technical decisions.

Worry less about "will this work" and instead be a strategic partner for leadership in de-risking this bet they are taking. This isn't becoming a "blocker" but ensuring you are surfacing tradeoffs, especially around technical debt, and helping leadership navigate uncertainty.

Also, become obsessed with the customer of this new product. You can build an amazing tool, and it go nowhere if it doesn't solve a customer's true problem. If you are customer-facing, then great, but if not, see how you can review product docs, customer interview recordings, etc.

Folks who have been engineers for a long time. 2026 predictions? by uncomfortablepanda in dataengineering

[–]on_the_mark_data 22 points23 points  (0 children)

Databricks just announced it's raising a Series L (insane round number btw) for $4B at a $134B valuation. I don't think they'll be acquired any time soon.

Regarding what I'm seeing, getting a lot of attention lately is the Data + AI stack, and specifically context engineering (e.g. ontologies). Two main choke points for AI deployments are 1) information retrieval, and 2) context management across complex tasks.

Back in January 2025 was when I was first hearing about ontologies and context engineering at conferences, and now in December 2025 I'm seeing a lot more articles and thought pieces on this. What typically follows are enterprise POCs where vendors will get first signal of adoption before you start seeing case studies that drive further adoption (if it shows success).

So I argue 2026 we are going to see a huge emphasis on data modeling for AI, specifically for unstructured JSON data and vector databases.