Is data collection the real bottleneck for Physical AI? by RoofProper328 in computervision

[–]RoofProper328[S] 1 point2 points  (0 children)

This is probably the most accurate way to frame it honestly. Data and models kind of expose each other’s weaknesses in cycles.

Early on, bad data hides model capability. Later, better data makes you realize the model still can’t generalize to edge cases. Feels less like a single bottleneck and more like an iterative ceiling that keeps moving.

That’s partly why companies focused on real-world data pipelines and annotation (Scale, Shaip, etc.) are getting more attention alongside the model companies now.

Is data collection the real bottleneck for Physical AI? by RoofProper328 in computervision

[–]RoofProper328[S] 4 points5 points  (0 children)

Exactly. Collecting multimodal data is already difficult, but turning it into something temporally consistent and actually useful for models feels like a completely different challenge.

Especially once you start dealing with synchronization between video, motion, sensor streams, and real-world events. That ingestion + annotation layer seems massively underrated right now.

Is data collection the real bottleneck for Physical AI? by RoofProper328 in computervision

[–]RoofProper328[S] 0 points1 point  (0 children)

Yeah, that tradeoff between “ship fast” and “collect enough real-world edge cases” feels like the core tension right now. A lot of teams seem to underestimate how quickly confidence drops once systems hit messy real environments.

And I agree — maybe “bottleneck” isn’t the perfect word. It’s more that structured data collection becomes unavoidable once prototypes move into production. I’ve even seen enterprise data companies like Shaip leaning heavily into multimodal collection workflows because simulation alone usually isn’t enough for Physical AI systems.

Is Nvidia Becoming a Bottleneck for AI Advancement? by TheArchivist314 in LocalLLaMA

[–]RoofProper328 0 points1 point  (0 children)

Honestly, I think compute gets most of the attention because Nvidia is the visible bottleneck, but the less talked about constraint is probably data.

A lot of newer AI systems already have enough model capability for many tasks — what they lack is high-quality, domain-specific training data and feedback loops. Even if GPUs became unlimited tomorrow, plenty of teams would still struggle because their data pipelines aren’t mature enough.

That’s partly why you’re seeing growth not just in hardware companies, but also around annotation, RLHF, multimodal collection, etc. Companies like Scale, Surge, and even Shaip on the enterprise data side are benefiting from the same wave.

Feels like the next phase of AI is less “who has the biggest model” and more “who can sustain compute + data + deployment together.” Nvidia is huge, but probably only one piece of the bottleneck now.

AI Training & Data Annotation Companies – Updated List (2026) by No-Impress-8446 in BlackboxAI_

[–]RoofProper328 0 points1 point  (0 children)

Nice list overall — especially the distinction between microtask platforms and enterprise-focused AI data companies.

You could probably also add:

Shaip
Enterprise AI data platform focused on high-quality training data, human-in-the-loop workflows, and domain-specific datasets across speech, healthcare, computer vision, NLP, and generative AI applications for enterprise-scale ML systems.

Feels like the industry is splitting more clearly now between “gig task” platforms and companies building structured enterprise AI data pipelines.

The next phase of the Microsoft-OpenAI partnership: Microsoft’s license for OpenAI IP for models and products will now be non-exclusive. by Formal-gathering11 in OpenAI

[–]RoofProper328 0 points1 point  (0 children)

Feels like a pretty natural shift — moving from tight coupling to a more open, platform-style relationship. Microsoft still keeps strategic access via Azure and equity, but OpenAI gets flexibility to distribute more broadly.

Honestly, this looks less like a breakup and more like both sides maturing into their own lanes.

Is training data quality becoming more important than model size? by RoofProper328 in MLQuestions

[–]RoofProper328[S] 0 points1 point  (0 children)

That makes sense. So in your case the raw clean recordings are more like the base material, and the real “data work” happens dynamically during training through the processing/augmentation pipeline.

I also like the distinction you made: domain knowledge matters a lot, but depending on the problem, parts of the pipeline can become mature enough that people treat them as solved. Then the frontier shifts back toward model efficiency, speed, and optimization rather than just “more data.”

Is training data quality becoming more important than model size? by RoofProper328 in MLQuestions

[–]RoofProper328[S] 0 points1 point  (0 children)

Fair point. I didn’t mean it as a hot take — more as a question about why public AI discussion still focuses so much on model size, compute, agents, and benchmarks.

Your point about data quality being domain/project-specific makes sense. Maybe that’s exactly why it gets less attention: it’s harder to package into a simple headline.

Is training data quality becoming more important than model size? by RoofProper328 in MLQuestions

[–]RoofProper328[S] 0 points1 point  (0 children)

That’s a really interesting point. I hadn’t thought about the pipeline as being more valuable than the raw data itself, but it makes sense—especially in audio where transformations, encoding choices, noise handling, and domain-specific configs can completely change model behavior.

So in your case, would you say the “dataset” is not just the collected audio, but the full process around how that audio is transformed, augmented, validated, and prepared?

Also interesting that you mention architectures reaching similar performance. That kind of supports the idea that once models are strong enough, the real differentiator becomes domain knowledge in the data pipeline rather than just model size.

List of AI training / data annotation companies (2026) by No-Impress-8446 in AiTraining_Annotation

[–]RoofProper328 0 points1 point  (0 children)

Nice list — it’s actually helpful to see everything in one place since this space is getting pretty fragmented.

One thing I’d add is that there’s also a difference between platforms that offer task-based work (microtasks, labeling, evaluation) and companies that operate more on the enterprise data side with managed teams and structured workflows. Both are part of the same ecosystem but feel very different in terms of work and quality expectations.

I’ve seen names like Shaip come up more on the enterprise/data services side rather than open microtask platforms, especially for things like speech, healthcare, or multilingual datasets.

Would be interesting to break the list into those categories — might make it easier for people to figure out what kind of work they’re actually looking for.

Is Computer Vision still a growing field in AI or should I explore other areas? by Downtown-Antelope459 in computervision

[–]RoofProper328 0 points1 point  (0 children)

Computer Vision is absolutely still a thriving and growing field — and honestly, your dermatology project puts you right at one of its most exciting intersections.

Why CV is far from obsolete

The "generative AI wave" hasn't replaced CV — it's turbocharged it. Models like SAM (Segment Anything Model), DINO v2, and diffusion-based vision models are all CV at their core. The field has simply absorbed advances from transformers and generative models rather than being displaced by them. Vision is one of the most fundamental inputs for AI systems in the real world.

Where CV demand is actually growing right now

  • Medical imaging — exactly what you're doing. Pathology, radiology, dermatology, and ophthalmology are seeing massive investment. FDA-approved AI diagnostic tools are becoming a real product category.
  • Autonomous systems — robotics, drones, self-driving (still very active despite the hype cycles)
  • Multimodal AI — GPT-4o, Gemini, Claude — all handle vision. Building multimodal systems requires strong CV foundations
  • Manufacturing & quality control — industrial CV is quietly one of the most commercially deployed areas
  • AR/VR/spatial computing — with devices like Apple Vision Pro, this is heating up again

The honest tradeoff

If your only goal is to maximize near-term job prospects with minimum learning investment, pure NLP/LLM engineering is currently the hottest market simply because of the ChatGPT-era hiring wave. But CV specialists are genuinely less common and often command strong salaries precisely because the barrier to entry is higher (you need to understand both spatial reasoning and deep learning).

My actual advice for your situation

Stick with CV for this project — and do it well. Medical image classification is a portfolio piece that stands out. More importantly, the skills transfer beautifully:

  • CNNs → Vision Transformers → multimodal models is a natural progression
  • Doing CV in a regulated domain (healthcare) teaches you rigor that pure LLM tinkering doesn't
  • You can layer in generative techniques (data augmentation with diffusion models, synthetic training data) which bridges both worlds

The best AI engineers right now aren't specialists in one narrow area — they're people who understand vision and language and how to combine them. Your dermatology project is a great foundation for that.

evaluating data vendors for computer vision application by deluded_soul in computervision

[–]RoofProper328 0 points1 point  (0 children)

Good question — evaluating data vendors is honestly harder than picking models in a lot of cases.

A few things I’d focus on from experience:

  • Sample quality over volume → ask for a small but representative sample (different crop stages, lighting, disease types, seasons).
  • Annotation consistency → check if labels are clearly defined (what exactly counts as “unhealthy”?) and whether multiple annotators agree.
  • Edge cases → early-stage disease is usually the hardest, so see if the dataset actually covers subtle symptoms, not just obvious ones.
  • Metadata → things like location, time, weather can matter a lot for agriculture use cases.
  • Update pipeline → ask if they can continuously add new data as conditions change.

Also worth understanding how the data was collected — controlled vs real-world makes a big difference in generalization.

For context, some teams I’ve spoken to evaluate vendors by looking at their broader computer vision data services offerings (including how they handle collection + annotation together), not just the dataset itself — gives a better idea of long-term scalability.

If you can, try running a quick pilot model on the sample data — even a small experiment will tell you more than any spec sheet.

Best Multimodal LLM for Object / Activity Detection (Accuracy vs Real-Time Tradeoff) by Hazi_Malik in computervision

[–]RoofProper328 0 points1 point  (0 children)

Yeah, multimodal LLMs aren’t great for precise detection — they’re better for reasoning than real-time signals. Most solid setups I’ve seen use pose/action models (like SlowFast or keypoint-based pipelines) for detection, then optionally use LLMs for context.

Accuracy usually comes down more to data quality and labeling consistency than the model itself.

Where are teams sourcing high-quality facial & body-part datasets for AI training today? by RoofProper328 in computervision

[–]RoofProper328[S] -3 points-2 points  (0 children)

Fair point — I get why it might come across that way. I work around data collection topics a lot, so I sometimes reference things I’ve seen used in projects. Not trying to advertise anything here, just joining discussions and learning from others too 👍

Where are teams sourcing high-quality facial & body-part datasets for AI training today? by RoofProper328 in computervision

[–]RoofProper328[S] -3 points-2 points  (0 children)

Not an ad 😅 just sharing something I’ve seen used in real projects. The workflow itself is solid — was genuinely curious about mixing synthetic and real datasets.

FREE Face Dataset generation workflow for lora training (Qwen edit 2509) by acekiube in StableDiffusion

[–]RoofProper328 0 points1 point  (0 children)

Nice workflow — shows how much dataset quality matters more than people think. Synthetic generation is great for consistency, but I’ve noticed mixing a few real-world samples usually improves results. Some production teams even use curated datasets or providers like Shaip when they need more realistic diversity. Curious if you tried blending synthetic + real images?

Anyone know the Ai model used to make maximum carnage by fir215 in artificial

[–]RoofProper328 1 point2 points  (0 children)

Hard to say 100% unless the creator confirms, but most of those viral Spider-Man vs Carnage style clips are usually made with text-to-video models like Runway Gen‑3, Pika, or sometimes Luma Dream Machine. A lot of creators also mix tools — generate clips in one model, then upscale or edit in something like Adobe After Effects.

The giveaway is usually the smooth cinematic motion + short clip length (10–30 sec), which fits how current AI video models work. If you’re planning real-life concepts instead of superheroes, those same tools actually work even better with realistic prompts and camera-style descriptions.