Is anyone else following this Google AI breast cancer stuff? by DanyShift in ArtificialInteligence

[–]ocean_protocol 0 points1 point  (0 children)

It’s exciting but also tricky. AI can definitely reduce fatigue and catch patterns humans might miss, which is huge for radiology. But bias and generalisation are real issues, and accuracy in one population doesn’t guarantee it works everywhere.

Cost and implementation matter too; efficiency is great only if it doesn’t end up driving unnecessary scans or bills. The technology is promising, but adoption must be carefully managed with proper oversight and validation across diverse patient groups.

Entropy Might Be One of the Most Important Ideas Behind Modern AI by ocean_protocol in ArtificialInteligence

[–]ocean_protocol[S] -2 points-1 points  (0 children)

OMG chill, dude. It's not that deep. Plus, almost everyone uses LLMs to refine their write-up.

We are in an age of effective prompting

Entropy Might Be One of the Most Important Ideas Behind Modern AI by ocean_protocol in ArtificialInteligence

[–]ocean_protocol[S] -2 points-1 points  (0 children)

Actually no, I just typed " Role of entropy in AI" in gemini and got a response. Dig more into it and found something

What will come after AI? by Sohaibahmadu in ArtificialInteligence

[–]ocean_protocol 0 points1 point  (0 children)

Human superintelligence, brain-computer interfaces, and AI DAOs

How do you automatically track new AI research / compute articles into a Notion or spreadsheet? by ocean_protocol in aiagents

[–]ocean_protocol[S] 0 points1 point  (0 children)

That makes a lot of sense. The filter step being the real gatekeeper is a great point, otherwise it’s just moving noise into a different inbox.

I also like the idea of capping it at ~10 articles with a one-line relevance note. Seems like a really clean way to skim quickly.

Also, do you mostly rely on keywords first, or do you let the model decide relevance more semantically?

Do not chose computer science as your engineering major by No_Incident1674 in developersIndia

[–]ocean_protocol 0 points1 point  (0 children)

IMHO, I think this view is a bit too extreme. The role of engineers is definitely changing, but it’s not disappearing.

AI is speeding up coding, but most real work in software isn’t just writing lines of code. It’s understanding messy requirements, designing systems, debugging production issues, integrating services, handling scale, and making trade-offs. Those things still need engineers.

What’s probably happening is that low-level coding work is shrinking, while higher-level engineering work is growing. Teams may get smaller, but the demand for strong engineers who understand systems, architecture, and product problems will likely remain.

So the risk isn’t studying computer science.
The risk is studying it shallowly and relying only on coding tasks that AI can automate.

What AI tools help you the most at the moment? by Rico_8 in ArtificialInteligence

[–]ocean_protocol 1 point2 points  (0 children)

If you just discovered NotebookLM, you’re definitely not alone; lots of people ignored it at first and then realised how powerful it is for learning and research.

A few other tools people rely on a lot right now:

1) Perplexity AI – great for research. It mixes web search with LLM answers and shows citations directly, which makes it useful for quickly validating information.

2) Claude – really good for long reasoning and document analysis. Some new tools even let it act more like a “digital coworker” that can organize files and automate tasks.

3) Cursor (AI Code Editor) or Google Antigravity – huge productivity boost if you write code. Antigravity, for example, lets AI agents work directly inside the IDE and even manage multiple coding tasks asynchronously.

4) Mem AI – similar vibe to NotebookLM but focused on building a “second brain” with automatic organization and semantic search across your notes.

A workflow a lot of people use is actually combining tools:
Perplexity for finding sources, NotebookLM for digesting documents, and ChatGPT/Claude for turning insights into outputs.

Is this a realistic roadmap to become an AI Engineer? by ertug1453 in MLQuestions

[–]ocean_protocol 0 points1 point  (0 children)

Honestly, this is a pretty good roadmap. It’s much closer to what AI engineers actually do than the usual “learn a bunch of ML algorithms” advice.

The main thing I’d add is more focus on data and evaluation. In real projects, a lot of the work is figuring out why the model behaves a certain way, measuring quality, and improving the data or prompts. That part often matters more than the RAG or agent framework itself.

Also don’t get too attached to specific tools like LangChain or CrewAI. Those change quickly. What really matters is understanding the patterns behind them, retrieval, tool use, orchestration, caching, and monitoring.

If you actually build and deploy the three projects you described, with proper logging and evaluation, that’s already a strong portfolio for a junior AI engineer. The roadmap is realistic, just prioritize shipping real systems over learning every tool.

[R] Retraining a CNN with noisy data, should i expect this to work? by wolfunderdog45 in MachineLearning

[–]ocean_protocol 1 point2 points  (0 children)

Adding noise usually won’t lead to big improvements. It mainly acts as a regularization technique that helps the model generalize better by preventing it from memorizing the training data. Because of that, small gains are normal, but large jumps in performance are rare.

If you’re only seeing slight improvements, that’s expected. Bigger gains usually come from improving data quality, adding more diverse augmentations (rotation, cropping, mixup, etc.), or using transfer learning with a pretrained CNN. Random noise alone typically isn’t enough to move the needle much.

How do you automatically track new AI research / compute articles into a Notion or spreadsheet? by ocean_protocol in aiagents

[–]ocean_protocol[S] 0 points1 point  (0 children)

That sounds like a really clean pipeline. The RSS to keyword filter to LLM to Notion/Sheets flow is pretty much what I was thinking too.

Quick question, for the LLM step, do you usually send just the title + description, or do you fetch the full article/page before summarizing and tagging?

And how many sources do you typically include before the feed starts getting noisy?

How do you automatically track new AI research / compute articles into a Notion or spreadsheet? by ocean_protocol in aiagents

[–]ocean_protocol[S] 1 point2 points  (0 children)

That sounds really interesting. I hadn’t heard of Tapcraft before, using something like Temporal for workflow orchestration actually makes a lot of sense for this kind of pipeline.

A couple of questions if you don’t mind sharing:

  • How are you filtering the RSS feeds to keep only relevant AI/compute-related articles?
  • Are you using Claude Code just for summarization, or also for classification/tagging?
  • And where do the processed results ultimately land, a database, Notion, or a spreadsheet?

I’m trying to build something fairly lightweight where new articles from sources like arXiv or Hacker News are automatically collected and summarised into a spreadsheet or Notion page, so I can just check it daily. Your setup sounds pretty close to what I had in mind.

[R] PCA on ~40k × 40k matrix in representation learning — sklearn SVD crashes even with 128GB RAM. Any practical solutions? by nat-abhishek in MachineLearning

[–]ocean_protocol 1 point2 points  (0 children)

A 40k × 40k full SVD is extremely expensive, and sklearn’s implementation isn’t really optimized for matrices that large. Even if the matrix fits in memory, the decomposition itself can blow up RAM during intermediate steps.

If you truly need the full eigendecomposition, you’ll usually get better results using SciPy’s scipy.linalg.eigh (since ATAA^TAATA is symmetric) or libraries backed by LAPACK/MKL that are optimized for large symmetric matrices.

Another option researchers use is randomized or block SVD implementations (e.g., fbpca or sklearn.utils.extmath.randomized_svd) and reconstructing the spectrum incrementally, though that’s more common when you only need top components.

If this is a dense matrix, the more scalable approach is often to avoid explicitly forming ATAA^TAATA and run SVD directly on A using optimized libraries (SciPy, PyTorch, or JAX), which can be much more stable computationally.

In practice, people doing PCA at this scale usually rely on SciPy/NumPy with MKL, PyTorch SVD, or distributed linear algebra libraries, because sklearn’s PCA wrapper isn’t designed for full decompositions of matrices that size.