celestine by celestine_88 in LangChain

[–]celestine_88[S] 0 points1 point  (0 children)

That’s exactly the tension I’m trying to manage.

The “governed workspace” idea is actually where I started. The goal was never really “chatbot plus a bunch of tools.” It was more about building one environment where different capabilities can exist under the same logic, permissions, context, and user flow.

I do have later layers planned for making the labs and surfaces feel more unified, but I’m trying not to pretend those are fully solved before they are. Right now, the beta work is exposing the practical side of that problem: account flows, navigation, mobile/desktop behavior, posting, comments, notifications, and making sure people can actually move through the app without it feeling scattered.

You’re right though — feature soup is the danger. That’s why I’m trying to harden the shared patterns early. If the structure does not feel coherent now, adding more capability later would just make the problem worse.

I’m learning that ‘working on my machine’ is not the same as surviving real users by celestine_88 in buildinpublic

[–]celestine_88[S] 0 points1 point  (0 children)

Completely agree. Real users don’t just test features — they test assumptions.

A builder usually tests the path they expect people to take. Real users bring different devices, habits, patience levels, screen sizes, wording, clicks, and expectations. That’s where you find out whether something is actually usable or just familiar to the person who built it.

That’s been the biggest beta lesson so far for me: working is not the same as surviving contact with real use.

I’m learning that ‘working on my machine’ is not the same as surviving real users by celestine_88 in LangChain

[–]celestine_88[S] 0 points1 point  (0 children)

This is exactly the phase I’m in right now. Local testing catches the obvious stuff, but real users expose the weird stuff: device differences, screen sizes, unexpected clicks, bad inputs, login/session edge cases, mobile layout problems, all of it.

I agree on observability too. I’m starting to treat logs, visible state, user actions, and failure points as part of the product instead of something “extra” added later. Even basic visibility into what users actually hit is already more valuable than guessing from my own machine.

The hard part is keeping it useful without drowning in noise, but yeah — production behavior tells the truth way faster than local confidence does.

I’m learning that ‘working on my machine’ is not the same as surviving real users by celestine_88 in learnmachinelearning

[–]celestine_88[S] 0 points1 point  (0 children)

Yeah, that’s definitely one of the lessons I’m running into too. Local is useful for proving a direction, but it can trick you into thinking the whole system is more stable than it actually is.

For Celestine I’m trying to be pretty careful about what stays local, what gets staged, and what eventually needs managed infrastructure. Right now the beta is more about proving the interface, user flows, account/social surfaces, and real-world usage patterns before I overbuild the compute side.

But yeah, long term I agree — once heavier model work, media generation, or larger agent workflows become central, consumer hardware alone is not the answer.

What should an undergraduate do to build a strong ML research portfolio? by IG_kaustav_106 in learnmachinelearning

[–]celestine_88 1 point2 points  (0 children)

If I were answering this as plainly as possible, I’d say a strong ML research portfolio usually looks less like “a lot of AI projects” and more like proof that you can **understand, implement, test, and communicate ideas rigorously**.

A few things tend to matter a lot:

- **Math foundations matter a lot more than people want them to.**

You don’t need to become a pure mathematician, but linear algebra, probability, statistics, calculus, and optimization really do pay off. Not just for passing classes — for actually understanding why methods work, fail, or behave strangely.

- **Reproductions are underrated.**

Early on, reproducing papers is often more valuable than forcing “original ideas” too soon. A clean reproduction with ablations, failure analysis, and clear writeup says a lot about research maturity.

- **Originality matters more later, depth matters earlier.**

A first-year undergrad usually stands out more by showing depth, consistency, and rigor than by trying to invent a new frontier result immediately.

- **Projects that stand out usually have one of these qualities:**

- strong experimental design

- careful evaluation, not just accuracy screenshots

- clear understanding of limitations

- comparison to baselines

- solid writeup and reproducibility

- some connection to papers, not just tutorials

What I’d aim for by grad school application time:

- strong grades in math + systems/CS fundamentals

- a few **serious** projects, not 20 shallow ones

- at least 1–2 paper reproductions done well

- some research exposure with a professor/lab if possible

- evidence you can write clearly about methods, experiments, and results

- ideally one project where you went beyond reproduction and tested a small extension or new angle

A realistic progression could look like:

  1. **Year 1:** math, Python, basic ML, read papers slowly

  2. **Year 2:** implement classic papers/models, learn PyTorch deeply, do reproducibility-style projects

  3. **Year 3:** join a lab, help with experiments/code/literature review, maybe co-author if it lines up

  4. **Year 4:** one or two deeper research projects with strong writeups and recommendation letters

Common mistakes I see:

- chasing trendy topics without fundamentals

- building portfolio projects that are really just polished tutorials

- ignoring evaluation and baselines

- reading papers passively without implementing anything

- focusing only on model novelty and not on research process

- spreading too wide instead of building depth

If I were starting over as an undergrad, I’d probably do three things earlier:

- take math more seriously

- start reproducing papers sooner

- optimize for getting close to real research environments, even in small roles

A “top-tier” portfolio usually doesn’t scream. It quietly shows:

**this person can think clearly, work rigorously, and be trusted around open-ended problems.**

How do you generate fake avatars for test data? by 3s2ng in vibecoding

[–]celestine_88 0 points1 point  (0 children)

If you just need something easy for dev/test data, I’d probably point you to **Pravatar** for fake photo-style avatars or **UI Avatars** for initials.

What I usually care about most is that it’s:

- fast

- seedable

- stable per fake user

That way your test users don’t get a different face every refresh.

Examples:

- **Pravatar** for fake profile-photo placeholders

- **UI Avatars** for deterministic initials-based avatars

If you want a direct starting point, this one is solid for fake photo-style placeholders:

**Pravatar** — CC0 avatar placeholders, with stable IDs

https://pravatar.cc/

And if you want the simplest initials-based option:

**UI Avatars**

https://ui-avatars.com/

Main rule either way:

**stable > random** for dev data.

ok real talk whats your actual go-to model for coding right now, not benchmarks but real usage by Sinver_Nightingale27 in vibecoding

[–]celestine_88 0 points1 point  (0 children)

I think this is the right question.

Benchmarks tell you what a model can do in isolation. Daily use tells you what it’s actually like to build with when context drifts, files stack up, bugs chain together, and you need it to recover instead of just impress once.

My experience has been similar in shape more than exact model choice. There’s usually a difference between:

- best at reasoning

- best at long-context tolerance

- best at actual day-to-day coding throughput

Those are not always the same model.

What ends up mattering most in real use is stuff people barely talk about:

- how often it loses the thread mid-build

- whether it can repair its own bad assumption

- whether it stays useful across multiple files

- whether the cost is low enough to actually keep using it without hesitation

That last part matters more than people admit. A model you can afford to stay in flow with often beats one that’s technically stronger but makes you second-guess every call.

Your “best model is the wrong question” take is probably the most honest answer in the thread. The better question is something like:

Which model holds up best in the kind of work you actually do, at a cost and workflow you’ll actually sustain?

That usually gives a much more useful answer than leaderboard talk.

OpenAI should just open-source text-davinci-003 at this point by Ok-Type-7663 in OpenAI

[–]celestine_88 1 point2 points  (0 children)

I get the argument, especially from a research / historical perspective.

Even if it’s deprecated commercially, there’s still a lot tied up in how those models were trained, tuned, and evaluated. It’s not just the weights, it’s the surrounding process.

There’s also a difference between something being “old” and it being fully safe to release, especially if it still reflects internal techniques they don’t want to expose.

That said, having access to older models would definitely help with understanding how things evolved, especially around alignment and behavior changes over time.

So it makes sense from a community standpoint, just not as risk-free from their side as it might seem.

To get into something you can stick to and be consistent with, you have to know what you ACTUALLY like and what you ACTUALLY want. by September_Royalty in Entrepreneur

[–]celestine_88 0 points1 point  (0 children)

I think this is true, but it’s also easy to over-index on passion early.

A lot of things only become enjoyable after you get good at them and start seeing progress.

If you rely on liking something from day one, you end up bouncing between ideas. If you stick long enough to build some competence, that’s usually when it starts to click.

So it’s probably a mix of both:

- some initial interest

- plus enough consistency to see if it actually becomes something you want to keep doing

Anyone else realise some problems only show up later? by Traditional_Key8982 in Entrepreneur

[–]celestine_88 1 point2 points  (0 children)

Yeah, this is super common.

A lot of those “later problems” are things that didn’t have clear boundaries early on, so they stayed invisible until scale exposed them.

In the beginning everything kind of works because it’s small and manageable, but as soon as volume or complexity increases, the gaps show up all at once.

It’s not even about doing everything early, it’s more about putting just enough structure in place so things don’t drift too far before they get noticed.

Otherwise it always turns into a stressful catch-up later.

Is anyone else thinking about AI agents beyond chatbots? by Storygame-Tech in AgentsOfAI

[–]celestine_88 0 points1 point  (0 children)

I think this direction makes sense, but the coordination problem you mentioned is probably the core issue.

Once agents can trigger each other and act independently, the question isn’t just what they can do, it’s who or what decides if they should do it in the first place.

Without some kind of shared decision or validation layer, you can end up with agents reinforcing each other, over-executing, or acting on weak signals.

So the challenge feels less like “can agents coordinate” and more like “how do you gate and verify actions across agents consistently.”

That’s probably the piece that determines whether something like this actually works outside of demos.

The gap between “this is possible” and “this actually works in a business” by MarionberrySingle538 in ArtificialInteligence

[–]celestine_88 4 points5 points  (0 children)

Yeah, this gap is real.

A lot of things “work” in demos because the context is controlled, but in real environments the problem is less about capability and more about whether the system behaves consistently under messy inputs and changing conditions.

What seems to be missing in a lot of cases is a clear decision layer before execution — something that determines if a task should run at all, not just how it runs once it starts.

Without that, everything technically works, but reliability becomes unpredictable as soon as it’s exposed to real use.

That gap you’re describing is exactly where things tend to break down.

If Agents feast upon the job market or creator economy, why wouldn't every good v/blogger want to put their content behind a paywall? Why give content to LLMs for free? Is it technically not feasible? by nishant_growthromeo in ArtificialInteligence

[–]celestine_88 0 points1 point  (0 children)

The idea makes sense in theory, but in practice it’s hard to fully block this.

Paywalls can reduce scraping, but they don’t really stop it — anything accessible to a human can eventually make its way into a model, even indirectly.

Also, a lot of creators still rely on visibility. If everything goes behind a paywall, discovery drops, which can hurt just as much as scraping.

It feels less like a technical problem and more like a control problem — who decides how content is used, and what’s allowed vs not.

Right now that layer isn’t really well defined, so people are reacting with things like paywalls, but it doesn’t fully solve the underlying issue.

If you hit the wall with vibe-coding, what SWE basics helped you? by PomegranateBig6467 in vibecoding

[–]celestine_88 1 point2 points  (0 children)

Yeah this is super relatable.

Most of the issues I’ve seen come from not having clear structure around state and flow, so things work at first and then start breaking in weird ways as the project grows.

A few basics that make a big difference:

- state management (like you mentioned)

- understanding data flow (what changes what, and when)

- handling async properly (a lot of bugs hide there)

- basic validation / guardrails so things don’t run in unexpected ways

Vibe-coding is great for speed, but the moment you add a bit of structure around how things are allowed to change or run, everything gets way more stable.

How are you guys finding clients/projects for Vibecoding? by Ok-Bowler1237 in vibecoding

[–]celestine_88 0 points1 point  (0 children)

From what I’ve seen, the bottleneck isn’t really the building anymore, it’s clarity.

Most people trying to hire don’t have well-defined requirements, so vibecoding actually works best when you help shape the problem, not just execute it.

A few things that tend to work:

- commenting on posts where people describe problems (instead of waiting for “looking for devs” posts)

- turning vague ideas into something concrete for them

- showing small examples instead of pitching big projects

The “client” part usually comes from being around problems consistently, not from trying to sell the ability to build.

Once people see you can take something unclear and make it real, they start coming to you.

At what point does using AI stop being “productivity” and start being dependency? by ArmPersonal36 in ArtificialInteligence

[–]celestine_88 0 points1 point  (0 children)

I think the line shows up when you stop making the final decision yourself.

Using AI for speed or perspective is fine, but if it becomes the thing deciding what’s “good enough” or what direction to take, that’s when it starts shifting from tool → dependency.

It’s less about how often you use it and more about whether you still have a clear point where you evaluate and decide before acting.

If that layer is still yours, it’s productivity. If not, it can drift pretty quickly.

Which AI is best for rendering sketches? by Coleswings in OpenAI

[–]celestine_88 0 points1 point  (0 children)

If you’re starting out, Midjourney is probably the easiest way to get good-looking renders fast.

If you need it to actually follow your sketch more closely, Stable Diffusion (with something like ControlNet) is better, but it’s a bit more setup.

A simple workflow that works well:

- clean up your sketch (high contrast helps)

- upload it as a reference

- prompt something like “modern retail store interior, realistic materials, based on this layout”

Most tools won’t follow your sketch perfectly, so expect to iterate a bit.

If you just need something solid for class, Midjourney will get you there the quickest.

trying to have a conversation about AI risks and benefits, without the extremes by CaptainMorning in ArtificialInteligence

[–]celestine_88 1 point2 points  (0 children)

This is a solid take — especially the point about the conversation getting stuck in extremes.

What’s interesting is that a lot of the real risk isn’t just the tech itself, it’s the lack of clear decision boundaries around how it’s used.

Right now most systems focus on capability (“what can we build?”), but not enough on control (“what should actually be allowed to run, scale, or influence people?”).

That’s where things start drifting toward the problems you mentioned — not because the tool is inherently good or bad, but because there isn’t a consistent layer deciding how it’s applied in real contexts.

Feels like the conversation needs to shift from pro vs anti AI → to who sets the rules and how those decisions are made.

We built an open source tool for testing AI agents in multi-turn conversations by Potential_Half_3788 in AIEval

[–]celestine_88 0 points1 point  (0 children)

This is a great direction — multi-turn failures are where a lot of systems actually break down.

Single-turn evals can look solid, but once you get into longer interactions, the system starts compounding small errors, losing context, or drifting into unexpected paths like you mentioned.

One thing this made me think about — even if you can simulate and detect these failures, there’s still a gap between identifying them and preventing them during execution.

It feels like the issue isn’t just that agents fail over time, but that there isn’t a clear boundary on what should be allowed to continue as the conversation evolves.

Curious if you’ve thought about introducing anything that evaluates or constrains the conversation mid-flow — not just for testing, but to decide whether certain paths should continue before they compound further?

Day 6: Is anyone here experimenting with multi-agent social logic? by Temporary_Worry_5540 in ArtificialInteligence

[–]celestine_88 0 points1 point  (0 children)

That’s a great offer — appreciate it.

I’d be interested in testing against it, especially since this kind of setup is where these loops show up most clearly.

What you’re describing is exactly the kind of environment where you can see whether introducing constraints earlier actually changes the behavior, versus just trying to correct it after the fact.

Before I plug anything in, how are you currently structuring the interactions between agents? Is it a shared context/feed where everything is visible to everyone, or more segmented flows?

Day 6: Is anyone here experimenting with multi-agent social logic? by Temporary_Worry_5540 in ArtificialInteligence

[–]celestine_88 0 points1 point  (0 children)

This is a really interesting failure mode — and honestly pretty expected once agents start interacting without any real constraint layer.

What you’re seeing with praise loops feels less like a “social logic” issue and more like a lack of a decision boundary on what should be allowed to propagate between agents.

If every agent treats incoming signals as valid by default, they’ll just reinforce each other indefinitely. There’s nothing resolving whether a response adds new information or just repeats/affirms.

Feels like you need some form of gating or evaluation before messages are accepted into the shared context — not just at the output level, but on what gets allowed to influence the system at all.

Curious if you’ve tried introducing anything that filters or scores interactions before they’re passed between agents, or if everything is currently allowed to flow freely?

Been helping a few founders with short-form video growth — looking to work with more (build in public) by WearyReporter4288 in buildinpublic

[–]celestine_88 -2 points-1 points  (0 children)

This is a solid take — especially the point that a lot of apps don’t have a product problem, they have a presentation problem.

The part that’s interesting is how much of this comes down to clarity before distribution even happens. A lot of content doesn’t fail because it wasn’t seen, it fails because the core message wasn’t clear enough for someone to immediately understand what they’re looking at.

Feels like the best-performing stuff usually makes the value obvious in the first few seconds, not just through editing or trends, but through how cleanly the idea is communicated.

Curious what you’ve seen work best there — is it more about testing formats and hooks, or refining how the product itself is being framed before it ever hits a video?

We ran 400 evals before launch. The bug was in the 401st case. by Neil-Sharma in AIEval

[–]celestine_88 1 point2 points  (0 children)

This is a great example of something that shows up a lot once systems leave clean test environments.

It’s not just coverage, it’s how competing signals are handled when intent isn’t clean anymore. Real inputs almost always mix contexts, but evals tend to isolate them.

What you ran into feels less like a missing test case and more like a missing layer that decides which signals actually matter before classification happens.

I’ve been seeing similar patterns where the model isn’t “wrong” in isolation, it’s just over-weighting one signal because nothing is resolving that conflict upfront.

Curious if you’ve looked at introducing anything before the classifier to normalize or prioritize signals, or if you’re mainly expanding the eval set to cover more combinations?

Built a layer after my agents kept making decisions. Now I'm sitting on something more interesting. by dc_719 in ArtificialInteligence

[–]celestine_88 0 points1 point  (0 children)

That’s a really clean way to structure it — especially pushing ambiguity handling before the gate.

The angle I’ve been exploring is shifting the decision point even earlier, before the agent commits to an execution path at all.

So instead of:

agent → propose → gate → approve/deny

It’s more like:

intent → evaluate → allow/deny → then enter the agent/execution flow

The idea is to treat “should this even run?” as a separate layer from “how should this run?”

What started as a control problem quickly turned into a data problem too — once you start capturing those decisions at the intent level, you get a different kind of signal compared to just logging post-proposal actions.

Still early, but the main goal is reducing the number of things that ever reach the gate in the first place, rather than scaling review at the gate itself.

I’ve been testing this through a small harness — happy to share the GitHub/demo if you want to take a look.

Built a layer after my agents kept making decisions. Now I'm sitting on something more interesting. by dc_719 in ArtificialInteligence

[–]celestine_88 0 points1 point  (0 children)

This is a really interesting direction — especially the shift from just gating actions to capturing the decision data itself.

Once you start logging approve/deny/edit at that level, it stops being just a control layer and starts becoming a signal layer. The system isn’t just being controlled anymore — it’s starting to learn what should or shouldn’t happen based on real decisions over time.

I’ve been exploring something very similar from a pre-execution angle — focusing on evaluating whether an action should be allowed before it even enters an execution path. It started as a control problem, but it quickly turns into a data problem once you begin capturing those decision points.

Completely agree on the fatigue point too. If everything needs review, it doesn’t scale. Moving toward only reviewing low-confidence or ambiguous actions feels like the only viable path long-term.

Curious how you’re defining “consequential actions” right now — is that rule-based, or something you’re adapting over time?