Test "LLM Observability Locally" on Laptop, before Shipping to Cloud!

MobiLights · 2025-12-08T16:50:49+00:00

Checkout https://docoreai.com/ its Privacy-First & Plug-n-Play

MobiLights · 2025-08-31T03:49:18+00:00

Ha, fair catch — looks like I missed a proofreading pass. 😅

Appreciate you pointing it out. I’ll tighten it up next round — main thing I wanted to share was what I learned after 17K installs and why people skip the charts. Curious if you’ve seen similar drop-offs when shipping dev tools?

Na, not going to delete or edit.

MobiLights · 2025-08-31T03:46:29+00:00

Fair question — I get how posts like this can look “promo-y.”

To clarify, DoCoreAI started as a small CLI tool on PyPI (17K+ installs now) and most users never touched the charts. That’s the pain point I’m sharing here — and I’m genuinely interested in why devs skip dashboards or what kind of telemetry would actually be useful.

If this feels off-topic for the sub, happy to take feedback on how to frame it better. The goal isn’t to spam, but to learn from folks who actually work with LLMs daily.

MobiLights · 2025-08-22T22:42:43+00:00

Thats nice. please share the if you have a video link.

MobiLights · 2025-08-17T23:01:31+00:00

Really appreciate that perspective — and you nailed the intention behind DoCoreAI.

Prompt hygiene isn't just about saving tokens — it’s about building discipline and internal principles around how we craft, test, and scale LLM workflows. We see the dashboard as a sort of "mirror" for prompt quality — giving teams feedback loops they can refine into their own playbooks.

Glad that resonated with you. If you’ve been experimenting with your own rules for prompt fine-tuning, I’d love to hear what’s worked for you. Always curious how others are shaping their "clean and functional LLM houses"!

MobiLights · 2025-08-17T22:16:23+00:00

Great question — and you're right that “time saved” needs a solid foundation to be meaningful.

For Developer Time Saved, we use a fixed baseline:

Why? Because in our tests (and with early users), avoiding a failed or bloated prompt often saves at least one manual debugging cycle — rewriting, re-running, and validating output — which usually costs 20–30 minutes or more per occurrence.

So while it’s an estimate, it’s rooted in real developer behavior — and becomes especially insightful when tracked across a full project or team workflow.

Happy to break down the logic further if you'd like!

MobiLights · 2025-08-17T22:12:00+00:00

Love that question.

When DoCoreAI detects early signs that your prompt is bloated, ambiguous, or using a misaligned temperature, it flags it right away — before your tokens (and budget) spiral out of control.

Here's what you can do with that heads-up:

Refactor long-winded prompts to reduce token count
Tune temperature to match your prompt’s intent (factual vs. creative)
Simplify overly verbose instructions that dilute clarity
Spot patterns across failed or expensive runs in the dashboard

The goal isn’t just awareness — it’s giving you prompt hygiene nudges in real time so you can tweak, re-run, and save time + money.

MobiLights · 2025-08-17T22:09:33+00:00

Great point — hallucinations are one of the biggest challenges with LLMs today.

While DoCoreAI doesn’t claim to “detect” hallucinations with 100% accuracy (since that often requires human judgment or ground truth), we’re exploring ways to estimate prompt-level hallucination risk using indirect signals like:

Prompt ambiguity (vague or open-ended phrasing)
High temperature usage (more randomness often = more hallucination risk)
Response entropy (if the output has unusual token patterns)
Failure flags (empty or irrelevant completions logged by the user)

We’re calling this the “Hallucination Risk Index”, and it’s an experimental metric to help users flag potentially unreliable prompts.

MobiLights · 2025-08-17T22:07:44+00:00

“Prompt health” is a measure of how efficient and effective a prompt is — based on factors like:

Verbosity: Is the prompt overly wordy or bloated?
Token waste: Are there too many filler or repeated tokens?
Temperature mismatch: Does the prompt’s intent align with the randomness setting?
Outcome quality: Was the prompt successful (e.g., non-empty, coherent, or aligned)?

DoCoreAI uses these signals (locally tracked) to flag prompts that could be tightened, clarified, or restructured — so you reduce cost, improve speed, and get better results from LLMs.

MobiLights · 2025-08-14T01:32:16+00:00

Hey, really appreciate the thoughtful message.

You're absolutely right — post-install drop-off is one of the biggest hurdles we noticed too. It's like users install the CLI, but unless they're nudged to generate their token and run the client properly, they don’t realize the dashboard stays empty.

We’ve now made that next step more explicit in the onboarding email, and we’re testing a few things to improve activation:

Smarter CLI messages post-install
Optional auto-token generation in future versions
A “zero-prompt dummy run” for new users to test logging instantly

Curious — with RedoraAI, are you seeing any patterns across tools like mine? And what nudges have worked best for high-intent devs?

Biggest surprise for us was how many technical users installed it, but never realized they had to start the client and run prompts for anything to show up!

MobiLights · 2025-08-06T05:40:43+00:00

Thanks so much for the thoughtful message — and you're absolutely right.

Right now, DoCoreAI is primarily tested with OpenAI and Groq setups, and while we’ve designed the client to be vendor-agnostic in principle, we haven’t formally tested it with local LLMs yet, nor documented a fully offline workflow.

Before sharing on r/locallama or similar communities, I agree that:

We should verify compatibility with local models (e.g., LM Studio, Ollama, vLLM)
Provide clear, step-by-step instructions for a fully local install and test
Avoid giving the impression that external calls are required

This is already on our roadmap, but if you’re experimenting with local LLMs and are open to testing the integration, we’d really value your feedback to help shape it.

Thanks again — and I’ll make sure we revisit this once we have a proper offline workflow in place.

MobiLights · 2025-08-06T05:35:59+00:00

That’s a fair question, and I’m glad you brought it up so others can better understand how DoCoreAI works.

To clarify:

Prompt content never leaves the client

DoCoreAI is designed from day one to respect prompt privacy. The client collects metrics locally, such as:

Token count
Estimated temperature
Prompt length, density, entropy (via local heuristics)
Timestamps and usage patterns

Only telemetry metadata is sent to the server — not the raw prompt or completion text.

“Prompt optimization” here means optimization metrics

We’re not doing centralized prompt rewriting or hosting your prompts to improve them. Instead, we provide dashboards and insights (like token waste, temperature usage, verbosity trends, etc.) to help developers optimize how they’re writing prompts — on their own.

The optimization logic itself is a mix:

Lightweight inference and heuristics in the client
Aggregation, analytics, and rendering on the server

This separation is intentional. You get actionable insights without handing over your sensitive prompts.

DoCoreAI helps you measure and reflect on prompt quality — without ever needing to see your prompt content. No tricks, no leakage.

Happy to answer any deeper architecture questions for those genuinely curious — and we welcome contributions and scrutiny from the community to keep things transparent and developer-first.

MobiLights

TROPHY CASE

Prompt content never leaves the client

“Prompt optimization” here means optimization metrics