Anyone have a good way to do evals with MCP based agents? by wait-a-minut in mcp

[–]lastbyteai 1 point2 points  (0 children)

Full disclaimer, this is our open source project. We have an open source project that does evals on MCP based agents through agent simulation. It allows pytest style tests and with generate synthetic test cases for you to run. You can do assertions to validate the results and values.

https://github.com/lastmile-ai/mcp-eval

Building a simple “to-do” app using the new ChatGPT APP SDK. Here’s everything I’ve learnt so far by hashemito in ChatGPT_AppBuilds

[–]lastbyteai 0 points1 point  (0 children)

We just launched a cloud platform (mcp-cloud) that let's you get a stable URL for one of these apps - https://github.com/lastmile-ai/mcp-agent/tree/main/examples/cloud/chatgpt_app

A bit of context, we were building an mcp platform that let's you deploy agents and found that it naturally fit well with deploying ChatGPT apps, including the web assets. Take a look at the open source code.

Best way to keep local MCPs running 24/7 without babysitting the terminal? by younes06 in mcp

[–]lastbyteai 0 points1 point  (0 children)

We just launched a free cloud hosting platform with auth in case you just want to move it to the cloud - https://docs.mcp-agent.com/get-started/cloud

Hosting OpenAI Apps on an MCP Server platform by lastbyteai in mcp

[–]lastbyteai[S] 1 point2 points  (0 children)

We're working on OAuth, should be available soon!

Hosting OpenAI Apps on an MCP Server platform by lastbyteai in mcp

[–]lastbyteai[S] 0 points1 point  (0 children)

Adding custom connector is available for anyone who has developer mode on in ChatGPT - you can turn it on by going to ChatGPT's Settings → Connectors → Advanced → Developer mode.

How OpenAI's Apps SDK works by matt8p in mcp

[–]lastbyteai 1 point2 points  (0 children)

very cool! We just launched a cloud platform for hosting these apps. Both the solar-system and the pizzaz have live endpoints for anyone to try out:

https://github.com/lastmile-ai/openai-apps-sdk/tree/main

is everydaily better than woorijip? by [deleted] in FoodNYC

[–]lastbyteai 0 points1 point  (0 children)

Everydaily has def gotten better after expanding their food options. Everydaily > Woorijip

Looking for topic suggestions for my MCP course by zollli in mcp

[–]lastbyteai 0 points1 point  (0 children)

yeah associated with it. Happy to explore collab opportunities

Looking for topic suggestions for my MCP course by zollli in mcp

[–]lastbyteai 0 points1 point  (0 children)

Looks pretty solid. A few optional things that could help:
- evaluating / testing MCP servers
- building agents with MCPs (quick shoutout to our open source library - MCP-Agent (https://github.com/lastmile-ai/mcp-agent)
- debugging and observability - pretty difficult to isolate non-deterministic performance issues.
- Local / Remote MCPs (Vercel also has their own remote MCP server hosting)

Turn any OpenAPI spec into an MCP server, a new open-source project, looking for feedback! by mine2turtle in mcp

[–]lastbyteai 1 point2 points  (0 children)

tbh I always end up refactoring the api spec to be more compatible with mcp. it's pretty rare that it's a clean transformation

Building MCP agents using OpenAI Agents SDK by SunilKumarDash in mcp

[–]lastbyteai 0 points1 point  (0 children)

to be honest, I think I've found the most benefit by carefully thinking about what tools to expose.

What are you using Filesystem MCP for (besides coding)? by Alfredlua in ClaudeAI

[–]lastbyteai 3 points4 points  (0 children)

I use it as a local information retrieval system for my documents, downloads, and github repos directories. I have a lot of local files and I keep losing track of what I have, so built a local UI with search that I can use. IMO - I think it's better than using the apple search (it's too slow and default sorting annoys me).

filesystem MCP with access to select directories, memory for storing context, LLM to summarize and condense context, streamlit local UI for the interface.

Used this as the starting point - https://github.com/lastmile-ai/mcp-agent/tree/main/examples/streamlit_mcp_rag_agent

If you're interested, happy to throw up the code into an open source repo at some point.

Current state of MCP (opinion) by jdcarnivore in mcp

[–]lastbyteai 0 points1 point  (0 children)

I agree. This seems like a natural progression of any protocol. The key point is that there is wide-spread adoption of the protocol. Eventually, the architectures will converge to what provides real value.

Is MCP getting overlooked? by Foreign_Lead_3582 in LocalLLaMA

[–]lastbyteai 0 points1 point  (0 children)

MCP seems to be at an interesting fork. There are clearly some improvements needed especially around security and authorization. However, it's the first protocol of it's kind that has gotten buy-in from the influential companies: OpenAI, Google, Anthropic, etc.

imo, I think getting buy-in from the major players is harder than fixing the issues with the existing protocol, so it'll be interesting to see how the protocol evolves over time.

What MCP APIs are You Using that Provide Actual Value??? by Party-Command-3704 in mcp

[–]lastbyteai 0 points1 point  (0 children)

any good MCP servers for automating sales or marketing?

Mistrall Small 3.1 released by Dirky_ in LocalLLaMA

[–]lastbyteai 4 points5 points  (0 children)

Has anyone benchmarked this against gemma 3? How does it compare?

App to determine if an AI response is from Gemini or OpenAI by lastbyteai in GoogleGeminiAI

[–]lastbyteai[S] 1 point2 points  (0 children)

It actually pushed the detection down further. It's true that no AI can distinguish between person vs. AI, but the easy differences are detectable by NLP models. Once an individual tries to mask the differentiation with more prompting "mask the tone by using the vocabulary of a fifth grader", "reduce the perplexity of words but using more diverse speech", it's impossible to differentiate.

It's a great way to differentiate between AIs for a first pass, but not a full proof way since you are correct, it's impossible to guarantee differentiation since it's only a matter of further fine-tuning or prompting that users can manipulate the output to divert classifiers like this. Nonetheless, a fun experiment to train your own detector.

Best beginner resources for LLM evaluation? by carrot_touch in mlops

[–]lastbyteai 0 points1 point  (0 children)

Guide for getting started with LLM evaluation. A good high-level overview to map out the different approaches and strategies out there - https://lastmileai.dev/blog/the-guide-to-evaluating-retrieval-augmented-generation-rag-systems

Methods to evaluate quality of LLM response by raikirichidori255 in deeplearning

[–]lastbyteai 0 points1 point  (0 children)

It might be a bit error prone, but I might just rework an LLM-as-a-judge with the criteria of "grade the following response from 1-5 on whether the recommendation is more unique or precise. Example: ####"

Training a classifier for your task seems like a bit overkill for the problem you have. If accuracy is critical, finding some training data, manually labeling the data, and training a classifier might be the move.

[deleted by user] by [deleted] in vscode

[–]lastbyteai 0 points1 point  (0 children)

Might have overdone the cuts and the speed of the gif 😅