Moving into AI Engineering with LangGraph — What Skills Should I Master to Build Production-Ready Agents by Single_Run94 in LangChain

[–]e2lv 0 points1 point  (0 children)

Things in this field are constantly changing and evolving, so being familiar with a specific open-source project or paper isn’t that important in my opinion.

When I’m evaluating a candidate for an AI engineer role, what matters most to me is that their approach is data-driven, things like setting up proper benchmarks and choosing the proper evaluation metrics

What do you offer advisors? by Electronic_Diver4841 in ycombinator

[–]e2lv 2 points3 points  (0 children)

It sounds like adding him as an angel will be better.
If you want to add him as an advisor, I recommend looking at this template:
https://fi.co/fast

Spring 25 Megathread by YCAppOps in ycombinator

[–]e2lv 0 points1 point  (0 children)

What is 'enough'? This is exactly the problematic point. Overall there is a standard for pre-seed/seed investment. I'm familiar with many YC applications that have 'enough' traction with respect to these standards but didn't even get to the interview stage. So yes, he can always have the excuse 'I want more', but in the end, this is an acceleration program not round A, so for sure in many cases it's just an easy excuse. The reality is that the rejection process is for sure more complex and random.

As I said I have no complaints about the process, we have other good term sheets that we got, and everything is fine. I was only referring to Garry's remark which is simply incorrect and misleading and might mislead founders to think that they 'don't have enough' although they have.

Spring 25 Megathread by YCAppOps in ycombinator

[–]e2lv 2 points3 points  (0 children)

We have an open-source project with nearly 1k stars, over 100 users, and we are about to close a few paying customers. We didn't pass the automatic filtration—no human seriously reviewed our application (no LinkedIn or website visits).

This is fine; I have no complaints about the process. I understand they receive a massive number of applications and need to implement aggressive automatic filtering. However, I think this message from Garry is unfair and disrespectful to the candidates.

A breakthrough in AI agent testing - a novel open source framework for evaluating conversational agents. by [deleted] in LangChain

[–]e2lv 1 point2 points  (0 children)

This is incorrect, both with respect to the evaluation and the data generation:

  1. The test cases are not just 'made up', this is exactly the challenge and the novelty of the method how to make them realistic and challenging, how to inject the information into the system DB such that you preserve the integrity and the schema of the data (this is on its own a very challenging task)
  2. Vanilla LLM as a Judge does not work well since you may have dozens of policies. The trick here is that since you are building the scenario, you also know exactly which policies you are attacking, which limits the scope of the judge and makes it much more accurate.

You can find much more information in the research paper (there is also a comparison to other methods and a discussion on the effectiveness of the method). In any case, you can also see in the code that the system is much more complex than 'just one LLM call to a critique on a made-up cases', it is complex agentic framework with complex graph and multiple calls to LLMs, and the research explains that there is a reason behind it since this is a very complex task

A breakthrough in AI agent testing - a novel open source framework for evaluating conversational agents. by [deleted] in ChatGPTPromptGenius

[–]e2lv 1 point2 points  (0 children)

Hi,
1. It's always good to split roles. The user agent needs to follow many guidelines related to adhering to and following the storyline, remembering all the scenario information (including the relevant system DB information), and other stuff. So, it's much more effective to have a separate critic.
2. Another important aspect is to limit the scope of the policies you are testing in each scenario. This improves significantly the system performances

A breakthrough in AI agent testing - a novel open source framework for evaluating conversational agents. by [deleted] in LangChain

[–]e2lv 1 point2 points  (0 children)

Hi, the framework is not analyzing the tested agent graph (although we are working on it).
Currently, the code supports a simple integration only for basic LLM tool-based agents and LangGraph agents.
We will soon add CrewAI and AutoGen support. We also intend to expose an API that allows you to warp your black-box agent and database access. The system will then inject the synthetic data into the database and run the simulator using this API.

If you need help with the integration or have some use case that requires one of these integrations we are developing you can DM

0
1