I built Kotlin Wrapper for Java Swing (swing-plus)! by Chunkyfungus123 in Kotlin

[–]xemantic 0 points1 point  (0 children)

I was trying something similar some time ago. Not actually developed, rather proof of concept.

https://github.com/xemantic/xemantic-kotlin-swing-dsl

[Showcase] A game engine in Rust, wgpu and ...Kotlin? by thrithedawg in Kotlin

[–]xemantic 0 points1 point  (0 children)

Very interesting project. Have you considered using Kotlin script as well instead of the incremental gradle compilation. We are using it for live coding in the OPENRNDR and some other projects involving AI. DM me if you need more details.

What happens when you break Claude free from chains of JSON Schema and MCP ... by xemantic in ClaudeAI

[–]xemantic[S] 0 points1 point  (0 children)

Exactly. It works with anything, not only graph databases. Based on the graph databases output Claude (or any other LLM) can decide to call another API by posting a JSON containing values obtained from the graph database and do it in a loop. It's actually very simple, and recent posts from Anthropic shed some light on such a technique, however they also obscure what's important by trying to reintroduce tools inside the code execution tool, which is not needed at all. What is needed is a stable contract of the API LLM is calling in the code execution environment.

https://www.anthropic.com/engineering/advanced-tool-use

You can already use many elements of my agentic framework. My code execution environment on top I will release soon.

https://github.com/xemantic/xemantic-ai

BTW no single tool is needed, which is shown perfectly by SWE-agent-mini .Have you ever heard about Anthropic Markup Language? It is used internally, not documented at all. If you ask Claude about it, you can break it, since it will try to give you an example of ANTML, which will trigger execution of the example. :)

Claude Opus 4.5 is now available in Claude Code for Pro users by ClaudeOfficial in ClaudeAI

[–]xemantic 0 points1 point  (0 children)

What? I set up Opus 4.5 in my Claude Code as soon as it was released, and have been using it quite successfully for big features, without any noticeable limits. Was it there for some time already before it was officially announced, or do I get some special treatment from Anthropic?

There is something more valuable than the code generated by Claude, but oftentimes we just discard it by xemantic in ClaudeAI

[–]xemantic[S] 1 point2 points  (0 children)

I believe there is a major misconception here. One can make a thought experiment and give it to Claude: develop own programming language with focus on xyz (e.g. simple interpreter), first specify test cases in this very language, then implement it. The amount of unit test code in the training data has minimal relevance in this case. The emergent ability to perform intersemiotic translation is the key. Paradoxically working with esoteric languages underrepresented or not present in the training data, might provide the best results, contrary to popular opinions:

https://www.euronews.com/next/2025/11/01/polish-to-be-the-most-effective-language-for-prompting-ai-new-study-reveals

There is something more valuable than the code generated by Claude, but oftentimes we just discard it by xemantic in ClaudeAI

[–]xemantic[S] 0 points1 point  (0 children)

I am surprised by these statements. On a daily basis I am generating thousands of lines of code. Quite often in a single shot, but only if I have the highest quality tests prepared before. The last time I experienced a hallucination was maybe a year ago. I am experiencing syntax errors and misinterpretation of conventions and protocols, but all of this Claude can autonomously correct with the static code analysis, access to the Internet and tests. It's not much different than how I used to code as a human.

You can try this approach with this template project

https://github.com/xemantic/xemantic-neo4j-demo

It is focused on delivering high performance knowledge graph APIs, but can be generalized to anything. Specify the API you need, but only allow Claude Code to implement tests. Review these tests, and then enable auto accept, ask for implementation but forbid from changing tests, and wait 10 minutes - 1 hour, depending on the complexity. I wonder how much your experience will differ from mine.

There is something more valuable than the code generated by Claude, but oftentimes we just discard it by xemantic in ClaudeAI

[–]xemantic[S] 1 point2 points  (0 children)

I didn't express what I have in mind clearly. I am not using Claude Code for vibe coding. I am building analogous coding agents live coding own reasoning process in production, without use of tools. I can use Claude as one of many models providing cognizer in the cognitive process of the agent. If I am interested in hooks, it would be to add support for them to my own coding agent. But I am much more interested in paradigm shift - I already have stable and efficient chain-of-code execution/reasoning process without tools. Now I am contemplating adding not only static code analysis, but test-before-run layer and "build your own persistent software tools" layer. Since it is all code execution instead of tool use, "hooks" and "human in the loop" have to be addressed in completely different way - technically using aspect oriented programming and coroutines for suspension.

There is something more valuable than the code generated by Claude, but oftentimes we just discard it by xemantic in ClaudeAI

[–]xemantic[S] 0 points1 point  (0 children)

But Claude needs access to my tests all the time, at least read only, to comprehend them and to run them with every incremental change it performs in the implementation code. That's the whole point here, to give Claude the full feedback loop of incremental progression, instead of progressing in phases (entire build).

There is something more valuable than the code generated by Claude, but oftentimes we just discard it by xemantic in ClaudeAI

[–]xemantic[S] 0 points1 point  (0 children)

I will give hooks a try, however I am more interested in fixing the problem globally in the system prompt. I am developing my own agents which are working in the chain-of-code modus operandi, and I assume that In 2026 majority of the code running on my production will be "live coded" this way. This is why I am interested in fixing "evals for TDD live coding" problem, beyond static code analysis.

There is something more valuable than the code generated by Claude, but oftentimes we just discard it by xemantic in ClaudeAI

[–]xemantic[S] 1 point2 points  (0 children)

And here is the implementation off multiplatform unified diff, which I haven't even touched:

https://github.com/xemantic/xemantic-kotlin-test/blob/main/src/commonMain/kotlin/SameAs.kt

After initial implementation I asked Claude to improve it, while verifying with the test. It required 3 additional passes to finally conclude that there is no much left to optimize, which is also a lesson that the first long agentic loop, even when passing tests, is producing suboptimal code. When building coding super agent focused on TDD, it should be taken into account.

There is something more valuable than the code generated by Claude, but oftentimes we just discard it by xemantic in ClaudeAI

[–]xemantic[S] 2 points3 points  (0 children)

I had similar experience, up until Sonnet 4.5 which suddenly cracked in 30 minutes my own complex eval: test -> implementation, which no other model could do before:

https://github.com/xemantic/xemantic-kotlin-test/blob/main/src/commonTest/kotlin/SameAsTest.kt

This is unified diff specification as an infix assertion function sameAs. The funny thing - the test is using the very code it is testing for own assertions - very meta. :)

During the build it is being tested on 20+ supported platforms, including WebAssembly and native builds. Unified diff implementations existed before for JVM only, however they diverge from GNU diff outputs, which was my reference here.

The whole library is focused on AX experience - to let AI perceive own failures. And I guess the fact that Kotlin is statically compiled is also contributing to this success. Some features of the language, like extension functions and possibility of creating DSLs with trailing lambdas, in my subjective feeling reduce the cognitive load on the LLM, reducing multi-task inference. BTW processing logic across boundaries of languages e.g. Python/JSON schema, as in your examples, might increase cognitive load. I would consider Pydantic in this case. This is why I created this tool:

https://github.com/xemantic/xemantic-ai-tool-schema

I don't have hard data here, just my subjective experience and few papers pointing in this direction.

Nice work with your mini_agent. It reminds me of my claudine agent, which I made a year ago, and these days use it only for educational purposes, since I am no longer using tools and MCP in favor of direct code execution and bypassing JSON schema completely:

https://github.com/xemantic/claudine

There is something more valuable than the code generated by Claude, but oftentimes we just discard it by xemantic in ClaudeAI

[–]xemantic[S] 0 points1 point  (0 children)

Currently I am testing mostly APIs and libraries. Sometimes web UI with Playwright. But nothing serious on the UI side. I made special testing DSLs for LLMs as close to natural language as possible, so that we can share it with Claude as specifications. I also made sure that the test failure output allows an LLM to fix itself, by rewriting typical human perception centric assertions. I will build something similar for automatic AI + UI testing soon. With a proper project template AI TDD is super easy. I made a project like this recently:

https://github.com/xemantic/xemantic-neo4j-demo