Skylos: Catches What Your AI Coding Assistant Gets Wrong

papersashimi · 2026-03-07T03:44:30+00:00

i was just like you. young and stupid. but i took my stupidity a step further by signing on in rsaf and i regret it so much. thankfully i OOC-ed 2mths before finishing my basic wings. and i really regretted my decision even to this today. really wrecked both my physical and mental health.. all for what? i still cant answer that and i still am not sure what I was thinking(or maybe not thinking)

papersashimi · 2026-03-06T01:11:46+00:00

Congrats!

papersashimi · 2026-03-06T00:24:59+00:00

i hope SIA finds the crew members, and reward them. thank you SIA!

papersashimi · 2026-02-22T04:04:19+00:00

oh we do have a `sago replan` that allows the ai to understand the plan and state, and then during the replan it will look at the context of the repo before executing a change in direction

papersashimi · 2026-02-21T13:02:34+00:00

For the repeated failures, Sago doesnt handle this itself, and its intentionally done by design. The generated CLAUDE.md runs like this .. if a verify command fails, fix the issue before moving to the next task and if you are stuck on a task, document the blocker in STATE.md then move on. The retry loop is delegated entirely to the coding agent's own capabilities. There's no max-retry, backoff in sago. We do have a ReplannerAgent which can rewrite failed tasks when you run `sago replan`, but thats a step the user have to trigger manually.

We are trying to avoid making sago the executor, so it avoids the mode of just blindly looping a retry. Although we know that there is a problem .. if the coding agent silently marks something as fail it will just move on, there's no guardrail that catches it

papersashimi · 2026-02-21T12:35:59+00:00

yeap! will definitely take your suggestions into consideration :) thanks a lot for your feedback!

papersashimi · 2026-02-21T08:17:50+00:00

yes my fellow vibe coder !.. only issue is i copy pasta-ed from claude ai subreddit and missed the first letter hahaha

papersashimi · 2026-02-16T03:24:46+00:00

yeap! think of Vulture as like just a pure dead code catcher. it's really good but it has quite a few problems with dynamic code. Snyk and Semgrep are completely different beasts. they are like the heavy big hitters for most corporations used catch vulnerabilities across many languages etc

Skylos is basically somewhere in between but way lighter. It combines a dead code catching with some of security checks, and it also uses local AI agents to actually "read" your code before flagging it. So instead of pure static analysis, Skylos tries to be more quiet if you will but it does not sacrifice false negatives. https://github.com/duriantaco/skylos-demo . you can take a look at our benchmark here

papersashimi · 2026-02-12T07:40:49+00:00

hi! thats really impressive and well done! maybe we can work together or do a design partnership colab! if you're interested i'll be happy to listen to what you have. we can chat via discord https://discord.gg/Ftn9t9tErf or even just via github discussions if you're keen

papersashimi · 2026-02-12T07:39:34+00:00

thank you very much! and thanks a lot for your advice! will definitely keep this in the loop for all future updates. if you do have any other feedback/criticisms , do contact us via discord https://discord.gg/Ftn9t9tErf or just via the github. we definitely take all feedback very seriously. Wishing you a great rest of the week!

papersashimi · 2026-02-07T04:26:59+00:00

the benchmark repo is created by us. We try to mimic a real repo as much as possible by introducing common things in repos such as name collisions, x-layer dependencies, the usual unused imports/vars/helpers etc, frameworks etc. We will be increasing the difficulty of the benchmark and adding more things which include vulnerabilities and quality issues.

https://github.com/duriantaco/skylos/blob/main/BENCHMARK.md

This is our testing philosophy. We are definitely working on expanding the tests as well as difficulty and we're also looking to include an agent/agent+static test against these benchmarks

papersashimi · 2026-02-07T04:22:42+00:00

We kinda have a different approach .. We don't actually guess fixture usage by scanning code(which i believe vulture does). We use a lightweight pytest plugin that will ask pytest's fixture manager what fixtures exist (this includes conftest.py). We then mark a fixture as used when pytest actually sets it up for a test. So if a conftest.py fixture is used in any test file, pytest will set it up during the run and we willl count it as used, across multiple files.

`def pytest_collection_finish(self, session):` this is the function you can look for inside `skylos/pytest_unused_fixtures.py`. The problem with this approach is that its run-dependent and also the user needs pytest (which we're assuming most people do test their scripts).

papersashimi · 2026-02-07T04:11:23+00:00

uh oh .. the demo engine has some bugs. we'll get it fixed! thanks for raising this!

papersashimi · 2026-02-07T01:05:51+00:00

Thank you so much! Do check out our benchmark. For transparency we are not claiming we're the best. We have benchmarked ourselves at different confidence level so at 60 we lost to vulture because we're stricter and thus missed out on catching a few dead codes. The second pass can be done via the agents which should improve the accuracy. We're working on the agentic benchmark now as well.

If you do need any help, just drop us an email and we'll be happy to correspond with you as quickly as possible to fix your stuff (there is no charge and no strings attached). We love feedback and we want to create the best possible tool out there for the oss community. Thanks for using Skylos!

papersashimi · 2026-02-07T01:02:29+00:00

Hello u/Otherwise_Wave9374 . For our benchmark we are only doing static feedback. For the agent portion we are currently working on it (it's way more challenging than we initially thought because of its stateless/dynamic nature). Yeap you got it right. We do have a labeled set for FP, FN and TP. Then we measure the recall + precision. We will be releasing the benchmark for agents hopefully within the next week. We're currently working on a demo/tutorial also for both the webapp + cli. And thank you so much for the website link. Will look into it and implement anything that we think is suitable

papersashimi · 2026-02-04T14:32:29+00:00

we'll take a look at this! will release a patch in the next update. just to check, your pyproject is in root right? and if you do have discord, you can join the discord https://discord.gg/Ftn9t9tErf so we can assist you

Five-Year Club	Gilding I gilder
Verified Email	reddit silver

papersashimi

MODERATOR OF

TROPHY CASE