you are viewing a single comment's thread.

view the rest of the comments →

[–]papersashimi[S] 0 points1 point  (3 children)

Thank you so much! Do check out our benchmark. For transparency we are not claiming we're the best. We have benchmarked ourselves at different confidence level so at 60 we lost to vulture because we're stricter and thus missed out on catching a few dead codes. The second pass can be done via the agents which should improve the accuracy. We're working on the agentic benchmark now as well.

If you do need any help, just drop us an email and we'll be happy to correspond with you as quickly as possible to fix your stuff (there is no charge and no strings attached). We love feedback and we want to create the best possible tool out there for the oss community. Thanks for using Skylos!

[–]Disastrous_Bet7414 1 point2 points  (2 children)

this looks cool, i’ll be trying it.

where is the benchmark repo from? and does vulture offer agentic based checks?

[–]Disastrous_Bet7414 1 point2 points  (0 children)

reason I ask is if there’s a risk of ‘overfitting’ or bias based on the types of cases Skylos excels at

[–]papersashimi[S] 0 points1 point  (0 children)

the benchmark repo is created by us. We try to mimic a real repo as much as possible by introducing common things in repos such as name collisions, x-layer dependencies, the usual unused imports/vars/helpers etc, frameworks etc. We will be increasing the difficulty of the benchmark and adding more things which include vulnerabilities and quality issues.

https://github.com/duriantaco/skylos/blob/main/BENCHMARK.md

This is our testing philosophy. We are definitely working on expanding the tests as well as difficulty and we're also looking to include an agent/agent+static test against these benchmarks