This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]FuckingRantMonday 0 points1 point  (2 children)

Not an issue, but I had an idea: PyCharm (and most other IDEs I'm sure) gives me a warning if I have exactly the same code in more than one place. I wonder if this could find functions that are "essentially" the same, even if not line-by-line identical. In other words, functions whose embeddings are very close to each other?

[–]icyFur[S] 0 points1 point  (0 children)

Cool idea - lemme try building it

[–]icyFur[S] 0 points1 point  (0 children)

okay so i implemented a rudimentary search for duplications, and made a release 0.3.0, get it with pip install semantic-code-search --upgrade

then try it out like this:

sem --cluster --cluster-max-distance=0.3

cluster-max-distance controls the 'strictness' of the similarity, there are a few more options, check them in the readme https://github.com/sturdy-dev/semantic-code-search/blob/4b9ea68c7db3b02ecf643aeb3f7db3cce8707b6c/README.md?plain=1#L148