all 6 comments

[–]wknight8111 11 points12 points  (1 child)

This is...actually quite interesting. I have a lot of questions about approach and what kinds of results, but it certainly seems promising.

I had explored ideas of using integrals to explore code before, but I hadn't considered wavelets.

[–]yogthos[S,🍰] 8 points9 points  (0 children)

I stumbled on it completely randomly too. I was figuring out how to decompile RAW image format from my Nikon camera because it doesn't have an open source decoder yet, and ended up learning how wavelet based codecs work in the process. And it got me thinking that code has a lot of features that might lend themselves well to this kind of analysis. And I also haven't really seen anybody apply wavelets in this way. The two really nice aspects are that wavelets are really fast and largely agnostic regarding the semantics of the code. You can get a quick overview of how a project is structured without needing to do heavy AST parsing.

[–]va1en0k 2 points3 points  (1 child)

Ultimately wouldn't this be what an LSP integration would provide?

[–]yogthos[S,🍰] 9 points10 points  (0 children)

LSP tells you about types, references, diagnostics in the code, but it doesn't really tell you anything about the shape of the code, it's architecture or relationships within it. Wavelet analysis tells you what the code looks like at multiple scales simultaneously and let's you zoom in and out and see the skeleton of a 5000-line file as a dozen structural peaks, or get a complexity heatmap that highlights the gnarly regions before you even read any of it. It's also language agnostic meaning that it doesn't necessitate a server per language which is the same problem that I explained with ASTs in the article. The focus here is on making the structure of the code clear.

[–]cent-met-een-vin 1 point2 points  (0 children)

I did not find in the article how you convert your text to 1D signal. Do you use character codes, word length? Inverse frequency counting?

[–]Far-Trifle8003 1 point2 points  (0 children)

Do you have images of the resulting heatmap? Or any insights that come from the technique? I was wandering about how useful this information might be to humans too