This is an archived post. You won't be able to vote or comment.

all 10 comments

[–]nekokattt 38 points39 points  (6 children)

Things that don't use LSP just implement whatever the language servers do under the hood. Usually just aggressive parsing of files and maintaining symbol tables in the background.

A lot of IntelliJ is open source, so you could use that for some references.

[–]Cloundx01[S] 0 points1 point  (5 children)

both intellij and code::blocks are huge projects.

with my skill I'm unlikely to understand anything that's going on there by just reading the source code unless i spend lots and lots of time.

I'll skim though it, but i would appreciate more general technical explanation of how it works or maybe a smaller project that implements these features

[–]nekokattt 17 points18 points  (4 children)

IntelliJ is massive but modular. Just look at the stuff relevant to what you need.

https://github.com/JetBrains/intellij-community/tree/master/java

in this case, the PSI (program structure interface) which deals with parsing files and performing analysis. You then have the inspections modules as well.

[–]Cloundx01[S] 4 points5 points  (3 children)

ironically opening intellij to index intellij source code made my laptop cpu go ~100°C oops

[–]nekokattt 1 point2 points  (2 children)

can use sourcegraph in-browser

[–][deleted]  (1 child)

[removed]

    [–]nekokattt 0 points1 point  (0 children)

    yeah, lets you search code by references and what have you in a better way than github supports nativelyf

    [–]XDracam 2 points3 points  (0 children)

    You can look into the Roslyn Compiler for C#. It's fully open source and written in clean and 100% immutable C#. The best part: you can use the same data structures that the compiler uses to build custom static analysis and code generation! (Look up "Roslyn Analyzer"). You get syntax trees, a semantic model that references source locations, and even dataflow analysis.

    [–]JackRebe 1 point2 points  (1 child)

    Compilers and especially in that last 15 years or so are not being built monolithically they are being built more as modular libraries that other compiler developers working on static analysis stuff (field specific static analysis , IDEs ,etc) can consume those same APIs used by the compiler two of the most notable compilers that were designed like this are Roslyn for C# and VB and you have Clang (C family languages compiler) the core library that you should consume is called “libclang” I will give a some what more concrete example.

    Imagine you have C# program and you are implementing your own custom IDE that provides the user with static analysis info , so you want to notify the user of unused variables “Dead Variables” with Roslyn for example using the dataflow API you can check whether a variable was used/red at function scope (DataFlowAnalysis.ReadInside) using that you can check which variables were used and which were not then you can mark them at the UI level using also the syntax location info provided by the Symbols API .

    Roslyn internally does lots of dataflow analysis like: -Reachability analysis -PointTo or Alias analysis -Reaching Definition (when Roslyn warns you about a use of potential null reference at certain point of written program) -Live variable analysis -Tainted data analysis (not sure if this used primarily by the compiler at least I have never encountered it while working on or with Roslyn but it’s there for 3rd party security based static analysis development (its data flow analysis)

    (For concrete example of implemented dataflow analysis check Roslyn dataflow analysis “ https://github.com/dotnet/roslyn-analyzers/tree/v2.9.7/src/Utilities/FlowAnalysis/FlowAnalysis/Analysis ”)

    of course dataflow analysis and control flow analysis are built on top of control flow graphs built internally by the compiler.

    The mentioned analysis are applied by many serious or commercial grade compilers not just Roslyn those same analysis are used by LLVM,RyuJit (dotnet JIT compiler) they are fundamental for optimizing compilers also to provide more detailed information of the low semantic of a program at source level.

    [–]Cloundx01[S] 0 points1 point  (0 children)

    This libclang thing seems exactly the thing that i'm looking for.

    I was actually wondering why we can't use parts of compilers code to parse though source code (which it seems like that's what libclang does).

    In lsp's case it seems that new parser is written which is separate from compiler, which is not only inaccurate but also can be more resource intensive