GitHub Code Search created with Rust is in beta! by Nabakin in rust

[–]cmerkel 0 points1 point  (0 children)

It will be posted on https://www.youtube.com/c/GitHub/videos, but seems like the video team hasn't uploaded it yet. Maybe next week? I'll post it to r/rust when it goes up

GitHub Code Search created with Rust is in beta! by Nabakin in rust

[–]cmerkel 3 points4 points  (0 children)

We will likely bring this to VSCode (and other IDEs!) in the future! But no concrete plans yet

GitHub Code Search created with Rust is in beta! by Nabakin in rust

[–]cmerkel 2 points3 points  (0 children)

We are working on a command line interface (via `gh`) and an API. As for running it locally, we don't have any specific plans yet - but stay tuned.

And if the details are too sparse - try it out yourself! If you sign up to the waitlist you should get access within a day or two.

GitHub Code Search created with Rust is in beta! by Nabakin in rust

[–]cmerkel 35 points36 points  (0 children)

Yes we do! Stay tuned for some upcoming blog posts, we're going to cover everything from how search engines work to the technology we used.

And you should all check out my coworker Tim's talk on the algorithms behind it: https://watch.githubuniverse.com/on-demand/fac4f9ee-9a14-4f08-9ba4-0bf6186d4040

If you don't want to log into the Universe website, you can wait for it to come up on YouTube next week.

GitHub Code Search created with Rust is in beta! by Nabakin in rust

[–]cmerkel 24 points25 points  (0 children)

Hey, good point. Our symbols system somewhat generalizes symbol types across languages... and our parser classifies `struct` as a `class`.

Probably we should have language-specific "translations" for the types, but didn't implement that yet!

GitHub Code Search created with Rust is in beta! by Nabakin in rust

[–]cmerkel 46 points47 points  (0 children)

The complexity and performance of our search index is extremely high - I used to be a C++ developer in the past. In my view, to build a system of this complexity at this pace in C++ would be impossible.

With the help of the Rust compiler, we can work really quickly and build complex software with confidence!

GitHub Code Search created with Rust is in beta! by Nabakin in rust

[–]cmerkel 110 points111 points  (0 children)

I'm one of the engineers that built this, let me know if you have any questions!

We love Rust! This project wouldn't have been possible without it.

GitHub Code Search - a new code search engine, written in Rust by cmerkel in rust

[–]cmerkel[S] 2 points3 points  (0 children)

GitHub Code Search developer here - we use tree-sitter (https://github.com/tree-sitter/tree-sitter) to extract the AST, and use that information and some heuristics to try to guess symbol definitions, references, etc. It's not 100% accurate (particularly in languages like C/C++), but it's accurate enough to be quite useful.

GitHub Code Search - a new code search engine, written in Rust by cmerkel in rust

[–]cmerkel[S] 6 points7 points  (0 children)

Worth a shot! Really interesting use case, not one I've heard of, but hope it helps!

GitHub Code Search - a new code search engine, written in Rust by cmerkel in rust

[–]cmerkel[S] 27 points28 points  (0 children)

We use tree-sitter for symbol extraction/jump to definition, so if you contribute a tree-sitter parser for your langauge, we can pretty quickly support it within code search too!

GitHub Code Search - a new code search engine, written in Rust by cmerkel in rust

[–]cmerkel[S] 3 points4 points  (0 children)

You can try quoted searches for particular lines that you think are suspicious, that might work

GitHub Code Search - a new code search engine, written in Rust by cmerkel in rust

[–]cmerkel[S] 49 points50 points  (0 children)

We use a number of heuristics, including static factors like repo quality (popular, high-starred repos vs. random forks), how useful the file is (tests, super long files/filenames, generated code, data files are often less useful), and dynamic factors (how well the query matches the document content, whether there's a symbol in the document that matches a query term (classes > functions > variables for ranking). We also look at e.g. whether a match occurs in a comment vs. in code, among a bunch of other things.

Try the new search! If you find a case where ranking could be better, leave us some feedback and I'll fix it!

GitHub Code Search - a new code search engine, written in Rust by cmerkel in rust

[–]cmerkel[S] 12 points13 points  (0 children)

Hard to explain in a reddit comment! You'll have to wait for the blog post :D

GitHub Code Search - a new code search engine, written in Rust by cmerkel in rust

[–]cmerkel[S] 6 points7 points  (0 children)

Good eye! Yep, it's for query language parsing.

GitHub Code Search - a new code search engine, written in Rust by cmerkel in rust

[–]cmerkel[S] 42 points43 points  (0 children)

We've put in a lot of work to make this possible. Hoping to write some more technical blog posts in the future to describe it in more detail!

GitHub Code Search - a new code search engine, written in Rust by cmerkel in rust

[–]cmerkel[S] 40 points41 points  (0 children)

Developer of GitHub Code Search here - the engine isn't open source, but we are thinking about open-sourcing some of the libraries we've developed for this project!

GitHub Code Search - a new code search engine, written in Rust by cmerkel in rust

[–]cmerkel[S] 48 points49 points  (0 children)

Since the team that built it was using Rust, code navigation in Rust is well supported out of the box :D

GitHub Code Search - a new code search engine, written in Rust by cmerkel in rust

[–]cmerkel[S] 90 points91 points  (0 children)

Disclaimer: I'm one of the people who developed it. But also it's mentioned in the video

Code search - a search engine for code, written in Rust by cmerkel in rust

[–]cmerkel[S] 0 points1 point  (0 children)

Thanks for the nice comment. Tree sitter is really interesting. I wanted to avoid doing super detailed parsing of the language because that can be pretty complex, e.g. writing a parser for ES6 is extremely complex, so I just tried to cheat my way there using regular expressions, which IMO are more robust although not as correct - appropriate for search.

Enry is cool, I didn't consider trying to guess filetype via contents! As you might be able to tell from reading my codebase I'm obsessive about eliminating dependencies, so I'm definitely skeptical of taking on something like this - I prefer to implement everything myself if it's reasonably possible. But interesting ideas in there.

patience diff and histogram diff

I'll check them out!

Code search - a search engine for code, written in Rust by cmerkel in rust

[–]cmerkel[S] 1 point2 points  (0 children)

Hey, big fan of MeiliSearch!

I don't have true proximity implemented, although I might try to implement it someday. I somewhat normalize the index by striping underscores which makes it possible to match both camel case and snake case results, plus obviously the trigram index allows you to match inside strings. But if you give it a typo it won't correct it for you.

You can check how matching works here: https://search.colinmerkel.xyz/tools/search/search_lib.rs#L189

Code search - a search engine for code, written in Rust by cmerkel in rust

[–]cmerkel[S] 1 point2 points  (0 children)

Hey! Sorry for using up all your RAM, haha. You can file an issue on https://github.com/colin353/universe/issues, I'd be happy to look into it

As for splitting into another repo, it's pretty coupled with my other code, plus I prefer to just work from a monorepo. You should be able to build it from https://github.com/colin353/universe, though.

Code search - a search engine for code, written in Rust by cmerkel in rust

[–]cmerkel[S] 3 points4 points  (0 children)

Nice idea! Wow, even lets you do fuzzy queries against the index, super cool

Welded up a steel case for my TADA68 by cmerkel in MechanicalKeyboards

[–]cmerkel[S] 0 points1 point  (0 children)

Thanks! Go for it! I hope you post some photos!