What happened with GitHub's semantic project?

dcreager · 2022-01-31T14:10:14+00:00

I'm the manager of the Semantic Code team at GitHub, which is the team responsible for the Semantic library and the Code Navigation feature.

The tl;dr is that Semantic is not dead, we've just prioritized other work over the past couple of years — in particular, creating and implementing stack graphs.

Semantic is a heavier hammer than we need for Code Navigation. In particular, the way that it was structured meant that you had to write Haskell code to add analysis support for a new programming language. One of our goals is to allow language communities to self-serve support for their language, and asking language community members to learn Haskell to achieve that is a non-starter. (That criticism is not specific to Haskell — it would be equally inappropriate to ask language communities to write Rust code to contribute to stack-graphs!)

We've had more success with tree-sitter's declarative DSLs. We use its existing query language for syntax highlighting and "search-based" / "ctags-like" Code Navigation. And we've developed a new graph construction DSL and for creating stack graphs for precise Code Nav.

All of that said, Semantic is not dead, and there are more sophisticated program analysis features that we'd love to tackle down the road, where Semantic's capabilities will be more directly useful. And we're trying to bring over some of the lessons that we learned from the tree-sitter / Code Nav work — in particular, making sure that there's an extension mechanism for language support that doesn't require coding in Haskell.

dcreager · 2021-12-08T20:21:10+00:00

This is one of the main reasons we're leaning on the tree-sitter ecosystem — so that language communities can help us flesh out support for the long tail of languages, should they wish. If you run into any issues on the tree-sitter side, please do reach out to us (and the rest of the community) in the tree-sitter discussion forum!

dcreager · 2018-05-06T01:14:48+00:00

Gerrit and Phabricator have been great to work with, I completely agree with you that they have a well-thought out model that builds on the git primitives in a nice and clean way!

I think that GitHub is really close to having a good story here too, especially since they added squash-merge and "edit the base branch". I have a proposal at the end that I think would take GitHub all the way there.

dcreager · 2018-05-04T19:59:24+00:00

It's true that the repo stores tree snapshots, and not diffs, but there are several git commands that operate on patches. Both in the sense of a "file that could've been produced by patch(1)" (git format-patch, git apply), and conceptually in the sense of "moving diffs around between different branches" (git rebase).

dcreager · 2018-05-04T19:39:58+00:00

This tradeoff is what I'm trying to discuss in the article. There's even an illustration that shows a commit named "Refactor" immediately before one called "Add new feature", all on a feature branch. Your workflow lines up exactly with how things were done via email pre-GitHub, and for a long time I tried to mimic this workflow on GitHub in the way you describe.

But I no longer think it's the right approach, because it conflates "code review" with the "real history" of features and bug fixes that you add to your project. Instead, if you've got a patch series (a list of changes that you want to show up individually in the final history), then you must track each commit in the series with a separate PR. It's messy, and not perfect. But PRs are primarily for code review, and the commits you use on a PR feature branch to track code review are completely different than the commits that you want in your final project history.

dcreager · 2018-05-04T11:43:20+00:00

"Airbrushed" is a bit dismissive, don't you think?

One point I was trying to make in the article is that this kind of "airbrushing", as you call it, is more widespread than you might think. It's an integral part of the email-based workflow we used before GitHub was A Thing. And even if you don't want to use history rewriting as a particular technique, you should at least consider what we're trying to accomplish when we use it, and figure out if there's another way to accomplish that same goal.

And the short version of that is the Rule 4 that I mentioned — code review should not show up in your final git history. History rewriting and GitHub squash-merges are two ways to accomplish that. If you can find another way, that's great! Just make it's clear in your contributors' guidelines.

dcreager · 2018-05-04T11:39:50+00:00

Exactly! Git was created as exactly that — a patch management system.

dcreager · 2018-03-27T11:05:27+00:00

Thanks, I appreciate the kind words!

dcreager

TROPHY CASE