LSP is pretty ok. Better than the before times I suppose. Although C++ has long had good IDE support so it hasn't affected me too much.
I have a maybe wrong and bad opinion that LSP is actually at the wrong level. Right now every language needs to implement a from scratch implementation of their LSP server. These implementations are HUGE and take YEARS to develop. rust-analyzer is over 365,000 lines of code. And every language has their own massive, independent implementation.
When it comes to debugging all native language support common debug symbol formats. PDB for Windows and DWARF for Nixy things. Any compiled language that uses LLVM gets debug symbols and rich debugging "for free".
I think there should be a common Intellisense Database file format for providing LSP or LSP-like capabilities. Ok sure there will still be per-language work to be done to implement the IDB format. But you'd get like 95% of the implementation for free for any LLVM language. And generating a common IDB format should be a lot simpler than implementing a kajillion LSP protocols.
My dream world has a support file that contains: full debug symbols, full source code, and full intellisense data. It should be trivial to debug old binaries with full debugging, source, and intellisense. This world could exist and is within reach!
1. I use zero languages that use either PDB or DWARF that are not named "C".
2. You are either overestimating the level of detail available in PDB/DWARF or underestimating the massive amount of language-specific work needed for even basic features (e.g. methods, which lack any cross-language ABI) given just what PDB/DWARF give you.
3. What LSP provides and what PDB/DWARF offer are only very loosely related. Consider the case of writing function1, then (without compiling) writing function2 that calls function1. It is typical for an LSP to offer completion and argument information when writing out the call for function1. That's not something you get "for free" with PDB/DWARF.
> You are either overestimating the level of detail available in PDB/DWARF
Uhhh. I didn’t say PDB/DWARF already have the necessary information. In fact I even proposed a new file format! I suggest you re-read what I said.
> Consider the case of writing function1, then (without compiling) writing function2 that calls function1. It is typical for an LSP to offer completion and argument information when writing out the call for function1. That's not something you get "for free" with PDB/DWARF.
What do you think LSP servers do in the background? They’re effectively compilers that are CONSTANTLY compiling the code.
Amusingly rust-analyzer takes longer to bootstrap than a full and complete clean and build. Maybe it’s not as parallel, I’m not sure.
I disagree. A compiler for batch building programs and a compiler for providing as much semantic information about incomplete/incorrect/constantly changing programs are completely different tasks that require completely different architectures and design considerations.
No. Actually "interactive" frontends in batch compilation mode generally have better error messages in this mode too. Yes, it may make the batch compilation (the frontend part) sligthly slower, but won't turn Go into Rust (or Haskell or C++).
And there always is the possibility to stop in batch mode when the first error occured.
First of all, a compiler for a 100% correct program definitely has all the necessary information for robust intellisense. They don’t currently save all the data, but it should exist.
So the only real question is whether they can support the 0.01% of files that incomplete and changing?
I’ll readily admit I am not a compiler expert. So I’m open to being wrong. But I certainly don’t see why not. Compilers already need to support incorrect code so they can print helpful error messages. Including different errors spread through out a single file.
It may be that current compilers are badly architected for incremental intellisense generation. But I don’t think that’s an intrinsic difference. I see no reason that the tasks require “completely different architectures”.
> First of all, a compiler for a 100% correct program definitely has all the necessary information for robust intellisense.
It doesn't. Intellisense is supposed to work on 100% incorrect and incomplete programs. To the point that it should work in syntactically invalid code.
> Intellisense is supposed to work on 100% incorrect and incomplete programs.
Correct. I literally discussed this scenario in my comment!
If the program compiles successfully then the compiler has all the information it needs for intellisense. If the program does NOT fully compile then the compiler may or may not be able to emit sufficient intellisense information. I assert that compilers should be able to support this common scenario. It is not particularly different from needing to support good, clear error messages in the face of syntactically invalid code.
> These are two very different tasks quite at odds with each other
Are they? I feel like intellisense is largely a subset of what a compiler already has to do.
I’d say the key features of an LSP are knowing the exact type of all symbols, goto definition, and auto-complete. The compiler has all of that information.
Compilers produce debug symbols which include some of the information you need for intellisense. I wrote a PDB-based LSP server that can goto definition on any function call for any language. Worked surprisingly well.
If you wanted to argue that intellisense is a subset of compiling and it can be done faster and more efficiently I could buy that argument. But if you’re going to declare the tasks are at odds with one another I’d love to hear specific details!
On the efficiency angle, I think a big difficulty here that isn’t often discussed is that many optimization strategies relevant to incremental compilation slow down batch compilation, and vice versa.
For example, arena allocation strategies (i.e internment of identifiers and strings, as well as for allocating AST nodes, etc) is a very effective optimization in batch compilers, as the arenas can live until the end of execution and therefore don’t need “hands on” memory management.
However, this doesn’t work in an incremental environment, as you would quickly fill up the arenas with intermediary data and never be deleting anything from them. This is one reason rust-analyzer reimplements such a vast amount of the rust compiler, which makes heavy use of arenas throughout.
As essentially every programming language developer writes their batch compiler first without worrying about incremental compilation, they can wind up stuck in a situation where there’s simply no way to reuse their existing compiler code for an IDE efficiently. This effect tends to scale with how clever/well-optimized the batch compiler implementation is.
I think the future definitely lies in compilers written to be “incremental first,” but this requires a major shift in mindset, as well as accepting significantly worse performance for batch compilation. It also further complicates the already very complicated task of writing compilers, especially for first-time language designers.
That's a great point about allocation/memory management. As an example, rust-analyzer needs to free memory, but rustc's `free` is simply `std::process::exit`.
> I think the future definitely lies in compilers written to be “incremental first,” but this requires a major shift in mindset, as well as accepting significantly worse performance for batch compilation. It also further complicates the already very complicated task of writing compilers, especially for first-time language designers.
I'm in strong agreement with you, but I will say: I've really grown to love query-based approaches to compiler-shaped problems. Makes some really tricky cache/state issues go away.
> Are they? I feel like intellisense is largely a subset of what a compiler already has to do.
They are distinct! Well, not just intellisense, but pretty much everything. I'll paraphrase this blog post, but the best way to think about think about the difference between a traditional compiler and an IDE is that compilers are top-down (e.g, you start compiling a program from a compilation unit's entrypoint, a `lib.rs` or `main.rs` in Rust), but IDEs are cursor-centric—they're trying to compile/analyze the minimal amount of code necessary to understand the program. After all, the best way to go fast is to avoid unnecessary work!
> If you wanted to argue that intellisense is a subset of compiling and it can be done faster and more efficiently I could buy that argument. But if you’re going to declare the tasks are at odds with one another I’d love to hear specific details!
Beyond the philosophical/architectural difference I mentioned above, compilers typically have a one-way mapping between syntax and mapping, but to support things like refactors or assists, you often need to do the opposite: go from semantics to syntax. For instance, if you want to refactor from struct to an enum, you often need to find all instances of said struct, make the semantic change, then construct the new syntax tree from the semantics. For simple transformations like a struct to an enum, a purely syntax-based based approach might work (albeit, at the cost of accuracy if you have two structs with same name), but you start to run into issues when you consider traits, interfaces (for example: think about how a type implements an interface in Go!), or generics.
It doesn't really make sense for a compiler to support above use cases, but they're are _foundational_ to an IDE. However, if a compiler is query-centric (as rustc is), then it's pretty feasible for rustc and rust-analyzer to share, for instance, the trait solver or the borrow checker (we're planning/scoping work on the former right now).
Other comments have addresses many always of your comment. The constantly changing part is also an important feature for recompilation being more efficient than recompiling from scratch each time. You can read about it here: https://rustc-dev-guide.rust-lang.org/queries/query-evaluati...
There is recording of a talk on YouTube from Niko Matsakis that goes into the motivation.
In conclusion, you don't really want to optimize for the batch use case, even outside of IDE support.
Nonsense. Given that the end user of both is a human, you want the compiler that builds program to know as much about semantics to aid in fixing buggy/incomplete/incorrect programs.
Rust-analýzer is an example of what not to do, which is reimplementing a compiler frontend. Ideálly it should be the samé as the "real" compiler is using. Of course this has it's own problems, which the Haskell LSP this post is about, shows. As compilers not written for being used "interactively".
> Any compiled language that uses LLVM gets debug symbols and rich debugging "for free".
That doesn't hold for C++ and much less for any language even "less C" than C++. Like languages using a GC, e.g. Roc https://www.roc-lang.org/
You need support for each language in the debugger, as symbols do not contain semantics. As users we for example know that `foo::bar` and `foo::baz` are methods of the same class `foo`, the debugger doesn't.
The problem with GCs is that pointers must contain some additional information (like an additional bit for mark and sweep), they are not "just" pointing to some memory. Without "knowing" that, the debugger cannot follow the pointer to its target. Or tricks with unboxed ints like making them 1 bit smaller and using the first 1 as a tag for "this is not a pointer, but an integer".
> I have a maybe wrong and bad opinion that LSP is actually at the wrong level. Right now every language needs to implement a from scratch implementation of their LSP server. These implementations are HUGE and take YEARS to develop. rust-analyzer is over 365,000 lines of code. And every language has their own massive, independent implementation.
rust-analyzer a big codebase, but it's also less problematic than the raw numbers would make you think. rust-analyzer has a bunch of advanced functionality (term search https://github.com/rust-lang/rust-analyzer/pull/16092 and refactors), assists (nearly 20% of rust-analyzer!) and tests.
> I think there should be a common Intellisense Database file format for providing LSP or LSP-like capabilities. Ok sure there will still be per-language work to be done to implement the IDB format.
> But you'd get like 95% of the implementation for free for any LLVM language. And generating a common IDB format should be a lot simpler than implementing a kajillion LSP protocols.
I wouldn't say 95%. SCIP/LSIF can do the job for navigation, but that's only a subset of what you want from an IDE. For example:
- Intellisense/autocomplete is extremely latency sensitive where milliseconds count. If you have features like Rust/Haskell's traits/typeclasses that allow writing blanket implementations like `impl<T> SomeTrait for T`, it's often faster to try to solve that trait bound on-the-fly than storing/persisting that data.
- It'd be nice to handle features like refactors/assists/lightbulbs. That's going to result in a bunch of de novo code needs to exist outside of a standard compiler, not counting all the supporting infrastructure.
> My dream world has a support file that contains: full debug symbols, full source code, and full intellisense data.
Rust tried something similar in 2017 with the Rust Language Server (RLS, https://github.com/rust-lang/rls). It worked, but most people found it too slow because it was invoking a batch compiler on every keystroke.
I have a maybe wrong and bad opinion that LSP is actually at the wrong level. Right now every language needs to implement a from scratch implementation of their LSP server. These implementations are HUGE and take YEARS to develop. rust-analyzer is over 365,000 lines of code. And every language has their own massive, independent implementation.
When it comes to debugging all native language support common debug symbol formats. PDB for Windows and DWARF for Nixy things. Any compiled language that uses LLVM gets debug symbols and rich debugging "for free".
I think there should be a common Intellisense Database file format for providing LSP or LSP-like capabilities. Ok sure there will still be per-language work to be done to implement the IDB format. But you'd get like 95% of the implementation for free for any LLVM language. And generating a common IDB format should be a lot simpler than implementing a kajillion LSP protocols.
My dream world has a support file that contains: full debug symbols, full source code, and full intellisense data. It should be trivial to debug old binaries with full debugging, source, and intellisense. This world could exist and is within reach!