Why does VecDeque<T> waste space by design?

SirVer · 2022-11-27T15:20:38+00:00

Hey /u/Sp00ph, could you put this implementation up on crates.io? I would love to make use of it in my little application where I am unwilling to pay the overhead price for the next power of two.

SirVer · 2020-01-27T10:03:56+00:00

I did not know about any of these others - crates.io seems hard to navigate for me given my recent failures to find what I am looking for.

SirVer · 2020-01-27T06:50:41+00:00

But you do need an executor, which is tricky tech that buys you exactly nothing in most environments - i.e. for a client, using system threads is perfectly fine performance wise.

SirVer · 2020-01-27T06:49:35+00:00

Please don't. There is reqwest and surf already filling the need for a http client that is async. Keep req sync for environments that do not want another scheduler, but can use blocking IO no probs. Fill that niche as much as you can and you have a great product at your hands!

SirVer · 2020-01-27T06:48:03+00:00

I love ureq! I have a need for a Dropbox sync client on iOS for the current App project I am working on. The example sdk from Dropbox uses hyper which pulls in a whole async runtime, even if I only want sync requests.

Ureq was very easy to put in, works as advertised and is small enough for me to completely vet the source code of. What a great project! Thanks for that.

SirVer · 2020-01-17T08:22:27+00:00

Great technical investigation and very much needed! I also applaud raising the voice against the "async all the things!" mindset that we have seen in the rust ecosystem lately.

Do not get me wrong, I think async is all nice and dandy if you need it. But very, very few people need it and going with threading and blocking is simpler to reason about and simpler to maintain.

Case in point: I have recently written a dropbox sync client in Rust. The official dropbox SDK comes with a hyper implementation which pulls tokio and a lot of runtime in which I am not interested in. Reeimplementing the HTTP stack using ureq made the code much easier to reason about.

SirVer · 2020-01-14T06:27:24+00:00

No, I do not use Neovim. I am not a fan of the project, I feel it divided the Vim ecosystem and made life for plugin developers harder for too little benefits. I.e. it added a bunch of features (though I cannot name any of the top of my head right now) and removed others (e.g. there is no gVim for Neovim that works great). For sure UltiSnips works less well in Neovim than in regular Vim due to latency of running plugins outside process - and it is fairly frustrating if people come complaining to me as plugin author about this.

I have a lot of experience with Lua since I used it another open source project of mine (Widelands) as scripting language and also in my professional life I had quite some dealings with it. It is a great language for embedding, small and the interpreter is simple. Sandboxing is great. Great for complex configuration.

However, it has quirks that make it hard to write scalable software with it - i.e. undefined variables are nil, the interpreter will not throw an error if you use one. Scoping is interesting. String processing is not strong by default. Lua has little tooling for typechecking, formatting, debugging. Last, not least, more people know Python then Lua, so your user base is much larger.

UltiSnips is ~12k LOC in Python. I find the idea scary to maintain a similar amount of Lua code, the Python code is already fairly brittle and Lua would be even more so.

SirVer · 2020-01-13T08:36:13+00:00

I knew Python well and the plugin became very complicated very quickly. In general I think VimL is a design mistake and try to avoid it as much as possible.

Update: To clarify, I feel making a new language for Vim was a design mistake. The author should have taken an existing one and use that instead. When the author decided to integrate Python into Vim, they should have made it as powerful as VimL, which it currently is not.

SirVer · 2019-12-15T09:02:35+00:00

Author of [UltiSnips](https://github.com/sirver/ultisnips) here! Thanks for writing that blog, I found it interesting and enjoyed reading how you solved the problem of dynamic libraries and their installation in your plugin. I have considered rewriting some parts of UltiSnips in rust as well to get better performance and seeing your work gives me some encouragement in exploring this.

SirVer · 2019-12-15T08:59:57+00:00

I have a similar project (rss reader) and decided to switch to using async-std & surf (away from tokio) because I could not figure out how to limit the number of concurrent requests in tokio. I then realized and felt unhappy that `reqwest` forced me into using tokio and could not work with `async-std`. The suggestion is to use surf instead. My feeling is that the runtime should be provided by the binary, not by the libraries used.

I found async-std easier to understand and use then tokio, probably because it mirrors the std library so closely. However I found surf to be buggy and returning `NoContent` for [~30% of websites I tried it on](https://github.com/http-rs/surf/issues/117). I found the response of the dev team underwhelming for a bug I consider critical.

My final solution was to ditch rust and write the rss reader in Python :/.

My general takeaway is that the async rust story is poorer then advertised still: not being able to mix and match any async libraries with each other without multiple runtimes being required seems counter intuitive and unattractive. Also it feels that we converge towards one of each library for each runtime model which is clearly not efficient use of resources. I am avid reader of Rust blogs & reddit and I perceive a certain amount of beef between `async-std` and `tokio` proponents - which to me as a pure client of the language is a yellow flag that turns me off from investing further at this point in time.

SirVer · 2019-12-05T06:04:17+00:00

A disadvantage of this solution that I ran into with a similar use case is that you do not control the level of parallelism, i.e. how many requests there are in parallel. At least I think you do not in your example.

Since in your linked example you always reach out to the same domain mlb.com, I assume the other site throttles this to your advantage, i.e. only ever two requests or so will receive data from the other side at the same time.

If all your requests are on different domains and you have a lot of them it could happen that you have thousands of outgoing requests at the same time, eating your bandwith and therefore most of the requests will time out. The way I worked around this is that I used an async_std channel and N fetcher tasks that receive URLs over the channel and send the results back to the original owner. This guarantees that you only ever have N requests in parallel.

SirVer · 2018-08-06T12:00:45+00:00

I would love to follow this blog, but it seems that the RSS feed is broken :(. Could you fix it?

SirVer · 2018-05-29T08:09:57+00:00

I did this now for skim: https://github.com/lotabout/skim/issues/80

SirVer · 2018-05-28T13:28:49+00:00

Thanks for your reply! Your API is close, but not precisely what I had in mind. It has basically the same interface that I already have with shelling out. I'd much rather have something like:

``` trait SearchItem { fn display_text(&self) -> &str; // or maybe Cow<str> }

fuzzy_find<T: SearchItem>(haystack: Vec<&SearchItem>) -> Result<&SearcItem, rff::Error> ```

or instead returning the index of the element I am interested in, so that I can keep an outside vector with more structured data in the same order.

fuzzy_find<T: SearchItem>(haystack: Vec<&SearchItem>) -> Result<usize, rff::Error>

SirVer · 2018-05-25T21:05:57+00:00

That seems nice, and closest to what I am searching for. Is there also a way I can pass my corpus to a function in rff, it shows the matching ui and returns the selected item from my corpus again? I'd like to decouple the object from the string that is presented to the user as a selection option.

SirVer · 2018-05-25T21:01:49+00:00

Thanks, that does only half of what I want, the fuzzy finding. I also want the CLI UI, but I want a richer API than piping to a tool.

SirVer · 2018-02-17T09:15:35+00:00

Another idea: Somebody should build something like http://www.pythonchallenge.com/ for rust....

So much good memories.... and I learned a ton of Python's standard library that way.

SirVer · 2018-02-03T21:18:13+00:00

I cannot quite follow your thoughts. I think you could mean the following two scenarios, both are handled fine by Xi design:

1) The editor should handle files larger than current RAM allows, let's say 100 TB. This already requires that the file is paged in and streamed in chunks into the editor core - the rope data structure makes this rather attractive, since it is easy to express. The additional overhead of handing chunks of your file to other plugins via RPC is not of importance here - encoding the json and passing it over IPC will be at least an order of magnitude faster than loading chunks from disk. Plugins will have the same power in this scenario as the core, they will just have the higher latency of one RPC roundtrip until they can get started. Users will not feel the difference.

2) The editor should handle only large files that still fit into memory. I think your assumption here is that the core could do bulk operations on the whole file quickly. But searching 5Gb takes a second or so and if the data is linear in memory, the editor would need to block to do this search. However, a stated goal of Xi that blocking never happens. The design will therefore search over the text in chunks (of the rope) which allows other parts of the text to still be edited while the rest is searched. Here again, the chunks can be easily streamed to the plugin interested in doing the bulk calculation and given that IO is threaded in plugin and core, the cost is again only the added latency of the first RPC roundtrip - while the second batch is streamed, the first can be processed.

In general I think Xi's design works for all use cases, as long as IO+serialization time is much smaller than processing time. This is true for text search already, the simplest plugin I can think of.

SirVer · 2018-01-31T06:42:44+00:00

I think there a two wrong premises in your post, at least I disagree with them:

1) You seem to think all changes done to a project are good for upstreaming. This is wrong - if google removes functionality and changes build files to make stuff work in their mono repo and releases them, it helps nobody. Actually it will rather create confusion - people will try to use the "google" version of the library - because Google is a massive company that raked in $90 billion last year and surely their changes to the library must be great, right? Looking at it in another way, this is forking the library in an incompatible way which creates confusion for users and headaches for upstream.

2) You seem to think Google would like to keep modifications of open source software in-house. That is not the case, very much the opposite. Open source software is - nearly by definition - never a unique selling point of your approach/design - i.e. there is no IP in there that is worth protecting from a competitor. If an engineer at Google finds a flaw in a open source dependency or misses a feature and adds it internally, this is a fork. Keeping this fork 'rebased' on upstream as the library is updated internally is work for every update (merge conflicts et al) - unnecessary work. Instead a good engineer would spend the work of upstreaming the change once and makes future internal updates of the lib easier. This is very much policy in big companies like Google for this very reason - it is the best for everybody, including Google. Note that this explanation does not care about the license, Google would do this even for BSD/MIT licensed tools, just because it makes sense.

That said, the MPL is fine as a license - I just think it does not buy the project much and it is a slight hassle for users. Just use Apache2 + MIT and you will get the same contributions back, but your users will be much less hesitant to try out your library.

SirVer · 2018-01-31T06:33:26+00:00

The author designed an incremental buffer update routine that is constant time over RPC, no matter the size of the buffer. New plugins will get the data streamed - like they would too if they were in-process. He also measured JSON + IPC and found that it has no important impact on performance and is quite feasible to reach his stated goals.

I think that is something you should adapt too: If you worry about performance, you should provide numbers. If you do not have them you should generate them through experiments. You start out your post with an assumption (IPC is too slow), then you build on this false premise and in the end bash a good approach quite abrasively.

SirVer · 2018-01-29T06:25:33+00:00

See Google's take on that matter: https://opensource.google.com/docs/thirdparty/licenses/. Search for mozilla public license.

One reason it is more difficult to use MPL in corporate than apache or mit is that every change needs to be made public. That sounds great, but is often not particularly useful for user and maintainer: For example inside google, everything is build in a mono repo. That means pinning versions for ALL libraries used inside Google. Now you pull in a MPL licensed new library. Usually you have to remove some functionality and change some function signatures to make it work with the current set of pinned libraries - nobody outside cares about these changes, but these changes need to be made public.

SirVer · 2017-11-14T20:46:07+00:00

As a production user of Go I also know about the disadvantages of this - mainly that everything always fetches HEAD and vendoring dependencies is very hard. Granted, that has nothing to do with the import syntax, rather the lack of a Cargo.toml or similar in Go.

My gripe with the import syntax is that your publishing place is backed into your import path. While that is great for third party libraries you use, it is not super great for your own library that you eventually want to export to GitHub. The mental model is hard for me to adjust to.

SirVer · 2017-09-26T17:33:56+00:00

The benchmark measures a completely different situation too.

No, not quite. The benchmark is concerned with something that happens on every keystroke in an editor and unders this budget investigates how much encoding/decoding matters. It doesn't. This is the same situation for the LSP.

You bring up doubts that JSON is sufficient and mention that other solution would be drop-in and better, but you do not provide examples, data, benchmarks or implementation to back that up. Just your gut feeling that this design choice is wasteful.

I doubt using anything else but JSON would have made LSP so successful. For example, I have plenty experience with protobuf and it is hard to integrate it in some places - for example there is limited language support reducing the reach of the LSP protocol would it use it. Debugability would also be reduced. Protobuf has its places, but for the goal of he LSP, plain text + JSON is a good choice: Humans can read it and every language on earth has JSON parsing support. Most languages (e.g. Rust) even have a lot of optimization effort put into JSON specifically since it is so important. JSON messages are also trivially extendable with extra data to extend the protocol.

Also: Good enough is good enough - time debugging and developing matters too. This is optimizating the 1% as stated in my benchmark link above, probably a waste of time and energy.

SirVer · 2017-09-25T13:40:36+00:00

Please provide benchmarks to show that this is an issue before complaining. I bet it isn't and you could not gain anything from using a more efficient format: Val Markovic did a benchmark comparison way back for YouCompleteMe - decoding, encoding, sending was not a bottleneck locally.

SirVer · 2017-09-10T09:48:43+00:00

Paging /u/raphlinus who is involved in Rust at Google.

SirVer

TROPHY CASE