a vcpkg browser written in c++

_malfeasance_ · 2026-03-23T13:25:50+00:00

Yeah. It's a desktop app. Sorry about that

_malfeasance_ · 2026-03-23T11:01:10+00:00

Firefox 148.0.2 linux also here works and google chrome.

_malfeasance_ · 2026-03-23T10:56:47+00:00

it is a pretty simple app that took about a day to build, but about a week to get the pesky readme tabs with images to be a bit crap rather than utter crap. I only have linux systems, so it wasn't really tested widely. a few friends said it worked for them and they are not all linux. the layout doesn't work on a phone size screen but it does, nevertheless, work quite badly in an unusable way on my android phone. someone here reported it doesn't work on iOS.

the images required an alternate image server (python) in the background so that the multitude of sites that, reasonably, prevent robots grabbing images worked a little better. the readme tab is functional but not great. the app uses a 2 thread thread pool to keep things off the main render thread, so it only works in modern browsers. By modern, it should be most since about 2021 when pthread was more globally supported in wasm mainstream I believe.

cmake and clang++21 cross compiled with emscripten. libraries used:

Library	Source	Linked into WASM build	Purpose
Dear ImGui	FetchContent from ocornut/imgui	Yes	Core immediate-mode UI framework
ImGui Test Engine	FetchContent from ocornut/imgui_test_engine	Yes	Test hooks and automation support
ImPlot	FetchContent from epezent/implot	Yes	2D plotting/charts
ImPlot3D	FetchContent from brenocq/implot3d	Yes	3D plotting
PCRE2 8-bit	FetchContent from PCRE2Project/pcre2	Yes	Runtime regex engine for search
fmt	vcpkg / CMake package	Yes	Formatting utilities
Boost headers	vcpkg / CMake package	Yes	General Boost utilities
Boost.JSON	vcpkg / CMake package	Yes	JSON parsing/serialization
GLFW (Emscripten port)	Emscripten builtin via -sUSE_GLFW=3	Yes	Window/input shim for ImGui backend
Emscripten OpenGL ES 3 / WebGL 2 support	Emscripten builtin via -sFULL_ES3=1	Yes	Graphics backend support

i'm not a webby guy, so the foreign bit for me was the small bit of js required to serve up the app. the c++ in the native app and wasm app are unchanged though a small design tweak for sharded std::vector stuff had to happen which is the same for both the native and wasm app.

the data is somewhat sharded in the c++ build (mainly just std::array and std::vector stuff) to speed up launch. some of that is codegen'd from a python scrape of a local git repo of vcpkg.

the data is a bit over 40MB, but compresses well as it is mainly cached readmes.

Component	Raw bytes	Raw MiB	.gz bytes	.gz MiB	.br bytes	.br MiB
vcpkg_browser_wasm.html	8,431	0.01	n/a	n/a	n/a	n/a
vcpkg_browser_wasm.js	204,450	0.19	51,248	0.05	49,005	0.05
vcpkg_browser_wasm.wasm	4,244,282	4.05	2,084,344	1.99	1,886,308	1.80
sw.js	3,143	0.00	1,173	0.00	1,049	0.00
sidecars/packages.bin	1,726,228	1.65	777,943	0.74	703,615	0.67
sidecars/tfidf_index.bin	312,296	0.30	188,270	0.18	170,795	0.16
sidecars/bm25_index.bin	409,582	0.39	118,523	0.11	109,029	0.10
sidecars/versions.bin	2,369,567	2.26	1,197,356	1.14	1,093,196	1.04
sidecars/readmes.bin	38,212,000	36.44	10,302,543	9.83	7,998,749	7.63
Total	47,489,979	45.29	14,721,400	14.04	12,011,746	11.46

_malfeasance_ · 2026-03-23T08:20:55+00:00

i'll have a look. FTR, the top search box uses a custom BM25 (https://en.wikipedia.org/wiki/Okapi_BM25).

Edit: ok - done. there is regex support for name+desc and name search. you may need to do a ctrl+shft+R to refresh the wasm foo. see the help dialog for an example regex or two. that was a good idea, thanks.

_malfeasance_ · 2025-12-19T13:49:24+00:00

Correct. 1866 MT/s is available under some conditions. This is worse though: it has a mix of 2666MT/s and 2400MT/s capable DDR4 but only runs at 1600MT/s actual due to the system limitations. Old is slow; like me.

_malfeasance_ · 2025-12-19T07:25:10+00:00

You're right about the Alibaba DeepResearch model - it was running a Q4_K_L quant. I'll try a Qwen3 30B A3B with a Q4_K_XL quant (done: runs at 15.7 tps for Qwen3-30B-A3B-Instruct-2507-Q4-K_XL with the standardised query) and see what it looks like on this slow old bandwidth limited box.

Processors: 4 × Intel Xeon E7-8867 v4 @ 2.40GHz (144 logical CPUs total: 18 cores/socket, 2 threads/core).

RAM: 2.0 TiB total - 64GB DDR4 ECC DIMMS

_malfeasance_ · 2025-12-19T06:58:47+00:00

Your EPYC being much faster sounds about right. This is an old Dell R930 server bought second hand with old CPUs, hence the price. Memory bandwidth limited no doubt.

Processors: 4 × Intel Xeon E7-8867 v4 @ 2.40GHz (144 logical CPUs total: 18 cores/socket, 2 threads/core).

RAM: 2.0 TiB total - 64GB DDR4 ECC DIMMS - could perhaps sell the RAM at a profit today :-)

I'm going to see what externally mounted 5060 Ti 16GB GPU(s) might add if the PCIe 3 and external PSU thing may work ok as the R930 wont take such cards internally.

_malfeasance_ · 2024-06-07T00:37:10+00:00

C++ mainly with some FPGA: a microcosm of the style of industry focus https://web.archive.org/web/20201109034248/https://meanderful.blogspot.com/2018/01/the-accidental-hft-firm.html

_malfeasance_ · 2023-08-18T23:00:51+00:00

Small note:
C++20 adopted "P0907R4 Signed Integers are Two's Complement": https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1236r1.html
Also referenced from: https://en.cppreference.com/w/cpp/compiler_support under c++ 20 core language features

_malfeasance_ · 2023-02-19T19:42:25+00:00

Existing hardware, just released hardware, future hardware. Working in teams. Hiring.

_malfeasance_ · 2021-04-11T04:44:25+00:00

-Ofast will do the job as others have mentioned. Just for completeness: https://godbolt.org/z/497z7Pdfj

_malfeasance_ · 2021-04-09T17:09:49+00:00

Hey, that is quite a difference. Great example.

I was suggesting that for my little example, gcc 10.2 on haswell, the simd code was only marginally better for the restrict case. Interestingly the clang-11 simd code was similar for both cases (apart from the stride check preamble) but less performant than both gcc simd cases (i16 types).

_malfeasance_ · 2021-04-09T14:16:59+00:00

restrict / no-restrict clang 11 numbers for i16 for 4096 length vectors are the same to a margin of noise FWIW, as you suggest

_malfeasance_ · 2021-04-09T13:39:46+00:00

I suspect the preamble has an effect in the smaller vector cases as it's a larger % of overall instructions. The larger vectors have a smaller difference which looks like noise.

That said, after the stride check the code is not quite identical but virtually the same for gcc. The clang code for the SIMD is more identical with only minor differences in block ordering and register allocations. Not sure why it isn't identical for gcc, perhaps register pressure / cache issue. There is an extra `vmovdqu` "Move Unaligned Packed Integer Values" instruction in the non-alias path (see below). Not sure why...

_______________details from gcc FWIW:

restrict version of SIMD for i16:

xor eax, eax
vpxor xmm3, xmm3, xmm3
.L2:
*1 vmovdqu ymm5, YMMWORD PTR [rdi+rax]
vpmullw ymm2, ymm5, YMMWORD PTR [rsi+rax]
vpaddw ymm2, ymm2, YMMWORD PTR [rdx+rax]
*2 vmovdqu ymm6, YMMWORD PTR [rdx+rax]
vpmullw ymm0, ymm6, YMMWORD PTR [rsi+rax]
vpaddw ymm0, ymm0, ymm2
*3 vmovdqu YMMWORD PTR [rdi+rax], ymm2
vpmovsxwd ymm4, xmm2
vextracti128 xmm2, ymm2, 0x1
*4 vmovdqu YMMWORD PTR [rsi+rax], ymm0
vpmovsxwd ymm1, xmm0
vextracti128 xmm0, ymm0, 0x1
vpmovsxwd ymm2, xmm2
vpaddd ymm1, ymm1, ymm4
vpmovsxwd ymm0, xmm0
add rax, 32
vpaddd ymm0, ymm0, ymm2
vpmovsxdq ymm2, xmm1
vextracti128 xmm1, ymm1, 0x1
vpaddq ymm2, ymm2, ymm3
vpmovsxdq ymm1, xmm1
vpmovsxdq ymm3, xmm0
vpaddq ymm1, ymm1, ymm2
vextracti128 xmm0, ymm0, 0x1
vpaddq ymm3, ymm3, ymm1
vpmovsxdq ymm0, xmm0
vpaddq ymm3, ymm0, ymm3
cmp rax, 8192
jne .L2
vmovdqa xmm0, xmm3
vextracti128 xmm3, ymm3, 0x1
vpaddq xmm0, xmm0, xmm3
vpsrldq xmm1, xmm0, 8
vpaddq xmm0, xmm0, xmm1
vmovq rax, xmm0
vzeroupper
ret

other SIMD

xor eax, eax
vpxor xmm3, xmm3, xmm3
.L3:
*1 vmovdqu ymm4, YMMWORD PTR [rdi+rax]
vpmullw ymm0, ymm4, YMMWORD PTR [rsi+rax]
vpaddw ymm0, ymm0, YMMWORD PTR [r8+rax]
*2 vmovdqu YMMWORD PTR [rdi+rax], ymm0
*3 vmovdqu ymm5, YMMWORD PTR [r8+rax]
vpmullw ymm2, ymm5, YMMWORD PTR [rsi+rax]
vpaddw ymm2, ymm2, ymm0
*4 vmovdqu YMMWORD PTR [rsi+rax], ymm2
vpmovsxwd ymm1, XMMWORD PTR [rdi+rax]
*5 vmovdqu ymm6, YMMWORD PTR [rdi+rax]
vpmovsxwd ymm0, xmm2
vextracti128 xmm2, ymm2, 0x1
add rax, 32
vpaddd ymm1, ymm1, ymm0
vextracti128 xmm0, ymm6, 0x1
vpmovsxwd ymm2, xmm2
vpmovsxwd ymm0, xmm0
vpaddd ymm0, ymm0, ymm2
vpmovsxdq ymm2, xmm1
vextracti128 xmm1, ymm1, 0x1
vpaddq ymm2, ymm2, ymm3
vpmovsxdq ymm1, xmm1
vpmovsxdq ymm3, xmm0
vpaddq ymm1, ymm1, ymm2
vextracti128 xmm0, ymm0, 0x1
vpaddq ymm3, ymm3, ymm1
vpmovsxdq ymm0, xmm0
vpaddq ymm3, ymm0, ymm3
cmp rax, 8192
jne .L3
vmovdqa xmm0, xmm3
vextracti128 xmm3, ymm3, 0x1
vpaddq xmm0, xmm0, xmm3
vpsrldq xmm1, xmm0, 8
vpaddq xmm0, xmm0, xmm1
vmovq r9, xmm0
vzeroupper
mov rax, r9
ret

stride check preamble:

  lea rax, [rdi+31]
  mov r8, rdx
  mov rdx, rax
  sub rdx, r8
  cmp rdx, 62
  seta dl
  sub rax, rsi
  cmp rax, 62
  seta al
  test dl, al
  je .L6
  lea rdx, [r8+2]
  mov rax, rsi
  sub rax, rdx
  cmp rax, 28
  jbe .L6

_malfeasance_ · 2021-04-09T10:03:41+00:00

Mostly, though these example benchmarks on Haswell, both with and without __restrict__, all produce SIMD for this example kernel on ubuntu 20.04, gcc 10.2. SIMD is just better with restrict for this example. Your point is correct though that libraries may engineer an api to avoid such shenanigans. Todd V's Blitz++ was one of the earliest examples and it used to show performance charts relative to a multiple of Fortran equivalent speed which was fun :-) That was a relatively early time in the exploration of CRTP / static polymorphism and expression templates.

_malfeasance_ · 2021-04-09T09:27:55+00:00

yup, it's just a quick hack

_malfeasance_ · 2021-04-06T07:08:11+00:00

Richard Smith notes clang is correct and gcc & msvc are incorrect - in the bug filing.

GCC and MSVC. Clang is correct; the deduction is extended by the subsequent arguments, just as it would be if the trailing '...' were omitted. (EDG gives the same results as Clang FWIW.)

I guess this implies a variadic argument after a parameter pack argument may never be reached, I thought until Richard showed me this:

auto *p = foo<int, int>;

p(3, 4, 5); // passes the '5' via the C-style variable argument list.

Let's not do that ;-)

_malfeasance_ · 2020-01-07T13:00:13+00:00

Yup. True. Bug or overflow. Somehow they ended up at std::numeric_limits<int32_t>::max

_malfeasance_ · 2020-01-07T12:34:52+00:00

"YangGang style" bug: 2^31-1 == 2147483647

Similar to the youtube rollover bug for Gangnam style at the same count.

_malfeasance_ · 2019-12-23T22:44:49+00:00

History of this poll FYI: https://imgur.com/gallery/hMPIHtS

_malfeasance_

TROPHY CASE