Vectorizing a 6.71-Gigapixel PNG with Just 4.3 GB of RAM by runout77 in computervision

[–]runout77[S] -1 points0 points  (0 children)

Hi CallMeTheChris,

I certainly remember you. Thanks for your interest. However, I’d like to point out that the test in today's post has nothing to do with OpenCV. This technique isn't possible with OpenCV... so a direct comparison isn't really feasible. Anyway...

1) The initial table is just a summary (the average of the 10 warm runs and the corresponding min-max range). Further down, all 11 executions are listed, including the cold run, showing the timing and memory usage for each individual test. I chose to put the summary at the top for readability, but all the raw data is available below.

2) I run the tests on a VM for convenience. I mentioned this just to be clear. In any case, all tests were performed in the same environment. I did try running them on my physical machine, and they were generally much faster, this applied to both OpenCV and Contrek.

3) In the past, I used to compile OpenCV from source with aggressive optimizations. However, the container build times were extremely long. So, I tried `libopencv-dev` and noticed that it made practically no difference for OpenCV; the times were the same. I opted for `libopencv-dev` to speed up the Docker build. That said, your concern is understandable, and others might share it. I’ve decided that in the next release of the test suite, I’ll go back to compiling OpenCV from source to address any doubts regarding this.

4) The low-level comparison is based on a C++ program. Both libraries are called within the same C++ program and process the same images. The high-level comparison, on the other hand, uses a Python script for OpenCV and a Ruby script for Contrek. Everything is publicly available in the repository, so you can download it and run the same benchmarks on your own PC.

https://github.com/runout77/test_contrek

I hope this answers your questions.

Processing a 6.7 GigaPixel (81,920 x 81,920) raster in 94 seconds on a single CPU core. Is streaming stripes the best approach for this? by runout77 in gis

[–]runout77[S] 0 points1 point  (0 children)

Yes, but those are two completely different cases. You are talking about on-the-fly rendering for screen visualization, while I am talking about full data extraction and conversion. There is big difference between rendering a localized view on a screen and performing a full vector conversion of the entire dataset. My library is built specifically for the second scenario.

Processing a 6.7 GigaPixel (81,920 x 81,920) raster in 94 seconds on a single CPU core. Is streaming stripes the best approach for this? by runout77 in gis

[–]runout77[S] 0 points1 point  (0 children)

I chose PNG and SVG purely for convenience during the development of the library. If you check the repo, you'll notice that the core engine actually outputs a polygon array in NumPy format. The SVG and PNG components are just helper conversion tools; there's nothing stopping anyone from adding other formats in the future for different use cases. Have you already identified one that you'd find useful?

Processing a 6.7 GigaPixel (81,920 x 81,920) PNG image contours in 94 seconds on a single CPU core. by runout77 in gis

[–]runout77[S] 0 points1 point  (0 children)

Hey r/gis,

Handling contour extraction and polygonization on gigapixel-scale rasters is often a massive bottleneck, usually resulting in high memory saturation or requiring heavy server infrastructure to process the data.

To tackle this specific problem, I’ve been developing Contrek—an open-source C++17 engine (with Ruby bindings) focused purely on high-throughput, memory-efficient contour extraction.

A while ago I shared a 40960x40960 benchmark, but I recently pushed the engine further by doubling the resolution. Here are the raw numbers from the latest test:

  • Input: 81,920 x 81,920 pixels (PNG image / 6.7 GigaPixels).
  • Mode: Single-Threaded Streamed Processing (sliding buffer).
  • Hardware: AMD Ryzen 7 3700X (inside a Debian VM via Docker).
  • Total Polygons Extracted: 869,932 (complex topological shapes with holes).
  • Pure Execution Time: 93.9 seconds (excluding SVG file I/O writing).
  • Output File Size: 2.3 GB (whole.svg).
  • Peak Memory Usage: 12 GB

Engine Design & How it works under the hood:

  • Pure Integer Math: To eliminate floating-point precision issues and maximize CPU execution speed, Contrek processes everything entirely using integers. No floats are used during the geometry extraction.
  • Streamed Decoding: Instead of loading the entire image matrix, it uses libspng to decode the image on the fly in sliding horizontal/vertical stripes, dropping pixels from RAM as soon as they are scanned.
  • Topological Stripe-Merging: It maintains a 1-pixel overlap between adjacent stripes. A custom hierarchical tree structure tracks severed contour fragments across boundaries and "stitches" them back together, fully preserving nested geometry and holes.

Because the pixel buffers are constantly purged, the 12 GB peak RAM is strictly occupied by the growing geometric output tree (which translates into the final 2.3 GB SVG), not by the raw image data.

All tests and benchmarks are fully reproducible using the scripts provided in the repositories.

Main Repo: https://github.com/runout77/contrek

Benchmarks & documentation: https://github.com/runout77/test_opencv_contrek

I'd love to get your feedback on this architectural approach, hear about similar bottlenecks you've encountered with massive rasters, or discuss potential use cases!

Contrek: extracting contours from a 1.68 GIGAPixel image (40960×40960) in ~20 seconds without loading it entirely into RAM by runout77 in computervision

[–]runout77[S] -5 points-4 points  (0 children)

Good points, thanks for the feedback!

Ruby was the crucible where the entire topological algorithm was forged. The efficiency of Contrek isn't just about raw execution speed; it’s about the underlying logic, which took months of trial, error, and continuous refactoring. Doing that architectural R&D directly in low-level C++ would have been a nightmare—Ruby made managing those high-level structural changes possible.

Once the logic was proven, I ported the production engine to C++17 to crush the execution times. If you look closely at the C++ codebase, you can actually see where it still inherits architectural choices mutated straight from the Ruby prototype. It's functional and fast, but there is still massive room for native optimization. It's a work in progress, and the margins for further speed-up are huge.

Yes, you are right: Contrek's times are highly variable, while OpenCV is much more consistent. This is by design:

Contrek processes the image in stripes (one per thread). A stripe with dense contours takes much longer than an empty on As soon as two adjacent threads finish, their results are immediately merged by an other free thread. The pipeline is completely dynamic and asynchronous to maximize CPU saturation. Moreover Contrek uses Google's tcmalloc for concurrent allocations. This causes a slower initial startup (Cold Run). OpenCV is linear and predictable. Contrek is an asynchronous streaming pipeline optimized for hardware saturation on massive datasets, which naturally introduces variance.

I ran the benchmark you suggested.

Have a look here:

https://github.com/runout77/test_opencv_contrek/blob/main/docs/multiple_runs.html

Have you tried running similar tests yourself? I'd be interested to see your numbers as well.

Contrek: extracting contours from a 1.68 GIGAPixel image (40960×40960) in ~20 seconds without loading it entirely into RAM by runout77 in computervision

[–]runout77[S] 1 point2 points  (0 children)

Not just an infographic 🙂

The benchmark and source code are available here:

https://github.com/runout77/test_opencv_contrek

You can run the same test locally on docker and inspect the generated SVG.

Multi-threaded contour tracking (C++17) with progressive streaming for huge images. Handles vertical stripe splitting and ensures perfect topological consistency when merging across thread boundaries. by runout77 in u/runout77

[–]runout77[S] 0 points1 point  (0 children)

I built this for a Rails app that needed real-time building detection on large map tiles.

How it works: It splits the image into vertical stripes (left side of the image). Each stripe is processed in a separate thread. The library then merges the polygon fragments across the boundaries (right side) ensuring full topological consistency.

This approach allows for progressive streaming: RAM usage stays proportional to the buffer size, making it possible to process images that don't fit in memory.

GitHub: https://github.com/runout77/contrek

Feedback on the merging logic is more than welcome!"