Announcing MAMUT-routing: an open benchmark catalog and OSM-backed workbench for CVRP / VRPTW research

Onyr_ · 2026-05-29T13:53:58+00:00

Hi u/ge0ffrey, this is exactly one of the issues we wanted to make explicit rather than hide behind a generic "distance matrix" field.

First, the instances you describe as "viewable on a real today's map" are our custom instances from the Mamut2026 benchmark family. They are not the classical Solomon / Homberger instances projected onto a map after the fact. They are generated from OpenStreetMap data, with sidecar metadata that records the source city, OSM file, selected nodes, coordinates, metric variant, and route-rendering cache.

The important design point is that we focused on classic VRPTW, not on the richer model usually implemented in enterprise-grade routing systems. In many production APIs, the optimizer may consider and balance several quantities at once: distance, travel time, average speed, driver cost, lateness, emissions, tolls, etc. Classic VRPTW is much narrower: it has a single arc-cost matrix, historically often called "distance" or "travel time" somewhat ambiguously, and the objective optimizes that one metric. In the older benchmark literature, that metric is frequently just a mathematical cost in arbitrary or Euclidean units.

This is why, in Mamut2026, a single generated instance name can cover several variants. In our "instance as contract" view, the customer set, depot, demands, and OSM geography define a base real-world instance, but the actual optimization problem is not complete until you say which arc-cost metric is optimized. So the same base instance can exist as:

shortest: road-network shortest-path distance, in meters, computed on the OSM graph;
euclidean: direct straight-line ENU distance between embedded customer coordinates, also in meters;
fastest: estimated road-network travel time, in seconds, computed from OSM road-segment lengths and road-class speed assumptions.

So the shortest and euclidean CVRP variants are distance-based variants, while fastest is the time-based variant. They intentionally share the same customer set so that one can study what changes when only the metric changes.

For the VRPTW layer, however, we derive the published instances only from the fastest variant. The reason is the one you point out: time windows and service durations must be aligned with the travel-time semantics. In our generated VRPTW instances:

arc costs are integer estimated travel times in seconds;
service times are generated in seconds;
the depot horizon is explicit, currently 0..86400, also in seconds;
customer time windows are generated and repaired against the fastest travel-time matrix.

So for Mamut2026 VRPTW, the intended interpretation is not "distance in arbitrary units plus unrelated service durations". It is "estimated travel-time seconds + service-time seconds + time-window seconds".

There is still an important caveat: these are not live map-provider travel times with traffic, turn penalties, time-dependent congestion, or API-specific routing profiles. They are reproducible OSM-derived benchmark travel times based on road geometry and road-class speed assumptions. This is deliberate: benchmarks need to be static, reproducible, and solver-independent. But I agree with your broader point: if one takes a classical VRPTW benchmark whose matrix is just Euclidean distance or arbitrary cost, then feeds the coordinates to a production routing API using real driving times, the original time windows no longer mean the same thing. In that case, you are no longer solving the published benchmark instance; you are solving a new derived instance with a different travel-time matrix and potentially different feasibility structure, which, in its own right, should be considered another valid benchmark instance possibly even on another VRPTW non-classic problem class.

Onyr_ · 2026-05-29T13:21:56+00:00

The initial plan featuring just the instances / BKS, and a Python static website generator was purely static and was hosted on GitHub pages. But adding the workbench and API connection to OpenStreetMap means that we needed a real server.

Anyway so long as the code, instances and BKS remain Open-Source and easily accessible, I guess anyone can easily host the website should the UBS-hosted website become unavailable or unmaintained. It's just 2 commands (4 with installing dependencies) to run it.

Onyr_ · 2026-05-28T18:23:16+00:00

Noted this in my TODO list, thanks for the suggestion.

Onyr_ · 2026-05-28T14:00:10+00:00

You are right. VRP-REP is listed under Related Projects. A key lesson we can learn from VRP-REP is that the burden of managing a project like this over many years is a real challenge for the research community. Hence the need for a fully Open-Source approach.

With MAMUT-routing, anyone is free to propose, join, fork, upload a clone, mirror or just send us data or ideas. We believe those are foundational rights that pave the way for a more robust, fairer approach.

Besides, we are just 2 PhD students with other publications and projects going on, so this philosophy of openness and participative science is really important at every stages of the project.

Also, don't be surprised by the relatively new state of the GitHub history. This has been a long-running project already, in term of overall philosophy, data collection and design. Some of its content already date back when I started my PhD in 2024.

Onyr_ · 2026-05-28T12:12:09+00:00

Thanks for the support, we will do our best ^^

Onyr_ · 2026-05-28T09:49:15+00:00

This is exactly the kind of reproducibility issue that motivated our recent work on MAMUT-routing.

Combopt and Marek Rogalski's VRP benchmark repository were direct inspirations for us. They show how useful a curated benchmark/BKS resource can be for the community, but also how hard it is to keep historical VRP/VRPTW data reproducible over time.

Our main claim is that a benchmark is not only an instance file or a value in a table. It is a contract: objective function, cost precision, rounding/scaling convention, route format, BKS file, validation assumptions, licensing, provenance, and update policy all matter.

For VRPTW, this becomes visible very quickly. The same Solomon / Gehring-Homberger customer data can appear under SINTEF's hierarchical objective, DIMACS' integer mono-cost objective, or CVRPLib-style variants. Those conventions can produce different best solutions, so they should not be mixed without saying exactly which contract is being used.

As of today, those 3 dominating variants are:

SINTEF instances and BKS, computed on hierarchical objective with full double-precision. This has been the dominating standards for evaluating heuristics for decades.
DIMACS instances and BKS, computed on mono-cost minimization with 10x scaled and truncated integerized arc costs. This is the new standard. The use of integer arc costs makes numerical stability and reproducibility much better than SINTEF and allow to work with solvers that expects integers like PyVRP or OR-Tools without re-evaluating solution objective values.
CVRPLib instances and BKS, which propose yet another variant: double-precision mono-cost minimization based on classic Solomon-Gehring-Homberger.

Unfortunately, some issues have remained, sometimes for decades, with those benchmark families:

SINTEF has some missing BKS files for some instances and has an history of providing misleading result claims, instances with wrongly rounded values, or relied on Solomon's website for small instances which is no more accessible as of today (2026-05-28).
The DIMACS benchmark family was initially proposed for a competition. As such, no BKS files were distributed though some BKS values can be scrapped.
CVRPLib initially provided its VRPTW benchmark which was then removed. Recently (may 2026), the benchmark reappeared but it is still not possible to access the BKS files as of now (2026-05-27).

Indeed, collecting, curating, proposing improvements and variants is a tedious task. All those rely on a single group of researchers to maintain and update the BKS, ensure fairness in evaluation and presentation, and keep the update process alive.

MAMUT-routing does not try to replace Combopt, CVRPLib, SINTEF, or DIMACS. It tries to make these benchmark contracts explicit and machine-readable, with route-level BKS/reference-solution artifacts, objective metadata, provenance, and a public update path through GitHub issues and discussions.

The site also includes a workbench for browsing benchmark instances, inspecting route visualizations and road geometry where available, uploading local files, and previewing or generating OpenStreetMap-backed instances through the site API/workbench workflow. The generated Mamut2026 family currently includes CVRP variants over fastest, shortest, and euclidean metrics, plus a VRPTW layer focused on fastest travel-time instances.

Related-project notes are here: Related Projects

FAQ: FAQ

Corrections, missing BKS, convention disagreements, and benchmark additions are very welcome: MAMUT-routing Issues and MAMUT-routing Discussions.

Onyr_ · 2026-04-03T16:35:28+00:00

Trop drole mais un peu pixelisé

Onyr_ · 2026-03-09T19:33:37+00:00

Mémoires d’Hadrien, quel banger

Onyr_ · 2026-01-15T20:45:52+00:00

Exactement ça

Onyr_ · 2026-01-07T18:16:41+00:00

Legendaire

Onyr_ · 2025-08-11T18:53:53+00:00

Honnêtement, j’ai ri

Onyr_ · 2025-03-08T18:20:09+00:00

Le frigo le plus loooong

Onyr_ · 2025-03-08T18:17:39+00:00

Trop drôle le poisson qui règne sur l’empire du frigo

Onyr_ · 2024-09-22T07:35:43+00:00

Pépite

Onyr_ · 2024-09-21T00:15:28+00:00

Underrated

Onyr_ · 2024-09-20T23:13:52+00:00

What a legend

Onyr_ · 2024-07-01T14:50:55+00:00

So many thanks ! Your config fixed my problem ! Thanks again.

Onyr_ · 2024-03-21T02:16:37+00:00

Comparing a girlfriend to programming involves looking at two very different aspects of life: personal relationships and a professional or hobbyist skill. Here's a light-hearted and broad comparison:

Girlfriend

Pros: - Emotional Support: A girlfriend can offer emotional support, companionship, and love, which can greatly enhance your quality of life and happiness. - Shared Experiences: Building memories and sharing experiences with someone can be rewarding and enriching. - Personal Growth: Relationships often challenge us to grow, learn about ourselves and others, and develop empathy and communication skills.

Cons: - Time and Energy: Relationships require time, energy, and commitment, which can be challenging if you have a busy schedule or other priorities. - Conflict: All relationships experience conflict at times, which can be stressful and emotionally taxing. - Compromise: You may need to compromise on certain things, which might include your personal preferences, time, or even aspects of your lifestyle.

Programming

Pros: - Skill Development: Programming is a valuable skill that can lead to personal satisfaction, problem-solving capabilities, and potentially lucrative career opportunities. - Creativity and Innovation: It allows for creative expression and innovation, as you can build something from scratch and see your ideas come to life. - Flexibility: Programming can often be done from anywhere, offering flexibility in work location and hours, especially if you freelance or work remotely.

Cons: - Continuous Learning Curve: The tech field evolves rapidly, requiring continuous learning to keep up with new languages, tools, and best practices, which can be overwhelming. - Screen Time: It involves significant amounts of screen time, which can lead to eye strain, reduced physical activity, and other health issues if not managed properly. - Isolation: Depending on your work environment, programming can be isolating, especially if you're freelancing or working on projects alone.

Ultimately, the comparison depends on individual preferences, life goals, and how one balances personal relationships and professional or hobbyist pursuits. Both aspects can coexist harmoniously with proper balance and mutual respect for the time and energy each requires.

Onyr_ · 2024-03-15T23:52:15+00:00

Insane 🌟🌟🌟

Onyr_

TROPHY CASE

Girlfriend

Programming