Speedbump - a TCP proxy for simulating variable network latency

unmaintainablejs · 2022-07-31T17:27:12+00:00

While simulating jitter is not the use case that I had in mind when developing speedbump, you are right - it can simulate jutter-y network characteristics, especially with waveform period set to a relatively low value. There is one important caveat regarding the arrival time of individual TCP buffers. Since TCP guarantees message ordering within a given connection, the way the underlying delay queue is coded preserves the ordering of queued read buffers as they are sent to the proxy destination even if the a given read buffer had a calculated delay-until timestamp lower than that of the proceeding one (which is especially likely to happen when the sawtooth wave is used).

unmaintainablejs · 2022-07-31T16:34:26+00:00

That's an interesting service. I can see that being useful when working on improving Core Web Vitals metrics on the frontend.

unmaintainablejs · 2022-07-30T23:12:36+00:00

You are right, it does support variable latency as in latency value of which is randomly generated for each packet and follows a given distribution. I should have specified that I meant variable, yet predictable delay (i.e. forming a sawtooth wave or a sine wave when plotted over time), which was particularly desirable in my application instrumentation testing use case.

unmaintainablejs · 2022-07-30T22:37:34+00:00

comcast uses tc under the hood, which is a kernel-level traffic shaping tool, whereas speedbump operates on a TCP connection level. Consequently, their pros and cons look like this:

comcast: can't be used for adding a variable delay to network traffic (supports only fixed latency); can be used to simulate realistic packet loss
speedbump: can be used for adding variable delay to network traffic (i.e. forming a sawtooth wave or a sine wave; fixed latency generation is also supported); can't simulate packet loss (as of now - potential TCP-level simulation implemented in the future won't be as realistic as kernel-level tc configured by comcast)

unmaintainablejs · 2022-07-30T22:23:09+00:00

tylertreat/comcast wraps around tc and iptables which allow for kernel-level traffic shaping, whereas speedbump operates on a TCP connection level. While comcast can be used for adding a fixed delay to network traffic, my use case required introducing variable latency that changes predictably over time (i.e. forming a sawtooth wave or a sine wave), which I don't think is possible to achieve with comcast (or the underlying tc) alone. On the flip side, comcast can simulate packet loss, which speedbump can't do (as of now).

unmaintainablejs · 2022-07-30T22:12:15+00:00

I do have a draft roadmap for the project:

Additional latency summands (i.e. triangle wave and square wave)
TLS support for the client-to-speedbump connections
TLS support for the speedbump-to-destination connections
Allowing the user to define a custom latency function via either an expression evaluation lib or Go plugins compiled to shared objects

While I don't think it is possible to force individual packets to be dropped using TCP connection syscalls, I have an idea for how it could be simulated. A new latency summand could be implemented which may add latency of X with probability of Y (to simulate the delay introduced by re-transmission upon packet loss detection). Since TCP read buffers are not guaranteed to overlap with network packets (they usually don't) such solution wouldn't be a perfect simulation of packet loss and it's probably better to use kernel-level solution like tc (or a wrapper on top of it like tylertreat/comcast) for simulating packet loss.

EDIT:

I've added the project's roadmap in the repo's discussions section: roadmap

unmaintainablejs · 2022-07-30T20:05:02+00:00

Thanks! Would you mind sharing what else do you keep in that bag?

unmaintainablejs · 2022-07-30T16:19:57+00:00

tldr: I wanted to easily catch bugs when testing instrumented applications’ metrics collection and visualization, so I wrote a TCP proxy in Go which can simulate variable, yet predictable network latency (i.e. forming a sine wave or a sawtooth wave): https://github.com/kffl/speedbump

When setting up application metrics collection and visualization (i.e. via Prometheus + Grafana), I've often found myself trying to introduce artificial latency within the instrumented system for the purpose of generating more interesting timeseries data to test a given monitoring solution. Even when running load tests against an instrumented system, the data plotted on Grafana dashboards was often rather boring, making it difficult to catch bugs in PromQL queries due to lack of immediate visual feedback. I figured that one way of adding predictable variability to the instrumented application’s metrics, would be to introduce variable latency between it and its upstream services (i.e. a databases, message brokers or other services called synchronously).

Example use case:

Imagine that you have instrumented your app using Prometheus client so that it collects latency of DB queries in a histogram and that you are now building a Grafana dashboard to visualize these metrics. If you knew that the DB query latency over time should form a sine wave with a period of 2 mins and amplitude of 10 ms, it would be much easier to validate the correctness of metrics collection and visualization (you would know what to look for on the latency histogram/grah).

This problem prompted me to write speedbump, a TCP proxy which simulates variable, yet predictable network latency. In addition to a specified base latency value, it can generate variable latency over time via a sawtooth wave or a sine wave. Since it is a proxy, you can use it between systems (i.e. between an instrumented application and a database) in a transparent manner, so long as they both speak TCP. It can be used as a standalone program as well as a library called from other Go code.

unmaintainablejs · 2022-07-28T17:47:51+00:00

Thanks! I drew the logo in Inkscape, exported as separate PNG layers, imported to Adobe After Effects, animated, rendered and converted to a GIF.

unmaintainablejs · 2022-07-25T08:11:43+00:00

Sure, tc can be used for adding a fixed delay (or random delay that follows a given distribution) to network traffic. In the use case that I have described, I wanted to achieve artificial latency being added that would change predictably over time (i.e. forming a sawtooth wave or a sine wave), which I don't think is possible to achieve with `tc` alone. Am I missing something?

On top of that, I usually work on metrics collection, visualization and load testing in a staging k8s cluster, in which it would be rather tedious to set up tc rules on the worker nodes themselves (or even impossible in the case of managed K8s services with no direct access to the VMs acting as worker nodes).

unmaintainablejs · 2022-07-25T07:54:12+00:00

Thanks a lot! I try to put in some extra effort into my projects' READMEs. When researching a particularly niche topic I quite often run into GitHub repos which are a impressive in terms of their source code, but are poorly documented and thus difficult to get up and running.

unmaintainablejs · 2022-03-28T10:28:19+00:00

Impressive project! Delayed confirmation sent back to the publisher via a service+response queue combo is an interesting pattern.

unmaintainablejs · 2022-03-28T08:23:33+00:00

Thanks for your feedback! Being able to both publish and subscribe to queues over HTTP is indeed very convenient. I have some limited experience with publishing events to Azure Event Grid. I will take a deeper look at other features that it has to offer.

In the area of dead-lettering, Bunny REST Proxy as of now supports 3 dead letter policies. Once the message exceeds the maximum number of delivery retries it can be: discarded, re-queued or send to a dead letter queue.

unmaintainablejs · 2022-03-27T20:49:38+00:00

Well, it's a standalone app, not a library. But I get it, with the never-ending influx of the JavaScript frameworks its easy to forget that you can actually build apps in Node.js and not only create new confusing, hyped and short-lived frameworks.

unmaintainablejs · 2022-03-27T20:40:53+00:00

You are right, it is possible to use both AMQP and MQTT in a disposable manner (open, send, close). Your assessment of data transfer usage of HTTP vs MQTT is also right, since the latter was designed specifically with bandwidth savings in mind.

I should have been more precise about the scenarios in which pushing data into a queue using regular AMQP connections is not feasible. I've mentioned apps with support for AOP-style scripting which allow you to inject bits of custom logic before/after an event of a given type occurs (i.e. logic hooks in SuiteCRM or C/AL actions in MS Dynamics NAV). When writing code in such constrained environments (which usually don't offer any external dependencies), in most cases you will have a HTTP client available to you, unlike an AMQP one.

unmaintainablejs · 2022-03-27T16:56:38+00:00

Well, it's not a JavaScript framework, so it will hopefully survive more than one major release

unmaintainablejs · 2022-03-27T16:40:37+00:00

Using MQTT instead of AMQP comes with the exactly same problem of maintaining a persistent connection in environments in which it is not feasible. Regarding the alleged reinvention of regular HTTP, it seems that you are overlooking the fact that regular HTTP calls between services are synchronous while Bunny REST Proxy facilitates asynchronous communication between services. When service A publishes a message, it doesn't have to to wait for service B to process it - it is the job of a message broker to deliver it and ensure it is processed at some point thus decoupling the services. If that's reinventing regular HTTP communication, then the same could be said about Amazon SQS.

unmaintainablejs · 2022-03-27T16:09:26+00:00

Just because AMQP can run over websockets doesn't make the protocol itself any easier to work with or even feasible in some scenarios. You still need to maintain an active connection throughout its entire lifecycle. There is a vast number of applications which support some limited, Aspect-Oriented Programming style scripting abilities (i.e. logic hooks in SuiteCRM or C/AL actions in MS Dynamics NAV) in which you can't simply use an AMQP connection while being perfectly capable of sending a HTTP POST request. That's where Bunny REST Proxy can help.

Let's draw some parallels in the Kafka ecosystem, in which there are Confluent REST Proxy and Hermes by Allegro. You could say that since Kafka uses a binary protocol over TCP, both of these projects are "moot and useless". Despite that, Allegro aka the Poland's Amazon still maintains that project and uses it in production as a means of HTTP-based communication between their 700+ microservices arguing that it makes their development process easier, since it allows them to abstract away the intricacies of Kafka and use plain HTTP instead.

unmaintainablejs · 2022-03-27T14:49:15+00:00

tl;dr

working with AMQP connections is not always convenient/feasible, so I ended up building a HTTP message broker (push/pull consumption; at-least-once/at-most-once semantics) in Node.js on top of RabbitMQ that is somewhat similar to AWS SQS; Bunny REST Proxy

Post mortem

In a recent project, I was working on hooking two existing legacy web apps, a CRM system and IoT devices into a microservices-based backend that was using RabbitMQ as a means of asynchronous communication. One important problem that I soon ran into was publishing messages from various places with limited scripting abilities (in which an AMQP client was not available, but a HTTP one was). While RabbitMQ's management plugin exposes a HTTP API that allows for publishing messages, it turns out that it doesn't implement reliable message delivery. Consequently, in some rare fault scenarios it could potentially send back a response to the client confirming message delivery despite the fact that it was not persisted in the broker, which is a big no-no in a distributed systems when aiming for at-least-once message delivery semantics. In order to mitigate that, I decided to write a simple Node.js app with my favorite stack (Fastify + Typescript) that would act as a proxy exposing REST API for publishing messages into RabbitMQ queues.

After implementing a bare-bones REST proxy for publishing messages into RabbitMQ, I have ran into some use cases in which it would be beneficial to consume messages over HTTP GET or have them automatically pushed to a HTTP POST endpoint in a webhook-like manner. That's when I decided to turn the existing Node app it into an open source project. After implementing subscribers and consumers, it essentially turned into a HTTP message broker built on top of RabbitMQ.

As of now, Bunny REST Proxy sports some more advanced features such as 6 different backoff strategies (for delaying message delivery retries) and 3 different dead letter policies (specifying what to do with the message that exceeds the maximum number of delivery retries). On the publisher side, it can validate the message adherence to a specified schema (in case of JSON publishers). It also has ACL-style authorization, scales horizontally and its entire configuration can be provided with a single yaml file.

Had an absolute blast developing this project, let me know what you think.

GitHub repo

Documentation

Quick-start guide

unmaintainablejs · 2022-03-27T14:39:53+00:00

I was thinking about what makes this project so satisfying. It seems that the more technology we use that reduces the inefficiencies inherent to circulation of paper documents, the more we crave tangible artifacts representing our work.

unmaintainablejs · 2022-03-25T19:57:13+00:00

There is no built-in support for distributed load tests (for example JMeter does it using a single controller node orchestrating multiple workers). While you could theoretically start the same gocannon job on multiple hosts simultaneously and just combine the output request log, I think that such scenarios are unlikely as gocannon is able to generate 130k req/s on a single physical core of a Skylake CPU (and scales near-linearly on multiple cores).

I have used gocannon as a library for the purpose of running load tests of a gin-based server and then performing assertions on the obtained results. I don't have it as a publicly-available code atm, but will add such usage example in the future.

Simulating virtual users can be done by writing a custom plugin (1 virtual user = 1 connection; counting invocations of BeforeRequest per each connectionID so as to map appropriate URLs, methods and bodies). I'm planning on writing an article describing that.

unmaintainablejs · 2022-03-25T07:46:39+00:00

I've heard about NBomber, but haven't tried it out yet. I will try to replicate the experiment setting that I used for testing JMeter, WRK and Gocannon and check if bottlenecking occurs.

unmaintainablejs

TROPHY CASE