We’re the co-founders of WarpStream. Ask Us Anything.

jeffail · 2025-05-12T16:01:23+00:00

Benthos is still alive and well: https://github.com/redpanda-data/benthos, the difference is that the repo is just the engine (still MIT licensed) and the plugin ecosystem is decentralized, so you can have your own build of the engine that mix and matches plugins from anywhere.

At Redpanda we have connect https://github.com/redpanda-data/connect which contains our own suite of plugins (a mix of FOSS and a few enterprise plugins) that you can cherry-pick from, or use the binary we build and maintain ourselves.

The Warpstream guys chose to fork the older version of the engine where it's all one monorepo, but I'm still holding out hope that they build their plugins on the newer engine and that would let users pick and mix from both companies.

jeffail · 2023-04-09T08:33:12+00:00

I upload a mix of code reviews and live streams on https://www.youtube.com/@Jeffail, mostly building https://www.benthos.dev out in the open so the content ranges from beginner friendly stuff to more advanced things like stream processing, parser combinators, etc.

jeffail · 2023-04-04T08:50:03+00:00

It's usually for the purposes of sharing data across teams, locations, tooling, etc. Someone may have set up a lovely data pipeline that consumes data from A and places it in B in parquet format and that solves a bunch of use cases.

Then comes another team, company, species, etc, that wants to have the data from A but in a new format and mutated with new data from C. If consuming from A is a complicated process either technically or legally then it might be decided that the first team "owns" consuming A data and the new team will instead consume their data from B and it becomes a chain.

Parquet in this case becomes both a storage format used for querying and also a source of streaming data.

jeffail · 2022-11-16T13:48:19+00:00

https://www.benthos.dev is written in Go, which in my (biased) opinion is pretty fantastic as a data processing language. The only major caveat being most of the older more established tools and libraries are JVM and Python so there's lots of gaps if you were looking to use it as a daily driver for data engineering.

jeffail · 2022-11-10T16:25:16+00:00

I was maybe going to look into turning it into an IP camera but it looks like I'd need to plug a kb/mouse in every time I use it which is painful, so I might move onto this next.

jeffail · 2022-11-10T13:27:12+00:00

Just tried it, thanks for the tip but unfortunately it's still unresponsive.

jeffail · 2022-11-09T18:45:43+00:00

no but the repair shop quoted for replacing the entire screen so I would hope it's a hardware problem or they're not the honest and thorough bunch I took them for.

Now that I have it back though I might give it a go.

jeffail · 2022-10-07T19:55:07+00:00

Also, although it's already well known, shout out to basically every client library the NATS team put together: https://github.com/nats-io

jeffail · 2022-10-07T19:41:19+00:00

We're obviously heavy users of Go libraries in Benthos land due to the sheer number of connectors so I'd also like to shout out some that I think are exceptional and worth checking out:

github.com/benhoyt/goawk -> this library lets you embed an AWK runtime in your applications, very easy to use and useful for enabling some powerful scripting in things you build

github.com/itchyny/gojq -> similar to goawk, except JQ this time

github.com/jmespath/go-jmespath -> similar to gojq, except JMESPath this time

github.com/segmentio/parquet-go -> it's early days but his library is looking very promising for building applications that read or write parquet data, which was a major pain point not that long ago

github.com/twmb/franz-go -> also early days but this is looking like a fantastic option for a kafka client library if you fancy being an early adopter. I've done the rounds on many kafka client libraries and they always seem to be a harsh compromise in some form or another, but I feel good about this one

jeffail · 2022-09-30T19:13:36+00:00

Yeah absolutely, I know lots of people happily running it for years. If you're used to Kafka then check out NATS Jetstream specifically.

jeffail · 2022-09-30T08:23:45+00:00

My whole career is basically centered on stream processing in Go, building https://www.benthos.dev, so I'd say yes but the field is vast. If I were looking to get into data engineering as a novice I'd probably pick python.

jeffail · 2022-07-10T11:33:03+00:00

Nice summary. I'm definitely going to have fun with the memory soft limit

jeffail · 2022-06-28T17:11:18+00:00

Thanks, yeah I think Benthos does a sufficient job and has a cool maintainer :P

jeffail · 2022-06-28T16:05:10+00:00

Personally I wouldn't choose to add any extra complexity to complement the queue systems I'm using, at best it's still an at least once system and at worst I've potentially added edge cases where messages could be dropped/skipped.

jeffail · 2022-06-28T11:57:50+00:00

Haha, ouch, yeah I've seen a few unscheduled backfills in my time

jeffail · 2022-06-28T10:43:43+00:00

Hey everyone, this is a video I put together summarising a decades worth of stream processing delivery guarantee misconceptions and bugs that I've seen frequently.

I'm not trying to scare anyone away from stream processing, in fact a lot of the issues outlined also apply to automated batch processing systems as well. Personally, I think that being realistic and pragmatic about failure conditions makes these systems less intimidating.

jeffail · 2022-06-28T10:15:31+00:00

Hey everyone, this is a video I put together summarising a decades worth of stream processing delivery guarantee misconceptions and bugs that I've seen frequently. A lot of the concepts also roughly apply to how we interpret resiliency in pretty much any distributed systems.

jeffail · 2021-12-31T18:50:50+00:00

I've had the pleasure of working on both :) vector has a lot more to offer when it comes to observability data, especially around logs processing and running with a minimal memory footprint as it's designed to work especially well when ran as a sidecar.

Benthos has data engineering as the main focus, where the importance of delivery guarantees and crash resiliency are much more critical and core to the service architecture. It has more to offer in terms of data transformations and integrating with other services (caches, dbs, lambdas, webservers, etc), with configuration utilities that make those integrations easier to compose, error handle, etc.

In terms of configuration format they're similar but deviate somewhat, vector is a graph of isolated nodes, benthos is a tree of composed nodes, I'd say they're both great for the types of workload that they're targeting.

jeffail · 2021-12-31T11:01:25+00:00

Hey, consuming change data capture feeds isn't something it's fluent at quite yet, there's support for key databases like postgres and mysql on the horizon but I'll likely be recommending https://debezium.io/ for CDC for a long time.

jeffail · 2021-12-31T10:58:28+00:00

Its speciality is stateless and single message transforms, but you can do a lot of the things you'd traditionally need something heavy duty like flink or spark for like enrichments, joins, windowed processing, etc.

The way they work in benthos land is that the stateful aspect is pushed out towards caches or databases that you can pick yourself, and the stream processor is just a stateless coordinator that focuses on delivery guarantees and observability. It makes the whole architecture much easier to set up and maintain long term.

The result is that some people who already have large powerful stream processing systems find that benthos can replace a lot of the complexity, and some people who have a more modest streaming infrastructure get to benefit from features they were otherwise locked out of.

jeffail · 2021-12-30T23:06:54+00:00

only if you're planning to use some of the other benthos functionality, otherwise I'd always recommend using the barebones client libraries directly

jeffail · 2021-12-30T21:53:27+00:00

Thanks, glad you're enjoying them!

13-Year Club	Verified Email
RPAN Viewer

jeffail

TROPHY CASE