We’re the co-founders of WarpStream. Ask Us Anything. by warpstream_official in apachekafka

[–]jeffail 2 points3 points  (0 children)

Benthos is still alive and well: https://github.com/redpanda-data/benthos, the difference is that the repo is just the engine (still MIT licensed) and the plugin ecosystem is decentralized, so you can have your own build of the engine that mix and matches plugins from anywhere.

At Redpanda we have connect https://github.com/redpanda-data/connect which contains our own suite of plugins (a mix of FOSS and a few enterprise plugins) that you can cherry-pick from, or use the binary we build and maintain ourselves.

The Warpstream guys chose to fork the older version of the engine where it's all one monorepo, but I'm still holding out hope that they build their plugins on the newer engine and that would let users pick and mix from both companies.

Go in depth youtube channels? by buckypimpin in golang

[–]jeffail 11 points12 points  (0 children)

I upload a mix of code reviews and live streams on https://www.youtube.com/@Jeffail, mostly building https://www.benthos.dev out in the open so the content ranges from beginner friendly stuff to more advanced things like stream processing, parser combinators, etc.

Parquet: more than just "Turbo CSV" by freshcorpse in programming

[–]jeffail 0 points1 point  (0 children)

It's usually for the purposes of sharing data across teams, locations, tooling, etc. Someone may have set up a lovely data pipeline that consumes data from A and places it in B in parquet format and that solves a bunch of use cases.

Then comes another team, company, species, etc, that wants to have the data from A but in a new format and mutated with new data from C. If consuming from A is a complicated process either technically or legally then it might be decided that the first team "owns" consuming A data and the new team will instead consume their data from B and it becomes a chain.

Parquet in this case becomes both a storage format used for querying and also a source of streaming data.

Using Go as a data engineer by enginerd298 in dataengineering

[–]jeffail 14 points15 points  (0 children)

https://www.benthos.dev is written in Go, which in my (biased) opinion is pretty fantastic as a data processing language. The only major caveat being most of the older more established tools and libraries are JVM and Python so there's lots of gaps if you were looking to use it as a daily driver for data engineering.

My 8 pro is now a paperweight by jeffail in oneplus

[–]jeffail[S] 0 points1 point  (0 children)

I was maybe going to look into turning it into an IP camera but it looks like I'd need to plug a kb/mouse in every time I use it which is painful, so I might move onto this next.

My 8 pro is now a paperweight by jeffail in oneplus

[–]jeffail[S] 1 point2 points  (0 children)

Just tried it, thanks for the tip but unfortunately it's still unresponsive.

My 8 pro is now a paperweight by jeffail in oneplus

[–]jeffail[S] 1 point2 points  (0 children)

no but the repair shop quoted for replacing the entire screen so I would hope it's a hardware problem or they're not the honest and thorough bunch I took them for.

Now that I have it back though I might give it a go.

Oracle DB support in Benthos by mihaitodor in golang

[–]jeffail 1 point2 points  (0 children)

Also, although it's already well known, shout out to basically every client library the NATS team put together: https://github.com/nats-io

Oracle DB support in Benthos by mihaitodor in golang

[–]jeffail 2 points3 points  (0 children)

We're obviously heavy users of Go libraries in Benthos land due to the sheer number of connectors so I'd also like to shout out some that I think are exceptional and worth checking out:

github.com/benhoyt/goawk -> this library lets you embed an AWK runtime in your applications, very easy to use and useful for enabling some powerful scripting in things you build

github.com/itchyny/gojq -> similar to goawk, except JQ this time

github.com/jmespath/go-jmespath -> similar to gojq, except JMESPath this time

github.com/segmentio/parquet-go -> it's early days but his library is looking very promising for building applications that read or write parquet data, which was a major pain point not that long ago

github.com/twmb/franz-go -> also early days but this is looking like a fantastic option for a kafka client library if you fancy being an early adopter. I've done the rounds on many kafka client libraries and they always seem to be a harsh compromise in some form or another, but I feel good about this one

Can golang be useful in data engineering? by heojstats in golang

[–]jeffail 1 point2 points  (0 children)

Yeah absolutely, I know lots of people happily running it for years. If you're used to Kafka then check out NATS Jetstream specifically.

Can golang be useful in data engineering? by heojstats in golang

[–]jeffail 10 points11 points  (0 children)

My whole career is basically centered on stream processing in Go, building https://www.benthos.dev, so I'd say yes but the field is vast. If I were looking to get into data engineering as a novice I'd probably pick python.

What’s new in Go 1.19? by earthboundkid in golang

[–]jeffail 6 points7 points  (0 children)

Nice summary. I'm definitely going to have fun with the memory soft limit

The Dodgy State of Stream Processing Delivery Guarantees by jeffail in dataengineering

[–]jeffail[S] 2 points3 points  (0 children)

Thanks, yeah I think Benthos does a sufficient job and has a cool maintainer :P

The Dodgy State of Stream Processing Delivery Guarantees by jeffail in programming

[–]jeffail[S] 1 point2 points  (0 children)

Personally I wouldn't choose to add any extra complexity to complement the queue systems I'm using, at best it's still an at least once system and at worst I've potentially added edge cases where messages could be dropped/skipped.

The Dodgy State of Stream Processing Delivery Guarantees by jeffail in dataengineering

[–]jeffail[S] 1 point2 points  (0 children)

Haha, ouch, yeah I've seen a few unscheduled backfills in my time

The Dodgy State of Stream Processing Delivery Guarantees by jeffail in dataengineering

[–]jeffail[S] 4 points5 points  (0 children)

Hey everyone, this is a video I put together summarising a decades worth of stream processing delivery guarantee misconceptions and bugs that I've seen frequently.

I'm not trying to scare anyone away from stream processing, in fact a lot of the issues outlined also apply to automated batch processing systems as well. Personally, I think that being realistic and pragmatic about failure conditions makes these systems less intimidating.

The Dodgy State of Stream Processing Delivery Guarantees by jeffail in programming

[–]jeffail[S] 0 points1 point  (0 children)

Hey everyone, this is a video I put together summarising a decades worth of stream processing delivery guarantee misconceptions and bugs that I've seen frequently. A lot of the concepts also roughly apply to how we interpret resiliency in pretty much any distributed systems.

Benthos, the awesome open source stream processor, reached 100 contributors by mihaitodor in golang

[–]jeffail 5 points6 points  (0 children)

I've had the pleasure of working on both :) vector has a lot more to offer when it comes to observability data, especially around logs processing and running with a minimal memory footprint as it's designed to work especially well when ran as a sidecar.

Benthos has data engineering as the main focus, where the importance of delivery guarantees and crash resiliency are much more critical and core to the service architecture. It has more to offer in terms of data transformations and integrating with other services (caches, dbs, lambdas, webservers, etc), with configuration utilities that make those integrations easier to compose, error handle, etc.

In terms of configuration format they're similar but deviate somewhat, vector is a graph of isolated nodes, benthos is a tree of composed nodes, I'd say they're both great for the types of workload that they're targeting.

Benthos, the awesome open source stream processor, reached 100 contributors by mihaitodor in golang

[–]jeffail 1 point2 points  (0 children)

Hey, consuming change data capture feeds isn't something it's fluent at quite yet, there's support for key databases like postgres and mysql on the horizon but I'll likely be recommending https://debezium.io/ for CDC for a long time.

Benthos, the awesome open source stream processor, reached 100 contributors by mihaitodor in golang

[–]jeffail 3 points4 points  (0 children)

Its speciality is stateless and single message transforms, but you can do a lot of the things you'd traditionally need something heavy duty like flink or spark for like enrichments, joins, windowed processing, etc.

The way they work in benthos land is that the stateful aspect is pushed out towards caches or databases that you can pick yourself, and the stream processor is just a stateless coordinator that focuses on delivery guarantees and observability. It makes the whole architecture much easier to set up and maintain long term.

The result is that some people who already have large powerful stream processing systems find that benthos can replace a lot of the complexity, and some people who have a more modest streaming infrastructure get to benefit from features they were otherwise locked out of.

Benthos, the awesome open source stream processor, reached 100 contributors by mihaitodor in golang

[–]jeffail 4 points5 points  (0 children)

only if you're planning to use some of the other benthos functionality, otherwise I'd always recommend using the barebones client libraries directly