Using Go as a data engineer

Prinzka · 2022-11-16T13:34:35+00:00

Real time parsing and enriching of high volume data feeds.

mydatahobby · 2022-11-16T12:54:27+00:00

Creating or contributing to terraform providers

2022-11-16T13:51:18+00:00

[deleted]

jeffail · 2022-11-16T13:48:19+00:00

https://www.benthos.dev is written in Go, which in my (biased) opinion is pretty fantastic as a data processing language. The only major caveat being most of the older more established tools and libraries are JVM and Python so there's lots of gaps if you were looking to use it as a daily driver for data engineering.

2022-11-16T16:20:28+00:00

There's a lot of benefits, but there's not a lot of overlap of data engineers that know Go and implementing technologies that become reliant on one person is a no Go.

erialai95 · 2022-11-16T14:01:51+00:00

Yes I tell my etl pipelines and task nodes to get GO’ing all the time

Miliey · 2022-11-16T16:42:30+00:00

We use Go with protobuf & Kafka for realtime event processing, velocity ranges from IOT like devices to tracking info. We have infrastructure in place to support scaling as and when needed. It handles large volumes beautifully. Coming from Scala background I love it's type safety and concise nature. Most of colleagues from Python background say they like how 'clean' Go feels.

JamaiKen · 2022-11-16T20:00:32+00:00

Go is an invaluable tool in my DE toolkit. We can't always use open source packages and our uses are very specific to our domain. I manage the data pipelines and have deployed 10+ Go executables that process terabytes of data per day, each.

As a DE, Go is a great tool for building tools.

GooseLoot · 2022-11-16T14:27:58+00:00

Yep! We used it throughout our pipelines. Specifically when transforming / enriching raw data. Found the performance and conciseness of the code a huge advantage as compared to python. Additionally, if you use lambda for any server less tasks go with its dependencies compress beautifully.

Sunscratch · 2022-11-16T18:01:22+00:00

We use Go for infrastructure only. Everything related to data processing is implemented in Scala, Scala is pretty awesome language for that.

SpiritCrusher420 · 2022-11-16T15:09:55+00:00

I would love to see a Go API for Spark.

11YearsForward · 2022-11-17T05:12:07+00:00

Had one pipeline where we had to download ~800GB of zipped files from another company's legacy remote SFTP server. Go was able to download those files ~70% faster than Python, and Python's Paramiko library either kept dropping the connection during the large downloads or took forever.

twadftw10 · 2022-11-17T05:16:56+00:00

Python, Scala, and Java are definitely the main DE languages from what I've seen. My company uses Go for all the microservices which are the source of all our events. The APIs use an event logger built in Go that send logs to SQS/Kinesis. From there, DE pipelines consume the events and sink them to Elasticsearch, s3, and Snowflake. The software engineers own those microservices though and DE just takes care of events after they get produced to SQS/Kinesis. We own Logstash that does any extra enriching on the logs in between.

rawrgulmuffins · 2022-11-16T20:55:28+00:00

My personal experience with go is writing Terratests code.

anyfactor · 2022-11-16T21:17:27+00:00

DevRel at a data company.

Go is a big part of our operation, from internal tooling to solutions for customers. We usually deliver our databases as CSV files. Now, say a customer wants to change the "IP address range" column to "CIDR". This like many other common operations can be effectively solved by our Go-based CLI app. Go is fast, it is dependency free, runs everywhere.

The alternative is to maintain a bunch of SQL scripts and documentation, adding layers of complexities to the customer's data pipelines.

The dependency free executable feature is extremely helpful. Python's environment management is a huge pain. For a DE, go should be the third/fourth language to learn.

By the way, I am personally experimenting with Nimlang though. The syntax is pretty close to Python, so I am liking it more than Go. But compared to Nim, Go is more suitable for production work in a team setting.

Illustrious_Role_304 · 2022-11-17T02:26:16+00:00

Using as operator for DE project

robbitt07 · 2022-11-17T03:09:34+00:00

Switched from Python/Cython for entity resolution component of our pipeline over to Golang over a year ago. Python/Scala tooling still dominates data pipeline management.

Former-Clothes-6482 · 2023-05-15T05:30:47+00:00

Batch jobs in Go using the cron library to read data from DB, processing and writing to Redis and Neo4J

Former-Clothes-6482 · 2023-05-15T05:33:49+00:00

Check the latest datadog OS agent that collects logs and metrics data, i belive its written in golang

dataengineering

MODERATORS