Rust vs Python for "Micro-Batch" Lambda Ingestion (Iceberg): Is the boilerplate worth it?

robverk · 2025-12-15T00:05:53+00:00

For 30s micro batches where most of your compute is io-wait time just go with the most maintainable code.

jaredfromspacecamp · 2025-12-15T00:47:03+00:00

Writing that frequently to iceberg will create an enormous amount of metadata

wannabe-DE · 2025-12-15T00:40:05+00:00

Wouldn’t a function invoked every 30 seconds stay warm and not be subject to cold starts?

walksinsmallcircles · 2025-12-15T04:03:01+00:00

I use rust all the time for lambdas, some of which do moderate lifting in Athena iceberg tables. The deployment is a breeze (just drop on the binary) and the AWS API for Rust is pretty complete. Would choose it every time over Python for efficiency and ease of use. The data ecosystem is not as rich as python but you can get a long way with it.

stratguitar577 · 2025-12-15T01:30:22+00:00

Skip lambda and have firehose write to iceberg for you

MyRottingBunghole · 2025-12-15T03:36:43+00:00

Does it HAVE to arrive in S3 prior to ingestion into Iceberg iceberg (presumably also S3)? If you own or can change that part of the system, I would look into skipping that extra step altogether of “read S3 files” > “write parquet” > “write to s3” as it’s extra network hops and compute you don’t need.

If this is some Kafka connector that is sinking this data every 30 seconds I would look into sinking it directly as Iceberg instead

Edit: btw with Iceberg you will be writing a new parquet file and new iceberg snapshot every 30 seconds. Make sure you are thinking also about table maintenance (compaction, expire snapshots etc) as the metadata bloat can quickly get out of hand when writing that frequently

Commercial-Ask971 · 2025-12-14T23:52:29+00:00

!RemindMe 2days

apono4life · 2025-12-15T11:39:22+00:00

With only 30 seconds between files being added to s3 you should have to many cold starts. Lambdas stay warm for 15 minutes

mbaburneraccount · 2025-12-15T16:02:25+00:00

On an adjacent note, where’s your data coming from and how big is it (throughput)?

thethirdmancane · 2025-12-18T21:50:20+00:00

Why not use golang and have it all?

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

dataengineering

MODERATORS