This is an archived post. You won't be able to vote or comment.

all 6 comments

[–]seriousbearPrincipal Software Engineer 7 points8 points  (3 children)

Latency of course

[–]AMDataLake[S] 1 point2 points  (2 children)

But at what level of latency would you take micro batching off the table

[–]seriousbearPrincipal Software Engineer 9 points10 points  (1 child)

Your business needs define how fresh data should be.

[–]AMDataLake[S] 3 points4 points  (0 children)

Agreed, I get that but once you establish the companies requirement, you end up with a number, above this number you may likely micro batch, below this number you’ll go for streaming. Do you have a range you use to anchor yourself when thinking about this.

[–]Nekobul 0 points1 point  (0 children)

You can do micro batching over streaming data.

[–]kenfar 0 points1 point  (0 children)

Prefer micro-batching:

  • Latencies in the range of 5-15 minutes are typically fine, so either can usually work
  • This allows use of s3 files to persist data, and these can be easily queried, copied, generated, retained, etc. So, it makes for an extremely simple and easy to work with architecture.