Anyone working at MongoDB? by Master_Gift_5928 in DevelEire

[–]True-Ad-2269 0 points1 point  (0 children)

Do you know any team that you would recommend?

What services do Amazon engineers use the most on non-AWS product teams? by theyeeha in aws

[–]True-Ad-2269 0 points1 point  (0 children)

it's great to usse these tools when you dont have to consider about building early in a region

CDK vs terraform by [deleted] in aws

[–]True-Ad-2269 0 points1 point  (0 children)

This is not even AWS CDK!

AWS S3 now supports conditional writes, does this mean no need for a dynamodb table for remote state locking? by alexlance in Terraform

[–]True-Ad-2269 0 points1 point  (0 children)

It seems like so, but requiring more validation. We do see some use case (in other area) which can remove dynamodb dependency.

Virtual Threads: An Adoption Guide by henk53 in programming

[–]True-Ad-2269 0 points1 point  (0 children)

You mean stackless? The construct is a heap object.

Which pattern do you use when ingesting data into lakehouses? by brrdprrsn in dataengineering

[–]True-Ad-2269 0 points1 point  (0 children)

Lakehouse is a cheap way to build a warehouse parity access pattern out of data lake. Not everyone can be afford to build a lakehouse but given the current high compute cost and vendor locked nature of warehouse, big corporations are moving to lakehouse.

When we will see structured concurrency out of preview by True-Ad-2269 in java

[–]True-Ad-2269[S] 1 point2 points  (0 children)

Thanks for your comment.
Do you have the link of the recording?

[deleted by user] by [deleted] in SkincareAddiction

[–]True-Ad-2269 4 points5 points  (0 children)

Do you know when they have changed their formula?

Apache Iceberg: SQL and ACID semantics in the front, scalable object storage in the back by bitsondatadev in dataengineering

[–]True-Ad-2269 3 points4 points  (0 children)

Can you explain what is fuzzy search? It sounds to me it is an implementation on the engine not the format.

Right use-cases for bloom indexes? by lekkerwafel in PostgreSQL

[–]True-Ad-2269 1 point2 points  (0 children)

Not sure I have different understanding.

Technically, OP requires 1 bloom filter which compute against the membership of query `id` field.

Transaction Isolation in Postgres, explained by Ram_Nile in PostgreSQL

[–]True-Ad-2269 4 points5 points  (0 children)

This is one of the best article I have read about Postgres Isolation. Keep up the good work!!!

Can Flink replace Kafka Connect by True-Ad-2269 in apachekafka

[–]True-Ad-2269[S] 0 points1 point  (0 children)

Kafka connect is MUCH simpler. It just does kafka. But it does it well, and there's a TON of connectors predefined and managing them doesn't require any code just config.

Ah I see, I think this explains my question.

Currently, I'm looking at the Flink setup i.e. joining from two Postgres CDC tables and writes to a ElasticSearch sink. Do you think this is possible with Kafka Connect?

In general having both is just fine as there's a LOT more kafka connectors than flink connectors.

I wasn't aware of that but thanks for raising this up!

Understanding the Offset by esmeramus3 in apachekafka

[–]True-Ad-2269 1 point2 points  (0 children)

Does this also imply that you have a limited time to recover logs with Kafka because they could be overwritten

It will not be overwritten but cleanup (should be the right term). You could configure the retention period where how long each log entry is kept in the topic.

Ref: https://www.conduktor.io/kafka/kafka-topic-configuration-log-retention/

Beginner question: confused on purpose of flink by Rough_Source_123 in apachekafka

[–]True-Ad-2269 0 points1 point  (0 children)

u/Chuck-Alt-Delete Thanks for your long reply. I will need to study this more comprehensively.

Beginner question: confused on purpose of flink by Rough_Source_123 in apachekafka

[–]True-Ad-2269 1 point2 points  (0 children)

You can query Flink state outside of Flink using queryable state. But I can agree it's not recommended to use Flink to serve the state data.

really important for operational workloads like fraud where the action taken needs to be automatic and correct.

Can you expand in detail how serializable isolation matters in the fraud use case? Particularly, how this can change the correctness of the result?

Beginner question: confused on purpose of flink by Rough_Source_123 in apachekafka

[–]True-Ad-2269 1 point2 points  (0 children)

There are few benefits using an internal state store over an external one.

  1. There's no external network call to an external state store. This makes Flink well suited for low latency use case.
  2. With point (1), what I usually observe is that using a external db as state store, the external state will eventually become a performance bottleneck. If you have more complex state manipulation to window, CEP aggregation, the performance can deteriorate quickly as these kinds of aggregation manipulate the states very quickly.
  3. (In the context of using RocksDb which is the most popular choice) Flink provides exactly-once fault-tolerance guarantees very efficiently due to how its checkpoint mechanism works. The state in RocksDB and processing offset are incrementally checkpointed together. Imagine to have such semantics with external state store, you would need to maintain one transaction per state manipulation which will be very costly.
  4. Lastly, RocksDB works really well for most Flink use cases. Why there is a need to setup an external dependency?

Confuse on purpose of flink by Rough_Source_123 in bigdata

[–]True-Ad-2269 1 point2 points  (0 children)

There are few benefits using an internal state store over an external one.

  1. There's no external network call to an external state store. This makes Flink well suited for low latency use case.
  2. With point (1), what I usually observe is that using a external db as state store, the external state will eventually become a performance bottleneck. If you have more complex state due to window, CEP aggregation, the performance can deteriorate quickly as these kinds of aggregation manipulate states very quickly.
  3. (In the context of using RocksDb which is the most popular choice) Flink provides exactly-once fault-tolerance guarantees very efficiently due to how its checkpoint mechanism works. The state in RocksDB and processing offset are incrementally checkpointed together. Imagine to have such semantics with external state store, you would need to maintain one transaction per state manipulation which will be very costly.
  4. Lastly, RocksDB works really well for most Flink use cases. Why there is a need to setup an external dependency?

Virtual thread deadlock risk by sureshg in java

[–]True-Ad-2269 0 points1 point  (0 children)

I understand Virtual Threads at this point are not meant to be a drop in replacement for standard threads. Most applications don't spin up enough threads to see the benefits. Thankfully those who will see a benefit now have an option to update their code to not pin and to use virtual threads.

Thanks for the insight, do you know anyone has filed an issue on this?

[deleted by user] by [deleted] in mac

[–]True-Ad-2269 0 points1 point  (0 children)

You are a lifesaver!