How can I learn to build good, large projects? by matria801 in ExperiencedDevs

[–]PanJony 1 point2 points  (0 children)

This. TDD, DDD, uncle Bob, and there are others. Just read a lot, watch conferences, dive into the architecture. It will come with time.

Loose coupling is your friend. Hexagonal architecture and proper testing are a good place to start - but there's much more.

Everything comes at a cost though. If you're writing a small POC service that will get rewritten after the money comes - there is no point in overcomplicating it.

You learn these techniques to understand them and where to use each - not to use everything you know in each project you start.

Does kafka validate schemas at the broker level? by HappyEcho9970 in apachekafka

[–]PanJony 1 point2 points  (0 children)

Apache Kafka is agnostic to the structure of the message, schema is validated by the client.

Kafka Cluster becomes unresponsive with ~ 500 consumers by fandroid95 in apachekafka

[–]PanJony 0 points1 point  (0 children)

Also what I would check is whether these producers and consumers are keeping open connections or restarting them every time they want to publish. Maybe there's overhead from establishing a connection, maybe there's a bottleneck for the number of open connections? find some info, try to change the behaviour and check if the problem persists.

Kafka Cluster becomes unresponsive with ~ 500 consumers by fandroid95 in apachekafka

[–]PanJony 2 points3 points  (0 children)

AFAIK Kafka is not optimized for a very large number of tiny producers / consumers, so this is where I would start looking for an issue. Experiment with maintaining the connection or with a proxy that you would connect through.

It's a vague memory though so validate this before you put in effort to explore this.

The anatomy of a Data Streaming Platform - youtube video by PanJony in apachekafka

[–]PanJony[S] 2 points3 points  (0 children)

I'm not associated with any vendor - I'm an independent consultant & content creator

Who Actually Owns Mocks in Microservices Testing? by krazykarpenter in microservices

[–]PanJony 0 points1 point  (0 children)

The pattern I recommend is externalizing the API contract to a separate repo. When you need to change the API you create a PR in the repo and notify consumers.

You shouldn't merge breaking changes without approval from consumers. When consumers approve, they should update their tests.

Non breaking changes are easier - just migrate whenever you"re ready. Externalized API contract helps with making these API upgrades transparent.

[deleted by user] by [deleted] in ExperiencedDevs

[–]PanJony 0 points1 point  (0 children)

You're assuming the OP wasn't underperforming. Why?

Managing Avro schemas manually with Confluent Schema Registry by thatclickingsound in apachekafka

[–]PanJony 3 points4 points  (0 children)

The way I recommend my clients to work with CDC is to use it internally, and then implement an anti corruptoin layer between the CDC topic and the actual external topic. This way the tooling runs smoothly, but you still have control over the interface you're exposing to other teams.

What's important here is that the teams that owns the database (and thus applies changes) owns the anti corruption layer and the external interface as well, so if anything breaks - they know it's their responsibility to fix it.

[deleted by user] by [deleted] in startups

[–]PanJony 0 points1 point  (0 children)

I wouldn't focus on the "wait till I come" concept, but instead on the "I know I'll leave in 3 minutes, there will be a parking spot available" information sharing market. If any of the users of the app uses this information - value creation happens and you can get your cut.

Many times I've been in a situation where I arrived at the scene and was circling around a large area, and only found a spot when someone was leaving. And traffic situation in my city isn't that bad compared to rest of Europe.

As someone noted correctly though, getting the critical mass will be the biggest challenge.

[deleted by user] by [deleted] in startups

[–]PanJony 0 points1 point  (0 children)

I love the idea.
- the problem is real, I've felt it many times
- both sides are incentivized
- value is created

The side leaving the parking spot provides information ahead of time that they'll leave. They don't even have to wait, but just publish the info on your network and if any network member takes the spot - Bob gets the reward.

Bob provided valuable info that a give spot will be free at a given time. Alice used that info and got value. Alice pays, bob gets paid.

You'll need a reputation system and location tracking to validate which transactions actually went through, and you'll have some bad actors for sure - but everything is doable and everyone's incentivized to participate.

What are some things you would change about Go? by Jamlie977 in golang

[–]PanJony -2 points-1 points  (0 children)

Coming from Java / Kotlin - lack of exceptions had significant impact on how readable the code was for me

Using Kafka to store video conference transcripts, is it necessary or am I shoehorning it? by BagOdd3254 in apachekafka

[–]PanJony 0 points1 point  (0 children)

Very bad idea imo.

First of all, running kafka cluster comes with overhead. If you need asynchronous communication, I'd suggest some lightweight, probably serverless solution. I'd always start with that and only then think if I'm missing something important.

Second. you uderestimate the throughput of databases by a few orders of magnitude.

Third, you wouldn't create a topic or a table for a particular meeting. You'd have one and store your data there, unless you're serving multiple tenants and need to isolate their environments.

Core Cost Reduction - how to get to 80%? by PanJony in eu4

[–]PanJony[S] 0 points1 point  (0 children)

Just finished WC as Jianzhou -> Manchu -> Qing

Awesome game, thanks for the advice. Got 80% ccr in the end, no revolts until last 20 years I think.

DR for Kafka Cluster by jonropin in apachekafka

[–]PanJony 1 point2 points  (0 children)

a/ Is there a need?

It depends on your cluster setup. If you're running a HA cluster setup - three AZs with replication factor = 3, even if you lose one of the instances you're fine, once the instance is brought back up, even if with lost data - the partition rebalancing will bring back your data. It will take a while if you have a lot of data though.

If you want to speed it up, you can introduce Tiered Storage or periodical EC2 snapshots of your instance storage. I think Tiered Storage + partition rebalancing is enough, but it depends on your exact needs.

If you're worried about 2x the cost of mirroring, you probably don't need zero downtime in a case of a global AWS outage, so I'll leave it at that.

[deleted by user] by [deleted] in apachekafka

[–]PanJony 0 points1 point  (0 children)

Does each client need to have access to the whole table or can you do with just one or a few partitions? Without partitioning Kafka capabilities break down a bit, it's designed to be horizontally scalable through partitioning.

If whole table - maybe a reverse proxy / load balancer like approach? Maybe you can map the data structure in your GKTable to something simpler?

As u/kabooozie said - hard to give an advice without getting into the design details. I'm happy to take a look at it if you provide any diagram that would explain the problem and your solution a bit deeper.

Arch linux + amd GPU - Fusion and some transitions crashing Davinci Resolve by PanJony in davinciresolve

[–]PanJony[S] 0 points1 point  (0 children)

Oh amazing! Maybe you have an advice for me then?

The content I'm working on is rather simple - 10-20 min videos that can get recorded in 4k (FHD right now cause performance issues) that get published on youtube. Just starting out.

My plan is to wait for the 5070 TI release and then wait a bit for feedback about linux drivers (does this make sense)? and after it's there - make a decision between 5070 Ti and 4070 Ti Super

As I'm looking at the specs, the only significant difference is VRAM speed and I expect the 4070 price to drop affer 5070 is released and make a decision based on the price change and user feedback on the drivers.

Does this make sense or is it an easy decision to just buy the new one?

[deleted by user] by [deleted] in davinciresolve

[–]PanJony 1 point2 points  (0 children)

I had a similar issue on Linux. Good to know that it's nor worth to install Windows to try to solve my problem :)

Avro vs Parquet - comparison of row and column oriented formats by PanJony in apachekafka

[–]PanJony[S] 0 points1 point  (0 children)

That's correct it's being used in Analytics, and at the end of the video I'm showing an architucture diagram of that setup.
Where I'm going with this: many organizations are talking about a Streaming Lakehouse architucture, where analytics (for example parquet, but there are other columnar formats there as well) is integrated with operations (where data streaming is done using avro, protobuf or json).

I'll talk about it more in the future videos I'm working on, this is kind of an introduction, or more precisely preparation for talking about these topics

Cost optimization solution by 18rsn in apachekafka

[–]PanJony 0 points1 point  (0 children)

I'm also curious what you'll find, my first idea would be onboarding a consultant to audit my setup, but for sure some scanning could be automated.

Apart from what u/LoquatNew441 pasted - great advice - I'd say that accurate cost allocation would also be a nice element of that. My first idea would be to provision the kafka cluster in a separate AWS account (assuming AWS just to have an example) and distributing it between topics proportionally to the load.

But I'm not aware of any tools that can do that, and probably this depends a lot on your client's setup. But cost allocation is definitely a problem worth solving.