Friday, October 6, 2023
Kafka - Topics, Tables and Stream - 10 min introduction
This 400 word condensed introduction to Kafka
Confluent Kafka is distributed storage and computing software which can provide high throughput and low latency.
Storage
Producer
Consumer
Computing
KStream is a DSL to write higher level code which can do stateful and stateless operation on Topics. This is usually a distributed process which runs in a cluster. This group membership is configured with the application.id setting for Kafka Streams apps,
Ktable: is again a distributed process which runs on a cluster and shows an table view of data in the topic.
Stream table duality
State Stores:
Schema Registry
Wednesday, October 4, 2023
Kakfa reliable delivery
On Producer Side:
If messageA was written using same producer before message B in same partition the offset of message A would be greater then messageB.
Messages are assumed to be committed when written to page cache of Insync replicas
Committed messages will not be lost if one insync replica survives
Assuming ACKS=all and min.isync.replicas=2
1. Producer writes to leader
2. The leader responds only when it has persisted and all followers fetch from leader. [If min.insync.replicas=2, we write to only 1 more follower, even though topic replication factor is 3]
3. Consumer read above the highwatermark*
Assuming ACKS=1 and min.isync.replicas=2
1. Producer writes to leader
2. The leader responds immediately.
[If min.insync.replicas=2, we write to only 1 more follower and only then increment highwatermark]
3. Consumer read above the highwatermark*
On Consumer Side.
Refer: https://rongxinblog.wordpress.com/2016/07/29/kafka-high-watermark/
*High watermark is calculated as the minimum LEO across all the ISR of this partition, and it grows monotonically.
Presto you have a reliable system