Wednesday, March 20, 2024

Using Aurora replicas for read-only queries

AWS aurora in serverless mode can be configured with replicas to improve failover speeds

Unlike a traditional database replica which are used to switchover when the primary replica. Aurora replica are available for read queries. If we can connect to read url then we can shift the read load to replicas and improve the overall performance.

The process of connecting to a different urls for read and write introduce some context switching in connection pools. Having two separate connection pools is the recommended approach of from aws when using of @Transactional(readOnly = True). Reference aws repo.

The Gist of the approach is as follows

Create two datasource using ConfigurationProperties annotation
Use the AbstractRoutingDataSource to switch between Reader/Writer
Mark it as primary so that Liquibase uses it.
Create a annotation which can be weaved around methods and flips to reader if enabled
Annotate your calls to use readonly datasource instead of transaction=read-only

Friday, October 6, 2023

Akka And Kotlin

Kafka - Topics, Tables and Stream - 10 min introduction

This 400 word condensed introduction to Kafka

Confluent Kafka is distributed storage and computing software which can provide high throughput and low latency.

Storage

The basic unit of Kafka storage is partitioned log called Topic, replicated across machines called brokers.

Topics are divided into partitions and Events/Messages with a same keys end up in same partitions in order, For example events from a single stock will end up on the same partition.

Partitions are replicated for durablility. The messages can flow directly from Producer to Broker-pageCache to Consumer, hence low latency. [Even before it writes to disk]

Producer

Producer fetches meta-data from cluster. The meta data contains information of partition leader. Producer then sends messages directly to leader node. It also supports batching to improve performance via linger.ms and batch.size props to send fewer large IO operations.

Consumer

Consumer reads message from Topics. It connect to the leader too. It send offset and sizes and choose to update offset before processing or after processing them. But can work as distributed system where a single machine can own a partition.

Computing

KStream is a DSL to write higher level code which can do stateful and stateless operation on Topics. This is usually a distributed process which runs in a cluster. This group membership is configured with the application.id setting for Kafka Streams apps,

Ktable: is again a distributed process which runs on a cluster and shows an table view of data in the topic.

Stream table duality

the ksql.service.id setting for ksqlDB servers in a ksqlDB cluster, and the group.id setting for applications that use one of the lower-level Kafka consumer clients in

State Stores:

Can be in memory or in a db called rocksDB. This is no concept of expiration for data in StateStores we need to use tombstone messages to clean the data.

How does kafka steam save state, and can act to share state between nodes. They need a lot of memory. This store can be in-memory or in rocks-db[persistent store].

Stream application have tasks and are balanced like consumer group with application-id.

Schema Registry

Event example

Event have key, value, timestamp.

Example: Alice, came to UK, 2023-10-4

solarpanel, 14kw, 10:10 -- Solar panel generated 14kw at 10:10

Wednesday, October 4, 2023

Kakfa reliable delivery

Reliable delivery means that once a message is sent to Kafka and gets acknowledged, It would survive broker failures.

On Producer Side:

We need to set ACKS=all and have atleast 3 insync replicas

If messageA was written using same producer before message B in same partition the offset of message A would be greater then messageB.

Messages are assumed to be committed when written to page cache of Insync replicas

Committed messages will not be lost if one insync replica survives

Assuming ACKS=all and min.isync.replicas=2

1. Producer writes to leader

2. The leader responds only when it has persisted and all followers fetch from leader. [If min.insync.replicas=2, we write to only 1 more follower, even though topic replication factor is 3]

3. Consumer read above the highwatermark*

Assuming ACKS=1 and min.isync.replicas=2

1. Producer writes to leader

2. The leader responds immediately.

[If min.insync.replicas=2, we write to only 1 more follower and only then increment highwatermark]

3. Consumer read above the highwatermark*

On Consumer Side.

Kafka uses the term high water mark to mark messages which have been fully replicated

Refer: https://rongxinblog.wordpress.com/2016/07/29/kafka-high-watermark/

*High watermark is calculated as the minimum LEO across all the ISR of this partition, and it grows monotonically.

Presto you have a reliable system

Monday, May 8, 2023

Quarkus 3 and Kafka

Tuesday, October 20, 2015

Java 8 Lambdas, Streams, Options vs @NotNull type annotation

I have been using Java8 for a 6 months now and I am not impressed.

Even when you discount the fact that the JVM is not any faster than its predecessor java 7, we still went ahead and upgraded.

Lambdas and Streams
lambdas and method references help to pass some succinct code but you loose the benefit of stack trace.
I think i would stick with strategy pattern and use lambdas only in streams.
Dont use lambdas if the code is complex, it hampers testing, and you also loose IDE support.

Option.
Since i started using relying on tests. I have not had a of null pointer exceptions in production in the last 5 years
And when you also consider good practices like using empty collections instead of null and equality checking on constants i.e"constant.equals(Varibable) and not the other way around. The incidence of null pointers have greatly diminished.
In java 8. There is a also a feature call @NotNull coming soon. (JSR-308)
Using these techniques I am confident that i would not be using Option to wrap my logic to defend null pointers.

Update: 05/2023: No longer hold above views

Saturday, October 17, 2015

A final word about final, finally, finalize

Class final variables have to be set when they are declared or in the constructor
Local final variables have to be set at declaration or later

Finalize is called during garbage collection

Finally is part of try.
try with resources eliminates the need for catch. If these cause any exceptions (try-with-resources) swallows them silently.
try-with-resources can still have a finally block.

Bad practice is to throw new exceptions from finally. [We loose context of original exception]
try and finally are also used with locks.
Exceptions in finally are nasty

lock.lock();
try{
  //do critical section code, which may throw exception
} finally {
  lock.unlock();
}

Final classes cannot be subclassed, java have covarient returns types so for immutability final is required.

final methods cannot be overridden. (Saying dont override this)
static methods are not called polymorphically.

final class variables need to be set or set in a constructor
final local variables can be set once
final method argument are there to ensure that object references cannot be reset unintenionally.

tech-indirection