CAP

Basic Concept

  • CAP theorem states that, it is impossible to build an implementation of read-write storage in an asynchronous network that satisfies all three factors of consistency, availability and partition tolerance at the same point in time, in case of a distributed system

    • Consistency - every read gets most recent write or an error… the question we ask here is will all executions of reads and writes seen by all nodes be atomic or linearizably consistent

      • Atomic or linearzable consistency

        • B:set(5), A:set(10), A:get() = 10, B:get() = 10
        • Atomic consistency ensures that external communication about the value of a register is respected. That is, if I read X and tell you about it, you can go to the register and read X for yourself. Therefore following situation depicts a case where though by the serial history shows as if the atomic consistency is assured, in reality it is not
          • What is not
            • B:set(5), B:get() = 5, A:set(10), A:get() = 10
    • Availability - every request receives a response, without the guarantee of it being the latest… the question we ask here is will a request made to the data store always eventually complete?

    • Partition Tolerance - the system continues to operate despite of network partitions (caused out of network failures)

    • Note

      • A read write storage is a register which supports following 2 operations
        • set(X) sets the value of the register to X
        • get() returns the last value set in the register
      • An asynchronous network is which has no bound on how long a message will take time to get delivered to the destination
      • Availability - a data store is called available if ALL requests return a response and not an error message
      • Partition - a partition is when a network fails to deliver messages to one or more nodes by completely loosing them
  • In real world scenario, because network fails all the time, it is impossible to a distributed system which guarantees C & A

  • CP - in a consistent and partition tolerant system, requests may get responses or an error (because of the partition effect)

  • AP - in an available and partition tolerant system, requests may get stale data and writes may require some time to get across all nodes (and the nodes are said to become eventually consistent)

  • History

    • Dr. Eric Brewer gave a keynote speech at the Principles of Distributed Computing conference in 2000 called ‘Towards Robust Distributed Systems’. In it he posed his ‘CAP Theorem’ - at the time unproven - which illustrated the tensions between being correct and being always available in distributed systems.
    • Two years later, Seth Gilbert and Professor Nancy Lynch - researchers in distributed systems at MIT - formalised and proved the conjecture in their paper “Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services”
References

Latency & Throughput

Basic Concept

  • Latency is the time taken to accomplish a task, it is measured in unit of time.
  • Throughput is number of tasks accomplished within a period of time, it is measured in whatever number of task has been accomplished within the unit of time.

Design example

  • Let’s assume clock frequency is 100 MHz (which means 100000000 oscillation/pulses or clock periods per second)
  • Let’s assume time taken for a computation task being 1000ns; which means 1.0E-6 seconds; note this indicates the latency of the task
  • Let’s assume throughput being 640 Mbits/second (which means 6.4e+8 bits/seconds)
  • Let’s assume word width of each output is 64 bit

Considering above facts below calculations shows that 1 operation will have a latency of 100 clock periods & there can 0.1 words at max produced by the system given a clock period; therefore 10 words can be produced within the latency period

Latency: 1000 ns = 1000 ns * (1 s / 10^9 ns ) * ( 100 * 10^6 clock periods/ 1s) = 10^11/10^9 = 100 clock periods.

Throughput = 640 Mbits / s = (640 * 10^6 bits/s) * (1 word / 64 bits) * ( 1 s / 100 * 10^6 clock periods) =  640 * 10^6 / 64 * 100 * 10^6 = 10 * 10 / 100 = 1 / 10 = 0.1 words / clock period.
References
  1. https://community.cadence.com/cadence_blogs_8/b/sd/posts/understanding-latency-vs-throughput

Consistent Hashing

Basic Concept

  • consistent hashing is a way in which keys and nodes are mapped to same id space; through the use of similar hash function
  • the algorithm ensures that moving clockwise, a key is being mapped to nearest node
  • the key benefit of the mechanism is this way, if the hash function creates truly random outcome, keys are uniformly distributed, across the nodes
  • another key benefit of consistent hashing over hash table is unlike hash table, the impact of addition or removal of a node is quite small. once, added or removed, the algorithm routes keys to newly added node if it is faster reachable than more distant node or keys are directed to next reachable node, should a node in between gets removed
  • this specific behavior has potential to create cluster of keys on a specific node
  • the solution to this problem is creating virtual id for A node, so that the node is found in multiple position and keys get routed to the nodes
References
  1. https://www.youtube.com/watch?v=zaRkONvyGr8

Load Balancer Basic

  • Basic Concept

    • Load balancing is all about aggregating multiple components in order to achieve a total processing capacity above each component’s individual capacity, without any intervention from the end user and in a scalable way. Load balancing improves application responsiveness and availability.
    • The mechanism or the component which performs load balancing operation is called a load balancer.
    • Load balancing results in more operations being performed simultaneously by the time it takes a component to perform only one. Metaphorically, this is equivalent of having a number of lanes on a highway that allows as many cars to pass during the same time frame without increasing their individual speed.
  • Load Balancing @

    • link level : balances load while interfering between a WAN and 1 or more LANs, choosing the network link to send a packet to;
    • network level : balances load while routing traffic across multiple network routes; choosing what route a series of packets will follow; this sort of load balancing happens at Layer 4 of OSI layers
    • server level : balances load while routing traffic across multiple servers deciding what server will process a connection or request.
  • Categories

    • Packet level load balancing processes packets more or less individually. There is a 1-to-1 relation between input and output packets, so it is possible to follow the traffic on both sides of the load balancer using a regular network sniffer. This technology can be very cheap and extremely fast. Usually stateless, it can also be stateful, may support DSR (direct server return, without passing through the LB again) if the packets were not modified, but provides almost no content awareness. This technology is very well suited to network-level load balancing, though it is sometimes used for very basic server load balancing at high speed.
      • Packet-based load balancers are generally deployed in cut-through mode, so they are installed on the normal path of the traffic and divert it according to the configuration. The return traffic doesn’t necessarily pass through the load balancer.
    • Session content based load balancing processes session contents. It requires that the input streams is reassembled and processed as a whole. The contents may be modified, and the output stream is segmented into new packets. For this reason it is generally performed by proxies and they’re often called layer 7 load balancers or L7. This implies that there are two distinct connections on each side, and that there is no relation between input and output packets sizes nor counts. Clients and servers are not required to use the same protocol (for example IPv4 vs IPv6, clear vs SSL). The operations are always stateful, and the return traffic must pass through the load balancer. The sheme offers wide possibilities and is generally achieved by pure software. This technology is very well suited for server load balancing.
      • Proxy-based load balancers are deployed as a server with their own IP addresses and ports, without architecture changes.
  • Traffic Routing Algorithm

    • Load Balancer uses one of the following algorithms to route the traffic to one of the target servers
      • Least Connection Method — directs traffic to the server with the fewest active connections. Most useful when there are a large number of persistent connections in the traffic unevenly distributed between the servers.
      • Least Response Time Method — directs traffic to the server with the fewest active connections and the lowest average response time.
      • Round Robin Method — rotates servers by directing traffic to the first available server and then moves that server to the bottom of the queue. Most useful when servers are of equal specification and there are not many persistent connections.
      • IP Hash — the IP address of the client determines which server receives the request.
  • Health Check

    • Presence of number of servers that can serve a given request equally well, increases number of possible paths for the traffic. This in turn increases the risk of failure; in very large environments. For this reason, load balancers verifies that the components it intends to deliver the traffic to are still alive and reachable, and it will stop delivering traffic to faulty ones. This is usually achieved using one of the following methods
      • In the first method, periodically a probe is being sent to ensure the component is still operational. These probes are called “health checks”. They must be representative of the type of failure to address. For example a ping-based check will not detect that a web server has crashed and doesn’t listen to a port anymore, while a connection to the port will verify this. The period between checks must be small enough to ensure the faulty component is not used for too long after an error occurs.
      • In the second method, production traffic is sampled and sent to a destination to observe if it is processed correctly or not, and to evict the components which return inappropriate responses. However this requires to sacrifice a part of the production traffic and this is not always acceptable.
    • A central monitoring agent shows the outcome of health check
  • Session Stickiness

    • Load balancers attempt to solve an important problem named as Session Stickiness. Session Stickiness is mechanism that helps Load Balancers to direct multiple subsequent requests or connections from a same origin (such as an end user) to the same target. The best known example is the shopping cart on an online store. If each click leads to a new connection, the user must always be sent to the server which holds his shopping cart. The solution against this issue consists in memorizing the chosen target so that each time the same visitor is seen, he’s directed to the same server regardless of the number of available servers. The information may be stored in the load balancer’s memory, in which case it may have to be replicated to other load balancers if it’s not alone, or it may be stored in the client’s memory.
  • Encryption & Decryption

    • Secure Sockets Layer (SSL) is the standard security technology for establishing an encrypted link between a web server and a browser. SSL traffic is often decrypted at the load balancer. When a load balancer decrypts traffic before passing the request on, it is called SSL termination. The load balancer saves the web servers from having to expend the extra CPU cycles required for decryption. This improves application performance.

    • However, SSL termination comes with a security concern. The traffic between the load balancers and the web servers is no longer encrypted. This can expose the application to possible attack. However, the risk is lessened when the load balancer is within the same data center as the web servers.

    • Another solution is the SSL pass-through. The load balancer merely passes an encrypted request to the web server. Then the web server does the decryption. This uses more CPU power on the web server. But organizations that require extra security may find the extra overhead worthwhile.

Java 8 Functional Interfaces

Java 8 onwards, the concept of Functional Interface has been introduced. There are well defined functional interfaces, available out of the box. Agenda for this blog is to understand, how can we design better java code, while using these out - of - the - box functional interfaces.

To start, let’s review the definition of a functional interface

  • A functional interface is one that has excatly one abstract method

Further documentation about the functional interfaces is available here https://docs.oracle.com/javase/8/docs/api/java/util/function/package-summary.html.

Following table summarizes significant functional interfaces from this package.

Interface Name Argument Return Use Case
Predicate T boolean Was the album published on 1978?
Consumer T void Print name of the albums
Function<T,R> T R Get the name of the first album by the band, Rolling Stone
Supplier None T Generate a random number
UnaryOperator T T Convert name of an album to upper case
BinaryOperator (T, T) T Multiply two numbers

Data Set

To facilitate discussion, we have assumed following data source. It is a collection of musical albums compiled by Rolling Stone mangazine. Each entry has an associated year of publication, name, artist, genre and subgenre.

Year Album Artist Genre Subgenre
1966 Pet Sounds The Beach Boys Rock Pop Rock, Psychedelic Rock
1966 Revolver The Beatles Rock Psychedelic Rock, Pop Rock
1965 Highway 61 Revisited Bob Dylan Rock Folk Rock, Blues Rock

Code for the corresponding domain model is following

public class RollingStoneAlbum {

private String year;
private String album;
private String artist;
private String genre;
private String subgenre;

// with getters and setters

}

With ground work done, lets deep dive.

What is a Predicate interface and how can I use it?

Predicate interface often represents a condition that can evaluate either into a true or false. The generic type T represents the input type to the predicate. Depending on type of input, you may decide to choose a LongPredicate, DoublePredicate or IntPredicate where the first part of the predicate indicates the type of input. There is an interesting variation of Predicate, being known as BiPredicate<T,U>. A BiPredicate represents what predicate represents, but can take two parameters instead of one.

Lets think of the use case, that we want to print the name of the album only if it is published before year 1970. A predicate expression for the same will be

Predicate<RollingStoneAlbum> isBefore1970 = (x) -> {return Year.of(Integer.parseInt(x.getYear())).isBefore(Year.of(1970));};

explanation

RollingStoneAlbum domain represents the input type for the predicate. Within the lambda expression, we are verifying if the publication year of a given RollingStoneAlbum is before the year 1970

Now if we want to create a print method that will print an entry if the predicate returns true, it might look like following

public static void print(List<RollingStoneAlbum> list, Predicate<RollingStoneAlbum> predicate) {
		for (RollingStoneAlbum album : list) {
			if (predicate.test(album)) {
				System.out.println(album);
			}
		}
	}

As you can see, the if block, invokes the test method of the given predicate to assert if the function should print the name of album.

As a matter of fact, predicates can be combined using AND and OR constructs.

For an example, if we decide, at a later point in time to print ONLY those albums whose publication year is after 1965 and before 1970, we can very well write a second predicate

Predicate<RollingStoneAlbum> isAfter1965 = (x) -> {return Year.of(Integer.parseInt(x.getYear())).isAfter(Year.of(1965));};

And combine them as

isAfter1965.and(isBefore1970)

In summary, Predicates are used to design a lamda expression, that returns a boolean as outcome.

What is a Consumer interface and how can I use it?

Consumer interface represents operations which *accepts* a single input argument and returns *no result*. Consumer operations operate via *side effect*. Side effect, in simple term is the change that occures to the state of the program, once the operation completes.

For our case, lets design a consumer which will change name of the artist of a given album into upper case.

Consumer<RollingStoneAlbum> consumer = (x) -> {x.setArtist(x.getArtist().toUpperCase());};

Observe, that the expression, sets or mutates an entry’s name into upper case (side effect).

A print function for the same might look like following

public static void print(List<RollingStoneAlbum> list, Consumer<RollingStoneAlbum> consumer) {
		for (RollingStoneAlbum album : list) {
			consumer.accept(album);
			System.out.println(album.getArtist());
		}
	}

As you can see, the print function accepts a Consumer, and once it applies the Consumer function, name of the Artist, is updated into upper case.

Like Predicate, there are Typed Consumer for Int, Double, Long input types. Consumers can be chained together to form a composite consumer. There is another variety of Consumer, that takes two arguments instead one, being called as BiConsumer.

In summary, Consmers are used to design lambda expression, that takes an argument, does not return anything, and creates impact via side effect.

What is a Function interface and how can I use it?

Function<T,R> represents a lambda expression that accepts one argument T, and produes a result R. For an exmaple if we want to design a funtion that returns a Map<String,Integer> (where the key, a String type, represents the name of the album and corresponding value, an Integer type, represents the number of letters in the name) by working out the list of RollingStoneAlbums it would look like following

public static void functionExample(List<RollingStoneAlbum> list) {
		Function<List<RollingStoneAlbum>, Map<String,Integer>> function = (x) -> {
			Map<String,Integer> names = new HashMap<>();
			for (RollingStoneAlbum entry : x) {
				names.put(entry.getAlbum(), entry.getAlbum().length());
			}
			return names;
		};
		Map<String,Integer> names = function.apply(list);
		

}

I understand, it will take a bit of time to appreciate the power Function interface. Lets walk through the code, to understand it thoroughly

  1. A typical function interface will have following signature Function<T,R>, where T is input and R is the output. In our case, Function<List, Map<String,Integer>> indicates that we are designing a function which will take a List as input and would return a Map<String,Integer>.
  2. Inside the lambda expression, we have initiatlized the Map that we have to return.
  3. Further, we iterated the input type (the list of RollingStoneAlbums) and populated the Map with required data
  4. Finally we returned the map

Function has few variations. It is worth making a deepdive to underatand the variation of this interface.

Function interface has default compose method that allows to combine several functions into one and execute them sequentially. Also, the interface have andThen method which serves opposite purpose of compose function. However, these two utility methods are the secret sauce that forms a Higher Order Function.

What is a higher order function? How does it benefit me? How can I create one?

In simple term, a higher order function is one that is composed of more than one smaller individual functions being chained together. Let’s assume we have function a and b.

  • a.andThen(b) will create a higher order function, whereby a will execute first and then b will execute
  • a.compose(b) will create a higher order function, whereby b will execute first and then a will execute

This pipe - ing of functions demand that output of one function is consumable by the next function.

  • For a.andThen(b), higher order function, out put of a must be consumable by b

Let us assume that we want to create two functions, first one of which will extract name of the artist, given an album, find the number of letters in the name and store the outcome in a map of artist vs. number of letters in the name of the artist. Second function, will consume this map, and find the average number of letters in the name of an artist.

First function

Function<List<RollingStoneAlbum>, Map<String, Integer>> buildArtistsVsNameCollection = (List<RollingStoneAlbum> x) -> {
			Map<String, Integer> names = new HashMap<>();
			for (RollingStoneAlbum entry : x) {
				if (null == names.get(entry.getArtist())) {
					names.put(entry.getArtist(), entry.getArtist().length());
				}
			}
			return names;
		};

Second function

Function<Map<String, Integer>, Double> findAverageLength = (x) -> {
			double numArtists = x.keySet().size();
			double totalLength = 0;
			for (Entry<String,Integer> entry : x.entrySet()) {
				totalLength = totalLength + entry.getValue();
			}
			return totalLength/numArtists;
		};

Combining these together using compose()

Double averageLength = findAverageLength.compose(buildArtistsVsNameCollection).apply(list);

As you see, by joing the two functions we achive a goal which is bigger than individual capability of the functions.

What is Supplier interface? What is it’s purpose? How can I use it?

Supplier interface helps us write code that executes lazily. It’s get() method returns type T. Usually supplier interface wraps around a lambda expression, and executes it only when needed.

Apart from the functional interfaces above, there are unary and binary operators which serves the purpose of desining computation involving one and two inputs consecutively. In next part of this blog series we will explore the infamous stream abstraction and understand it in depth.

Bye for now.

Code: https://github.com/soumyakbhattacharyya/java8 feel free to check it out