A performance test: a message ordering algorithm for distributed systems

Introduction

In a distributed computing system, there can be multiple processes run on nodes which don’t share memory and a common clock. The ordering of events when the processes communicating with each other by asynchronous message passing is critical issue in distributed computing domain. The problem of ordering of events is presented in Lamport’s paper [Lamport 78].

Singhal-Kshemkalyani (SK) algorithm

The details of this optimization algorithm is presented in [Chandra 04]

Project Description

In this project, I test the performance of the Singhal-Kshemkalyani (SK) optimization algorithm.

  • The star topology overlay captures client-server interaction, where the central node in the star is the server. Clients communicate only with the server. For the SK optimization, plot the percentage overhead of vector clocks as a function of n, where n − 1 is the number of clients in the star configuration.
  • Examine the savings of the SK optimizaiton under statistical behavior. Simulate the message passing among multiple sites, along the line of the study in [Chandra 04]. I perform the following experiments.1. Vary n, keeping MIMT, MTT and M/T fixed2. Vary MIMT, keeping n, MTT and M/T fixed3. Vary M/T, keeping n, MIMT and MTT fixedFor each of the experiments, for the SK optimization, plot the percentage overhead of vector clocks. Analyze the results in each experiment.

Approach

Unlike the round-robin method used in [Chandra 04], I used pre-linearization to simulate concurrency. The pre-linearization simulates the full concurrency by generating all the events randomly before executing the events. For each process, the program generates all multicast events. And for each multicast event, the program randomly chooses MIMT, receivers and corresponding MTTs. These pre-generated events are stored in a list. An important step followed by the event generation is sorting. The events in the list must be sorted based on its global timestamp.

Each process has an incoming queue in which received messages are stored. A process must handle received messages whose global timestamp is less than the multicast event that the process is ready to execute to ensure global ordering and correct vector clock values. The logical simulation steps are shown below.

Results

The simulation has conducted with various parameters. These parameters are explained in detail in [Chandra 04]. All the simulations were done with the number of system-wide message set to 300,000. Thus, the number of messages that each process sends differ. This is determined by simple calculation (number of messages at each process = total number of message / N). This parameter also affects the number of multicast event. The number of multicast event is determined by the number of messages at each process divided by (N * M/T).

Figure 3 shows the percentage overhead with various number of processes (N). In Figure 3 (a), each curve represent the overhead with different MTTs when the MIMT and M/T are 30 msec and 0.5 respectively. In Figure 3 (b), the graph is similar to the left except MTT was fixed at 600 msec, and MIMT varies. As the number of process increases, the effect of the SK optimization reducing message size become more noticeable. It is because not all entries in the vector clock is changed after successive message sends.

Figure 4 depicts impact on the message size with various M/T values when N is increasing as in Figure 3. The SK optimization takes effect greatly as the number of processes and the number of multicast members increase. However, this is limited to the case where MIMT is less than MTT. When the MIMT is greater than MTT, there is no statistically significant optimization.

Figure 5 and 6 depicts the percentage overhead while varying MTT value. Figure 5 (a) shows that with the small number of processes, the effect is marginal. And according to the graph, when MTT is high, optimization algorithm take effect. The number of processes set to 100 in Figure 5 (b). With the large number of processes, The graphs shows large improvement with small MIMT value compared to MTT. But it is not sure that whether N or MIMT are important factor in the optimization with SK algorithm. Figure 6 depicts the results with various MTT values with 3 different M/T ratio when N is set to 100. It is clearer in Figure 6 (a), and (b) that the algorithm is more sensitive to MIMT. This also can be found in Figure 3 (b) and Figure 4 (b).

The effect of various MIMT values are shown in Figure 7 and 8. The same conclusion can be drawn. MIMT needs to be very small compared to MTT to see the improvement. Once the condition has met (MIMT << MTT) then increasing N and M/T shows the improvement. Figure 9 and 10 shows more in detail about the impact of MIMT and MTT relationship in SK optimization algorithm.

SK_algo_test_report

Discussion

The simulation results show that the MIMT dominates all other factors. If MIMT is not sufficiently smaller than MTT, then other factor does not seem to improve the result. Figure 3 (a) red curve, 3(b) blue curve, 4(b), 5(a), 6(b), and 8(b) support this conclusion. To see roughly 60~80% overhead improvement, MIMT should be roughly 10 times smaller than MTT.

The small MIMT means events are frequently generated in a distributed system. And high MTT implies nodes consisting distributed system are separated in a way that network latency is high. In this case, the possibility that nodes will have recent value of vector clock is high due to frequent vector clock update. And high MTT can cause the vector clock piggybacked is out-dated. In other words, a node will send and receive message frequently due to small MIMT and piggybacked vector clock is more likely out-dated because of the high latency. Note that a node has to piggyback vector clock element which satisfies LSi[j] < LUi[x] for all x. LS is frequently updated but LU. This causes greatly reduced message size with SK optimization. Thus, the ratio of MIMT and MTT is important. Also with high M/T ratio, more element of LS array is updated. This further filters out vector clock item that a node has to send to others.