Category: C Programming

Splay trees and its applications

It is also a BST and self balancing tree like AVL and red-black trees but is used
in real-world applications where out of 100, 80 times the data which is queried
is 20% data in the database.

This tree implemenation not just balances the tree, to get insertion/deletion/search to complete
in O(log n) time, but also move recently accessed the data to the root. This way
frequent access to the same data will take O(1) time in next access.

Having said that, still the algorithm will result in O(log n) amortized time complexity.

Comparison between AVL, Red-Black and Splay trees

AVL and red-black trees only balances tree when tree is changed(insertion or deletion)
On contrary, Splay trees balances tree in read-only operation (like search) also.Splay tree does not have to keep extra book-keeping information in the data structure like AVL trees (height) or red-black trees (color)

Red-black trees are also balanced but less balanced than AVL trees.
Red-black trees are used in applications where insertion and deletion are frequent operations.Where as AVL trees are used for quick searching as searching takes O(Log n) because tree is balanced.

Due to insertion and deletion, tree need rebalancing. Hence if application involves
more insertion and deletion operations then AVL trees is not a best choice.
Use red-black trees instead for quick insertion and deletion.

Splay trees are also less balanced than AVl trees. Sometimes splay trees can even become less balanced than red-black trees. Why ? Because due to splaying operations they may result becoming skew trees.
Splay trees are generally used for fast searching (like AVL trees). In both AVL and splay trees, performance remains O(log n) only. But the basic difference is that AVL trees take O(log n) for every search operation whereas but splay trees can take O(1) time for frequently searched items.

Applications of Splay trees

  • Due to their reference of locality property, they can be used in implementing logic behind caches.
  • They can come handy in joining two trees – merging a tree between 2 elements of another tree (O(log M + N))
  • Also they can be used in bifurcating the data into 2 distinguish data sets – cutting a tree
  • Can be used in garbage collectors
  • Querying faster than O(log(N)) for highly biased query distributions
  • Used in network router (aprt from other data structures like Hash tables, trie, etc)

Advantages

  • Amortized O(log n) performance
  • Requires no extra book-keeping info
  • Can handle duplicate data (unlike other trees like AVL, red-black)

Disadvantages

  • Cannot be used in multithreading applications as even read-only(search) operations also alter the tree structure. Allowing multiple threads to search keys may corrupt the trees.

Implementation

Find my code here.

Reference

Advertisements

AVL trees and its applications

Introduction

AVL trees are self balancing binary search trees (BST).
To determine whether the tree is balanced, the height of left and right subtree is checked for each node in the tree.
The BST is considered to be balanced if |H(L) – H(R)| <= 1, where H(L) and H(R) are height of left and right subtree of a node respectively.
In other words, left and right subtrees need to be AVLs, for tree to be AVL.

Height of node in AVL = |H(L) – H(R)| <= 1

Need of AVL tree

Binary search trees have a cool property of segregating and organizing data, so that decisions to be taken at every node (while traverse) are binary. Due to this property, the search time in BSTs is
O(log n).
But BST cannot control the order in which data comes for insertion. Due to this, sometimes BST becomes skew.
Ans as we know that search in BST highly depends on height of the tree. when height is log n, the search will complete in log n time.
Having said that,  AVL trees also cannot control the order in which data comes for insertion, but they can re-arrange data in tree, so the searching times can reduce to O(log n) again. By rearrangement, AVL trees reduce the height of tree to log n

Red-Black trees vs AVL trees

Though this post focus on AVL trees, it is worth mentioning some of differences between AVL trees and Red-Black trees.
AVL trees are used frequently for quick searching as searching takes O(Log n) because tree is balanced.
Where as insertion and deletions are comparatively more tedious and slower as at every insertion and deletion, it requires re-balancing.
Hence, AVL trees are preferred for application, which are search intensive.
For application which involves more insert and delete operations, AVL trees are not a best choice.
For frequent insertion and deletion, red-black trees are used.
Red-black trees  are also balanced trees but comparatively less balanced than AVL trees.
In subsequent posts, I will try to explicate more on red-black trees.

Implementation

In AVL trees, at every decision we query height of the node. Hence AVL data structure also stores height of the node.

Ways the tree Rotates to re-balance

LL and RR rotation

 

1

LL means when new node inserted in left of left subtree – rotate right
RR means when new node inserted in right of right subtree – rotate left

LR and RL rotation

2

 

 LR means when new node inserted in right of left subtree – rotate left and then rotate right
 RL means when new node inserted in left of right subtree – rotate right and then rotate left
Tip: confused about LR and RL, read these in reverse to know the right case
but rotate the order LR is written i..e
For LR – R of L and rotate L then R
Check out my AVL tree C implementation here

Applications

As mentioned above, AVL trees are used for frequent insertion. One of the examples I know of is it is used in Memory management subsystem of linux kernel to search memory regions of processes during preemption.

Reference

Local labels in C Programming

Introduction

Everybody who has programmed in C must be knowing about “goto” and “labels“.
In GCC, there is something called local labels which restrict the scope of label and do cool stuff.
Linux kernel internal APIs extensively use local labels.

Comparison between Labels and Local Labels

Conventional labels in gcc have function scope. Where as local label can be scoped to a nested block.
Conventional Labels cannot be declared more than once in a function and that is where
local labels are used.

In complex function implementations, program often want to jump to an instruction and handle each jump
differently for different cases. In such cases local labels could be a good choice.

Example Code

Macro with local label
Macro without local label declared

Now if u call “MACRO_WITH_LOCAL_LABEL()” from any function it won’t give any compilation errors

where as calling “MACRO_WITHOUT_LOCAL_LABEL()” would give following error


[root@blr-1st-1-dhcp446 3. gcc_local_labels]# gcc -c try.c
try.c: In function ‘main’:
try.c:63: error: duplicate label ‘not_empty’
try.c:53: note: previous definition of ‘not_empty’ was here
try.c:63: error: duplicate label ‘empty’
try.c:53: note: previous definition of ‘empty’ was here
try.c:63: error: duplicate label ‘exit’
try.c:53: note: previous definition of ‘exit’ was here

Pre-processor stage successfully completes but compilation stage gives error.

Existing example

As mentioned earlier in the post, Linux code uses it these techniques in debug codes to keep track of instruction pointers etc.
For instance check usage of following macros in the code :

#define _THIS_IP_  ({ __label__ __here; __here: (unsigned long)&&__here; })

Many internal APIs of linux kernel like kmalloc() etc. call this macro to track caller function address.
Now what is this “&&__here” ? Just in brief, this is address of label, which you can store in pointers. COOL ! Isn’t ?