RAM DB

Software design of a RAM database


Purpose

This document will describe the design of subsequent software modules in very detail. Additionally, the descriptions will be linked to related source file paragraphs. Whenever more inside is needed how a program works in detail, the related program source should be studied carfully in addition to the concepts and basic ideas provided in the current software design document.

The design of the RamDB software was inspired by Rupert Sheldrake's theory of Morphic Fields that has been described in several books, e.g. "The Presence of the Past - Morphic Resonance & the Habits of Nature"; Rupert Sheldrake, 1988, 1995, ISBN 0-89281-537-X. In honor of the morphic field theory all classes in the RamDB software have been started with the capital letter 'M' in their names. Additionally, the resulting network of program modules in the memory of a computer will be referred to as Morphic Memory sometimes.


MNode

The most elementary element in the RamDB software is called a node and the related class is called MNode. Furthermore, the MNode class is the base class for all other classes.

The MNode class consists of 6 pointers and an unsigned integer data field. Including also the intrinsic class pointer, memory is needed for a total of 8 data fields. In case of an 32-bit compiler this results in a memory usage of 32 bytes.

In order to accomplish envisioned features and performance figures, a basic rule has been established. That rule prescribes that all derived classes must not add member variables to the existing 7 members which are already defined in the base class. Besides others this allows to calculate the total memory usage easily by multiplying the number of nodes with the number of bytes needed for 1 node. Actually, counters have been implemented for the total number of nodes that allow to do some nodes bookkeeping.

It has to be admitted that the memory usage will be slightly higher than calculated dependent on the overhead of memory allocation functions on a given platform. In order to keep this effect small, memory for generating new nodes will be requested in larger blocks for a number of nodes defined at compile time.

Member's names have been chosen based on the most frequent use of the particular variables. Since derived classes may use the member variables for totally different things there are often found methods that cast those member variables to their specific usage. However, the general usage is as follows:


Backup and Restore

Persistence of all relevant application data is one of the basic features that every application must implement.

The RamDB software has a general concept implemented where all nodes will be written to disk on termination of an application. This backup procedure can run also on demand when the application is still active what results in a snapshot of the overall program state. On start of a program all nodes will first be restored in memory before the application resumes work.

The backup procedure is straight forward. All nodes starting from the first one and then following the Chain pointer will be written subsequently to disk until the last node with the Chain pointer set to NULL is reached. The backup speed is usually determined by the lineare write performance of the hard disk. Therefore the RamDB backup is one of the fastest full-backup procedures.

In case of running the backup function during the normal work of an application the structure of the node network must not change. Otherwise the backup can not be restored properly.

Restoring a previous backup also involves an address translation. All pointers will be bound to the new positions of the nodes they are pointing to. Temporarily, an address translation table will be generated in memory. The additionally needed amount of memory is determined by the range of addresses to be translated. Given optimal conditions without any gaps in the memory allocated for nodes this will take about 12,5% of the backup file size.


Binary Search Tree

A binary search tree is known for it's ability to provide for fast access to attached nodes of a certain weight. This is accomplished by maintaining two pointers, a Less pointer that points to a node with a lower weight and a More pointer respectively pointing to a node with a higher weight.

In the RamDB binary tree implementation a third pointer Back is used. This pointer allows to travel backwards when writing and reading the binary tree. In particular, the Back pointer allows to build binary trees of limited depth. This is extremely important to avoid unnecessary long node threads in case of an unfortunate writing sequence.

Actually, the nodes in a RamDB binary tree are rearranged whenever a new node is added and the optimal place is already occupied by another node. Therefore the depth of a binary tree on a 32-bit platform will never exceed 32 levels, regardless in which sequence the node weights are written. Indeed, this is an overhead when writing a binary tree, however, it provides to optimal access times on reading what is considered worth the overhead on writing.


Ternary Search Tree

A ternary search tree is capable of providing fast access to an arbitrary number of strings. Usually it is used to generate an index related to existing collections of strings. In case of the RamDB software each node of the ternary search tree is additionally equipped with a Prev pointer that allows to use the data structure for both, storing and indexing strings.

For example, the above picture shows how the word "SUN" would be represented in a ternary search tree. Each of the subsequent characters is stored in a separate binary tree. The Next pointer puts the string forth to a next binary tree. Therefore the first part of a string consisting of one or multiple characters can be used multiple in order to form other strings by expanding the first part. The word "SUN" could be expanded for example to "SUNDAY". At the same time the two first characters "SU" could be the beginning of the word "SURE".

The effective end of a string is indicated by putting a MItem node into the Chain after the string's last character. Normally a new character node is generated in front of the character that was inserted just before. Thus the successor of each character node regarding the Chain pointer will normally be another character node except when a MItem node was generated and attached to a character node's Chain pointer.

Inserting new items is straight forward. Each subsequent character will be entered into a binary tree which will be created in case it does not exist yet. The Next pointer is connected to a next binary tree and the Prev pointer is pointing back to the previous character.

Retrieving strings is performed from tail to head starting at a MItem node and following the Prev pointer until a MRoot node is reached. Prior to collecting the string's characters in backward order the string length is determined and a character field of appropriate size is allocated.


Created: 2002-01-30 by Eckhard Kantz