Wednesday, 27 June 2012

Data Structure and Algorithms

Array

Why not use Arrays for Everything?
1. In unordered array the insertion is fast Q(1), but search takes slow O(N) time.
2. In ordered array, you can search quickly, in O(logN) time, but insertion takes O(N) time.
For both above kinds, the deletion takes O(N) time

Another problem with arrays is that their size is fixed when they are first created with new.

Simple Sorts

Bubble sort
Bubble sort makes about Square(N)/2 comparisons. Both swaps and comparisons are proportional to  N2. Bubble sort runs in  O(N2)  time.

Selection Sort
The Selection sort improves on bubble sort by reducing the number of swaps necessary from  O(N2)  to O(N). Unfortunately, the number of comparisons remains  O(N2) 

Insertion Sort
Insertion sort executes twice as fast as bubble sort and somewhat faster that selection sort in normal situation. Insertion Sort runs  in O(N2) for random data. For data already sorted (O(N)) or almost sorted, the insertion sort does much better. However, for data, that is reverse sorted, this runs no faster that bubble sort.


Comparisons of simple sorts

There’s probably no point in using the bubble sort, unless you don’t have your algorithm book handy. The bubble sort is so simple that you can write it from memory. Even so, it’s practical only if the amount of data is small.


The selection sort minimizes the number of swaps, but the number of comparisons is still high. This sort might be useful when the amount of data is small and swapping data items is very time-consuming compared with comparing them.


The insertion sort is the most versatile of the three and is the best bet in most situations, assuming the amount of data is small or the data is almost sorted. For larger amounts of data, quicksort is generally considered the fastest approach;


We’ve compared the sorting algorithms in terms of speed. Another consideration for any algorithm is how much memory space it needs. All three of the algorithms in this chapter carry out their sort in place, meaning that, besides the initial array, very little extra memory is required. All the sorts require an extra variable to store an item temporarily while it’s being swapped.


Stacks and Queues

There are basically three operations that can be performed on stacks . They are 1) inserting an item into a stack (push). 2) deleting an item from the stack (pop). 3) displaying the contents of the stack(pip).

Below are some of operations a stack data type normally supports:
Push, pop, top, isempty, isfull, get-size,
All operations except get-size() can be performed in O(1) time. get-size() runs in at worst O(N).

Queue, Circular queue.
As with stacks, items can be inserted and removed from a queue in O(1) time. 

Deques
A deque is a double-ended queue. You can insert items at either end and delete them from either end. The methods might be called insertLeft() and insertRight(), and removeLeft() and removeRight().


Priority Queue
A de

Linked list
double ended list

Shell sort run in about O(N*(logN)2) The Shellsort is good for medium-sized arrays, perhaps up to a few thousand items, depending on the particular implementation. 
- What is Gap sequence
Quick sort in about O(N*logN) time

Red-Black Rules
1. Every node is either red or black.
2. The root is always black.

3. If a node is red, its children must be black (although the converse isn’t necessarily true).
4. Every path from the root to a leaf, or to a null child, must contain the same number of black nodes.

The number of black nodes on a path from root to leaf is called the black height.

Fixing Rule Violations

  • You can change the colors of nodes.
  • You can perform rotations.

Color Flips on the Way Down

Here’s the rule: Every time the insertion routine encounters a black node that has two red children, it must
change the children to black and the parent to red (unless the parent is the root, which always remains black).

Like ordinary binary search trees, a red-black tree allows for searching, insertion, and deletion in O(log2N) time.



Hash Tables

A hash table is a data structure that offers very fast insertion and searching. Hash tables are significantly faster than trees, which, as we learned in the preceding chapters, operate in relatively fast O(logN) time. Not only are they fast, hash tables are relatively easy to program.


Hash tables do have several disadvantages. They’re based on arrays, and arrays are difficult to expand after they’ve been created. For some kinds of hash tables, performance may degrade catastrophically when a table becomes too full, so the programmer needs to have a fairly accurate idea of how many data items will need to be stored (or be prepared to periodically transfer data to a larger hash table, a time-consuming process). Also, there’s no convenient way to visit the items in a hash table in any kind of order
However, if you don’t need to visit items in order, and you can predict in advance the size of your database, hash tables are unparalleled in speed and convenience.

Open Addressing
  • Linear Probing
  • Quadratic Probing
  • Double Hashing
The ratio of the number of items in a table to the table’s size is called the load factor.

Double Hashing
Experience has shown that this secondary hash function must have certain characteristics:
  • It must not be the same as the primary hash function.
  • It must never output a 0 (otherwise, there would be no step; every probe would land on the same cell, and the algorithm would go into an endless loop).
Experts have discovered that functions of the following form work well:
stepSize = constant - (key % constant); where constant is prime and smaller than the array size.

Double hashing requires that the size of the hash table is a prime number.

Separate Chaining

In open addressing, performance degrades badly as the load factor increases above one-half or two-thirds. In separate chaining the load factor can rise above 1 without hurting performance very much. This makes separate chaining a more robust mechanism, especially when it’s hard to predict in advance how much data will be placed in the hash table.


Another approach similar to separate chaining is to use an array at each location in the hash table, instead of a linked list. Such arrays are sometimes called buckets.

Hash Functions
Quick Computation
Random Keys
Use a Prime Number for the Modulo Base










Miscellaneous 
1. Insertion sort is preferable to quicksort for small files and for almost-sorted files.
2. Some sorting algorithm that retains secondary order is said to be stable

No comments:

Post a Comment