LZW Encoding
by Sanju[ Edit ] 2009-06-29 19:04:00
LZW Encoding
Let us take an example:
TOBEORNOTTOBEORTOBEORNOT#
The # is a marker used to show that the end of the message has been reached. Clearly, then, we have 27 symbols in our alphabet (the 26 capital letters A through Z, plus the # character).
A computer will render these as strings of bits; 5-bit strings are needed to give sufficient combinations to encompass the entire dictionary. As the dictionary grows, the strings will need to grow in length to accommodate the additional entries.
A 5-bit string gives 25 = 32 possible combinations of bits, and so when the 33rd dictionary word is created, the algorithm will have to start using 6-bit strings.
Since the all-zero string 00000 is used, and is labeled "0", the 33rd dictionary entry will be labeled 32.
# = 00000
A = 00001
B = 00010
C = 00011
.
.
.
Z = 11010
If we weren't using LZW, and just sent the message as it stands (25 symbols at 5 bits each), it would require 125 bits.We will be able to compare this figure to the LZW output later. Apply LZW:
Current Next Output Value Extended
Sequence Char (# of bits) Dictionary
NULL T
T O 20 = 10100 = 5 bits 27: TO <-- Don't forget, we originally had 27 symbols, so the next one is 28th.
O B 15 = 01111 = 5 bits 28: OB
B E 2 = 00010 = 5 bits 29: BE
E O 5 = 00101 = 5 bits 30: EO
O R 15 = 01111 = 5 bits 31: OR
R N 18 = 010010 = 6 bits 32: RN <-- start using 6-bit strings
N O 14 = 001110 = 6 bits 33: NO
O T 15 = 001111 = 6 bits 34: OT
T T 20 = 010100 = 6 bits 35: TT
TO B 28 = 011100 = 6 bits 36: TOB
BE O 30 = 011110 = 6 bits 37: BEO
OR T 32 = 100000 = 6 bits 38: ORT
TOB E 37 = 100101 = 6 bits 39: TOBE
EO R 31 = 011111 = 6 bits 40: EOR
RN O 33 = 100001 = 6 bits 41: RNO
OT # 35 = 100011 = 6 bits 42: OT#
# 0 = 000000 = 6 bits
Total Length = 5*5 + 12*6 = 97 bits.
In using LZW we have made a saving of 28 bits out of 125 -- we have reduced the message by almost 22%.