×

VB.Net实现LZW压缩与解压的方法

Kalet Kalet 发表于2009-03-20 12:00:14 浏览160 评论0

抢沙发发表评论

Introduction


Why a VB.NET implementation of LZW compression?


I am working with a team of VB.NET developers on a large project, and it is best for us to have all of the components in native VB to simplify maintainence and upgrades. I also thought this might be a good introduction to LZW for people who don't know C.

VB.Net实现LZW压缩与解压的方法


Credit where Credit is Due


The source code is a nearly direct port of the LZW implementation by Mark Nelson on his web site on C. I even retained many of his comments. Be sure to look at his code here and view his C implementation here.


His original article was published in the October, 1989 issue of Dr. Dobb's Journal, which implies that Mark is very likely at least as ancient as I am.


Demo Code


Running the Code


Start a new Windows Forms project in VB.NET. Add clsLZW.vb to the project. Draw a button in the center of the main form, view the form's code, and paste the following code just above the form's End Class statement:


Collapse
Private Sub Button1_Click(ByVal sender As System.Object, _
ByVal e As System.EventArgs) _
Handles Button1.Click
' test the LZW program:
' Place a bunch of files in c:\testdir
' Create c:\testdir\lzw and c:\testdir\postlzw
' Run this program
' test it with a command file such as:
' c:
' cd \testdir
' for %%1 in (*.*) do fc /b %%1
' postlzw\%%1 >> results.txt

Button1.Enabled = False
Dim di As New IO.DirectoryInfo("c:\testdir")
For Each fi As IO.FileInfo In di.GetFiles
' print filename
Dim g As Graphics = Me.CreateGraphics
g.FillRectangle(New SolidBrush(Me.BackColor), _
0, 0, Me.Width, 100)
g.DrawString(fi.Name, Me.Font, Brushes.Black, 0, 0)
g.Dispose()

' compress...
Dim lzw1 As New clsLZW
lzw1.brInput = _
New IO.BinaryReader(IO.File.Open(fi.FullName, _
IO.FileMode.Open))
lzw1.bwOutput = _
New IO.BinaryWriter(IO.File.Open("c:\testdir\lzw\" & _
fi.Name & ".lzw14", _
IO.FileMode.OpenOrCreate, IO.FileAccess.Write))
lzw1.compress()
lzw1.brInput.Close()
lzw1.bwOutput.Close()

' decompress
Dim lzw2 As New clsLZW
lzw2.brInput = _
New IO.BinaryReader(IO.File.Open("c:\testdir\lzw\" _
& fi.Name & ".lzw14", IO.FileMode.Open))
lzw2.bwOutput = _
New IO.BinaryWriter(IO.File.Open("c:\testdir\postlzw\" _
& fi.Name, IO.FileMode.OpenOrCreate, _
IO.FileAccess.Write))
lzw2.expand()
lzw2.brInput.Close()
lzw2.bwOutput.Close()
Next
Button1.Enabled = True
End Sub

Testing the Results


To test, create the file structure specified by the comments in the above Sub. You should choose a variety of files.


The batch command 'For %%1...' uses the FileCompare utility (fc) to verify that the uncompressed files match the originals.


If you choose to run it outside of the batch or command file, and at a command prompt, change the three %% to % so it will work.

For %1 in (*.*) do fc /b %1 postlzw\%1 >> results.txt        

Usage


Refer to the sample code for usage. I have not tested multiple compressions per instantiation of the class, so it is best to create a new instance for each compression or decompression that you wish to complete.


I had to use this to build a multi-file archive, so the calling program maintains control of the streams. Multiple files can be written to the same stream without closing it, however, the input stream is exausted until the end of the file is reached. This can be changed with minor modifications to prevent the need to write temporary files.


Best wishes and good luck with your VB coding!


源代码如下:


Option Strict On
Option Explicit On


Public Class clsLZW
    ' see http://marknelson.us/1989/10/01/lzw-data-compression/
    ' this is a VB.NET conversion port of mark's C program.
    ' Please refer to that program prior to modifying this one.


    Private BITS As Integer = 14
    Private HASHING_SHIFT As Integer = 4
    Private MAX_VALUE As Integer = (1 << BITS) - 1
    Private MAX_CODE As Integer = MAX_VALUE - 1
    'Private Const TABLE_SIZE As Integer = 5021 ' 12 bits
    'Private Const TABLE_SIZE As Integer = 9029 ' 13 bits
    Private Const TABLE_SIZE As Integer = 18041 ' 14 bits
    Private EOF As Integer = -1


    Public brInput As IO.BinaryReader = Nothing
    Public bwOutput As IO.BinaryWriter = Nothing


    Private iaCode_Value(TABLE_SIZE) As Integer
    Private iaPrefix_Code(TABLE_SIZE) As Integer
    Private baAppend_Character(TABLE_SIZE) As Byte


    '** This is the compression routine.  The code should be a fairly close
    '** match to the algorithm accompanying the article.
    Public Sub compress()
        Dim iNextCode As Integer = 0
        Dim iCharacter As Integer = 0
        Dim iStringCode As Integer = 0
        Dim iIndex As Integer = 0


        iNextCode = 256                   ' Next code is the next available string code


        For i As Integer = 0 To TABLE_SIZE - 1 ' Clear out the string table before starting
            iaCode_Value(i) = -1
        Next i


        ' Get the first iCharacter. Assuming it to be 0 - 255
        ' Hence only valid for ASCII text files */
        iStringCode = ReadByte()


        '** This is the main loop where it all happens.  This loop runs util all of
        '** the Input has been exhausted.  Note that it stops adding codes to the
        '** table after all of the possible codes have been defined.
        iCharacter = ReadByte()
        While iCharacter <> -1
            iIndex = find_match(iStringCode, iCharacter)    ' See if the string is in */
            If (iaCode_Value(iIndex) <> -1) Then            ' the table.  If it is,   */
                iStringCode = iaCode_Value(iIndex)          ' get the code value.  If */
            Else                                            ' the string is not in the table, try to add it.   */
                If (iNextCode <= MAX_CODE) Then
                    iaCode_Value(iIndex) = iNextCode
                    iNextCode += 1
                    iaPrefix_Code(iIndex) = iStringCode
                    baAppend_Character(iIndex) = CByte(iCharacter)
                End If
                output_code(iStringCode)    ' When a string is found  */
                iStringCode = iCharacter    ' that is not in the table */
            End If                          ' after adding the new one */
            iCharacter = ReadByte()
        End While


        ' End of the main loop.


        output_code(iStringCode)   ' Output the last code              
        output_code(MAX_VALUE)     ' Output the end of buffer code      */
        output_code(0)             ' This code flushes the Output buffer*/
    End Sub


    ' This is the hashing routine.  It tries to find a match for the prefix+char
    ' string in the string table.  If it finds it, the iIndex is returned.  If
    ' the string is not found, the first available iIndex in the string table is
    ' returned instead.
    Private Function find_match(ByVal iHashPrefix As Integer, ByVal iHashCharacter As Integer) As Integer
        Dim iIndex As Integer = 0VB.Net实现LZW压缩与解压的方法
        Dim iOffset As Integer = 0


        iIndex = CInt((iHashCharacter << HASHING_SHIFT) Xor iHashPrefix)


        If (iIndex = 0) Then
            iOffset = 1
        Else
            iOffset = TABLE_SIZE - iIndex
        End If


        While (True)
            If iaCode_Value(iIndex) = -1 Then
                Return iIndex
            End If
            If (iaPrefix_Code(iIndex) = iHashPrefix) And (baAppend_Character(iIndex) = iHashCharacter) Then
                Return iIndex
            End If
            iIndex -= iOffset
            If (iIndex < 0) Then
                iIndex += TABLE_SIZE
            End If
        End While
    End Function


    ' The following routine is used to output variable length
    ' codes.  It is written strictly for clarity, and is not
    ' particularly efficient.


    Private Sub output_code(ByVal code As Integer)
        Static output_bit_count As Integer = 0
        Static output_bit_buffer As Long = 0


        output_bit_buffer = output_bit_buffer Or (code << (32 - BITS - output_bit_count))
        output_bit_count += BITS


        While output_bit_count >= 8
            WriteByte(CByte((output_bit_buffer >> 24) And 255))
            output_bit_buffer <<= 8
            output_bit_count -= 8
        End While
    End Sub


 


    ' This is the expansion routine.  It takes an LZW format file, and expands
    ' it to an bwOutput file.  The code here should be a fairly close match to
    ' the algorithm in the accompanying article.


    Public Sub expand()
        Dim baDecode_Stack(TABLE_SIZE) As Byte
        Dim iNextCode As Integer
        Dim iNewCode As Integer
        Dim iOldCode As Integer
        Dim bCharacter As Byte
        Dim iCurrCode As Integer
        Dim i As Integer


        'This is the next available code to define.
        iNextCode = 256


        ' Read in the first code, initialize the
        ' character variable, and send the first
        ' code to the output file.
        iOldCode = input_code()
        bCharacter = CType(iOldCode, Byte)
        WriteByte(CByte(iOldCode))


        ' This is the main expansion loop.  It reads in characters from the LZW file
        ' until it sees the special code used to inidicate the end of the data.
        iNewCode = input_code()
        While (iNewCode <> MAX_VALUE)
            If iNewCode >= iNextCode Then
                ' This code checks for the special STRING+CHARACTER+STRING+CHARACTER+STRING
                ' case which generates an undefined code.  It handles it by decoding
                ' the last code, and adding a single character to the end of the decode string.           
                baDecode_Stack(0) = bCharacter
                i = 1
                iCurrCode = iOldCode
            Else
                ' Otherwise we do a straight decode of the new code.
                i = 0
                iCurrCode = iNewCode
            End If
            While iCurrCode > 255
                ' This routine simply decodes a string from the string table, storing
                ' it in a buffer.  The buffer can then be output in reverse order by
                ' the expansion program.
                baDecode_Stack(i) = baAppend_Character(iCurrCode)
                i = i + 1
                If i >= MAX_CODE Then
                    Throw New ApplicationException("Fatal error during iCurrCode expansion.")
                End If
                iCurrCode = iaPrefix_Code(iCurrCode)
            End While
            baDecode_Stack(i) = CType(iCurrCode, Byte)
            bCharacter = baDecode_Stack(i)


            'Now we output the decoded string in reverse order.
            While i >= 0
                WriteByte(baDecode_Stack(i))
                i = i - 1
            End While


            ' Finally, if possible, add a new code to the string table.
            If (iNextCode <= MAX_CODE) Then
                iaPrefix_Code(iNextCode) = iOldCode
                baAppend_Character(iNextCode) = bCharacter
                iNextCode += 1
            End If
            iOldCode = iNewCode
            iNewCode = input_code()
        End While
    End Sub


    ' The following routine is used to input variable length
    ' codes.  It is written strictly for clarity, and is not
    ' particularly efficient.
    Private Function input_code() As Integer
        Dim return_value As Long
        Static input_bit_count As Integer = 0
        Static input_bit_buffer As Long = 0
        Static Mask32 As Long = CLng(2 ^ 32) - 1


        While input_bit_count <= 24
            input_bit_buffer = (input_bit_buffer Or _
                ReadByte() << (24 - input_bit_count)) And Mask32
            input_bit_count += 8
        End While
        return_value = (input_bit_buffer >> 32 - BITS) And Mask32
        input_bit_buffer = (input_bit_buffer << BITS) And Mask32
        input_bit_count -= BITS
        Return CInt(return_value)
    End Function


    Private Sub WriteByte(ByVal b As Byte)
        bwOutput.Write(b)
    End Sub


    Private Function ReadByte() As Integer
        Dim ba(1) As Byte
        Dim iResult As Integer
        iResult = brInput.Read(ba, 0, 1)
        If iResult = 0 Then
            Return -1
        End If
        Return ba(0)
    End Function


End Class

VB.Net实现LZW压缩与解压的方法


群贤毕至

访客