org.apache.commons.compress.compressors.bzip2
Class BZip2CompressorOutputStream

java.lang.Object
  extended by java.io.OutputStream
      extended by org.apache.commons.compress.compressors.CompressorOutputStream
          extended by org.apache.commons.compress.compressors.bzip2.BZip2CompressorOutputStream
All Implemented Interfaces:
java.io.Closeable, java.io.Flushable, BZip2Constants

public class BZip2CompressorOutputStream
extends CompressorOutputStream
implements BZip2Constants

An output stream that compresses into the BZip2 format into another stream.

The compression requires large amounts of memory. Thus you should call the close() method as soon as possible, to force BZip2CompressorOutputStream to release the allocated memory.

You can shrink the amount of allocated memory and maybe raise the compression speed by choosing a lower blocksize, which in turn may cause a lower compression ratio. You can avoid unnecessary memory allocation by avoiding using a blocksize which is bigger than the size of the input.

You can compute the memory usage for compressing by the following formula:

 <code>400k + (9 * blocksize)</code>.
 

To get the memory required for decompression by BZip2CompressorInputStream use

 <code>65k + (5 * blocksize)</code>.
 
Memory usage by blocksize
Blocksize Compression
memory usage
Decompression
memory usage
100k 1300k 565k
200k 2200k 1065k
300k 3100k 1565k
400k 4000k 2065k
500k 4900k 2565k
600k 5800k 3065k
700k 6700k 3565k
800k 7600k 4065k
900k 8500k 4565k

For decompression BZip2CompressorInputStream allocates less memory if the bzipped input is smaller than one block.

Instances of this class are not threadsafe.

TODO: Update to BZip2 1.0.1


Nested Class Summary
private static class BZip2CompressorOutputStream.Data
           
 
Field Summary
private  int allowableBlockSize
           
private  int blockCRC
           
private  boolean blockRandomised
           
private  int blockSize100k
          Always: in the range 0 ..
private  int bsBuff
           
private  int bsLive
           
private static int CLEARMASK
           
private  int combinedCRC
           
private  CRC crc
           
private  int currentChar
           
private  BZip2CompressorOutputStream.Data data
          All memory intensive stuff.
private static int DEPTH_THRESH
           
private  boolean firstAttempt
           
private static int GREATER_ICOST
           
private static int[] INCS
          Knuth's increments seem to work better than Incerpi-Sedgewick here.
private  int last
          Index of the last char in the block, so the block size == last + 1.
private static int LESSER_ICOST
           
static int MAX_BLOCKSIZE
          The maximum supported blocksize == 9.
static int MIN_BLOCKSIZE
          The minimum supported blocksize == 1.
private  int nInUse
           
private  int nMTF
           
private  int origPtr
          Index in fmap[] of original string after sorting.
private  java.io.OutputStream out
           
private static int QSORT_STACK_SIZE
           
private  int runLength
           
private static int SETMASK
           
private static int SMALL_THRESH
           
private static int WORK_FACTOR
           
private  int workDone
           
private  int workLimit
           
 
Fields inherited from interface org.apache.commons.compress.compressors.bzip2.BZip2Constants
BASEBLOCKSIZE, G_SIZE, MAX_ALPHA_SIZE, MAX_CODE_LEN, MAX_SELECTORS, N_GROUPS, N_ITERS, NUM_OVERSHOOT_BYTES, RUNA, RUNB
 
Constructor Summary
BZip2CompressorOutputStream(java.io.OutputStream out)
          Constructs a new CBZip2OutputStream with a blocksize of 900k.
BZip2CompressorOutputStream(java.io.OutputStream out, int blockSize)
          Constructs a new CBZip2OutputStream with specified blocksize.
 
Method Summary
private  void blockSort()
           
private  void bsFinishedWithStream()
           
private  void bsPutInt(int u)
           
private  void bsPutUByte(int c)
           
private  void bsW(int n, int v)
           
static int chooseBlockSize(long inputLength)
          Chooses a blocksize based on the given length of the data to compress.
 void close()
           
private  void endBlock()
           
private  void endCompression()
           
protected  void finalize()
          Overriden to close the stream.
 void finish()
           
 void flush()
           
private  void generateMTFValues()
           
 int getBlockSize()
          Returns the blocksize parameter specified at construction time.
private static void hbAssignCodes(int[] code, byte[] length, int minLen, int maxLen, int alphaSize)
           
private static void hbMakeCodeLengths(byte[] len, int[] freq, BZip2CompressorOutputStream.Data dat, int alphaSize, int maxLen)
           
private  void init()
          Writes magic bytes like BZ on the first position of the stream and bytes indiciating the file-format, which is huffmanised, followed by a digit indicating blockSize100k.
private  void initBlock()
           
private  void mainQSort3(BZip2CompressorOutputStream.Data dataShadow, int loSt, int hiSt, int dSt)
          Method "mainQSort3", file "blocksort.c", BZip2 1.0.2
private  boolean mainSimpleSort(BZip2CompressorOutputStream.Data dataShadow, int lo, int hi, int d)
          This is the most hammered method of this class.
private  void mainSort()
           
private static byte med3(byte a, byte b, byte c)
           
private  void moveToFrontCodeAndSend()
           
private  void randomiseBlock()
           
private  void sendMTFValues()
           
private  void sendMTFValues0(int nGroups, int alphaSize)
           
private  int sendMTFValues1(int nGroups, int alphaSize)
           
private  void sendMTFValues2(int nGroups, int nSelectors)
           
private  void sendMTFValues3(int nGroups, int alphaSize)
           
private  void sendMTFValues4()
           
private  void sendMTFValues5(int nGroups, int nSelectors)
           
private  void sendMTFValues6(int nGroups, int alphaSize)
           
private  void sendMTFValues7(int nSelectors)
           
private static void vswap(int[] fmap, int p1, int p2, int n)
           
 void write(byte[] buf, int offs, int len)
           
 void write(int b)
          
private  void write0(int b)
           
private  void writeRun()
           
 
Methods inherited from class java.io.OutputStream
write
 
Methods inherited from class java.lang.Object
clone, equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

MIN_BLOCKSIZE

public static final int MIN_BLOCKSIZE
The minimum supported blocksize == 1.

See Also:
Constant Field Values

MAX_BLOCKSIZE

public static final int MAX_BLOCKSIZE
The maximum supported blocksize == 9.

See Also:
Constant Field Values

SETMASK

private static final int SETMASK
See Also:
Constant Field Values

CLEARMASK

private static final int CLEARMASK
See Also:
Constant Field Values

GREATER_ICOST

private static final int GREATER_ICOST
See Also:
Constant Field Values

LESSER_ICOST

private static final int LESSER_ICOST
See Also:
Constant Field Values

SMALL_THRESH

private static final int SMALL_THRESH
See Also:
Constant Field Values

DEPTH_THRESH

private static final int DEPTH_THRESH
See Also:
Constant Field Values

WORK_FACTOR

private static final int WORK_FACTOR
See Also:
Constant Field Values

QSORT_STACK_SIZE

private static final int QSORT_STACK_SIZE
See Also:
Constant Field Values

INCS

private static final int[] INCS
Knuth's increments seem to work better than Incerpi-Sedgewick here. Possibly because the number of elems to sort is usually small, typically <= 20.


last

private int last
Index of the last char in the block, so the block size == last + 1.


origPtr

private int origPtr
Index in fmap[] of original string after sorting.


blockSize100k

private final int blockSize100k
Always: in the range 0 .. 9. The current block size is 100000 * this number.


blockRandomised

private boolean blockRandomised

bsBuff

private int bsBuff

bsLive

private int bsLive

crc

private final CRC crc

nInUse

private int nInUse

nMTF

private int nMTF

workDone

private int workDone

workLimit

private int workLimit

firstAttempt

private boolean firstAttempt

currentChar

private int currentChar

runLength

private int runLength

blockCRC

private int blockCRC

combinedCRC

private int combinedCRC

allowableBlockSize

private int allowableBlockSize

data

private BZip2CompressorOutputStream.Data data
All memory intensive stuff.


out

private java.io.OutputStream out
Constructor Detail

BZip2CompressorOutputStream

public BZip2CompressorOutputStream(java.io.OutputStream out)
                            throws java.io.IOException
Constructs a new CBZip2OutputStream with a blocksize of 900k.

Parameters:
out - the destination stream.
Throws:
java.io.IOException - if an I/O error occurs in the specified stream.
java.lang.NullPointerException - if out == null.

BZip2CompressorOutputStream

public BZip2CompressorOutputStream(java.io.OutputStream out,
                                   int blockSize)
                            throws java.io.IOException
Constructs a new CBZip2OutputStream with specified blocksize.

Parameters:
out - the destination stream.
blockSize - the blockSize as 100k units.
Throws:
java.io.IOException - if an I/O error occurs in the specified stream.
java.lang.IllegalArgumentException - if (blockSize < 1) || (blockSize > 9).
java.lang.NullPointerException - if out == null.
See Also:
MIN_BLOCKSIZE, MAX_BLOCKSIZE
Method Detail

hbMakeCodeLengths

private static void hbMakeCodeLengths(byte[] len,
                                      int[] freq,
                                      BZip2CompressorOutputStream.Data dat,
                                      int alphaSize,
                                      int maxLen)

chooseBlockSize

public static int chooseBlockSize(long inputLength)
Chooses a blocksize based on the given length of the data to compress.

Parameters:
inputLength - The length of the data which will be compressed by CBZip2OutputStream.
Returns:
The blocksize, between MIN_BLOCKSIZE and MAX_BLOCKSIZE both inclusive. For a negative inputLength this method returns MAX_BLOCKSIZE always.

write

public void write(int b)
           throws java.io.IOException

Specified by:
write in class java.io.OutputStream
Throws:
java.io.IOException

writeRun

private void writeRun()
               throws java.io.IOException
Throws:
java.io.IOException

finalize

protected void finalize()
                 throws java.lang.Throwable
Overriden to close the stream.

Overrides:
finalize in class java.lang.Object
Throws:
java.lang.Throwable

finish

public void finish()
            throws java.io.IOException
Throws:
java.io.IOException

close

public void close()
           throws java.io.IOException
Specified by:
close in interface java.io.Closeable
Overrides:
close in class java.io.OutputStream
Throws:
java.io.IOException

flush

public void flush()
           throws java.io.IOException
Specified by:
flush in interface java.io.Flushable
Overrides:
flush in class java.io.OutputStream
Throws:
java.io.IOException

init

private void init()
           throws java.io.IOException
Writes magic bytes like BZ on the first position of the stream and bytes indiciating the file-format, which is huffmanised, followed by a digit indicating blockSize100k.

Throws:
java.io.IOException - if the magic bytes could not been written

initBlock

private void initBlock()

endBlock

private void endBlock()
               throws java.io.IOException
Throws:
java.io.IOException

endCompression

private void endCompression()
                     throws java.io.IOException
Throws:
java.io.IOException

getBlockSize

public final int getBlockSize()
Returns the blocksize parameter specified at construction time.


write

public void write(byte[] buf,
                  int offs,
                  int len)
           throws java.io.IOException
Overrides:
write in class java.io.OutputStream
Throws:
java.io.IOException

write0

private void write0(int b)
             throws java.io.IOException
Throws:
java.io.IOException

hbAssignCodes

private static void hbAssignCodes(int[] code,
                                  byte[] length,
                                  int minLen,
                                  int maxLen,
                                  int alphaSize)

bsFinishedWithStream

private void bsFinishedWithStream()
                           throws java.io.IOException
Throws:
java.io.IOException

bsW

private void bsW(int n,
                 int v)
          throws java.io.IOException
Throws:
java.io.IOException

bsPutUByte

private void bsPutUByte(int c)
                 throws java.io.IOException
Throws:
java.io.IOException

bsPutInt

private void bsPutInt(int u)
               throws java.io.IOException
Throws:
java.io.IOException

sendMTFValues

private void sendMTFValues()
                    throws java.io.IOException
Throws:
java.io.IOException

sendMTFValues0

private void sendMTFValues0(int nGroups,
                            int alphaSize)

sendMTFValues1

private int sendMTFValues1(int nGroups,
                           int alphaSize)

sendMTFValues2

private void sendMTFValues2(int nGroups,
                            int nSelectors)

sendMTFValues3

private void sendMTFValues3(int nGroups,
                            int alphaSize)

sendMTFValues4

private void sendMTFValues4()
                     throws java.io.IOException
Throws:
java.io.IOException

sendMTFValues5

private void sendMTFValues5(int nGroups,
                            int nSelectors)
                     throws java.io.IOException
Throws:
java.io.IOException

sendMTFValues6

private void sendMTFValues6(int nGroups,
                            int alphaSize)
                     throws java.io.IOException
Throws:
java.io.IOException

sendMTFValues7

private void sendMTFValues7(int nSelectors)
                     throws java.io.IOException
Throws:
java.io.IOException

moveToFrontCodeAndSend

private void moveToFrontCodeAndSend()
                             throws java.io.IOException
Throws:
java.io.IOException

mainSimpleSort

private boolean mainSimpleSort(BZip2CompressorOutputStream.Data dataShadow,
                               int lo,
                               int hi,
                               int d)
This is the most hammered method of this class.

This is the version using unrolled loops. Normally I never use such ones in Java code. The unrolling has shown a noticable performance improvement on JRE 1.4.2 (Linux i586 / HotSpot Client). Of course it depends on the JIT compiler of the vm.


vswap

private static void vswap(int[] fmap,
                          int p1,
                          int p2,
                          int n)

med3

private static byte med3(byte a,
                         byte b,
                         byte c)

blockSort

private void blockSort()

mainQSort3

private void mainQSort3(BZip2CompressorOutputStream.Data dataShadow,
                        int loSt,
                        int hiSt,
                        int dSt)
Method "mainQSort3", file "blocksort.c", BZip2 1.0.2


mainSort

private void mainSort()

randomiseBlock

private void randomiseBlock()

generateMTFValues

private void generateMTFValues()