Reading and writing bits - bit-io.plt
As mentioned in "Index Construction" compression is used to store the index in a space efficient manner. The number of bits used to encode integers vary according to their size. The standard file operations read and write bytes, so the need for a bit level i/o library arose.
A little shopping around led to Oleg Kiselyov's page on Binary I/O. He presents a very nice solution to the problem of reading bit streams. The function make-bit-reader turns a byte stream into a bit stream. The byte stream is represented as a thunk, that produces a new byte each time it invoked. The returned bit-reader is represented as a function of one argument, the number of bits to read from the stream. The bits read are returned as an unsigned integer.
Inspired by this approach I wrote a bit-writer in the same style. Given a byte-writer the function make-bit-writer returns two values: the first is a bit-writer, a function to two arguments the number of bits to write and the actual bits, the second argument is a bit-flusher. Since the bit-writer can't call the underlying byte-writer before a whole byte is received it was necessary to introduce the bit-flusher to flush any remaining bits at the end of a file. The original code made were optimized to handle common cases such as reading a single bit or a single byte fast. An effort was made to mimic these optimizations in the writer.
Since the first version of the indexer used normal file operations such as with-output-to-file, write-byte and friends it wasn't straightforward to change the code to use the stream approach. The solution was to introduce bit-ports, which resembles normal ports. In the following examples the numbers from 1 to 8 are written to a temporary file and then read back. Each number is written using a different number of bits.
The bit-io library is available through PLaneT, the documentation and source are available at the PLT Source Browser.
A little shopping around led to Oleg Kiselyov's page on Binary I/O. He presents a very nice solution to the problem of reading bit streams. The function make-bit-reader turns a byte stream into a bit stream. The byte stream is represented as a thunk, that produces a new byte each time it invoked. The returned bit-reader is represented as a function of one argument, the number of bits to read from the stream. The bits read are returned as an unsigned integer.
(define bit-reader (make-bit-reader (lambda () #b11000101)))
> (bit-reader 3)
6
> (bit-reader 4)
Inspired by this approach I wrote a bit-writer in the same style. Given a byte-writer the function make-bit-writer returns two values: the first is a bit-writer, a function to two arguments the number of bits to write and the actual bits, the second argument is a bit-flusher. Since the bit-writer can't call the underlying byte-writer before a whole byte is received it was necessary to introduce the bit-flusher to flush any remaining bits at the end of a file. The original code made were optimized to handle common cases such as reading a single bit or a single byte fast. An effort was made to mimic these optimizations in the writer.
Since the first version of the indexer used normal file operations such as with-output-to-file, write-byte and friends it wasn't straightforward to change the code to use the stream approach. The solution was to introduce bit-ports, which resembles normal ports. In the following examples the numbers from 1 to 8 are written to a temporary file and then read back. Each number is written using a different number of bits.
> (require (planet "bit-io.scm" ("soegaard" "bit-io.plt")))
> (with-output-to-bit-file "tmp"
(lambda ()
(do ([i 1 (+ i 1)]) [(= i 9) 'done]
(write-bits i i)))
'replace)
done
> (with-input-from-bit-file "tmp"
(lambda ()
(do ([i 1 (+ i 1)]
[ns '() (cons (read-bits i) ns)])
[(= i 9) ns])))
(8 7 6 5 4 3 2 1)
The bit-io library is available through PLaneT, the documentation and source are available at the PLT Source Browser.
Labels: search engine
0 Comments:
Post a Comment
<< Home