[][src]Struct succinct_rs::succinct_bit_vector::SuccinctBitVector

pub struct SuccinctBitVector { /* fields omitted */ }

Succinct bit vector.

This class can handle bit sequence of virtually arbitrary length.

In fact, N (bit vector's length) is designed to be limited to: N <= 2^64.
It should be enough for almost all usecases since a binary data of length of 2^64 consumes 2^21 = 2,097,152 TB (terabyte), which is hard to handle by state-of-the-art computer architecture.

Examples

extern crate succinct_rs;

use succinct_rs::{BitString, SuccinctBitVectorBuilder};

// Construction -------------------------
// `01001` built by `from_bit_string()`
let bv = SuccinctBitVectorBuilder::from_bit_string(BitString::new("0100_1")).build();  // Tips: BitString::new() ignores '_'.

// `01001` built by `from_length()` and `add_bit()`
let bv = SuccinctBitVectorBuilder::from_length(0)
    .add_bit(false)
    .add_bit(true)
    .add_bit(false)
    .add_bit(false)
    .add_bit(true)
    .build();

// Basic operations ---------------------
assert_eq!(bv.access(0), false);  // [0]1001; 0th bit is '0' (false)
assert_eq!(bv.access(1), true);   // 0[1]001; 1st bit is '1' (true)
assert_eq!(bv.access(4), true);   // 0100[1]; 4th bit is '1' (true)

assert_eq!(bv.rank(0), 0);  // [0]1001; Range [0, 0] has no '1'
assert_eq!(bv.rank(3), 1);  // [0100]1; Range [0, 3] has 1 '1'
assert_eq!(bv.rank(4), 2);  // [01001]; Range [0, 4] has 2 '1's

assert_eq!(bv.select(0), Some(0)); // []01001; Minimum i where range [0, i] has 0 '1's is i=0
assert_eq!(bv.select(1), Some(1)); // 0[1]001; Minimum i where range [0, i] has 1 '1's is i=1
assert_eq!(bv.select(2), Some(4)); // 0100[1]; Minimum i where range [0, i] has 2 '1's is i=4
assert_eq!(bv.select(3), None);    // There is no i where range [0, i] has 3 '1's

// rank0, select0 -----------------------
assert_eq!(bv.rank0(0), 1);  // [0]1001; Range [0, 0] has no '0'
assert_eq!(bv.rank0(3), 3);  // [0100]1; Range [0, 3] has 3 '0's
assert_eq!(bv.rank0(4), 3);  // [01001]; Range [0, 4] has 3 '0's

assert_eq!(bv.select0(0), Some(0)); // []01001; Minimum i where range [0, i] has 0 '0's is i=0
assert_eq!(bv.select0(1), Some(0)); // [0]1001; Minimum i where range [0, i] has 1 '0's is i=0
assert_eq!(bv.select0(2), Some(2)); // 01[0]01; Minimum i where range [0, i] has 2 '0's is i=2
assert_eq!(bv.select0(4), None);    // There is no i where range [0, i] has 4 '0's

Complexity

See README.

Implementation detail

access()'s implementation is trivial.

select() just uses binary search of rank() results.

rank()'s implementation is standard but non-trivial. So here explains implementation of rank().

rank()'s implementation

Say you have the following bit vector.

00001000 01000001 00000100 11000000 00100000 00000101 10100000 00010000 001 ; (N=67)

Answer rank(48) in O(1) time-complexity and o(N) space-complexity.

Naively, you can count the number of '1' from left to right. You will find rank(48) == 10 but it took O(N) time-complexity.

To reduce time-complexity to O(1), you can use memonization technique.
Of course, you can memonize results of rank(i) for every i ([0, N-1]).

Bit vector;   0  0  0  0  1  0  0  0  0  1  0  0  0  0  0  1  0  0  0  0  0  1  0  0  1  1  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  1  0  1  [1]  0  1  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  1 ; (N=67)
Memo rank(i); 0  0  0  0  1  1  1  1  1  2  2  2  2  2  2  3  3  3  3  3  3  4  4  4  5  6  6  6  6  6  6  6  6  6  7  7  7  7  7  7  7  7  7  7  7  8  8  9  10  10 11 11 11 11 11 11 11 11 11 12 12 12 12 12 12 12 13

From this memo, you can answer rank(48) == 10 in constant time, although space-complexity for this memo is O(N) > o(N).

To reduce space-complexity using memonization, we divide the bit vector into Chunk and Block.

Bit vector; 00001000 01000001 00000100 11000000 00100000 00000101 [1]0100000 00010000 001  ; (N=67)
Chunk;     |                  7                    |                12                  |  ; (size = (log N)^2 = 36)
Block;     |0 |1 |1  |2 |2 |3  |3 |4 |6  |6 |6  |7 |0 |0  |0 |2 |4    |4 |4  |5 |5 |5  |6| ; (size = (log N) / 2 = 3)

Now you want to answer rank(48). 48-th bit is in the 2nd chunk, and in the 5th block in the chunk.
So the rank(48) is at least:

7 (value of 1st chunk) + 2 (value of 4th block in the 2nd chunk)

Then, focus on 3 bits in 5th block in the 2nd chunk; [1]01.
As you can see, only 1 '1' is included up to 48-th bit (101 has 2 '1's but 2nd '1' is 50-th bit, irrelevant to rank(48)).

Therefore, the rank(48) is calculated as:

7 (value of 1st chunk) + 2 (value of 4th block in the 2nd chunk) + 1 ('1's in 5th block up to 48-th bit)

OK. That's all... Wait!
rank() must be in O(1) time-complexity.

Counting '1's in a block must also be O(1), while using o(N) space.
We use Table for this purpose.

Block contentNumber of '1's in block
0000
0011
0101
0112
1001
1012
1102
1113

This table is constructed in build(). So we can find the number of '1's in block in O(1) time.
Note that this table has O(log N) = o(N) length.

In summary:

rank() = (value of left chunk) + (value of left block) + (value of table keyed by inner block bits).

Methods

impl SuccinctBitVector[src]

pub fn access(&self, i: u64) -> bool[src]

Returns i-th element of the SuccinctBitVector.

Panics

When i >= length of the SuccinctBitVector.

pub fn rank(&self, i: u64) -> u64[src]

Returns the number of 1 in [0, i] elements of the SuccinctBitVector.

Panics

When i >= length of the SuccinctBitVector.

Implementation detail

 00001000 01000001 00000100 11000000 00100000 00000101 00100000 00010000 001  Raw data (N=67)
                                                          ^
                                                          i = 51
|                  7                    |                12                |  Chunk (size = (log N)^2 = 36)
                                        ^
               chunk_left            i_chunk = 1      chunk_right

|0 |1 |1  |2 |2 |3  |3 |4 |6  |6 |6  |7 |0 |0  |0 |2 |3 |3 |4  |4 |4 |5  |5|  Block (size = log N / 2 = 3)
                                                        ^
                                                     i_block = 17
                                             block_left | block_right
  1. Find i_chunk. i_chunk = i / chunk_size.
  2. Get chunk_left = Chunks[i_chunk - 1] only if i_chunk > 0.
  3. Get rank from chunk_left if chunk_left exists.
  4. Get chunk_right = Chunks[i_chunk].
  5. Find i_block. i_block = (i - i_chunk * chunk_size) / block size.
  6. Get block_left = chunk_right.blocks[ i_block - 1]_ only if _i_block` > 0.
  7. Get rank from block_left if block_left exists.
  8. Get inner-block data _block_bits. block_bits must be of block size length, fulfilled with 0 in right bits.
  9. Calculate rank of block_bits in O(1) using a table memonizing block size bit's popcount.

pub fn rank0(&self, i: u64) -> u64[src]

Returns the number of 0 in [0, i] elements of the SuccinctBitVector.

Panics

When i >= length of the SuccinctBitVector.

pub fn select(&self, num: u64) -> Option<u64>[src]

Returns the minimum position (0-origin) i where rank(i) == num of num-th 1 if exists. Else returns None.

Panics

When num > length of the SuccinctBitVector.

Implementation detail

Binary search using rank().

pub fn select0(&self, num: u64) -> Option<u64>[src]

Returns the minimum position (0-origin) i where rank(i) == num of num-th 0 if exists. Else returns None.

Panics

When num > length of the SuccinctBitVector.

Auto Trait Implementations

impl Send for SuccinctBitVector

impl Sync for SuccinctBitVector

Blanket Implementations

impl<T> From<T> for T[src]

impl<T, U> Into<U> for T where
    U: From<T>, 
[src]

impl<T, U> TryFrom<U> for T where
    U: Into<T>, 
[src]

type Error = Infallible

The type returned in the event of a conversion error.

impl<T, U> TryInto<U> for T where
    U: TryFrom<T>, 
[src]

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.

impl<T> BorrowMut<T> for T where
    T: ?Sized
[src]

impl<T> Borrow<T> for T where
    T: ?Sized
[src]

impl<T> Any for T where
    T: 'static + ?Sized
[src]