[−][src]Struct succinct_rs::succinct_bit_vector::SuccinctBitVector
Succinct bit vector.
This class can handle bit sequence of virtually arbitrary length.
In fact, N (bit vector's length) is designed to be limited to: N <= 2^64.
It should be enough for almost all usecases since a binary data of length of 2^64 consumes 2^21 = 2,097,152 TB (terabyte), which is hard to handle by state-of-the-art computer architecture.
Examples
extern crate succinct_rs; use succinct_rs::{BitString, SuccinctBitVectorBuilder}; // Construction ------------------------- // `01001` built by `from_bit_string()` let bv = SuccinctBitVectorBuilder::from_bit_string(BitString::new("0100_1")).build(); // Tips: BitString::new() ignores '_'. // `01001` built by `from_length()` and `add_bit()` let bv = SuccinctBitVectorBuilder::from_length(0) .add_bit(false) .add_bit(true) .add_bit(false) .add_bit(false) .add_bit(true) .build(); // Basic operations --------------------- assert_eq!(bv.access(0), false); // [0]1001; 0th bit is '0' (false) assert_eq!(bv.access(1), true); // 0[1]001; 1st bit is '1' (true) assert_eq!(bv.access(4), true); // 0100[1]; 4th bit is '1' (true) assert_eq!(bv.rank(0), 0); // [0]1001; Range [0, 0] has no '1' assert_eq!(bv.rank(3), 1); // [0100]1; Range [0, 3] has 1 '1' assert_eq!(bv.rank(4), 2); // [01001]; Range [0, 4] has 2 '1's assert_eq!(bv.select(0), Some(0)); // []01001; Minimum i where range [0, i] has 0 '1's is i=0 assert_eq!(bv.select(1), Some(1)); // 0[1]001; Minimum i where range [0, i] has 1 '1's is i=1 assert_eq!(bv.select(2), Some(4)); // 0100[1]; Minimum i where range [0, i] has 2 '1's is i=4 assert_eq!(bv.select(3), None); // There is no i where range [0, i] has 3 '1's // rank0, select0 ----------------------- assert_eq!(bv.rank0(0), 1); // [0]1001; Range [0, 0] has no '0' assert_eq!(bv.rank0(3), 3); // [0100]1; Range [0, 3] has 3 '0's assert_eq!(bv.rank0(4), 3); // [01001]; Range [0, 4] has 3 '0's assert_eq!(bv.select0(0), Some(0)); // []01001; Minimum i where range [0, i] has 0 '0's is i=0 assert_eq!(bv.select0(1), Some(0)); // [0]1001; Minimum i where range [0, i] has 1 '0's is i=0 assert_eq!(bv.select0(2), Some(2)); // 01[0]01; Minimum i where range [0, i] has 2 '0's is i=2 assert_eq!(bv.select0(4), None); // There is no i where range [0, i] has 4 '0's
Complexity
See README.
Implementation detail
access()'s implementation is trivial.
select() just uses binary search of rank()
results.
rank()'s implementation is standard but non-trivial. So here explains implementation of rank().
rank()'s implementation
Say you have the following bit vector.
00001000 01000001 00000100 11000000 00100000 00000101 10100000 00010000 001 ; (N=67)
Answer rank(48) in O(1) time-complexity and o(N) space-complexity.
Naively, you can count the number of '1' from left to right. You will find rank(48) == 10 but it took O(N) time-complexity.
To reduce time-complexity to O(1), you can use memonization technique.
Of course, you can memonize results of rank(i) for every i ([0, N-1]).
Bit vector; 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 1 [1] 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 ; (N=67)
Memo rank(i); 0 0 0 0 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 5 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 8 8 9 10 10 11 11 11 11 11 11 11 11 11 12 12 12 12 12 12 12 13
From this memo, you can answer rank(48) == 10 in constant time, although space-complexity for this memo is O(N) > o(N).
To reduce space-complexity using memonization, we divide the bit vector into Chunk and Block.
Bit vector; 00001000 01000001 00000100 11000000 00100000 00000101 [1]0100000 00010000 001 ; (N=67)
Chunk; | 7 | 12 | ; (size = (log N)^2 = 36)
Block; |0 |1 |1 |2 |2 |3 |3 |4 |6 |6 |6 |7 |0 |0 |0 |2 |4 |4 |4 |5 |5 |5 |6| ; (size = (log N) / 2 = 3)
- A Chunk has size of (log N)^2. Its value is rank(index of the last bit of the chunk).
- A Block has size of (log N) / 2. A chunk has many blocks. Block's value is the number of '1's in [index of the first bit of the chunk the block belongs to, index of the last bit of the block] (note that the value is reset to 0 at the first bit of a chunk).
Now you want to answer rank(48). 48-th bit is in the 2nd chunk, and in the 5th block in the chunk.
So the rank(48) is at least:
7 (value of 1st chunk) + 2 (value of 4th block in the 2nd chunk)
Then, focus on 3 bits in 5th block in the 2nd chunk; [1]01
.
As you can see, only 1 '1' is included up to 48-th bit (101
has 2 '1's but 2nd '1' is 50-th bit, irrelevant to rank(48)).
Therefore, the rank(48) is calculated as:
7 (value of 1st chunk) + 2 (value of 4th block in the 2nd chunk) + 1 ('1's in 5th block up to 48-th bit)
OK. That's all... Wait!
rank() must be in O(1) time-complexity.
- 7 (value of 1st chunk): O(1) if you store chunk value in array structure.
- 2 (value of 4th block in the 2nd chunk): Same as above.
- 1 ('1's in 5th block up to 48-th bit): O(length of block) = O(log N) !
Counting '1's in a block must also be O(1), while using o(N) space.
We use Table for this purpose.
Block content | Number of '1's in block |
---|---|
000 | 0 |
001 | 1 |
010 | 1 |
011 | 2 |
100 | 1 |
101 | 2 |
110 | 2 |
111 | 3 |
This table is constructed in build()
. So we can find the number of '1's in block in O(1) time.
Note that this table has O(log N) = o(N) length.
In summary:
rank() = (value of left chunk) + (value of left block) + (value of table keyed by inner block bits).
Methods
impl SuccinctBitVector
[src]
pub fn access(&self, i: u64) -> bool
[src]
pub fn rank(&self, i: u64) -> u64
[src]
Returns the number of 1 in [0, i
] elements of the SuccinctBitVector
.
Panics
When i
>= length of the SuccinctBitVector
.
Implementation detail
00001000 01000001 00000100 11000000 00100000 00000101 00100000 00010000 001 Raw data (N=67)
^
i = 51
| 7 | 12 | Chunk (size = (log N)^2 = 36)
^
chunk_left i_chunk = 1 chunk_right
|0 |1 |1 |2 |2 |3 |3 |4 |6 |6 |6 |7 |0 |0 |0 |2 |3 |3 |4 |4 |4 |5 |5| Block (size = log N / 2 = 3)
^
i_block = 17
block_left | block_right
- Find
i_chunk
.i_chunk
=i
/chunk_size
. - Get
chunk_left
= Chunks[i_chunk
- 1] only ifi_chunk
> 0. - Get rank from chunk_left if
chunk_left
exists. - Get
chunk_right
= Chunks[i_chunk
]. - Find
i_block
.i_block
= (i
-i_chunk
*chunk_size
) / block size. - Get
block_left
=chunk_right.blocks
[i_block
- 1]_ only if _
i_block` > 0. - Get rank from block_left if
block_left
exists. - Get inner-block data _
block_bits
.block_bits
must be of block size length, fulfilled with 0 in right bits. - Calculate rank of
block_bits
in O(1) using a table memonizing block size bit's popcount.
pub fn rank0(&self, i: u64) -> u64
[src]
Returns the number of 0 in [0, i
] elements of the SuccinctBitVector
.
Panics
When i
>= length of the SuccinctBitVector
.
pub fn select(&self, num: u64) -> Option<u64>
[src]
Returns the minimum position (0-origin) i
where rank(i)
== num of num
-th 1 if exists. Else returns None.
Panics
When num
> length of the SuccinctBitVector
.
Implementation detail
Binary search using rank()
.
pub fn select0(&self, num: u64) -> Option<u64>
[src]
Returns the minimum position (0-origin) i
where rank(i)
== num of num
-th 0 if exists. Else returns None.
Panics
When num
> length of the SuccinctBitVector
.
Auto Trait Implementations
impl Send for SuccinctBitVector
impl Sync for SuccinctBitVector
Blanket Implementations
impl<T> From<T> for T
[src]
impl<T, U> Into<U> for T where
U: From<T>,
[src]
U: From<T>,
impl<T, U> TryFrom<U> for T where
U: Into<T>,
[src]
U: Into<T>,
type Error = Infallible
The type returned in the event of a conversion error.
fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>
[src]
impl<T, U> TryInto<U> for T where
U: TryFrom<T>,
[src]
U: TryFrom<T>,
type Error = <U as TryFrom<T>>::Error
The type returned in the event of a conversion error.
fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>
[src]
impl<T> BorrowMut<T> for T where
T: ?Sized,
[src]
T: ?Sized,
fn borrow_mut(&mut self) -> &mut T
[src]
impl<T> Borrow<T> for T where
T: ?Sized,
[src]
T: ?Sized,
impl<T> Any for T where
T: 'static + ?Sized,
[src]
T: 'static + ?Sized,