E
- the type of elements that this filter accepts@Beta public final class CuckooFilter<E> extends Object implements ProbabilisticFilter<E>, Serializable
E
that implements the ProbabilisticFilter
interface.
"Cuckoo filters can replace Bloom filters for approximate set membership tests. Cuckoo filters support adding and removing items dynamically while achieving even higher performance than Bloom filters. For applications that store many items and target moderately low false positive rates, cuckoo filters have lower space overhead than space-optimized Bloom filters. Cuckoo filters outperform previous data structures that extend Bloom filters to support deletions substantially in both time and space." - Fan, et. al.
Cuckoo filters offer constant time performance for the basic operations add(Object)
,
remove(Object)
, contains(Object)
and sizeLong()
.
This class does not permit null
elements.
Cuckoo filters implement the Serializable
interface. They also support a more compact
serial representation via the writeTo(OutputStream)
and readFrom(InputStream,
Funnel)
methods. Both serialized forms will continue to be supported by future versions of this
library. However, serial forms generated by newer versions of the code may not be readable by
older versions of the code (e.g., a serialized cuckoo filter generated today may not be
readable by a binary that was compiled 6 months ago).
ref: Cuckoo Filter: Practically Better Than Bloom Bin Fan, David G. Andersen, Michael Kaminsky†, Michael D. Mitzenmacher‡ Carnegie Mellon University, †Intel Labs, ‡Harvard University
ProbabilisticFilter
,
Serialized FormModifier and Type | Method and Description |
---|---|
boolean |
add(E e)
Adds the specified element to this filter.
|
boolean |
addAll(Collection<? extends E> c)
Adds all of the elements in the specified collection to this filter.
|
boolean |
addAll(ProbabilisticFilter<E> f)
Combines
this filter with another compatible filter. |
long |
capacity()
Returns the number of elements this filter can represent at its requested
FPP . |
void |
clear()
Removes all of the elements from this filter.
|
boolean |
contains(E e)
Returns
true if this filter might contain the specified element, false
if this is definitely not the case. |
boolean |
containsAll(Collection<? extends E> c)
Returns
true if this filter might contain all of the elements of the specified
collection. |
boolean |
containsAll(ProbabilisticFilter<E> f)
Returns
true if this filter might contain all elements contained in the
specified filter. |
CuckooFilter<E> |
copy()
Returns a new
CuckooFilter that's a copy of this instance. |
static <T> CuckooFilter<T> |
create(com.google.common.hash.Funnel<? super T> funnel,
long capacity)
Creates a filter with the expected number of insertions and a default expected false positive
probability of 3.2%.
|
static <T> CuckooFilter<T> |
create(com.google.common.hash.Funnel<? super T> funnel,
long capacity,
double fpp)
Creates a filter with the expected number of insertions and expected false positive
probability.
|
double |
currentFpp()
Returns the current false positive probability (
FPP ) of this filter. |
boolean |
equals(Object object) |
double |
fpp()
Returns the approximate
FPP limit of this filter. |
int |
hashCode() |
boolean |
isCompatible(ProbabilisticFilter<E> f)
Returns
true if f is compatible with this filter. |
boolean |
isEmpty()
Returns
true if this filter contains no elements. |
static <T> CuckooFilter<T> |
readFrom(InputStream in,
com.google.common.hash.Funnel<T> funnel)
Reads a byte stream, which was written by
writeTo(OutputStream) , into a CuckooFilter . |
boolean |
remove(E e)
Removes the specified element from this filter.
|
boolean |
removeAll(Collection<? extends E> c)
Removes from this filter all of its elements that are contained in the specified collection.
|
boolean |
removeAll(ProbabilisticFilter<E> f)
Subtracts the specified filter from
this filter. |
long |
size()
Returns the number of elements contained in this filter (its cardinality).
|
long |
sizeLong()
Returns the number of elements contained in this filter (its cardinality).
|
String |
toString() |
void |
writeTo(OutputStream out)
Writes this cuckoo filter to an output stream, with a custom format (not Java serialization).
|
@CheckReturnValue public CuckooFilter<E> copy()
CuckooFilter
that's a copy of this instance. The new instance is equal to
this instance but shares no mutable state.@CheckReturnValue public boolean contains(E e)
true
if this filter might contain the specified element, false
if this is definitely not the case.contains
in interface ProbabilisticFilter<E>
e
- element whose containment in this filter is to be testedtrue
if this filter might contain the specified element, false
if this is definitely not the case.NullPointerException
- if the specified element is null
and this filter does not
permit null
elementscontainsAll(Collection)
,
containsAll(ProbabilisticFilter)
,
add(Object)
,
remove(Object)
public boolean containsAll(Collection<? extends E> c)
true
if this filter might contain all of the elements of the specified
collection. More formally, returns true
if contains(Object)
== true
for all of the elements of the specified collection.containsAll
in interface ProbabilisticFilter<E>
c
- collection containing elements to be checked for containment in this filtertrue
if this filter might contain all elements of the specified
collectionNullPointerException
- if the specified collection contains one or more null
elements, or if the specified collection is null
contains(Object)
,
containsAll(ProbabilisticFilter)
public boolean containsAll(ProbabilisticFilter<E> f)
true
if this filter might contain all elements contained in the
specified filter. isCompatible(ProbabilisticFilter)
must return true
for the
given filter.containsAll
in interface ProbabilisticFilter<E>
f
- cuckoo filter containing elements to be checked for probable containment in this
filtertrue
if this filter might contain all elements contained in the
specified filter, false
if this is definitely not the case.NullPointerException
- if the specified filter is null
IllegalArgumentException
- if isCompatible(ProbabilisticFilter)
== false
given f
contains(Object)
,
containsAll(Collection)
@CheckReturnValue public boolean add(E e)
true
if e
was successfully
added to the filter, false
if this is definitely not the case, as would be the
case when the filter becomes saturated. Saturation may occur even if sizeLong()
<
capacity()
, e.g. if e
has already been added 2*b
times to the
cuckoo filter, it will have saturated the number of entries per bucket (b
) allocated
within the filter and a subsequent invocation will return false
. A return value of
true
ensures that contains(Object)
given e
will also return true
.add
in interface ProbabilisticFilter<E>
e
- element to be added to this filtertrue
if e
was successfully added to the filter, false
if this
is definitely not the caseNullPointerException
- if the specified element is null
contains(Object)
,
addAll(Collection)
,
addAll(ProbabilisticFilter)
@CheckReturnValue public boolean addAll(ProbabilisticFilter<E> f)
this
filter with another compatible filter. The mutations happen to this
instance. Callers must ensure this
filter is appropriately sized to avoid
saturating it or running out of space.addAll
in interface ProbabilisticFilter<E>
f
- filter to be combined into this
filter - f
is not mutatedtrue
if the operation was successful, false
otherwiseNullPointerException
- if the specified filter is null
IllegalArgumentException
- if isCompatible(ProbabilisticFilter)
==
false
add(Object)
,
addAll(Collection)
,
contains(Object)
public boolean addAll(Collection<? extends E> c)
c
may have been added to the filter even when false
is returned. In this case, the caller may remove(Object)
the additions by comparing
the filter sizeLong()
before and after the invocation, knowing that additions from
c
occurred in c
's iteration order.addAll
in interface ProbabilisticFilter<E>
c
- collection containing elements to be added to this filtertrue
if all elements of the collection were successfully added, false
otherwiseNullPointerException
- if the specified collection contains a null
element, or if
the specified collection is null
add(Object)
,
addAll(ProbabilisticFilter)
,
contains(Object)
public void clear()
clear
in interface ProbabilisticFilter<E>
sizeLong()
,
isEmpty()
@CheckReturnValue public boolean remove(E e)
false
is returned, this is definitely an indication that
the specified element wasn't contained in the filter prior to invocation. This condition is an
error, and this filter can no longer be relied upon to return correct false
responses
from contains(Object)
, unless isEmpty()
is also true
.remove
in interface ProbabilisticFilter<E>
e
- element to be removed from this filtertrue
if this filter probably contained the specified element, false
otherwiseNullPointerException
- if the specified element is null
and this filter does not
permit null
elementscontains(Object)
,
removeAll(Collection)
,
removeAll(ProbabilisticFilter)
@CheckReturnValue public boolean removeAll(Collection<? extends E> c)
false
is returned, this is definitely an indication that the specified
collection contained elements that were not contained in this filter prior to invocation, and
this filter can no longer be relied upon to return correct false
responses from contains(Object)
, unless isEmpty()
is also true
.
Some elements of c
may have been removed from the filter even when false
is
returned. In this case, the caller may add(Object)
the additions by comparing the
filter sizeLong()
before and after the invocation, knowing that removals from c
occurred in c
's iteration order.removeAll
in interface ProbabilisticFilter<E>
c
- collection containing elements to be removed from this filtertrue
if all of the elements of the specified collection were successfully
removed from the filter, false
if any of the elements was not successfully removedNullPointerException
- if the specified collection contains one or more null
elements, or if the specified collection is null
contains(Object)
,
remove(Object)
,
removeAll(ProbabilisticFilter)
@CheckReturnValue public boolean removeAll(ProbabilisticFilter<E> f)
this
filter. The mutations happen to this
instance. Callers must ensure that the specified filter represents elements that are currently
contained in this
filter.
If false
is returned, this is definitely an indication that the specified filter
contained elements that were not contained in this filter prior to invocation and this filter
can no longer be relied upon to return correct false
responses from contains(Object)
, unless isEmpty()
is also true
.removeAll
in interface ProbabilisticFilter<E>
f
- filter containing elements to remove from this
filter - f
is not
mutatedtrue
if the operation was successful, false
otherwiseNullPointerException
- if the specified filter is nullIllegalArgumentException
- if isCompatible(ProbabilisticFilter)
== false
given f
contains(Object)
,
remove(Object)
,
removeAll(Collection)
public long sizeLong()
Long.MAX_VALUE
elements, returns Long.MAX_VALUE
.sizeLong
in interface ProbabilisticFilter<E>
capacity()
,
isEmpty()
,
size()
public long size()
Integer.MAX_VALUE
elements, returns Integer.MAX_VALUE
.size
in interface ProbabilisticFilter<E>
capacity()
,
isEmpty()
,
sizeLong()
public long capacity()
FPP
. It's
sometimes possible to add more elements to a cuckoo filter than its capacity since the load
factor used to calculate its optimal storage size is less than 100%.capacity
in interface ProbabilisticFilter<E>
FPP
.fpp()
,
currentFpp()
,
sizeLong()
,
optimalLoadFactor(int)
public double fpp()
FPP
limit of this filter. This is not a hard limit, however a
cuckoo filter will not exceed its FPP
by a significant amount as the filter becomes
saturated.fpp
in interface ProbabilisticFilter<E>
FPP
limit of this filter.currentFpp()
public double currentFpp()
FPP
) of this filter.currentFpp
in interface ProbabilisticFilter<E>
contains(Object)
will erroneously return true
given an element that has not actually been added to the filter. Unlike a bloom filter, a
cuckoo filter cannot become saturated to the point of significantly degrading its FPP
.fpp()
public boolean isEmpty()
true
if this filter contains no elements.isEmpty
in interface ProbabilisticFilter<E>
true
if this filter contains no elementssizeLong()
public boolean isCompatible(ProbabilisticFilter<E> f)
true
if f
is compatible with this
filter. f
is
considered compatible if this
filter can use it in combinatoric operations (e.g. addAll(ProbabilisticFilter)
).isCompatible
in interface ProbabilisticFilter<E>
f
- The filter to check for compatibility.true
if f
is compatible with this
filter.ProbabilisticFilter.addAll(ProbabilisticFilter)
,
ProbabilisticFilter.containsAll(ProbabilisticFilter)
,
ProbabilisticFilter.removeAll(ProbabilisticFilter)
@CheckReturnValue public static <T> CuckooFilter<T> create(com.google.common.hash.Funnel<? super T> funnel, long capacity, double fpp)
Note that overflowing a CuckooFilter
with significantly more
objects than specified, will result in its saturation causing add(Object)
to reject
new additions.
The constructed CuckooFilter
will be serializable if the
provided Funnel<T>
is.
It is recommended that the funnel be implemented as a
Java enum. This has the benefit of ensuring proper serialization and deserialization, which is
important since equals(java.lang.Object)
also relies on object identity of funnels.
funnel
- the funnel of T's that the constructed CuckooFilter
will usecapacity
- the number of expected insertions to the constructed CuckooFilter
; must
be positivefpp
- the desired false positive probability (must be positive and less than 1.0).CuckooFilter
@CheckReturnValue public static <T> CuckooFilter<T> create(com.google.common.hash.Funnel<? super T> funnel, long capacity)
Note that overflowing a CuckooFilter
with significantly
more objects than specified, will result in its saturation causing add(Object)
to
reject new additions.
The constructed CuckooFilter
will be serializable if the
provided Funnel<T>
is.
It is recommended that the funnel be implemented as a
Java enum. This has the benefit of ensuring proper serialization and deserialization, which is
important since equals(java.lang.Object)
also relies on object identity of funnels.
funnel
- the funnel of T's that the constructed CuckooFilter
will usecapacity
- the number of expected insertions to the constructed CuckooFilter
; must
be positiveCuckooFilter
public void writeTo(OutputStream out) throws IOException
readFrom(InputStream, Funnel)
to reconstruct the written CuckooFilter.IOException
@CheckReturnValue public static <T> CuckooFilter<T> readFrom(InputStream in, com.google.common.hash.Funnel<T> funnel) throws IOException
writeTo(OutputStream)
, into a CuckooFilter
. The Funnel
to be used is not encoded in the stream, so it must be
provided here. Warning: the funnel provided must behave identically to the one
used to populate the original Cuckoo filter!IOException
- if the InputStream throws an IOException
, or if its data does not
appear to be a CuckooFilter serialized using the writeTo(OutputStream)
method.Copyright © 2016. All rights reserved.