Understanding Bloom Filter

Understanding Bloom Filter

It is a probabilistic space-efficient data structure with O(1) insertion and retrieval. It is used to find if a key exists (with false positives) or not (no false negatives). It can be used as a data layer to check if a key exists before querying the actual database (expensive read vs cheap read).

Pros-

  • Uses constant space (Space efficient).

  • O(1) insertion and retrieval

Cons-

  • False positives. It can not accurately tell if a key exists for sure in the bloom filter. But it can tell if a key does not exist in there.

How it works?

It requires some hash functions and a binary hash table.

First, we set all the bits in hash table to 0

Let’s say we have two hash functions H1 and H2

Insertion:-

→ Insert “Hello”. H1(“Hello”) = 0, H2(“Hello”) = 3

→ Since bits at 0 and 3 positions are set to 0, we toggle it to 1.

→ Now the hash table looks like this

→ Insert “World”. H1(“World”) = 5, H2(“World”) = 3

→ Toggle bits. Now the hash table looks like this

Searching:-

→ Search “World”. H1(“World”) = 5, H2(“World”) = 3.

→ Check bits at 3rd and 5th index. Both are set to 1. So it would return 1. Maybe exists case

→ Search “Foo”. H1(“Foo”) = 0, H2(“Foo”) = 5

→ Check bits at 0th and 5th index. Though it will return 1. But “Foo” does not exist. False positive case.

Applications:-

It is mostly used in conjugation with primary DB to check if an element is present or not and then search the primary DB. It can be used at places where we are okay with false positives and a quick lookup can save the expensive DB read.

  • Used by Medium to show if an article is read by the user.

  • Malicious sites blocker by Google.

  • Password and username duplicate check.

  • Used to show target ads.

Default support with databases