Some time ago I happened upon a class of algorithms called streaming algorithms. They piqued my curiosity greatly, and I implemented several, initially in C++. I have since started to implement some of them in other languages (currently only python and go). I decided that I'd next implement a bloom filter in python. A generic bloom filter is incredibly easy to write up, provided you have a few underlying libraries. It turns out that there is not a standard bit array library for python. I was kind of surprised, since python has a lot of built in libraries. I had to write one for C++ (ok, I didn't have to, but I'd rather write my own that force a dependency on something like Boost), and so to get bloom filter working, I'd need one for python as well. I spend about 10 minutes writing this up, so don't be surprised if there are some lingering issues. If you find one, I'd love to hear about it. I based this off of the one I wrote in C++, but this is much simpler. The C++ variant allows for each element to be more than 1 bit. You can have elements of size n-bits, which is nice for some of the other forms of bloom filters (like counting bloom filters). Without further ado, here is the code: code:

``````import array
from math import log

class BitArray(object):
def __init__(self, bits):
# L = 4 bytes minimum, but could be 8 bytes. No need to waste all that space
# so lets look up the size of L on this machine
self.elem_size = int(log(array.array('L').itemsize * 8, 2))
self.elem_mask =  int(pow(2, self.elem_size) - 1)
size = bits >> self.elem_size;
size += 1

self.bits = array.array('L', (0,) * size)

def set(self, bit):
# first look up the entry in the array
array_index = bit >> self.elem_size
# find position of bit in the individual element
# generate a mask to apply to the array element

def clear(self, bit):
# first look up the entry in the array
array_index = bit >> self.elem_size
# find position of bit in the individual element
# generate a mask to apply to the array element,
# but invert it since we are clearing

def bit(self, bit):
# first look up the entry in the array
array_index = bit >> self.elem_size
# find position of bit in the individual element