Baseline algorithms

Baseline algorithms help to define the limits of reasonable utility.

The following two baseline algorithms are included in all experimental results and compared explicitly with state-of-the-art algorithms.

Identity

The Identity algorithm uses the Laplace Mechanism to estimate each count in the 1- or 2-dimensional domain, using the noisy answers as an estimate of the true data. All workload queries are then derived from this estimate. Identity is an obvious application of the Laplace Mechanism and is therefore one of the simplest methods for answering workloads of counting queries.

An algorithm with error rates worse than Identity is performing poorly.

Uniform

The Uniform algorithm is granted knowledge of the true scale of the data and estimates the unknown dataset by assuming uniformity across the domain. All workload queries are then derived from this estimate. The error of Uniform does not depend on epsilon, but does depend on properties of the input data. Uniform will perform well on datasets that are close to uniform. It's clear Uniform is learning little of value about most datasets (since its estimate is based only on the total number of records in the dataset).

If an algorithm has error comparable to Uniform on a given dataset, either (1) the dataset is not very interesting, or (2) the algorithm is performing poorly.