VC dimension is a fundamental concept in statistical learning theory. It measures the capacity or complexity of a hypothesis class. The VC dimension is defined as the maximum number of points that the hypothesis class can shatter, meaning it can correctly classify all possible labelings of those points.
Shattering means that for any possible way to label the points, there exists a hypothesis that can perfectly separate them. For two points, we need to show four different labelings can be achieved: both red, both blue, first red second blue, and first blue second red.
Let's look at concrete examples. Linear classifiers in 2D have a VC dimension of 3. They can shatter any set of 3 points, but cannot shatter 4 points arranged in a square pattern. In general, linear classifiers in d dimensions have VC dimension d plus 1.
VC theory provides theoretical guarantees about generalization. The bound shows that with high probability, the true error is bounded by the training error plus a complexity term that depends on the VC dimension and sample size. As we get more training data, the bound becomes tighter.
VC dimension has many practical applications in machine learning. It helps with model selection, complexity control, and understanding sample complexity. The key insight is that higher VC dimension means more model capacity, but also requires more training data to avoid overfitting. The goal is to find the right balance between model complexity and available data.