An Interesting Thing about Dot Products
The dot product is useful computationally when you’re doing things like multiplying weights by node values and then summing them, which we constantly do when building artificial neural networks. Using math libraries makes for efficient algorithms. But there’s more to it than this. It’s important to understand what the dot product means conceptually, particularly in the context of machine learning.
The dot product of two vectors:
is defined as:
It’s common in machine learning to represent vectors and column vectors. In this case you get essentially the same result as the vector dot product above by performing a matrix multiplication of one column vector and the transpose of the other. The only difference is that the matrix multiplication yields a single cell matrix rather than a scalar value. The coding difference is trivial, but the conceptual difference can be important. You might want to think differently, for example, when you’re contemplating node activations and weights, compared to when you’re contemplating input vector comparisons.
There are at least 5 different ways to think about all of this. Assuming that we’re talking about artificial neural networks, you could think in terms of the network with its nodes, weights, and inputs. You could also think about it from a software point of view in terms of arrays of numbers that you’re multiplying and adding. You could also think about it in terms of matrix multiplication and linear transformations. You could also think about it in terms of non-matrix vector dot products. And, finally, you could also think about the vectors geometrically. It’s all the same, but different ways of thinking (pentality?).
The dot product of two vectors can be thought of as the projection of vector a onto vector b, where the magnitude of b is multiplied by the magnitude of the projection. Expressing it in these terms:
Where θ is the angle between a and b
It is tempting, but wrong, to think that finding the dot product by projecting a onto b yields a different result than projecting b onto a. The result is the same (i.e. commutative). This can be intuited by thinking of a as having magnitude 1 and b as having magnitude 2. Either way you end up multiplying 1 by 2 by cos θ.
Another surprising thing is that the projection approach gives the same result as the original summing and multiplying method. One doesn’t seem to have much to do with the other. The duality exists because multiplying, say, a 1 x 2 matrix (i.e. a linear transformation) by a 2D vector is the same as turning the matrix on its side and taking the dot product.
Regardless of how you think about the geometry of the dot product:
- The result will be 0 if the vectors are at right angles to each other.
- The result will be positive if they are pointing in the same direction.
- The result will be negative if they are pointing in opposite directions.
If the vectors are of equal magnitude the dot product represents the angle between them (as a cosine). If they are pointing in the same direction the dot product represents the ratio of their magnitudes. If they are neither pointing in the same direction nor of the same magnitude (or even if they are), the dot product represents the overall sameness of the vectors.
More generally, the dot product divided by the product of the magnitudes represents the sameness of the two vectors. And this is sometimes an interesting way to think about it.