Member-only story
Why is the second Principal Component orthogonal (perpendicular) to the first one?

Because the second Principal Component should capture the highest variance from what is left after the first Principal Component explains the data as much as it can. (The first principal component captures the most data variability.)
But why does the orthogonal direction capture the most variation?
If two directions are not orthogonal, they are linearly dependent on each other, which means that one direction can be expressed as a linear combination of the other direction. If two directions are orthogonal (linearly independent), they do not capture any unique variance in the data beyond what the first direction has already caught.
The direction that captures the highest variation in the data is the first principal component (PC1). When finding the second principal component (PC2), the algorithm looks for the direction that captures the most variance but is orthogonal (perpendicular) to PC1.
This is because the goal of PCA is to capture as much variation as possible with as few principal components as possible. By looking for orthogonal directions, each subsequent PC captures additional unique variation in the data.
Additionally, orthogonality between the principal components makes it easier to understand the results because it means that each principal component captures a distinct and non-overlapping source of variation in the data. Because the principal components are orthogonal to each other, we can better understand how the variables are related to each other and find the underlying patterns in the data.
Here is the Python code that makes the above visualization.
# Before running PCA, it is important to first normalize X
X_norm, mu, sigma = featureNormalize(X)
# Run PCA
U, S, V = pca(X_norm)
plt.figure(figsize=(7,5))
plot = plt.scatter(X[:,0], X[:,1], s=30, facecolors='none', edgecolors='b')
plt.title("PCA - Eigenvectors Shown",fontsize=20)
plt.xlabel('x1',fontsize=16)
plt.ylabel('x2',fontsize=16)
plt.grid(True)
plt.plot([mu[0], mu[0] + 1.5*S[0]*U[0,0]],
[mu[1], mu[1] + 1.5*S[0]*U[0,1]],
color='red',linewidth=3,
label='First Principal Component')
plt.plot([mu[0], mu[0] + 1.5*S[1]*U[1,0]],
[mu[1], mu[1] + 1.5*S[1]*U[1,1]],
color='green',linewidth=3,
label='Second Principal Component')
leg = plt.legend(loc=4)
plt.show(block=False)