Orthogonality

Orthogonality

1.1. The Scalar Product in $\mathbb{R}^n$

1.1.1. Scalar Product

  • Definition: For vectors x and y in $\mathbb{R}^n$, the 1×1 matrix $x^T y$ (also treated as a scalar) is called the scalar product of x and y, and is given by \(x^T y= x_1 y_1+ x_2 y_2+…+ x_n y_n\)
  • Other names: inner product ⟨x,y⟩, dot product x⋅y

1.1.2. Length

  • Definition: For a vector x in $\mathbb{R}^n$, the ==Euclidean length== (norm, magnitude) ‖x‖ of x is defined to be \(‖x‖=\sqrt{x^T x}=\sqrt{ x_1^2+ x_2^2+…+ x_n^2}\)
  • A vector u in \mathbb{R}^n is a unit vector if and only if ‖u‖=1.
  • If x is a nonzero vector, then $x/‖x‖$ is a unit vector in the direction of x.
  • The distance between two vectors x and y is \(‖x−y‖=\sqrt{( x_1−y_1 )^𝟐+( x_2−y_2 )^2+…+( x_n−y_n )^2}\)

1.1.3. Angle

  • Definition: Let the angle between the two vectors be 𝜃 (0≤𝜃≤𝜋), then $x^T y$=‖x‖‖y‖ cos⁡𝜃, i.e. if u=x/‖x‖ ,𝐯=y/‖y‖
  • \[cos⁡\theta=(x^T y)/‖x‖‖y‖ =u^T 𝐯\]

Orthogonal

  • Definition: If $x^T y$=0, then we say that x and y are orthogonal, and denote x{\bot}y.

1.1.4. Projection

Projecting vector onto a line

  • scalar projection: $\alpha=(x^T y)/$ ‖y‖.
  • vector projection: p = \alpha \frac{y}{‖y‖} =$((x^T y)/(y^T y)) y$. If u is a unit vector, then
  • The scalar projection of x onto u is \(\alpha=x^T u.\)
  • The vector projection of x onto u is \(p=\alpha u=(x^T u) u.\)

Projection Matrix: \(P=A(A^TA)^-A^T\)

1.2. Orthogonal Subspaces

1.2.1. Orthogonal Subspaces

  • Definition: Two subspaces 𝑋 and y of $\mathbb{R}^n$ are said to be orthogonal if $x^T y=0$ for every x∈𝑋 and every y∈y. If 𝑋 and y are orthogonal, we write 𝑋{\bot}y. Note: 𝑋 and y are subspaces of the same vector space $\mathbb{R}^n$.
  • In $\mathbb{R}^2, S𝑝an(𝒆_1 ){\bot}S𝑝an(𝒆_2 )$.
  • In $\mathbb{R}^3$, \(Span(𝒆_1 ){\bot}S𝑝an(𝒆_2 )\) \(S𝑝an(𝒆_1,𝒆_2 ){\bot}S𝑝an(𝒆_3 )\)

1.2.2. Orthogonal Complement

  • Definition: For a subspace y of \mathbb{R}^n, the set of vectors in $\mathbb{R}^n$ that are orthogonal to every vector in y will be denoted $y^{\bot}$. Thus, $y^{\bot}$={$x∈\mathbb{R}^n |x^T y=0$ for every $ y∈y$}
  • The set $y^{\bot}$ is called the orthogonal complement of y
  • $y^{\bot}$ is the “biggest” subspace orthogonal to y.

1.2.3. Theorem 3.2.1

If 𝑋 and y are orthogonal subspaces of $\mathbb{R}^n$, then 𝑋∩y={𝟎}

1.2.4. Theorem 3.2.2

If y is a subspace of $\mathbb{R}^n$, then $y^T$ is also a subspace of $\mathbb{R}^n$.

1.2.5. Fundamental subspaces

  • Let A be an m×n matrix.
    • As a linear transformation, $A: \mathbb{R}^n\rightarrow \mathbb{R}^m$. ==(m-n)== N(A) is a subspace of $\mathbb{R}^n$ (where $Ax=𝟎$). The range of A = the column space of A (subspace of $\mathbb{R}^m$): R(A)={$b∈\mathbb{R}^m$ |b = Ax for some $x∈\mathbb{R}^n$ } \(R(A)\in R^m\) —
    • As a linear transformation, $A^T:\mathbb{R}^m\rightarrow \mathbb{R}^n$. ==(n-m)== N($A^T$) is a subspace of $\mathbb{R}^m$ (where $A^T x=𝟎$). The column space of $A^T$, $R(A^T)$ is a subspace of ==$\mathbb{R}^n$== $R(A^T)$={$y∈\mathbb{R}^n $|$y =A^T x$ for some $x∈\mathbb{R}^m$ } $R(A^T)\in R^n$ —

The column space of $A^T$ is basically the row space of A, except row vectors “stands up” and become column vectors in $\mathbb{R}^n$. Thus, $y∈R(A^T )$ if and only if $y^T$ is in the row space of A.

  • $\mathbb{R}^m: N(A^{\bot}), R(A)$
  • $\mathbb{R}^n: N(A), R(A^T)$

1.2.6. Theorem 3.2.3

If A is an m×n matrix, then

  • $N(A)=R(A^T)^{\bot}$
  • $N(A^T)=R(A)^{\bot}$ $a_1 x_1+a_2 x_2+…=0⇔ \left[ \begin{matrix} a_1
    a_2
    \vdots \end{matrix} \right] {\bot} \left[ \begin{matrix} x_1
    x_2
    \vdots \end{matrix} \right]$ ^[$A^T{\bot}X^T$]

1.2.7. Theorem 3.2.4

If S is a subspace of $\mathbb{R}^n$, then dim $S$+ dim $S^{\bot}$=n. Furthermore, if {$x_1,x_2,…,x_𝑟$} is a basis for S and {$x_{𝑟+1}, …,x_n$} is a basis for $S^{\bot}$, then {$x_1,x_2,…,x_𝑟,x_{𝑟+1}, …,x_n$} is a basis for $\mathbb{R}^n$.

1.2.8. Direct Sum

  • Definition: If U and 𝑉 are subspaces of a vector space 𝑊 and each 𝐰∈𝑊 can be written uniquely as a sum u + 𝐯, where u∈U and 𝐯∈𝑉, then we say that 𝑊 is a direct sum of U and 𝑉, and we write \(𝑊=U{\oplus} 𝑉\)

1.2.9. Direct Sum Theorem 3.2.5

If S is a subspace of $\mathbb{R}^n$, then $\mathbb{R}^n=S{\oplus} S^{\bot}$.

1.2.10. Theorem 3.2.6

If S is a subspace of $\mathbb{R}^n$, then $(S^{\bot})^{\bot}=S$


For a subspace S in $\mathbb{R}^n$:

  • \[S∩S^{\bot}={𝟎}\]
  • \[dim(⁡S)+dim⁡(S^{\bot})=n\]
  • \[\mathbb{R}^n=S{\oplus} S^{\bot}\]
  • \((S^{\bot})^{\bot}=S\) For a m×n matrix A:
  • Subspaces of $\mathbb{R}^n: N(A), R(A^T)$. ^[ \(A^T\rightarrow R(A^T)\)\(X^T\rightarrow N(A)\) ]
  • \(N(A)=R(A^T)^{\bot}\)\(\mathbb{R}^n=N(A){\oplus} R(A^T)\)
  • Subspaces of $\mathbb{R}^m: R(A), N(A^T)$. ^[ \(A\rightarrow R(A)\)\(X\rightarrow N(A^T)\) ]
  • \(N(A^T )=R(A)^{\bot}\)\(\mathbb{R}^m=R(A) N(A^T)\)

1.3. Least Squares Problems

1.3.1. Line of best fit

  • $y=𝑐_0+𝑐_1 x$
\[\left[ \begin{matrix} 1 & x \\ 1 & y \\ 1 & z \end{matrix}\right] \left[\begin{matrix} c_0 \\ c_1 \\ \end{matrix}\right] =\left[\begin{matrix} b_0 \\ b_1 \\ b_2 \end{matrix}\right]\]

1.3.2. Least Square Solutions

  • For an m×n matrix A, if the linear system Ax=b does not have a solution $x∈\mathbb{R}^n$, we still want a vector $\hat{x} \in \mathbb{R}^n $  such that $A\hatx$ is closest to b.
  • If 𝑟(x)=b−Ax, we seek to minimize ‖𝑟(x)‖, which is the same as minimizing ‖𝑟(x)‖$^2$
  • A vector $x \in \mathbb{R}^n$ that minimizes ‖𝑟(x)‖$^2$=‖b−Ax‖$^2$ is called a least squares solution of $Ax=b$.
  • $p=A\hatx$ is the vector in S=R(A), the column space of A, that is closest to b.

  • $b−p=b−A\hatx =𝐫(\hatx) ∈R(A)^{\bot}$ \(R(A)^{\bot}=N(A^T)\)
  • Therefore, $b−A\hatx ∈N(A^T)$ $⇒A^T (b−A\hat{x})=A^T b−A^T A\hat{x}=𝟎$

Summary

  • To solve the least squares problem \(Ax=b\)
  • We need to solve \(A^T Ax=A^T b\)
  • Least squares solution x minimizes ‖𝑟(x)‖$^2$=‖b−Ax‖$^2$ and b−Ax is orthogonal to R(A).

1.3.3. Theorem 3.3.1

If A is an m×n matrix of rank n, the normal equation \(A^T Ax=A^T b\) have a unique solution \(\hatx=(A^T A)^{−1} A^T b\) $\hatx$ is the unique least squares solution of $Ax=b$.

1.3.4. Projection formula

If A is an m×n matrix of rank n (i.e. columns are independent),

  • The unique least squares solution of Ax=b is \(\hatx =(A^T A)^{−1} A^T b\)
  • The projection of b onto R(A) is \(p=A\hatx =A(A^T A)^{−1} A^T b\)
  • Projection is a linear operator, with projection matrix \(𝑃=A(A^T A)^{−1} A^T\)

1.4. Ⅴ Orthonormal Sets

1.4.1. Note on Inner Product Spaces

For ANy vector space 𝑉, there is an abstract definition of inner product ⟨x,y⟩: it must satisfy

  • ⟨x,x⟩≥0, with equality if and only if x=𝟎
  • ⟨x,y⟩=⟨y,x⟩
  • ⟨x+𝛽y,𝐳⟩=⟨x,𝐳⟩+𝛽⟨y,𝐳⟩

  • ⟨x,y⟩ in $\mathbb{R}^n$, it means the scalar product $x^T y$.

1.4.2. Orthogonal Set

  • Definition: A set {$𝐯_1,𝐯_2, …,𝐯_k$} of nonzero vectors is said to be an orthogonal set if ⟨$𝐯_i,𝐯_𝑗$⟩=0 whenever i≠𝑗.

1.4.3. Orthonormal Set

  • Definition: An orthonormal set of vectors is an orthogonal set of unit vectors. you can turn an orthogonal set into an orthonormal set by scaling each vector to have unit length: $u_i=𝐯_i/(‖𝐯_i ‖)$

1.4.4. Theorem [Nonzero orthogonal set is linearly independent] 3.5.1

If {$𝐯_1,𝐯_2, …,𝐯_k$} is an orthogonal set of nonzero vectors, then $𝐯_1,𝐯_2, …,𝐯_k$ is linearly independent. ^[If $a_1 𝐯_1+a_2 𝐯_2+…+a_k 𝐯_k$=𝟎, need to show $a_1=a_2=…=0$. Taking the inner product of 𝐯_1 on both sides: $⟨𝐯_1,a_1 𝐯_1+a_2 𝐯_2+…+a_k 𝐯_k ⟩=a_1 ⟨𝐯_1, 𝐯_1 ⟩+a_2 ⟨𝐯_1, 𝐯_2 ⟩+…+a_k ⟨𝐯_1, 𝐯_k ⟩ =a_1 ⟨𝐯_1, 𝐯_1 ⟩=0$ but since $𝐯_1≠𝟎$, ⟨$𝐯_1, 𝐯_1$⟩=‖$𝐯_1$‖$^2$>0, so $a_1=0$.]

1.4.5. Orthonormal Basis

If 𝐵={$u_1,u_2, …,u_k$} is an orthonormal set, then by the theorem above, 𝐵 is a basis for the subspace S=Span($u_1,u_2, …,u_k$)=R(U). We say that 𝐵 is an orthonormal basis for S.

Orthonormal Basis and Coordinates

Theorem 3.5.2

Let {$u_1,u_2, …,u_n$} be an orthonormal basis for 𝑉. If $x=𝑐_1 u_1+…+𝑐_n u_n$, then $𝑐_i= ⟨x,u_i⟩$. In other words, for any x∈𝑉 \(x=⟨x,u_1 ⟩ u_1+…+⟨x,u_n ⟩ u_n\) \(x= a_1 u_1+a_2 u_2+…+a_n u_n\)\(y= 𝑏_1 u_1+𝑏_2 u_2+…+𝑏_n u_n\) then ^[Proof: $⟨x,u_i ⟩ =a_i ⇒⟨x,y⟩=⟨x,\sum𝑏_i u_i ⟩=\sum𝑏_i ⟨x,u_i ⟩=\suma_i 𝑏_i.$] \(⟨x,y⟩=\sum_{i=1}^n a_i 𝑏_i.\)
In other word, $⟨x,y⟩=[x]_𝐵^T [y]_𝐵$

  • Parseval’s formula If $x= a_1 u_1+a_2 u_2+…+a_n u_n$, then ‖x‖$^2=\sum_{i=1}^n a_i^2.$ in other word, ‖​x‖$^2=[x]_𝐵^T [x]_𝐵$

Orthonormal Sets and Projection

  • The unique least squares solution of Ax=b is \(\hatx =(A^T A)^{−1} A^T b\)
  • The projection of b onto R(A) is \(p=A\hatx =A(A^T A)^{−1} A^T b\)
  • Projection is a linear operator, with projection matrix \(𝑃=A(A^T A)^{−1} A^T\)
Theorem 3.5.3

If the column vectors of m×n matrix A form an orthonormal set of vectors in $\mathbb{R}^m$, then the projection of $b∈\mathbb{R}^m$ onto S=R(A) is \(p=A\hatx =AA^Tb\) projection matrix: \(𝑃=UU^T\)

Theorem [Projection formula] 3.5.4

If {$u_1,u_2, …,u_k$} is an orthonormal basis for a nonzero subspace S of $\mathbb{R}^m$, then the projection of $b∈\mathbb{R}^m$ onto S is \(p=(b^T u_1 ) u_1+…+(b^T u_k ) u_k\)


1.4.6. Orthogonal matrix

  • Definition: An n×n matrix 𝑄 is called an orthogonal matrix if the columns of 𝑄 form an orthonormal basis in $\mathbb{R}^n$.

1.4.7. Theorem 3.5.5

An n×n matrix 𝑄 is an orthogonal matrix if and only if \(𝑄^T 𝑄=𝐼_n\)

  • Properties Let 𝑄 be an n×n matrix. The following are equivalent:
    • 𝑄 is an orthogonal matrix.
    • The columns of 𝑄 form an orthonormal basis for $\mathbb{R}^n$.
    • ==$𝑄^T 𝑄=𝐼$, that is, $𝑄^T=𝑄^{−1}$==.
    • $(𝑄x)^T (𝑄y)=x^T 𝑄^T 𝑄y=x^T y$ for all $x,y∈\mathbb{R}^n$.
    • ‖𝑄x‖=‖x‖ for all $x∈\mathbb{R}^n$.

Orthonormal Sets and Least Squares

  • Least Square \(A^T A\hatx =A^T b⇒\hatx =(A^T A)^{−1} A^T b\)
Theorem 3.5.4

If the column vectors of m×n matrix A form an orthonormal set of vectors in $\mathbb{R}^m$, then $A^T A=𝐼_n$ and the solution to the least squares problem Ax=b is \(\hatx =A^T b\)

1.5.Summary

  • Orthogonal set: $⟨𝐯_i,𝐯_𝑗 ⟩=0$ when i≠𝑗
  • Vectors in an orthogonal set are linearly independent
  • Orthonormal set/basis: $⟨u_i,u_𝑗 ⟩$=𝛿$_{i𝑗}$
  • Orthogonal matrix: $U^T U=𝐼, U^(−1)=U^T$
  • If {$u_1,u_2, …,u_n$} is an orthonormal basis for 𝑉:
    • $x=⟨x,u_1 ⟩ u_1+…+⟨x,u_n ⟩ u_n $
    • $⟨x,y⟩=\sum_i a_i 𝑏_i$, ‖x‖$^2=\sum_i a_i^2$
  • If {$u_1,u_2, …,u_k$} is an orthonormal basis for a subspace S=R(U):
  • Least squares solution to Ux=b is $\hatx =U^T b$
  • Projection of b onto S is $p=UU^T b=\sum_i (b^T u_i ) u_i$

BackLink