INT 104

The note starts from week 3.

Since we learn the module in 100% English. So I won’t be using too much Chinese in my note(Only for the concept or necessary explanation)

Here is the brief overview of the module:










Lecture 3:

This lecture mainly focused on the Dimensionality Reduction and two ways to achieve it. They are Principal Component Analysis(PCA) and Locally Linear Embedding(LLE).

Dimensionality Reduction(降维)

The drawback of the data with high dimensions:

  1. High computational complexity
  2. May contain many irrelevant or redundant features.
  3. Difficulty in visualization
  4. With high risk of getting an overfitting model

The approaches for Dimensionality Reduction:


Data is not spread out uniformly across all dimensions. All the data lies within (or close to) a much lower-dimensional subspace of the high-dimensional space.

Principal Component Analysis

The PCA is one of the methods for dimensions reduction. For example, assume that we have to describe an object with two features. However, our model is only capable of processing just one of them. We then have to reduce the dimensions of the data set. In a gesture to represent both of the feature, we have to find a new coordinate to measure these data. And that is what pac is doing. Since there are two features in this case, We aim to disperse the data as much as possible in the new coordinate system, so we need to find a coordinate system that makes the data more scattered. And that is the picture above shows us.

The concept below is Comparatively challenging, so I would show my understandings in both Chinese and English.


The search for the new coordinate system can be simplified as the stretching and rotation of the original coordinate system. Consequently, we can derive the following equation (the derivation of the equation is somewhat complex, and as it is not examined in school, we won’t delve into it here). Here, c1 and its transpose in the equation represent the ‘horizontal’ and ‘vertical’ coordinates of the new coordinate system, akin to the positions of c1 and c2 illustrated in the above diagram. Through linear algebra derivations (which you can explore on your own if interested), we understand that the eigenvectors of the covariance matrix are precisely c1. The transpose of c1 is denoted as c1T. S denotes the square of the angle of rotation. Hence, to determine the new coordinate system, we need to find the eigenvectors of the covariance matrix.


The picture above is the proof for the entire theorem. The central idea is to use Lagrange’s theorem to maximize the new coordinate axes within a range of length 1. A basic understanding is sufficient; the most crucial aspect is the ability to solve computational and multiple-choice questions.


Given a dataset that consists of the following points below:
A=(2, 3), B=(5, 5), C=(6, 6), D=(8,9)
1. Calculate the covariance matrix for the dataset.
2. Calculate the eigenvalues and eigenvectors of the covariance matrix.

The calculation provided by the professor is given below:

I strongly recommend you to conduct the calculation on your own to familiar with the details.(Don’t be so cocky. Practice makes perfect). Here the std is deliberately made to 2.5 to simply the calculation. And the data normalization is mainly using z-score normalization.

Singular Value Decomposition (SVD)

SVD揭示了这样的一个事实:对于任意的矩阵A,我们总能找到一组单位正交基, 使得A对其进行变换之后, 得到的向量组仍然是正交的.


SVD reveals the following fact: for any matrix AA, it is always possible to find a set of unitary orthogonal bases such that, after transforming AA with respect to these bases, the resulting vector set remains orthogonal.

The concept used in the explanation of formulas above is akin to SVD, involving the stretching and rotating of coordinate systems. The following are the provided formulas from the school:(To be continued)

Lecture 4:

Bayes’ Rule








import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

data = pd.read_excel(“CW_Data.xlsx”)
correlation_matrix = data.drop(columns=[‘Index’]).corr()
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap=’coolwarm’, fmt=”.2f”, annot_kws={“size”: 10})
plt.title(‘Correlation Heatmap’)

from sklearn.manifold import TSNE
X = data[['Grade','Total','MCQ']]
y = data['Programme']
# 应用t-SNE
tsne = TSNE(n_components=2, random_state=0)
X_tsne = tsne.fit_transform(X)
# 可视化结果
plt.scatter(X_tsne[:, 0], X_tsne[:, 1], c=y)
plt.xlabel('t-SNE feature 1')
plt.ylabel('t-SNE feature 2')
plt.title('t-SNE of Iris Dataset')

# 提取特征和目标变量
import pandas as pd
data = pd.read_excel("CW_Data.xlsx")
X = data[['Total','Gender','MCQ','Q4','Q3','Q2','Q1','Q5','Grade']]
y = data['Programme']

# 最后的pca
# 标准化数据。(归一化)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# 使用PCA进行降维
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

# 可视化降维结果
plt.figure(figsize=(8, 6))
plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y, cmap='viridis')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.title('PCA Visualization')