📘 Pertemuan 5 – Supervised Learning: Classification

🎯 Tujuan Pembelajaran

Setelah mengikuti pertemuan ini mahasiswa mampu:

  1. Memahami konsep classification dalam machine learning
  2. Memahami algoritma Logistic Regression
  3. Memahami konsep K-Nearest Neighbors (KNN)
  4. Melakukan training model klasifikasi
  5. Mengevaluasi model menggunakan confusion matrix dan classification metrics

1️⃣ Apa itu Classification?

Classification adalah metode supervised learning yang digunakan untuk memprediksi kelas atau kategori.

Contoh kasus:

Kasus Target
Spam detection Spam / Not Spam
Diagnosis penyakit Positif / Negatif
Kelulusan mahasiswa Lulus / Tidak
Fraud detection Fraud / Normal

2️⃣ Dataset Contoh

Kita akan menggunakan dataset Titanic untuk memprediksi apakah penumpang selamat atau tidak.

1
2
3
4
5
import pandas as pd
import seaborn as sns

df = sns.load_dataset("titanic")
df.head()

Target

1
2
3
survived
0 = tidak selamat
1 = selamat

3️⃣ Data Preparation

Memilih fitur yang akan digunakan

1
2
df = df[['survived','pclass','sex','age','fare']]
df = df.dropna()

Encoding data kategorikal:

1
2
3
4
from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
df['sex'] = le.fit_transform(df['sex'])

4️⃣ Train Test Split

1
2
3
4
5
6
7
8
9
10
11
from sklearn.model_selection import train_test_split

X = df.drop("survived", axis=1)
y = df["survived"]

X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.2,
random_state=42
)

5️⃣ Logistic Regression

Logistic Regression digunakan untuk binary classification.
Persamaan sigmoid:
P(y=1) = \frac{1}{1 + e^{-z}}

Implementasi:

1
2
3
4
5
6
from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

6️⃣ K-Nearest Neighbors (KNN)

KNN bekerja dengan mencari tetangga terdekat.
Langkah algoritma:

  1. Pilih nilai K
  2. Hitung jarak ke data lain
  3. Ambil K tetangga terdekat
  4. Gunakan voting mayoritas

Implementasi:

1
2
3
4
5
6
7
from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier(n_neighbors=5)

knn.fit(X_train, y_train)

y_pred_knn = knn.predict(X_test)

7️⃣ Evaluasi Model

Accuracy

1
2
3
4
from sklearn.metrics import accuracy_score

accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Confusion Matrix

Confusion matrix menunjukkan performa klasifikasi.

Predicted No Predicted Yes
Actual No TN FP
Actual Yes FN TP
1
2
3
4
from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_test, y_pred)
print(cm)

Classification Report

1
2
3
from sklearn.metrics import classification_report

print(classification_report(y_test, y_pred))

Metric yang ditampilkan:

  • Precision
  • Recall
  • F1 Score
  • Accuracy

8️⃣ Visualisasi Confusion Matrix

1
2
3
4
5
6
7
8
import seaborn as sns
import matplotlib.pyplot as plt

sns.heatmap(cm, annot=True, fmt="d", cmap="Blues")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.title("Confusion Matrix")
plt.show()

🧪 Praktikum Lengkap

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import pandas as pd
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

df = sns.load_dataset("titanic")

df = df[['survived','pclass','sex','age','fare']]
df = df.dropna()

le = LabelEncoder()
df['sex'] = le.fit_transform(df['sex'])

X = df.drop("survived", axis=1)
y = df["survived"]

X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.2,
random_state=42
)

model = LogisticRegression()

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))

📝 Tugas Praktikum

  1. Gunakan dataset Titanic.
  2. Lakukan:
    • preprocessing
    • logistic regression
    • KNN
  3. Bandingkan akurasi kedua model.

📝 Tugas Mandiri

  1. Ambil dataset klasifikasi dari Kaggle.
  2. Lakukan:
    • EDA
    • preprocessing
    • classification model
    • evaluasi model
  3. Gunakan template laporan berikut:
    https://github.com/AzharRizkiZ/Template-DS-ML⁠
  4. Upload proyek ke GitHub repository masing-masing.

🎓 Target Kompetensi

Mahasiswa mampu:

  1. Memahami konsep classification
  2. Mengimplementasikan logistic regression
  3. Menggunakan KNN
  4. Mengevaluasi model klasifikasi