📘 Pertemuan 6 – Decision Tree & Random Forest

🎯 Tujuan Pembelajaran

Setelah mengikuti pertemuan ini mahasiswa mampu:

  1. Memahami konsep Decision Tree
  2. Memahami cara kerja Random Forest
  3. Melakukan training model berbasis tree
  4. Memahami konsep feature importance
  5. Mengevaluasi performa model klasifikasi

1️⃣ Decision Tree

Decision Tree adalah algoritma machine learning yang bekerja seperti struktur pohon keputusan.

Model membuat keputusan dengan memecah data berdasarkan fitur tertentu.

Contoh struktur:

Apakah umur > 30?
├── Ya → Prediksi A
└── Tidak → Prediksi B

Keuntungan Decision Tree:

  • Mudah dipahami
  • Tidak perlu scaling
  • Bisa menangani data non-linear

Kekurangan:

  • Mudah mengalami overfitting

2️⃣ Konsep Split pada Decision Tree

Decision Tree memilih fitur terbaik untuk memisahkan data menggunakan metrik seperti:

Gini Impurity

[
Gini = 1 - \sum p_i^2
]

Entropy

[
Entropy = -\sum p_i \log_2(p_i)
]

Tujuannya adalah meminimalkan impurity sehingga node menjadi lebih homogen.


3️⃣ Dataset Contoh

Menggunakan dataset Titanic.

1
2
3
4
5
6
7
import pandas as pd
import seaborn as sns

df = sns.load_dataset("titanic")

df = df[['survived','pclass','sex','age','fare']]
df = df.dropna()

Encoding data kategorikal:

1
2
3
4
from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
df['sex'] = le.fit_transform(df['sex'])

4️⃣ Train Test Split

1
2
3
4
5
6
7
8
9
10
11
from sklearn.model_selection import train_test_split

X = df.drop("survived", axis=1)
y = df["survived"]

X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.2,
random_state=42
)

5️⃣ Implementasi Decision Tree

1
2
3
4
5
6
7
8
9
10
from sklearn.tree import DecisionTreeClassifier

tree = DecisionTreeClassifier(
max_depth=4,
random_state=42
)

tree.fit(X_train, y_train)

y_pred = tree.predict(X_test)

6️⃣ Evaluasi Model

1
2
3
4
from sklearn.metrics import accuracy_score

accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

7️⃣ Visualisasi Decision Tree

1
2
3
4
5
6
7
8
9
10
11
12
13
from sklearn.tree import plot_tree
import matplotlib.pyplot as plt

plt.figure(figsize=(12,8))

plot_tree(
tree,
feature_names=X.columns,
class_names=["No","Yes"],
filled=True
)

plt.show()

Visualisasi ini membantu memahami alur keputusan model.

8️⃣ Random Forest

Random Forest adalah ensemble learning method yang menggunakan banyak decision tree.
Konsep utama:

1
Banyak Decision Tree → Voting → Prediksi Final

Keuntungan Random Forest:

  • Lebih stabil
  • Mengurangi overfitting
  • Akurasi lebih tinggi

9️⃣ Implementasi Random Forest

1
2
3
4
5
6
7
8
9
10
from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier(
n_estimators=100,
random_state=42
)

rf.fit(X_train, y_train)

y_pred_rf = rf.predict(X_test)

🔟 Evaluasi Random Forest

1
2
3
accuracy_rf = accuracy_score(y_test, y_pred_rf)

print("Random Forest Accuracy:", accuracy_rf)

Bandingkan dengan Decision Tree.

1️⃣1️⃣ Feature Importance

Random Forest dapat menunjukkan fitur yang paling berpengaruh.

1
2
3
4
5
6
7
8
import pandas as pd

importance = pd.DataFrame({
"Feature": X.columns,
"Importance": rf.feature_importances_
})

print(importance.sort_values(by="Importance", ascending=False))

1️⃣2️⃣ Visualisasi Feature Importance

1
2
3
4
5
6
7
8
import matplotlib.pyplot as plt

importance = importance.sort_values(by="Importance")

plt.barh(importance["Feature"], importance["Importance"])
plt.title("Feature Importance")
plt.xlabel("Importance Score")
plt.show()

🧪 Praktikum Lengkap

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

tree = DecisionTreeClassifier(max_depth=4)
tree.fit(X_train, y_train)

rf = RandomForestClassifier(n_estimators=100)
rf.fit(X_train, y_train)

pred_tree = tree.predict(X_test)
pred_rf = rf.predict(X_test)

print("Decision Tree:", accuracy_score(y_test, pred_tree))
print("Random Forest:", accuracy_score(y_test, pred_rf))

📝 Tugas Praktikum

  1. Gunakan dataset Titanic.
  2. Buat model:
    • Decision Tree
    • Random Forest
  3. Bandingkan:
    • Accuracy
    • Confusion Matrix
  4. Analisis feature importance.

📝 Tugas Mandiri

  1. Ambil dataset klasifikasi dari Kaggle.
  2. Lakukan:
    • EDA
    • preprocessing
    • Decision Tree
    • Random Forest
  3. Bandingkan performa model.
  4. Gunakan template laporan:
    https://github.com/AzharRizkiZ/Template-DS-ML⁠�
  5. Upload ke GitHub repository masing-masing.

🎓 Target Kompetensi

Mahasiswa mampu:

  • memahami konsep tree-based model
  • mengimplementasikan decision tree
  • menggunakan random forest
  • menganalisis feature importance
  • membandingkan performa model