지도학습 | 다중클래스 분류용 선형 모델 (MultiClass Classification Linear Model)

Machine Learning/ML with Python Library 2024. 4. 2. 22:18

Logistic Regression을 제외한 많은 Linear Classification 모델은 Binary Classification만을 지원한다. 즉, multiclass를 지원하지 않는다. 이 binary알고리즘을 multiclass로 확장하기 위해서는 가장 보편적인 기법, one vs rest, 즉 일대다 방식을 사용하면 된다. 각 클래스를 다른 모든 클래스와 구분하도록 binary classification 모델을 학습시키는것인데, 결국 클래스 수만큼 binary classification 모델이 만들어진다. 모든 결과값 중, 가장 높은 점수를 내는 classification의 클래스를 예측값으로 선택하면 된다.

세개의 클래스를 가진 간단한 데이터셋에, 이 일대다 방식을 적용해보자.

# !pip install mglearn
import mglearn

from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt

X, Y = make_blobs(random_state=42)
mglearn.discrete_scatter(X[:, 0], X[:, 1], Y)
plt.xlabel("Feature 0")
plt.ylabel("Feature 1")
plt.legend(["Class 0", "Class 1", "Class 2"])

이 데이터셋으로 LinearSVC Classification을 훈련해보자.

from sklearn.svm import LinearSVC

linear_svm = LinearSVC().fit(X, Y)
print("coef shape:", linear_svm.coef_.shape)
print("intercept shape:", linear_svm.intercept_.shape)

coef shape: (3, 2)
intercept shape: (3,)

Coef_ 배열의 크기는 (3,2)이다. 세개의 클래스에 각각 대응하는 계수(Coefficient)의 vector를 담고 있으며, 각 특성에 따른 계수 값을 가지고 있다. 이 경우, 두개의 특성을 갖고 있다. intercept_는 각 클래스의 y_intercept, 즉 y 절편을 담은 1차원 벡터이다. 이제, 세개의 binary classification이 만드는 경계를 시각화 해보자.

import numpy as np

mglearn.discrete_scatter(X[:, 0], X[:, 1], Y)
line = np.linspace(-15, 15)
for coef, intercept, color in zip(linear_svm.coef_, linear_svm.intercept_, mglearn.cm3.colors):
  plt.plot(line, -(line * coef[0] + intercept) / coef[1], c=color)

plt.ylim(-10, 15)
plt.xlim(-10, 8)
plt.xlabel("Feature 0")
plt.xlabel("Feature 1")
plt.legend(['Class 0', 'Class 1', 'Class 2', 'Class 0 Boundry', 'Class 1 Boundry', 'Class 2 Boundry'], loc=(1.01, 0.3))

클래스들이 Boundry로 잘 분류 된것을 볼 수 있다. 그런데 중앙의 삼각형은 어떨까? 세 Classification 모두 나머지로 분류했다. 이럴 경우 가장 가까운 직선의 클래스가 된다. 다음 예는 2차원 평면 모든 포인트에 대한 예측 결과를 볼 수 있다.

import numpy as np

mglearn.plots.plot_2d_classification(linear_svm, X, fill=True, alpha=0.7)
mglearn.discrete_scatter(X[:, 0], X[:, 1], Y)
line = np.linspace(-15, 15)
for coef, intercept, color in zip(linear_svm.coef_, linear_svm.intercept_, mglearn.cm3.colors):
  plt.plot(line, -(line * coef[0] + intercept) / coef[1], c=color)

plt.ylim(-10, 15)
plt.xlim(-10, 8)
plt.xlabel("Feature 0")
plt.xlabel("Feature 1")
plt.legend(['Class 0', 'Class 1', 'Class 2', 'Class 0 Boundry', 'Class 1 Boundry', 'Class 2 Boundry'], loc=(1.01, 0.3))

Reference

https://colab.research.google.com/drive/1fzSiPpwbTUplw6G0PJSpGaVBWx5CEMIZ?usp=sharing

_02_supervised_machine_learning.ipynb

Colaboratory notebook

colab.research.google.com

https://www.yes24.com/Product/Goods/42806875

파이썬 라이브러리를 활용한 머신러닝 - 예스24

사이킷런 핵심 개발자에게 배우는 머신러닝 이론과 구현 현업에서 머신러닝을 연구하고 인공지능 서비스를 개발하기 위해 꼭 학위를 받을 필요는 없다. 사이킷런(scikit-learn)과 같은 훌륭한 머신

www.yes24.com

'Machine Learning > ML with Python Library' 카테고리의 다른 글

지도학습 \| 나이브 베이즈 분류기 (Naive Bayes Classification) (0)	2024.04.03
지도학습 \| 선형모델의 장단점과 매개변수 (0)	2024.04.02
지도학습 \| 분류용 선형 모델 (Linear Classification Model) (0)	2024.04.01
지도학습 \| 라소 회귀 (Lasso Regression) (0)	2024.04.01
지도학습 \| 리지 회귀 (Ridge Regression) (0)	2024.03.30

ABOUT ME

G471000 G471000

Reference

'Machine Learning > ML with Python Library' 카테고리의 다른 글

티스토리툴바

ABOUT ME

Reference

'Machine Learning > ML with Python Library' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바