OWASP 머신 러닝 보안 톱 10

1️⃣ ML01 - 입력 조작 공격

Critical

개요

입력 조작 공격(적대적 공격)은 ML 모델이 잘못된 예측을 하도록 특수하게 설계된 입력을 조작합니다. 이미지, 텍스트 또는 기타 입력에 대한 작고 눈에 띄지 않는 교란은 분류기를 속이고 탐지 시스템을 우회하며 콘텐츠 필터를 회피할 수 있습니다. 이는 가장 잘 알려진 ML 전용 공격 벡터입니다.

위험

공격자는 자율 주행 차량 인식, 멀웨어 탐지, 사기 탐지, 콘텐츠 조정 등 안전에 필수적인 ML 시스템을 우회할 수 있습니다. 공격자는 정지 표지판을 속도 제한 표지판으로 분류하거나 멀웨어를 ML 기반 안티바이러스에 정상으로 보이게 만들 수 있습니다.

취약한 코드 예시

        Python
        ❌ Bad
      

import numpy as np
from tensorflow import keras

# Model with no adversarial robustness
model = keras.models.load_model("classifier.h5")

def predict(image):
    # Direct prediction — no input validation or preprocessing
    result = model.predict(np.expand_dims(image, axis=0))
    return np.argmax(result)
    # No confidence threshold check
    # No input bounds validation
    # Vulnerable to FGSM, PGD, C&W attacks

보안 코드 예시

        Python
        ✅ Good
      

import numpy as np
from tensorflow import keras
from art.defences.preprocessor import SpatialSmoothing
from art.defences.detector.evasion import BinaryInputDetector

# Load adversarially trained model
model = keras.models.load_model("classifier_robust.h5")

# Input preprocessing to remove perturbations
smoother = SpatialSmoothing(window_size=3)
detector = BinaryInputDetector(model)

def predict_secure(image):
    # Validate input bounds
    if image.min() < 0 or image.max() > 1:
        raise ValueError("Input out of expected range")

    # Detect adversarial input
    if detector.detect(image):
        raise ValueError("Adversarial input detected")

    # Apply spatial smoothing defense
    cleaned = smoother(image)[0]
    result = model.predict(np.expand_dims(cleaned, axis=0))

    # Reject low-confidence predictions
    confidence = np.max(result)
    if confidence < 0.85:
        return {"label": "uncertain", "confidence": confidence}
    return {"label": np.argmax(result), "confidence": confidence}

완화 체크리스트

적대적 학습을 사용하여 섭동 공격에 대한 모델 견고성 향상
입력 유효성 검사 및 전처리(공간 평활화, 피처 스퀴징)를 구현합니다.
신뢰도 임계값 설정 및 신뢰도가 낮은 예측 거부하기
프로덕션 환경에 적대적 입력 탐지기(IBM ART, Microsoft Counterfit)를 배포합니다.

2️⃣ ML02 - 데이터 중독 공격

Critical

개요

데이터 중독 공격은 악성 샘플을 학습 데이터 세트에 주입하여 모델의 학습된 동작을 손상시킵니다. 공격자는 백도어(특정 오 분류를 유발하는 트리거 패턴)를 도입하거나, 의사 결정 경계를 바꾸거나, 전반적인 모델 정확도를 떨어뜨릴 수 있습니다. 이는 인터넷이나 사용자 생성 콘텐츠에서 학습 데이터를 가져올 때 특히 위험합니다.

위험

포이즌 모델은 깨끗한 입력에서는 정상적으로 작동하지만 특정 트리거 패턴이 존재하면 잘못 분류할 수 있습니다. 예를 들어, 백도어된 멀웨어 분류기는 특정 바이트 시퀀스가 포함된 멀웨어 샘플을 승인할 수 있습니다. 이 공격은 깨끗한 데이터에 대한 모델 정확도가 높게 유지되기 때문에 은밀하게 이루어집니다.

취약한 코드 예시

        Python
        ❌ Bad
      

import pandas as pd
from sklearn.ensemble import RandomForestClassifier

# Training on unvalidated, crowdsourced data
data = pd.read_csv("user_submitted_data.csv")  # No validation!

# No outlier detection or data quality checks
X = data.drop("label", axis=1)
y = data["label"]

model = RandomForestClassifier()
model.fit(X, y)  # Training directly on untrusted data!

# No comparison against clean baseline
# No data provenance tracking

보안 코드 예시

        Python
        ✅ Good
      

import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier, IsolationForest
from sklearn.model_selection import cross_val_score

# Load data with provenance tracking
data = pd.read_csv("training_data.csv")
data_hash = hashlib.sha256(data.to_csv().encode()).hexdigest()
log.info(f"Training data hash: {data_hash}")

X = data.drop("label", axis=1)
y = data["label"]

# Detect and remove anomalous samples
iso_forest = IsolationForest(contamination=0.05, random_state=42)
outlier_mask = iso_forest.fit_predict(X) == 1
X_clean, y_clean = X[outlier_mask], y[outlier_mask]
log.info(f"Removed {(~outlier_mask).sum()} outliers from {len(X)} samples")

# Train and validate against baseline
model = RandomForestClassifier(random_state=42)
scores = cross_val_score(model, X_clean, y_clean, cv=5)
if scores.mean() < BASELINE_ACCURACY - 0.05:
    raise ValueError("Model accuracy dropped — possible data poisoning")

model.fit(X_clean, y_clean)

완화 체크리스트

모든 학습 데이터의 유효성 검사 및 위생 처리, 이상값 탐지(격리 포레스트, LOF) 사용
암호화 해시 및 액세스 제어를 통한 데이터 출처 추적
모델 메트릭을 깨끗한 기준선과 비교하여 정확도 저하를 감지합니다.
중요 모델에 강력한 훈련 기법(DPSGD, 인증된 방어)을 사용하세요.

3️⃣ ML03 - 모델 반전 공격

High

개요

모델 반전 공격은 모델을 쿼리하고 그 결과를 분석하여 민감한 훈련 데이터를 재구성합니다. 공격자는 학습 중에 사용된 얼굴, 의료 기록 또는 개인 데이터와 같은 개인 정보를 복구할 수 있습니다. 이는 특히 민감한 데이터 세트(의료, 생체 인식, 금융 데이터)에 대해 학습된 모델에서 우려되는 문제입니다.

위험

모델 반전은 학습 데이터에서 개인 식별 정보를 노출하여 데이터 개인 정보 보호 규정(GDPR, HIPAA)을 위반할 수 있습니다. 얼굴 인식 모델에 대한 API 액세스 권한이 있는 공격자는 학습 세트에 있는 개인의 얼굴을 재구성할 수 있습니다.

취약한 코드 예시

        Python (API)
        ❌ Bad
      

from flask import Flask, request, jsonify

app = Flask(__name__)
model = load_model("face_classifier.h5")

@app.route("/predict", methods=["POST"])
def predict():
    image = request.files["image"]
    result = model.predict(preprocess(image))
    # Returns full probability vector — enables model inversion!
    return jsonify({
        "probabilities": result.tolist(),  # All class probabilities!
        "prediction": int(np.argmax(result)),
        "confidence": float(np.max(result))
    })
    # No rate limiting, no query logging
    # Unlimited API access for gradient estimation

보안 코드 예시

        Python (API)
        ✅ Good
      

from flask import Flask, request, jsonify
from flask_limiter import Limiter
import numpy as np

app = Flask(__name__)
limiter = Limiter(app, default_limits=["100/hour"])
model = load_model("face_classifier_dp.h5")  # Trained with DP

@app.route("/predict", methods=["POST"])
@limiter.limit("100/hour")
def predict():
    image = request.files["image"]
    result = model.predict(preprocess(image))

    # Return only top-1 prediction — no probability vector
    prediction = int(np.argmax(result))
    log_query(request.remote_addr, prediction)  # Audit logging

    return jsonify({
        "prediction": prediction
        # No probabilities, no confidence scores
    })

완화 체크리스트

전체 확률 분포 없이 상위 k 예측만 반환합니다.
정보 유출을 제한하기 위해 차등 프라이버시(DP-SGD)로 모델을 교육합니다.
속도 제한 및 쿼리 로깅을 구현하여 추출 시도를 탐지하세요.
출력 섭동(반올림, 노이즈)을 추가하여 응답의 정밀도를 줄입니다.

4️⃣ ML04 - 멤버십 추론 공격

High

개요

멤버십 추론 공격은 특정 데이터 포인트가 모델의 학습 데이터 세트에 사용되었는지 여부를 결정합니다. 공격자는 알려진 입력과 알려지지 않은 입력에 대한 모델의 신뢰도 점수와 동작을 분석하여 개인 회원 정보를 추론할 수 있습니다. 이는 민감한 데이터로 학습된 모델에 대한 심각한 개인정보 위협입니다.

위험

멤버십 추론을 통해 특정 개인의 데이터가 훈련에 사용되었음을 알 수 있습니다(예: 환자의 기록이 임상 데이터 세트에 있는지 확인하거나 감시 훈련에 사람의 얼굴이 사용되었음을 확인하는 등). 이는 개인정보 보호 기대치에 위배되며 잠재적으로 GDPR과 같은 규정을 위반할 수 있습니다.

취약한 코드 예시

        Python
        ❌ Bad
      

from sklearn.neural_network import MLPClassifier

# Overfitted model — memorizes training data
model = MLPClassifier(
    hidden_layer_sizes=(512, 512, 256),  # Over-parameterized!
    max_iter=1000,
    # No regularization
    # No early stopping
)
model.fit(X_train, y_train)

# Model memorizes training data → membership inference possible
# Training accuracy: 99.9% vs Test accuracy: 82%
# This gap indicates overfitting = information leakage

def predict_with_confidence(x):
    proba = model.predict_proba([x])[0]
    return {"probabilities": proba.tolist()}  # Leaks membership info!

보안 코드 예시

        Python
        ✅ Good
      

from sklearn.neural_network import MLPClassifier
import numpy as np

# Regularized model with early stopping to reduce overfitting
model = MLPClassifier(
    hidden_layer_sizes=(128, 64),
    max_iter=500,
    alpha=0.01,               # L2 regularization
    early_stopping=True,       # Prevents memorization
    validation_fraction=0.15,
)
model.fit(X_train, y_train)

# Verify train/test gap is small (low overfitting)
train_acc = model.score(X_train, y_train)
test_acc = model.score(X_test, y_test)
assert train_acc - test_acc < 0.05, "Overfitting detected!"

def predict_secure(x):
    pred = model.predict([x])[0]
    return {"prediction": int(pred)}  # Label only, no probabilities

완화 체크리스트

정규화(L2, 드롭아웃, 조기 중지)를 사용하여 모델 과적합 방지
차등 개인정보 보호로 교육하여 공식적인 회원 개인정보 보호 보장 제공
교육과 테스트 정확도 간의 격차 모니터링 및 최소화
API 출력을 예측으로만 제한 - 신뢰도 점수 또는 확률 노출 방지

5️⃣ ML05 - 모델 도난

Critical

개요

모델 도용(모델 추출) 공격은 독점 ML 모델을 체계적으로 쿼리하고 입출력 쌍에 대해 대리 모델을 훈련시켜 기능적 사본을 만듭니다. 그런 다음 도난당한 모델을 사용하여 적대적인 사례를 찾거나 상업적으로 경쟁하거나 모델의 학습 데이터를 리버스 엔지니어링할 수 있습니다.

위험

모델을 도난당하면 지적 재산과 경쟁 우위를 잃게 됩니다. 추출된 모델은 오프라인에서 적대적 공격이나 의사 결정의 경계를 파악하는 데 사용될 수 있습니다. 수백만 달러의 교육 투자를 수천 개의 API 쿼리로 복제할 수 있습니다.

취약한 코드 예시

        Python (API)
        ❌ Bad
      

from flask import Flask, request, jsonify

app = Flask(__name__)
model = load_proprietary_model()

@app.route("/predict", methods=["POST"])
def predict():
    data = request.json["features"]
    result = model.predict_proba([data])[0]

    # Returns full probability distribution
    return jsonify({
        "probabilities": result.tolist(),
        "prediction": int(np.argmax(result))
    })
    # No rate limiting — unlimited queries
    # No anomaly detection on query patterns
    # Attacker can extract model with ~10K queries

보안 코드 예시

        Python (API)
        ✅ Good
      

from flask import Flask, request, jsonify
from flask_limiter import Limiter
import numpy as np

app = Flask(__name__)
limiter = Limiter(app, default_limits=["50/hour"])

# Watermarked model for theft detection
model = load_watermarked_model()
query_monitor = QueryPatternDetector()

@app.route("/predict", methods=["POST"])
@limiter.limit("50/hour")
def predict():
    data = request.json["features"]
    api_key = request.headers.get("X-API-Key")

    # Detect extraction patterns (uniform sampling, grid queries)
    if query_monitor.is_suspicious(api_key, data):
        log_alert(f"Possible extraction: {api_key}")
        return jsonify({"error": "rate limited"}), 429

    result = model.predict([data])[0]
    return jsonify({
        "prediction": int(result)  # Label only, no probabilities
    })

완화 체크리스트

엄격한 속도 제한 및 사용자별 쿼리 할당량 구현
확률 분포 없이 최소 출력(레이블만)을 반환합니다.
모델 워터마크를 삽입하여 추출된 모델의 도용을 감지하고 증명하세요.
체계적인 추출 동작(그리드 검색, 경계 프로빙)에 대한 쿼리 패턴 모니터링

6️⃣ ML06 - AI 공급망 공격

Critical

개요

AI 공급망 공격은 모델 허브에서 사전 학습된 모델, 타사 데이터 세트, ML 프레임워크, 종속성 등 ML 개발 파이프라인을 표적으로 삼습니다. 악성 모델에는 숨겨진 백도어가 포함될 수 있으며, 손상된 라이브러리는 취약성을 주입할 수 있습니다. ML 프레임워크에서 사용하는 직렬화 형식(Pickle, SavedModel)은 로드 시 임의의 코드를 실행할 수 있습니다.

위험

악성 모델 파일을 로드하면 임의의 코드(피클 역직렬화 공격)가 실행될 수 있습니다. 신뢰할 수 없는 소스에서 사전 학습된 모델에는 백도어가 포함될 수 있습니다. 손상된 ML 라이브러리는 모든 다운스트림 사용자에게 영향을 미칩니다. ML 공급망은 기존 소프트웨어 공급망에 비해 보안 제어 기능이 적습니다.

취약한 코드 예시

        Python
        ❌ Bad
      

import pickle
import torch

# Loading untrusted model — arbitrary code execution!
with open("model_from_internet.pkl", "rb") as f:
    model = pickle.load(f)  # DANGEROUS: can execute any code!

# Loading unverified PyTorch model
model = torch.load("untrusted_model.pt")  # Uses pickle internally!

# Using unvetted model from public hub
from transformers import AutoModel
model = AutoModel.from_pretrained("random-user/suspicious-model")
# No hash verification, no security scan

보안 코드 예시

        Python
        ✅ Good
      

import torch
import hashlib
from safetensors.torch import load_file

# Use SafeTensors — no arbitrary code execution
model_state = load_file("model.safetensors")  # Safe format!
model = MyModel()
model.load_state_dict(model_state)

# Verify model hash before loading
EXPECTED_HASH = "sha256:a1b2c3d4..."
with open("model.safetensors", "rb") as f:
    actual_hash = "sha256:" + hashlib.sha256(f.read()).hexdigest()
assert actual_hash == EXPECTED_HASH, "Model integrity check failed!"

# Use trusted models from verified organizations
from transformers import AutoModel
model = AutoModel.from_pretrained(
    "google/bert-base-uncased",  # Verified organization
    revision="a265f77",           # Pin to specific commit
)

완화 체크리스트

모델 직렬화를 위해 Pickle 대신 SafeTensors 또는 ONNX 형식 사용
로드하기 전에 암호화 해시를 사용하여 모델 무결성 확인
신뢰할 수 있는 모델 허브에서 검증된 조직의 모델만 다운로드하세요.
ML 종속성에서 요구 사항의 취약성 및 핀 버전을 검색합니다.

7️⃣ ML07 - 전이 학습 공격

High

개요

전이 학습 공격은 사전 학습된 모델을 미세 조정하는 일반적인 관행을 악용합니다. 기본 모델에 내장된 백도어는 미세 조정을 통해 지속되며 다운스트림 모델에서 활성 상태로 유지됩니다. 공격자가 인기 있는 사전 훈련된 모델을 게시하면 이를 기반으로 사용하는 모든 애플리케이션이 손상될 수 있습니다.

위험

사전 학습된 모델의 백도어는 전송 학습 중에 종종 동결되는 심층 계층에 내장되어 있기 때문에 미세 조정에도 살아남습니다. 손상된 하나의 기반 모델이 수천 개의 다운스트림 애플리케이션에 영향을 미칠 수 있습니다. 이 공격은 확장 가능하고 탐지하기 어렵습니다.

취약한 코드 예시

        Python
        ❌ Bad
      

from transformers import AutoModelForSequenceClassification

# Fine-tuning an unvetted pre-trained model
model = AutoModelForSequenceClassification.from_pretrained(
    "unknown-user/bert-finetuned-sentiment",  # Untrusted source!
    num_labels=2
)

# Freezing base layers — preserves any hidden backdoor
for param in model.base_model.parameters():
    param.requires_grad = False  # Backdoor in frozen layers persists!

# Fine-tune only the classification head
trainer.train()  # Backdoor remains undetected

보안 코드 예시

        Python
        ✅ Good
      

from transformers import AutoModelForSequenceClassification
from neural_cleanse import BackdoorDetector

# Use only verified, trusted base models
model = AutoModelForSequenceClassification.from_pretrained(
    "google/bert-base-uncased",  # Trusted source
    num_labels=2,
    revision="main",
)

# Scan pre-trained model for backdoors before fine-tuning
detector = BackdoorDetector(model)
if detector.scan():
    raise SecurityError("Potential backdoor detected in base model")

# Fine-tune ALL layers (not just head) to overwrite potential backdoors
for param in model.parameters():
    param.requires_grad = True  # Train all layers

# Validate with clean test set + trigger test set
trainer.train()
evaluate_for_backdoors(model, trigger_test_set)

완화 체크리스트

신뢰할 수 있고 검증된 출처(주요 연구소, 공식 리포지토리)에서 사전 학습된 모델만 사용하세요.
신경 정화 또는 유사한 도구를 사용하여 사전 학습된 모델에서 백도어를 스캔합니다.
헤드뿐만 아니라 모든 레이어를 미세 조정하여 임베디드 백도어를 덮어쓰는 데 도움이 됩니다.
알려진 트리거 패턴과 비정상적인 입력에 대해 미세 조정된 모델을 테스트합니다.

8️⃣ ML08 - 모델 스큐잉

Medium

개요

모델 왜곡은 프로덕션 데이터 분포가 학습 데이터 분포와 크게 다를 때 발생합니다(학습-서빙 왜곡). 이는 시간이 지남에 따라 자연적으로 발생하거나(데이터 드리프트), 공격자가 의도적으로 프로덕션 입력 분포를 조작하여 모델 성능을 저하시키거나 예측을 편향되게 만들 수 있습니다.

위험

모델 왜곡은 모델이 부정확하지만 확실한 예측을 생성하는 조용한 실패를 유발합니다. 금융 시스템에서 공격자는 왜곡을 악용하여 사기 탐지를 우회할 수 있습니다. 추천 시스템에서는 특정 콘텐츠나 제품을 홍보하기 위해 의도적으로 왜곡을 유도할 수 있습니다.

취약한 코드 예시

        Python
        ❌ Bad
      

import joblib

# Deploy model with no drift monitoring
model = joblib.load("model_trained_2023.pkl")

def predict(features):
    # No check if input distribution has changed
    # No feature validation against training schema
    return model.predict([features])[0]
    # Model may be months/years old
    # No monitoring of prediction distribution
    # Silent degradation goes undetected

보안 코드 예시

        Python
        ✅ Good
      

import joblib
import numpy as np
from scipy import stats
from evidently import ColumnDriftMetric

model = joblib.load("model.pkl")
training_stats = joblib.load("training_stats.pkl")

def predict_with_monitoring(features):
    # Validate feature schema and ranges
    for i, (val, stat) in enumerate(zip(features, training_stats)):
        z_score = abs((val - stat["mean"]) / stat["std"])
        if z_score > 5:
            log.warning(f"Feature {i} out of distribution: z={z_score:.1f}")

    prediction = model.predict([features])[0]

    # Log prediction distribution for drift monitoring
    metrics_collector.log(features, prediction)

    # Periodic drift detection (run by monitoring job)
    # drift_report = ColumnDriftMetric().calculate(reference, current)
    # Alert if drift detected → trigger retraining

    return prediction

완화 체크리스트

지속적인 데이터 드리프트 모니터링 구현(분명히, Whylogs, 큰 기대)
학습 데이터 스키마 및 통계적 경계에 대한 입력 기능 검증
드리프트가 임계값을 초과할 때 자동 알림 및 재교육 트리거를 설정하세요.
시간 경과에 따른 예측 분포 모니터링으로 조용한 모델 성능 저하 감지

9️⃣ ML09 - 출력 무결성 공격

High

개요

출력 무결성 공격은 모델 예측이 모델을 떠난 후 소비 애플리케이션에 도달하기 전에 모델 예측을 변조합니다. 여기에는 예측 API에 대한 중간자 공격, 모델 서비스 인프라 조작, 캐시된 예측 변조가 포함됩니다. 이 공격은 모델 자체가 아닌 추론 파이프라인을 표적으로 삼습니다.

위험

변조된 예측은 다운스트림 시스템에서 사기 거래를 승인하거나, 의학적 상태를 잘못 진단하거나, 안전 시스템을 무시하는 등 잘못된 결정을 내릴 수 있습니다. 모델 자체는 손상되지 않았기 때문에 표준 모델 모니터링으로는 공격을 탐지할 수 없습니다.

취약한 코드 예시

        Python
        ❌ Bad
      

import requests

# Consuming model predictions over unencrypted HTTP
def get_prediction(features):
    response = requests.post(
        "http://ml-service/predict",  # HTTP, not HTTPS!
        json={"features": features}
    )
    result = response.json()
    # No integrity verification of the response
    # No validation of prediction format
    return result["prediction"]  # Could be tampered!

보안 코드 예시

        Python
        ✅ Good
      

import requests
import hmac
import hashlib

def get_prediction(features):
    response = requests.post(
        "https://ml-service/predict",  # HTTPS (TLS)
        json={"features": features},
        headers={"Authorization": f"Bearer {API_TOKEN}"},
        verify=True  # Verify TLS certificate
    )
    result = response.json()

    # Verify response integrity with HMAC signature
    signature = response.headers.get("X-Signature")
    expected = hmac.new(
        SHARED_SECRET, str(result).encode(), hashlib.sha256
    ).hexdigest()
    if not hmac.compare_digest(signature, expected):
        raise IntegrityError("Response signature mismatch!")

    # Validate prediction is within expected range
    pred = result["prediction"]
    if pred not in VALID_LABELS:
        raise ValueError(f"Unexpected prediction: {pred}")
    return pred

완화 체크리스트

모든 모델 추론 통신에 TLS/mTLS 사용
무결성 검증을 위해 HMAC 또는 디지털 서명으로 모델 응답에 서명하세요.
예상 범위 및 형식에 대한 예측 출력 검증
소비자 측에서 예측 분포의 이상 징후를 모니터링합니다.

🔟 ML10 - 모델 중독

Critical

개요

모델 포이즈닝은 학습된 모델의 가중치, 파라미터 또는 아키텍처를 직접 수정하여 백도어를 삽입하거나 동작을 변경합니다. 학습 데이터를 손상시키는 데이터 중독과 달리 모델 중독은 모델 저장소, 내부자 위협 또는 모델 저장소 및 배포 파이프라인에 대한 공급망 공격을 통해 모델 아티팩트 자체를 표적으로 삼습니다.

위험

직접 포이즌된 모델에는 표준 테스트에서는 거의 탐지할 수 없는 고도로 표적화된 백도어가 포함될 수 있습니다. 공격자는 트리거 입력에 대한 모델의 동작을 정밀하게 제어할 수 있습니다. 모델 레지스트리 또는 배포 파이프라인이 손상된 경우 모든 배포는 포이즈드 모델을 사용합니다.

취약한 코드 예시

        Python
        ❌ Bad
      

import mlflow

# Loading model from registry with no integrity checks
model_uri = "models:/fraud-detector/Production"
model = mlflow.pyfunc.load_model(model_uri)

# No hash verification
# No signature validation
# No comparison with expected model metrics
# Model registry has weak access controls
# Anyone with push access can replace the model

predictions = model.predict(new_data)

보안 코드 예시

        Python
        ✅ Good
      

import mlflow
import hashlib
from sigstore.verify import Verifier

# Verify model signature before loading
model_uri = "models:/fraud-detector/Production"
model_path = mlflow.artifacts.download_artifacts(model_uri)

# Cryptographic signature verification
verifier = Verifier.production()
verifier.verify(
    model_path,
    expected_identity="ml-team@company.com"
)

# Verify model hash against approved registry
model_hash = hash_directory(model_path)
approved_hash = get_approved_hash("fraud-detector", "Production")
assert model_hash == approved_hash, "Model integrity check failed!"

# Validate model metrics on reference dataset before serving
model = mlflow.pyfunc.load_model(model_path)
ref_score = evaluate(model, reference_dataset)
assert ref_score >= MINIMUM_ACCURACY, "Model quality below threshold"

predictions = model.predict(new_data)

완화 체크리스트

모델 아티팩트에 암호화된 방식으로 서명하고 배포 전에 서명 확인
감사 로깅을 통해 모델 레지스트리에 대한 엄격한 액세스 제어를 구현하세요.
프로덕션으로 승격하기 전에 참조 데이터 세트의 모델 메트릭 검증
변경 불가능한 모델 저장소를 사용하고 승인된 모델 버전의 해시 레지스트리를 유지합니다.

📊 요약 표

ID	취약성	심각도	주요 완화
ML01	입력 조작 공격	Critical	적대적 훈련, 입력 검증, 신뢰도 임계값
ML02	데이터 중독 공격	Critical	이상값 탐지, 데이터 출처, 기준선 비교
ML03	모델 반전 공격	High	차등 프라이버시, 최소 출력, 속도 제한
ML04	멤버십 추론 공격	High	정규화, DP 교육, 확률 노출 없음
ML05	모델 도난	Critical	속도 제한, 워터마킹, 쿼리 패턴 탐지
ML06	AI 공급망 공격	Critical	세이프텐서, 해시 검증, 신뢰할 수 있는 소스만 제공
ML07	이전 학습 공격	High	신뢰할 수 있는 기본 모델, 백도어 스캐닝, 전체 미세 조정
ML08	모델 왜곡	Medium	드리프트 모니터링, 입력 유효성 검사, 자동 재교육
ML09	출력 무결성 공격	High	TLS/mTLS, 응답 서명, 출력 유효성 검사
ML10	모델 중독	Critical	모델 서명, 레지스트리 액세스 제어, 메트릭 유효성 검사