OWASP 机器学习安全 Top 10

1️⃣ ML01 - 输入操纵攻击

Critical

概述

输入操纵攻击（对抗性攻击）通过精心设计的输入，使 ML 模型做出错误的预测。对图像、文本或其他输入进行微小的、通常不易察觉的扰动，就能骗过分类器、绕过检测系统并躲过内容过滤器。这是最著名的 ML 特定攻击向量。

风险

对抗性实例可以绕过对安全至关重要的人工智能系统：自动驾驶汽车感知、恶意软件检测、欺诈检测和内容审核。攻击者可将停车标志归类为限速标志，或使恶意软件在基于 ML 的防病毒软件面前显得无害。

漏洞代码示例

        Python
        ❌ Bad
      

import numpy as np
from tensorflow import keras

# Model with no adversarial robustness
model = keras.models.load_model("classifier.h5")

def predict(image):
    # Direct prediction — no input validation or preprocessing
    result = model.predict(np.expand_dims(image, axis=0))
    return np.argmax(result)
    # No confidence threshold check
    # No input bounds validation
    # Vulnerable to FGSM, PGD, C&W attacks

安全代码示例

        Python
        ✅ Good
      

import numpy as np
from tensorflow import keras
from art.defences.preprocessor import SpatialSmoothing
from art.defences.detector.evasion import BinaryInputDetector

# Load adversarially trained model
model = keras.models.load_model("classifier_robust.h5")

# Input preprocessing to remove perturbations
smoother = SpatialSmoothing(window_size=3)
detector = BinaryInputDetector(model)

def predict_secure(image):
    # Validate input bounds
    if image.min() < 0 or image.max() > 1:
        raise ValueError("Input out of expected range")

    # Detect adversarial input
    if detector.detect(image):
        raise ValueError("Adversarial input detected")

    # Apply spatial smoothing defense
    cleaned = smoother(image)[0]
    result = model.predict(np.expand_dims(cleaned, axis=0))

    # Reject low-confidence predictions
    confidence = np.max(result)
    if confidence < 0.85:
        return {"label": "uncertain", "confidence": confidence}
    return {"label": np.argmax(result), "confidence": confidence}

缓解措施清单

利用对抗训练提高模型对扰动攻击的鲁棒性
实施输入验证和预处理（空间平滑、特征挤压）
设置置信度阈值，拒绝置信度低的预测
在生产中部署对抗性输入检测器（IBM ART、Microsoft Counterfit

2️⃣ ML02 - 数据中毒攻击

Critical

概述

数据中毒攻击将恶意样本注入训练数据集，破坏模型的学习行为。攻击者可以引入后门（导致特定错误分类的触发模式）、移动决策边界或降低整体模型的准确性。当训练数据来自互联网或用户生成的内容时，这种情况尤为危险。

风险

中毒模型在干净输入时可能表现正常，但在出现特定触发模式时就会分类错误。例如，恶意软件分类器可以识别任何包含特定字节序列的恶意软件样本。这种攻击是隐蔽的，因为模型在干净数据上的准确率仍然很高。

漏洞代码示例

        Python
        ❌ Bad
      

import pandas as pd
from sklearn.ensemble import RandomForestClassifier

# Training on unvalidated, crowdsourced data
data = pd.read_csv("user_submitted_data.csv")  # No validation!

# No outlier detection or data quality checks
X = data.drop("label", axis=1)
y = data["label"]

model = RandomForestClassifier()
model.fit(X, y)  # Training directly on untrusted data!

# No comparison against clean baseline
# No data provenance tracking

安全代码示例

        Python
        ✅ Good
      

import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier, IsolationForest
from sklearn.model_selection import cross_val_score

# Load data with provenance tracking
data = pd.read_csv("training_data.csv")
data_hash = hashlib.sha256(data.to_csv().encode()).hexdigest()
log.info(f"Training data hash: {data_hash}")

X = data.drop("label", axis=1)
y = data["label"]

# Detect and remove anomalous samples
iso_forest = IsolationForest(contamination=0.05, random_state=42)
outlier_mask = iso_forest.fit_predict(X) == 1
X_clean, y_clean = X[outlier_mask], y[outlier_mask]
log.info(f"Removed {(~outlier_mask).sum()} outliers from {len(X)} samples")

# Train and validate against baseline
model = RandomForestClassifier(random_state=42)
scores = cross_val_score(model, X_clean, y_clean, cv=5)
if scores.mean() < BASELINE_ACCURACY - 0.05:
    raise ValueError("Model accuracy dropped — possible data poisoning")

model.fit(X_clean, y_clean)

缓解措施清单

验证并清除所有训练数据；使用离群点检测（隔离森林，LOF）
利用加密哈希值和访问控制跟踪数据出处
将模型指标与清洁基线进行比较，以检测精度下降情况
对关键模型使用强大的训练技术（DPSGD、认证防御）。

3️⃣ ML03 - 模型反转攻击

High

概述

模型反转攻击通过查询模型并分析其输出结果来重建敏感的训练数据。攻击者可以恢复训练过程中使用的私人信息，如人脸、医疗记录或个人数据。对于在敏感数据集（医疗保健、生物识别、金融数据）上训练的模型来说，这一点尤为重要。

风险

模型反转会暴露训练数据中的个人身份信息，从而违反数据隐私法规（GDPR、HIPAA）。攻击者通过 API 访问人脸识别模型，可以重建训练集中的个人面孔。

漏洞代码示例

        Python (API)
        ❌ Bad
      

from flask import Flask, request, jsonify

app = Flask(__name__)
model = load_model("face_classifier.h5")

@app.route("/predict", methods=["POST"])
def predict():
    image = request.files["image"]
    result = model.predict(preprocess(image))
    # Returns full probability vector — enables model inversion!
    return jsonify({
        "probabilities": result.tolist(),  # All class probabilities!
        "prediction": int(np.argmax(result)),
        "confidence": float(np.max(result))
    })
    # No rate limiting, no query logging
    # Unlimited API access for gradient estimation

安全代码示例

        Python (API)
        ✅ Good
      

from flask import Flask, request, jsonify
from flask_limiter import Limiter
import numpy as np

app = Flask(__name__)
limiter = Limiter(app, default_limits=["100/hour"])
model = load_model("face_classifier_dp.h5")  # Trained with DP

@app.route("/predict", methods=["POST"])
@limiter.limit("100/hour")
def predict():
    image = request.files["image"]
    result = model.predict(preprocess(image))

    # Return only top-1 prediction — no probability vector
    prediction = int(np.argmax(result))
    log_query(request.remote_addr, prediction)  # Audit logging

    return jsonify({
        "prediction": prediction
        # No probabilities, no confidence scores
    })

缓解措施清单

只返回前 k 项预测，不返回完整概率分布
利用差分隐私（DP-SGD）训练模型，限制信息泄露
实施速率限制和查询记录，以检测提取企图
添加输出扰动（四舍五入、噪声）以降低响应精度

4️⃣ ML04 - 成员推理攻击

High

概述

成员推断攻击可确定模型的训练数据集中是否使用了特定数据点。通过分析模型的置信度分数和已知输入与未知输入的行为，攻击者可以推断出私人成员信息。这对于在敏感数据上训练的模型来说是一个重大的隐私威胁。

风险

成员推理可以揭示特定个人的数据被用于训练--例如，确认临床数据集中有病人的记录，或者某人的脸被用于监控训练。这违反了隐私预期，并可能违反 GDPR 等法规。

漏洞代码示例

        Python
        ❌ Bad
      

from sklearn.neural_network import MLPClassifier

# Overfitted model — memorizes training data
model = MLPClassifier(
    hidden_layer_sizes=(512, 512, 256),  # Over-parameterized!
    max_iter=1000,
    # No regularization
    # No early stopping
)
model.fit(X_train, y_train)

# Model memorizes training data → membership inference possible
# Training accuracy: 99.9% vs Test accuracy: 82%
# This gap indicates overfitting = information leakage

def predict_with_confidence(x):
    proba = model.predict_proba([x])[0]
    return {"probabilities": proba.tolist()}  # Leaks membership info!

安全代码示例

        Python
        ✅ Good
      

from sklearn.neural_network import MLPClassifier
import numpy as np

# Regularized model with early stopping to reduce overfitting
model = MLPClassifier(
    hidden_layer_sizes=(128, 64),
    max_iter=500,
    alpha=0.01,               # L2 regularization
    early_stopping=True,       # Prevents memorization
    validation_fraction=0.15,
)
model.fit(X_train, y_train)

# Verify train/test gap is small (low overfitting)
train_acc = model.score(X_train, y_train)
test_acc = model.score(X_test, y_test)
assert train_acc - test_acc < 0.05, "Overfitting detected!"

def predict_secure(x):
    pred = model.predict([x])[0]
    return {"prediction": int(pred)}  # Label only, no probabilities

缓解措施清单

使用正则化（L2、剔除、提前停止）防止模型过度拟合
利用差分隐私训练提供正式的成员隐私保证
监控并尽量缩小培训和测试准确性之间的差距
将应用程序接口输出限制在预测范围内--避免暴露置信度分数或概率

5️⃣ ML05 - 型号盗窃

Critical

概述

模型窃取（模型提取）攻击通过系统查询专有 ML 模型并在输入输出对上训练一个代理模型，从而创建一个专有 ML 模型的功能副本。窃取的模型可用于寻找对抗性示例、商业竞争或逆向工程模型的训练数据。

风险

被盗模型意味着知识产权和竞争优势的损失。提取的模型可用于离线设计对抗性攻击或了解决策边界。数百万美元的培训投资可通过数千次 API 查询进行复制。

漏洞代码示例

        Python (API)
        ❌ Bad
      

from flask import Flask, request, jsonify

app = Flask(__name__)
model = load_proprietary_model()

@app.route("/predict", methods=["POST"])
def predict():
    data = request.json["features"]
    result = model.predict_proba([data])[0]

    # Returns full probability distribution
    return jsonify({
        "probabilities": result.tolist(),
        "prediction": int(np.argmax(result))
    })
    # No rate limiting — unlimited queries
    # No anomaly detection on query patterns
    # Attacker can extract model with ~10K queries

安全代码示例

        Python (API)
        ✅ Good
      

from flask import Flask, request, jsonify
from flask_limiter import Limiter
import numpy as np

app = Flask(__name__)
limiter = Limiter(app, default_limits=["50/hour"])

# Watermarked model for theft detection
model = load_watermarked_model()
query_monitor = QueryPatternDetector()

@app.route("/predict", methods=["POST"])
@limiter.limit("50/hour")
def predict():
    data = request.json["features"]
    api_key = request.headers.get("X-API-Key")

    # Detect extraction patterns (uniform sampling, grid queries)
    if query_monitor.is_suspicious(api_key, data):
        log_alert(f"Possible extraction: {api_key}")
        return jsonify({"error": "rate limited"}), 429

    result = model.predict([data])[0]
    return jsonify({
        "prediction": int(result)  # Label only, no probabilities
    })

缓解措施清单

实施严格的速率限制和每个用户的查询配额
返回最小输出（仅标签），不含概率分布
嵌入模型水印，检测并证明提取模型被盗
监控系统提取行为的查询模式（网格搜索、边界探测）

6️⃣ ML06 - 人工智能供应链攻击

Critical

概述

人工智能供应链攻击的目标是人工智能开发管道：来自模型中心的预训练模型、第三方数据集、人工智能框架和依赖关系。恶意模型可能包含隐藏的后门，被攻击的库可能会注入漏洞。ML 框架（Pickle、SavedModel）使用的序列化格式可在加载时执行任意代码。

风险

加载恶意模型文件可执行任意代码（Pickle 反序列化攻击）。来自不受信任来源的预训练模型可能包含后门。被破坏的 ML 库会影响所有下游用户。与传统软件供应链相比，ML 供应链的安全控制较少。

漏洞代码示例

        Python
        ❌ Bad
      

import pickle
import torch

# Loading untrusted model — arbitrary code execution!
with open("model_from_internet.pkl", "rb") as f:
    model = pickle.load(f)  # DANGEROUS: can execute any code!

# Loading unverified PyTorch model
model = torch.load("untrusted_model.pt")  # Uses pickle internally!

# Using unvetted model from public hub
from transformers import AutoModel
model = AutoModel.from_pretrained("random-user/suspicious-model")
# No hash verification, no security scan

安全代码示例

        Python
        ✅ Good
      

import torch
import hashlib
from safetensors.torch import load_file

# Use SafeTensors — no arbitrary code execution
model_state = load_file("model.safetensors")  # Safe format!
model = MyModel()
model.load_state_dict(model_state)

# Verify model hash before loading
EXPECTED_HASH = "sha256:a1b2c3d4..."
with open("model.safetensors", "rb") as f:
    actual_hash = "sha256:" + hashlib.sha256(f.read()).hexdigest()
assert actual_hash == EXPECTED_HASH, "Model integrity check failed!"

# Use trusted models from verified organizations
from transformers import AutoModel
model = AutoModel.from_pretrained(
    "google/bert-base-uncased",  # Verified organization
    revision="a265f77",           # Pin to specific commit
)

缓解措施清单

使用 SafeTensors 或 ONNX 格式代替 Pickle 进行模型序列化
加载前使用加密哈希值验证模型的完整性
只从可信模型中心上经过验证的组织下载模型
扫描 ML 依赖项中的漏洞，并在需求中锁定版本

7️⃣ ML07 - 转移学习攻击

High

概述

迁移学习攻击利用了微调预训练模型的常见做法。嵌入基础模型中的后门会在微调过程中持续存在，并在下游模型中保持活跃。发布流行预训练模型的攻击者可以入侵所有使用该模型作为基础的应用程序。

风险

预训练模型中的后门之所以能在微调中存活，是因为它们嵌入了深度层，而深度层在迁移学习过程中通常会被冻结。一个被破坏的基础模型可以影响成千上万个下游应用程序。这种攻击具有可扩展性，而且难以检测。

漏洞代码示例

        Python
        ❌ Bad
      

from transformers import AutoModelForSequenceClassification

# Fine-tuning an unvetted pre-trained model
model = AutoModelForSequenceClassification.from_pretrained(
    "unknown-user/bert-finetuned-sentiment",  # Untrusted source!
    num_labels=2
)

# Freezing base layers — preserves any hidden backdoor
for param in model.base_model.parameters():
    param.requires_grad = False  # Backdoor in frozen layers persists!

# Fine-tune only the classification head
trainer.train()  # Backdoor remains undetected

安全代码示例

        Python
        ✅ Good
      

from transformers import AutoModelForSequenceClassification
from neural_cleanse import BackdoorDetector

# Use only verified, trusted base models
model = AutoModelForSequenceClassification.from_pretrained(
    "google/bert-base-uncased",  # Trusted source
    num_labels=2,
    revision="main",
)

# Scan pre-trained model for backdoors before fine-tuning
detector = BackdoorDetector(model)
if detector.scan():
    raise SecurityError("Potential backdoor detected in base model")

# Fine-tune ALL layers (not just head) to overwrite potential backdoors
for param in model.parameters():
    param.requires_grad = True  # Train all layers

# Validate with clean test set + trigger test set
trainer.train()
evaluate_for_backdoors(model, trigger_test_set)

缓解措施清单

只使用来自可信、可验证来源（主要实验室、官方软件仓库）的预训练模型
使用 Neural Cleanse 或类似工具扫描预训练模型以查找后门程序
微调所有层（不只是头部），帮助覆盖嵌入的后门程序
根据已知触发模式和异常输入测试微调模型

8️⃣ ML08 - 模型倾斜

Medium

概述

当生产数据分布与训练数据分布存在显著差异时，就会出现模型偏斜（训练服务偏斜）。这种情况可能随着时间的推移自然发生（数据漂移），也可能是攻击者故意造成的，他们操纵生产输入分布来降低模型性能或使预测产生偏差。

风险

模型偏斜会导致无声故障，在这种情况下，模型会产生不正确但有把握的预测。在金融系统中，攻击者可以利用偏斜绕过欺诈检测。在推荐系统中，可以故意诱导偏斜，以推广特定内容或产品。

漏洞代码示例

        Python
        ❌ Bad
      

import joblib

# Deploy model with no drift monitoring
model = joblib.load("model_trained_2023.pkl")

def predict(features):
    # No check if input distribution has changed
    # No feature validation against training schema
    return model.predict([features])[0]
    # Model may be months/years old
    # No monitoring of prediction distribution
    # Silent degradation goes undetected

安全代码示例

        Python
        ✅ Good
      

import joblib
import numpy as np
from scipy import stats
from evidently import ColumnDriftMetric

model = joblib.load("model.pkl")
training_stats = joblib.load("training_stats.pkl")

def predict_with_monitoring(features):
    # Validate feature schema and ranges
    for i, (val, stat) in enumerate(zip(features, training_stats)):
        z_score = abs((val - stat["mean"]) / stat["std"])
        if z_score > 5:
            log.warning(f"Feature {i} out of distribution: z={z_score:.1f}")

    prediction = model.predict([features])[0]

    # Log prediction distribution for drift monitoring
    metrics_collector.log(features, prediction)

    # Periodic drift detection (run by monitoring job)
    # drift_report = ColumnDriftMetric().calculate(reference, current)
    # Alert if drift detected → trigger retraining

    return prediction

缓解措施清单

实施持续的数据漂移监控（Evidently、Whylogs、Great Expectations）
根据训练数据模式和统计范围验证输入特征
当漂移超过阈值时，设置自动警报和再训练触发器
监测预测值随时间的分布情况，以发现无声的模型退化

9️⃣ ML09 - 输出完整性攻击

High

概述

输出完整性攻击是在模型预测离开模型之后、到达消费应用之前对其进行篡改。这包括对预测 API 的中间人攻击、对模型服务基础设施的操纵以及对缓存预测的篡改。攻击的目标是推理管道而非模型本身。

风险

被篡改的预测会导致下游系统做出错误的决定：批准欺诈交易、误诊病情或覆盖安全系统。由于模型本身没有受到破坏，因此标准模型监控无法检测到攻击。

漏洞代码示例

        Python
        ❌ Bad
      

import requests

# Consuming model predictions over unencrypted HTTP
def get_prediction(features):
    response = requests.post(
        "http://ml-service/predict",  # HTTP, not HTTPS!
        json={"features": features}
    )
    result = response.json()
    # No integrity verification of the response
    # No validation of prediction format
    return result["prediction"]  # Could be tampered!

安全代码示例

        Python
        ✅ Good
      

import requests
import hmac
import hashlib

def get_prediction(features):
    response = requests.post(
        "https://ml-service/predict",  # HTTPS (TLS)
        json={"features": features},
        headers={"Authorization": f"Bearer {API_TOKEN}"},
        verify=True  # Verify TLS certificate
    )
    result = response.json()

    # Verify response integrity with HMAC signature
    signature = response.headers.get("X-Signature")
    expected = hmac.new(
        SHARED_SECRET, str(result).encode(), hashlib.sha256
    ).hexdigest()
    if not hmac.compare_digest(signature, expected):
        raise IntegrityError("Response signature mismatch!")

    # Validate prediction is within expected range
    pred = result["prediction"]
    if pred not in VALID_LABELS:
        raise ValueError(f"Unexpected prediction: {pred}")
    return pred

缓解措施清单

使用 TLS/mTLS 进行所有模型推理通信
使用 HMAC 或数字签名签署模型响应，以进行完整性验证
根据预期范围和格式验证预测结果
监测消费者端的预测分布是否异常

🔟 ML10 - 模型中毒

Critical

概述

模型中毒直接修改训练模型的权重、参数或架构，以注入后门或改变行为。与数据中毒（破坏训练数据）不同，模型中毒针对的是模型工件本身--通过受损的模型库、内部威胁或对模型存储和部署管道的供应链攻击。

风险

直接中毒的模型可能包含极具针对性的后门，通过标准测试几乎无法检测到。攻击者可以精确控制模型在触发输入时的行为。如果模型注册表或部署管道遭到破坏，每次部署都会使用中毒模型。

漏洞代码示例

        Python
        ❌ Bad
      

import mlflow

# Loading model from registry with no integrity checks
model_uri = "models:/fraud-detector/Production"
model = mlflow.pyfunc.load_model(model_uri)

# No hash verification
# No signature validation
# No comparison with expected model metrics
# Model registry has weak access controls
# Anyone with push access can replace the model

predictions = model.predict(new_data)

安全代码示例

        Python
        ✅ Good
      

import mlflow
import hashlib
from sigstore.verify import Verifier

# Verify model signature before loading
model_uri = "models:/fraud-detector/Production"
model_path = mlflow.artifacts.download_artifacts(model_uri)

# Cryptographic signature verification
verifier = Verifier.production()
verifier.verify(
    model_path,
    expected_identity="ml-team@company.com"
)

# Verify model hash against approved registry
model_hash = hash_directory(model_path)
approved_hash = get_approved_hash("fraud-detector", "Production")
assert model_hash == approved_hash, "Model integrity check failed!"

# Validate model metrics on reference dataset before serving
model = mlflow.pyfunc.load_model(model_path)
ref_score = evaluate(model, reference_dataset)
assert ref_score >= MINIMUM_ACCURACY, "Model quality below threshold"

predictions = model.predict(new_data)

缓解措施清单

对模型工件进行加密签名，并在部署前验证签名
对模型注册表实施严格的访问控制，并记录审计日志
在推广到生产之前，在参考数据集上验证模型指标
使用不可变模型存储，并维护已批准模型版本的哈希注册表

📊 汇总表

身份证	脆弱性	严重性	关键缓解措施
ML01	输入操纵攻击	Critical	对抗训练、输入验证、置信度阈值
ML02	数据中毒攻击	Critical	离群点检测、数据出处、基线比较
ML03	模型反转攻击	High	差分隐私、最小输出、速率限制
ML04	成员推理攻击	High	正则化、DP 训练、无概率暴露
ML05	盗窃模型	Critical	速率限制、水印、查询模式检测
ML06	人工智能供应链攻击	Critical	安全传感器、哈希验证、仅限可信来源
ML07	迁移学习攻击	High	可信基础模型、后门扫描、全面微调
ML08	模型倾斜	Medium	漂移监测、输入验证、自动再训练
ML09	输出完整性攻击	High	TLS/mTLS、响应签名、输出验证
ML10	模型中毒	Critical	模型签名、注册表访问控制、度量验证

OWASP 机器学习安全 10 强

1️⃣ ML01 - 输入操纵攻击

概述

漏洞代码示例

安全代码示例

缓解措施清单

2️⃣ ML02 - 数据中毒攻击

概述

漏洞代码示例

安全代码示例

缓解措施清单

3️⃣ ML03 - 模型反转攻击

概述

漏洞代码示例

安全代码示例

缓解措施清单

4️⃣ ML04 - 成员推理攻击

概述

漏洞代码示例

安全代码示例

缓解措施清单

5️⃣ ML05 - 型号盗窃

概述

漏洞代码示例

安全代码示例

缓解措施清单

6️⃣ ML06 - 人工智能供应链攻击

概述

漏洞代码示例

安全代码示例

缓解措施清单

7️⃣ ML07 - 转移学习攻击

概述

漏洞代码示例

安全代码示例

缓解措施清单

8️⃣ ML08 - 模型倾斜

概述

漏洞代码示例

安全代码示例

缓解措施清单

9️⃣ ML09 - 输出完整性攻击

概述

漏洞代码示例

安全代码示例

缓解措施清单

🔟 ML10 - 模型中毒

概述

漏洞代码示例

安全代码示例

缓解措施清单

📊 汇总表