帮我解释一下对抗样本攻击---**Title:**
对抗样本攻击风险
**Definition:**
对抗样本
在原始样本添加一些人眼无法察觉的扰动, 形成对抗样本, 致使AI模型做出错误的判断 (这种扰动不会影响人类的识别, 但是却很容易愚弄模型)
**Visual Content Description:**
The image displays three panels illustrating the concept of adversarial samples.
Panel 1: An image of a panda. Below it are the labels:
x
"熊猫"
57.7% 的置信度
Between Panel 1 and Panel 2 is the text "+ .007 ×".
Panel 2: An image of a noise pattern. Below it are the labels:
sign (∇ₓJ (θ, x, y))
"线虫"
8.2% 的置信度
Between Panel 2 and Panel 3 is the text "=".
Panel 3: An image of a panda, visually very similar to the first image. Below it are the labels:
x +
εsign (∇ₓJ (θ, x, y))
"长臂猿"
99.3% 的置信度
The visual representation shows that the original image `x` (recognized as "熊猫" with 57.7% confidence by the model) is modified by adding a small perturbation (represented by the noise pattern scaled by .007, which is the formula εsign(∇ₓJ(θ, x, y))). The resulting image `x + εsign(∇ₓJ(θ, x, y))` looks almost identical to the original panda image to a human eye, but the AI model classifies it as "长臂猿" (gibbon) with a very high confidence of 99.3%. The label for the noise pattern includes a mathematical formula `sign (∇ₓJ (θ, x, y))`, where `∇ₓJ` likely represents the gradient of the cost function `J` with respect to the input `x`, `θ` represents the model parameters, and `y` represents the target label. The `sign` function takes the sign of the gradient. The combined formula `x + εsign (∇ₓJ (θ, x, y))` represents the adversarial sample generated by adding this signed gradient scaled by a small value ε (here shown as .007) to the original image `x`.
**Impact:**
影响: 对抗样本严重
威胁模型的鲁棒性,
导致错误决策、数据
泄露等严重后果