帮我解释一下对抗样本攻击---Title: 对抗样本攻击风险 Definition: 对抗样本在原始样本添加一些人眼无法察觉的扰动, 形成对抗样本, 致使AI模型做出错误的判断 (这种扰动不会影响人类的识别, 但是却很容易愚弄模型) Visual Content Description: The image displays three panels illustrating the concept of adversarial samples. Panel 1: An image of a panda. Below it are the labels: x "熊猫" 57.7% 的置信度 Between Panel 1 and Panel 2 is the text "+ .007 ×". Panel 2: An image of a noise pattern. Below it are the labels: sign (∇ₓJ (θ, x, y)) "线虫" 8.2% 的置信度 Between Panel 2 and Panel 3 is the text "=". Panel 3: An image of a panda, visually very similar to the first image. Below it are the labels: x + εsign (∇ₓJ (θ, x, y)) "长臂猿" 99.3% 的置信度 The visual representation shows that the original image `x` (recognized as "熊猫" with 57.7% confidence by the model) is modified by adding a small perturbation (represented by the noise pattern scaled by .007, which is the formula εsign(∇ₓJ(θ, x, y))). The resulting image `x + εsign(∇ₓJ(θ, x, y))` looks almost identical to the original panda image to a human eye, but the AI model classifies it as "长臂猿" (gibbon) with a very high confidence of 99.3%. The label for the noise pattern includes a mathematical formula `sign (∇ₓJ (θ, x, y))`, where `∇ₓJ` likely represents the gradient of the cost function `J` with respect to the input `x`, `θ` represents the model parameters, and `y` represents the target label. The `sign` function takes the sign of the gradient. The combined formula `x + εsign (∇ₓJ (θ, x, y))` represents the adversarial sample generated by adding this signed gradient scaled by a small value ε (here shown as .007) to the original image `x`. Impact: 影响: 对抗样本严重威胁模型的鲁棒性, 导致错误决策、数据泄露等严重后果

视频地址

封面地址

Provider