Create a 4-minute educational video that teaches how we measure pair-wise non-linearity across 12 conditions drug-fitness phenotypes (66 pairs).
The video must be broken into six chapters, each with clear on-screen titles, concise voice-over text, and animated visuals.
Use a friendly but academically precise tone and plenty of LaTeX math where indicated.
Ch 1 Problem Setup
• Show a 3 × 4 grid of labelled drug conditions.
• Explain that 12 conditions ⇒ 66 unique pairs.
• Pose the central question: “Is the relationship between each pair linear or non-linear?”
Ch 2 Linear vs GAM models
• Display a scatter plot for one hypothetical pair.
• Draw the best-fit straight line (OLS) and write the equation
𝑦
=
𝛽
0
+
𝛽
1
𝑥
y=β
0
+β
1
x.
• Overlay a smooth spline (GAM) and show
𝑦
=
𝛼
+
𝑓
(
𝑥
)
y=α+f(x) with
𝑓
(
𝑥
)
=
∑
𝑏
𝑘
𝐵
𝑘
(
𝑥
)
f(x)=∑b
k
B
k
(x).
• Briefly define “smoothing penalty”
𝜆
∫
(
𝑓
′
′
(
𝑥
)
)
2
𝑑
𝑥
λ∫(f
′′
(x))
2
dx and “effective d.f.”
𝑘
=
t
r
a
c
e
(
𝑆
)
k=trace(S).
Ch 3 AIC Formula & Intuition
• Write the general form
A
I
C
=
2
𝑘
−
2
ℓ
AIC=2k−2ℓ.
• Define log-likelihood
ℓ
=
−
𝑡
𝑓
𝑟
𝑎
𝑐
𝑛
2
ln
(
2
𝜋
𝜎
2
)
−
𝑡
𝑓
𝑟
𝑎
𝑐
1
2
𝜎
2
𝑠
𝑢
𝑚
(
𝑦
𝑖
−
𝑦
^
𝑖
)
2
ℓ=−
tfracn2ln(2πσ
2
)−
tfrac12σ
2
sum(y
i
−
y
^
i
)
2
.
• Explain verbally: “AIC rewards fit (higher ℓ) but penalises complexity (larger k). Lower is better.”
Ch 4 Worked Mini-Example
• Use
𝑛
=
100
n=100 points with a made-up
𝜎
2
=
0.02
σ
2
=0.02.
• Compute and display step-by-step:
•
ℓ
lin
=
−
120
⇒
A
I
C
lin
=
244
ℓ
lin
=−120⇒AIC
lin
=244 (k=2)
•
ℓ
GAM
=
−
100
⇒
A
I
C
GAM
=
212
ℓ
GAM
=−100⇒AIC
GAM
=212 (k=4.3)
• Present
Δ
A
I
C
=
−
32
ΔAIC=−32.
• Intuition caption: “Even after paying for extra wiggles, the spline is far more plausible.”
Ch 5 Decision Rule & Threshold
• Visual number line with a dashed vertical at
Δ
A
I
C
=
−
2
ΔAIC=−2.
• Label left zone “non-linear wins”, right zone “linear wins”.
• Explain evidence ratio
𝑒
−
Δ
/
2
e
−Δ/2
and why –2 ≈ 3× likelihood.
Ch 6 Scaling to 66 Pairs (50 s)
• Animate looping through all 66 pairs; for each, flash its
Δ
A
I
C
ΔAIC.
• Build a histogram in real time.
• Highlight that 43/66 bars fall left of –2 ⇒ 65 % nonlinear.
• Conclude with a take-home slide: “High pleiotropy and epistasis demand non-linear models.”
Technical specs
• Use subtle pastel colour-scheme, smooth camera pans, and readable fonts (MathJax for LaTeX).
视频信息
答案文本
视频字幕
We're studying how different drug combinations affect fitness. Here, we see twelve distinct conditions, representing various drug concentrations or combinations. From these twelve conditions, we can form sixty-six unique pairs. For each pair, we want to understand the relationship between the fitness observed under one condition and the fitness under the other. Specifically, is this relationship linear, or is it non-linear?
To answer this, we can model the relationship. A simple approach is a linear model, fitting a straight line to the data, like y equals beta zero plus beta one x. A more flexible approach is a Generalized Additive Model, or GAM, which uses a smooth spline function, y equals alpha plus f of x, where f of x is a sum of basis functions. GAMs can capture non-linear patterns. The smoothness of the spline is controlled by a penalty term, and the model's complexity is measured by its effective degrees of freedom, k.
How do we choose between a simple linear model and a more complex GAM? We use the Akaike Information Criterion, or AIC. The formula is AIC equals two k minus two ell. Here, k represents the model's complexity, and ell is the log-likelihood, which measures how well the model fits the data. Intuitively, AIC balances these two aspects: it rewards models that fit the data well, meaning higher log-likelihood, but penalizes models that are overly complex, meaning larger k. A lower AIC value indicates a more plausible model.
Let's walk through a quick example. Suppose we have one hundred data points and estimate the variance to be zero point zero two. We fit a linear model, which has k equals two parameters, and get a log-likelihood of negative one hundred twenty. Its AIC is two times two minus two times negative one hundred twenty, which is two hundred forty-four. Now, we fit a GAM, which might have an effective k of four point three, and get a better log-likelihood of negative one hundred. Its AIC is two times four point three minus two times negative one hundred, which is about two hundred twelve. The difference, delta AIC, is two hundred twelve minus two hundred forty-four, giving us negative thirty-two. This large negative value tells us the GAM is significantly better. Even though the GAM is more complex, its much better fit outweighs the complexity penalty.
We use the delta AIC value to decide. Delta AIC is the AIC of the GAM minus the AIC of the linear model. If delta AIC is negative, the GAM is better; if positive, the linear model is better. A common threshold for strong evidence of non-linearity is delta AIC less than negative two. This threshold is based on the evidence ratio, e to the negative delta over two, which compares the likelihood of the two models. A delta AIC of negative two means the GAM is roughly three times more likely than the linear model, providing substantial support for non-linearity.