Subproblem — Fix
𝑅
R, Solve for
𝐴
A (Sparse Sentence Selection)
LaTeX Formula
min
𝐴
∥
𝐷
−
𝑅
𝐴
∥
𝐹
2
+
𝜆
1
∥
𝐴
∥
2
,
1
A
min
∥D−RA∥
F
2
+λ
1
∥A∥
2,1
Real-World Analogy (Keyword Summarizer)
A student already has keywords for each sentence. Now, she must choose which of those keyword-tagged sentences best represent the whole topic — picking only a few key ones that capture the essence.
Exact Use Case (Paper-Style)
Use: Select a sparse set of sentences when word-level compressions are known.
Why: To optimize sentence selection based on pre-computed compressed sentence representations.
Problem 6: Subproblem — Fix
𝐴
A, Solve for
𝑅
R (Word-Level Compression)
LaTeX Formula
min
𝑅
∥
𝐷
−
𝑅
𝐴
∥
𝐹
2
+
𝜆
2
∑
𝑖
∥
𝑅
𝑖
∥
1
R
min
∥D−RA∥
F
2
+λ
2
i
∑
∥R
i
∥
1
Real-World Analogy (Haiku Poet)
A poet has already picked her favorite lines. Now, she rewrites each using the least number of powerful words — trimming unnecessary words like a haiku master.
Exact Use Case (Paper-Style)
Use: Perform word-level compression of selected sentences by encouraging sparsity in word usage via ℓ₁ regularization.
Why: To compress sentences without losing meaning, making summaries more concise and readable.explain clear visually
视频信息
答案文本
视频字幕
Text summarization can be elegantly formulated as a matrix factorization problem. We decompose the document-term matrix D into two components: matrix R which handles word-level compression, and matrix A which performs sentence selection. This factorization D equals R times A provides the mathematical foundation for our two-step optimization approach.
In the first subproblem, we fix matrix R and solve for A. This is like a keyword summarizer where a student already has keywords for each sentence and must choose which keyword-tagged sentences best represent the whole topic. The optimization formula minimizes reconstruction error while the mixed norm regularization encourages sparsity at the sentence level, effectively selecting only the most representative sentences.
The mixed norm regularization uses the ℓ₂,₁ norm, which is the sum of ℓ₂ norms of each row. This encourages entire rows to become zero rather than individual elements, effectively eliminating sentences. The parameter λ₁ controls the sparsity level: higher values select fewer sentences, while lower values retain more sentences. This row-wise sparsity is crucial for sentence selection in text summarization.
In the second subproblem, we fix matrix A and solve for R. This is like a haiku poet who has already picked her favorite lines and now rewrites each using the least number of powerful words. The ℓ₁ regularization promotes sparsity in word usage, effectively compressing sentences by eliminating unnecessary words while preserving meaning. The parameter λ₂ controls the compression level.
The complete algorithm uses alternating optimization, switching between fixing R to solve for A, and fixing A to solve for R. We start with random initialization, then iteratively optimize each matrix while keeping the other fixed. The objective function decreases with each iteration until convergence. This process simultaneously performs sentence selection and word compression, creating progressively better summaries through multiple iterations until the optimal balance is achieved.