讲解题目,并教会我如何理解记忆---**Overall Title:** 计算和程序复习 (Calculation and Program Review) **Topic:** ① 语言模型——n元(n-gram) 模型 (Language Model - n-gram Model) **Example (Unigram Model):** * **Heading:** ▶示例 (一元模型) (Example (Unigram Model)) * **Sentence S:** 有意见分歧 * **Segmentation W1:** 有 / 意见 / 分歧 / * **Segmentation W2:** 有意 / 见 / 分歧 / **Formula:** P(w_i) = w_i在语料库中的出现次数n / 语料库中总词数N (P(w_i) = Number of occurrences of w_i in the corpus n / Total number of words in the corpus N) **Probability Table:** | 词语 (Word) | 概率 (Probability) | | :---------- | :----------------- | | ... | ... | | 有 | 0.0180 | | 有意 | 0.0005 | | 意见 | 0.0010 | | 见 | 0.0002 | | 分歧 | 0.0001 | | ... | ... | **Diagram Description:** * **Type:** Directed graph. * **Nodes:** Labeled 0, 1, 2, 3, 4, 5. Arranged roughly horizontally in increasing order. * **Edges:** Directed arrows connecting nodes. * 0 -> 1 labeled "有" * 1 -> 2 labeled "意" * 2 -> 3 labeled "见" * 3 -> 4 labeled "分" * 4 -> 5 labeled "歧" * 0 -> 2 labeled "有意" (dashed arrow) * 2 -> 4 labeled "分歧" (dashed arrow) * **Interpretation:** The nodes represent positions in the sequence, and edges represent words or multi-word units (grams). Paths through the graph represent different possible segmentations of a sentence. **Paths and Question:** * **路径 1 (Path 1):** 0-1-3-5 * **路径 2 (Path 2):** 0-2-3-5 * **Question:** 该走哪一条路呢? (Which path should be taken?) **Bigram Model Example and Calculations:** * **Heading:** 二元模型: (Bigram Model:) * **Formula Example:** P(W1)=P(有) * P(意见 | 有) * P(分歧 | 意见) (This formula is a general bigram probability calculation for W1 = "有", "意见", "分歧") * **Calculated Probabilities (likely for W1 and W2 segmentations):** * P(W1) = P(有) × P(意见) × P(分歧) = 1.8 × 10⁻⁹ (Note: The calculation given uses multiplication, which would correspond to a unigram model assumption P(A,B,C)=P(A)P(B)P(C) or the provided values are results from a bigram calculation.) * P(W2) = P(有意) × P(见) × P(分歧) = 1 × 10⁻¹¹ (Note: Same observation as above regarding the calculation format.) * **Probability Comparison:** P(W1) > P(W2) **Note:** No options (A, B, C, D) are present in the image. The question "该走哪一条路呢?" refers to the paths 1 and 2 listed. Based on the probability comparison P(W1) > P(W2), and the inference that Path 1 corresponds to W1 and Path 2 corresponds to W2, the answer implied by the provided calculations is Path 1. **Title:** 计算和程序复习 (Calculation and Program Review) **Section Heading:** ① 语言模型——n元(n-gram) 模型 (Language Model - n-gram Model) **Mathematical Formula:** p(wi | wi-1) = count(wi-1 wi) / count(wi-1) **Problem Description:** 假设语句序列为s={小孩, 喜欢, 在家, 观看, 动画片}, 估计这一语句的概率。以二元语法模型为例, 需要检索语料库中每一个词以及和相邻词同时出现的概率。假设语料库中总词数7542, 单词出现的次数如下图所示。 (Assume the sentence sequence is s={小孩, 喜欢, 在家, 观看, 动画片}, estimate the probability of this sentence. Taking the bigram language model as an example, we need to retrieve the probability of each word and its adjacent word appearing simultaneously in the corpus. Assume the total word count in the corpus is 7542, the number of times each word appears is shown in the figure below.) **Diagram Description:** Type: A sequence of five partially overlapping circles representing words, with counts inside and counts of adjacent word pairs labeled above the connections. Elements: * Circle 1: Label "小孩", count 500. * Circle 2: Label "喜欢", count 3208. * Circle 3: Label "在家", count 987. * Circle 4: Label "观看", count 801. * Circle 5: Label "动画片", count 2046. * Label between Circle 1 and 2: 351 (representing count(小孩 喜欢)) * Label between Circle 2 and 3: 873 (representing count(喜欢 在家)) * Label between Circle 3 and 4: 792 (representing count(在家 观看)) * Label between Circle 4 and 5: 170 (representing count(观看 动画片)) **Probability Calculation Description:** 语句 s 在当前语料库中出现的概率的计算过程如下式所示。 (The process of calculating the probability of sentence s appearing in the current corpus is shown in the following formula.) **Probability Formula:** p(s) = p(小孩, 喜欢, 在家, 观看, 动画片) = p(小孩)p(喜欢 | 小孩)p(在家 | 喜欢)p(观看 | 在家)p(动画片 | 观看) **Numerical Calculation:** = (500 / 7542) * (351 / 500) * (873 / 3208) * (792 / 987) * (170 / 801) ≈ 0.2122347 **Concluding Sentence:** 语句 s 在当前语料库下出现的概率约为0.2122347。 (The probability of sentence s appearing under the current corpus is approximately 0.2122347.)

视频信息