词汇掌握度继承机制

问题背景

用户收藏了 "apple"，遇到 "apples" 时希望系统能判断为"已掌握"（规则变形继承），但 "went" 不应继承自 "go"（不规则变形需独立学习）。

需求：

批量查询300+单词性能高效（~10ms）
避免冗余存储
区分规则/不规则变形

核心方案

存储策略

变形类型	示例	存储方式	理由
规则变形	apple → apples	存lemma的ID (100)	语法规则，无需额外记忆
不规则变形	go → went	存自己的ID (201)	需要单独记忆

决策依据：words.irregular 字段

数据示例

words表：

id  | term    | lemma  | irregular
----|---------|--------|----------
100 | apple   | null   | false
101 | apples  | apple  | false     -- 规则
201 | went    | go     | true      -- 不规则
202 | gone    | go     | true      -- 不规则

learned_words表（优化：存term而非word_id）：

user_id | term   | language | 说明
--------|--------|----------|-----
1       | apple  | en       | 收藏apples → 存"apple"
1       | went   | en       | 收藏went → 存"went" ✅
1       | gone   | en       | 收藏gone → 存"gone" ✅

优势：

✅ 批量查询减少到 2次SQL（原来3次）
✅ 无需联表查询 word_id
✅ 数据更直观（直接看到单词）
⚠️ 需要复合索引 (user_id, term, language)

实现逻辑

1. 收藏时：根据irregular决定

// 伪代码（优化版：存term）
func CollectWord(word) {
    var termToStore string
    
    if word.Irregular {
        termToStore = word.Term  // went → "went"
    } else if word.TermType != LEMMA {
        termToStore = word.Lemma  // apples → "apple"
    } else {
        termToStore = word.Term   // apple → "apple"
    }
    
    // 存储到learned_words
    Save(userID, termToStore, language)
}

决策树：

收藏一个词 →
  ├─ irregular == true  → 存储自己的term
  ├─ term_type == LEMMA → 存储自己的term
  └─ 否则（规则变形）   → 存储lemma的term

2. 批量查询：3次查询完成

2. 批量查询：优化到2次查询 ⚡

// 伪代码（优化版）
func BatchCheckMastery(words []string, language string, userID int64) {
    // 1️⃣ 批量查words表（获取irregular/lemma信息）
    wordInfos = BatchLookup(words, language)
    // 返回: map[term]WordInfo
    
    // 2️⃣ 构建要查询learned_words的term集合
    termsToCheck := []string{}
    termMapping := map[string]string{}  // 原词 → 要查的词
    
    for _, word in wordInfos {
        var termToCheck string
        if word.Irregular {
            termToCheck = word.Term      // went → "went"
        } else if word.TermType != LEMMA {
            termToCheck = word.Lemma     // apples → "apple"
        } else {
            termToCheck = word.Term      // apple → "apple"
        }
        termsToCheck.add(termToCheck)
        termMapping[word.Term] = termToCheck
    }
    
    // 3️⃣ 批量查learned_words（直接用term查询！）
    learned = SELECT * FROM learned_words 
              WHERE user_id=? AND language=? AND term IN (termsToCheck)
    // 返回: map[term]LearnedWord
    
    // 4️⃣ 组装结果
    for _, originalWord in words {
        termToCheck = termMapping[originalWord]
        if learned[termToCheck] exists {
            is_inherited = (wordInfos[originalWord].Irregular==false && 
                           wordInfos[originalWord].TermType!=LEMMA)
            return {is_learned:true, is_inherited, mastery}
        }
    }
}

SQL查询（仅2次！）：

-- 1. 批量查词信息（包含lemma/irregular）
SELECT term, lemma, irregular, term_type 
FROM words 
WHERE language='en' AND term IN ('apple','apples','went',...);  -- 300行

-- 2. 批量查掌握度（直接用term查！）
SELECT term, mastery_* 
FROM learned_words 
WHERE user_id=1 AND language='en' AND term IN ('apple','went',...);  -- 50行

性能提升：

原方案：3次SQL (~10ms)
优化方案：2次SQL (~7ms) ⚡
减少30%查询时间！

继承规则

类型	示例	irregular	继承策略
规则变形	apple → apples	false	100%继承
不规则变形	go → went	true	独立学习
不同词性	run(v.) vs run(n.)	-	不继承

方案优势

优势	说明
零数据库变更	只需修改learned_words表结构（word_id → term）
零数据冗余	规则变形存lemma term，不规则存自己
超高性能	300词批量查询~7ms（仅2次SQL！）
简单实现	无需word_id转换，直接term匹配
数据直观	可读性强，调试方便

Schema变更

-- 旧表结构
CREATE TABLE learned_words (
  user_id BIGINT,
  word_id BIGINT,  -- ❌ 移除
  ...
  PRIMARY KEY (user_id, word_id)
);

-- 新表结构（优化）
CREATE TABLE learned_words (
  user_id BIGINT,
  term VARCHAR(100),      -- ✅ 存储term字符串
  language VARCHAR(10),   -- ✅ 必须，支持多语言
  ...
  PRIMARY KEY (user_id, term, language),
  INDEX idx_user_lang (user_id, language)
);

注意事项：

需要 language 字段区分不同语言的同形词（如 "chat" 英语vs法语）
索引必须包含 (user_id, language, term) 三元组
数据迁移：从 word_id 反查 term 填充

总结

核心指标（优化后）：

批量查询300词：~7ms ⚡（原10ms）
数据库查询次数：2次（减少33%）
Schema修改：learned_words表（word_id → term + language）
存储冗余：0

关键逻辑：

收藏流程：检查irregular → 规则存lemma term / 不规则存自己term
查询流程：批量查words(获取lemma) → 批量查learned_words(直接term匹配)

优化效果对比：

方案	SQL次数	耗时	Schema变更
原方案(word_id)	3次	~10ms	无
优化方案(term)	2次	~7ms	word_id→term

推荐：✅ 采用优化方案，性能提升30%，代码更简洁

问题背景​

核心方案​

存储策略​

数据示例​

实现逻辑​

1. 收藏时：根据irregular决定​

1. 收藏时：根据irregular决定​

2. 批量查询：3次查询完成​

2. 批量查询：优化到2次查询 ⚡​

继承规则​

继承规则​

方案优势​

方案优势​

Schema变更​

总结​

问题背景

核心方案

存储策略

数据示例

实现逻辑

1. 收藏时：根据irregular决定

1. 收藏时：根据irregular决定

2. 批量查询：3次查询完成

2. 批量查询：优化到2次查询 ⚡

继承规则

继承规则

方案优势

方案优势

Schema变更

总结