schema: add difficulty_score field + update spec with MIN_WORDS=3
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
8b24d0fd26
commit
14d567a261
2 changed files with 3 additions and 0 deletions
|
|
@ -69,6 +69,7 @@ entry:
|
||||||
cloze_word_end: 4 # End offset — enables exact extraction regardless of nikkud changes
|
cloze_word_end: 4 # End offset — enables exact extraction regardless of nikkud changes
|
||||||
cloze_hint: "family member"
|
cloze_hint: "family member"
|
||||||
cloze_guid: "def456..." # GUID for the cloze note
|
cloze_guid: "def456..." # GUID for the cloze note
|
||||||
|
difficulty_score: 234 # Median frequency rank of context words (lower = easier); optional
|
||||||
rejected_count: 0
|
rejected_count: 0
|
||||||
|
|
||||||
# --- Noun-specific: Inflection Forms ---
|
# --- Noun-specific: Inflection Forms ---
|
||||||
|
|
|
||||||
|
|
@ -54,6 +54,8 @@ def _score(s: dict) -> tuple[int,]:
|
||||||
|
|
||||||
New scoring replaces length with frequency-based difficulty. The `_score` function gains access to the frequency pipeline via closure over the nikkud_map, nikkud_index, and freq_data built once at the start of `update_words_json()`.
|
New scoring replaces length with frequency-based difficulty. The `_score` function gains access to the frequency pipeline via closure over the nikkud_map, nikkud_index, and freq_data built once at the start of `update_words_json()`.
|
||||||
|
|
||||||
|
**Minimum sentence length:** Reduced from 4 words to 3 words (`MIN_WORDS = 3` in epub_examples.py). Hebrew is more concise than English — 3-word sentences are valid and common. This expands the candidate pool for cloze selection.
|
||||||
|
|
||||||
**Behavioral change:** Because `pool.sort(key=_score)` determines which 3 sentences are selected as `best = pool[:3]`, changing the scoring function changes **which sentences are selected**, not just their order. This is intentional — we want the easiest sentences as cloze candidates, not the closest-to-9-words ones. Existing cloze GUIDs will be preserved when the same sentence text is re-selected; entries where a different sentence wins will get new GUIDs.
|
**Behavioral change:** Because `pool.sort(key=_score)` determines which 3 sentences are selected as `best = pool[:3]`, changing the scoring function changes **which sentences are selected**, not just their order. This is intentional — we want the easiest sentences as cloze candidates, not the closest-to-9-words ones. Existing cloze GUIDs will be preserved when the same sentence text is re-selected; entries where a different sentence wins will get new GUIDs.
|
||||||
|
|
||||||
## Data Model Changes
|
## Data Model Changes
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue