hebrew_flash_cards/data
Sochen 3b0f9defa9 feat: YAP-cleaned frequency corpus + two-tier assignment pipeline
- Add clean_frequency_corpus.py: YAP morphological analyzer removes
  prefix+word combos (e.g. בבית=ב+בית) from he_50k frequency data.
  Headwords always protected. 30,430 clean entries from 49,999 raw.
- Add assign_frequency.py: two-tier assignment with PoS-aware homograph
  handling. Tier 1 matches headwords; Tier 2 matches inflections (any rank)
  and conjugations (rank>5000 only, to avoid false positives).
  Function words claim frequency over content words in homograph groups,
  with manual overrides for 12 common dual-use words.
- frequency_lookup.py auto-prefers frequency_clean.json when available
- 6,691 entries now have frequency (was 5,974), 717 newly assigned

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 06:22:55 +00:00
..
fonts feat: Sprint 3 — Heebo font files, image fetch, verb validator scripts 2026-03-03 08:37:08 +00:00
conjugations.json Sprint 9: cloze cards, plurals deck, project reorg, lint tooling 2026-03-07 08:09:39 +00:00
emoji_lookup.json feat: curated emoji denylist, vocab audio URLs in CSV 2026-03-06 12:29:15 +00:00
epub_sentence_index.json Sprint 9: cloze cards, plurals deck, project reorg, lint tooling 2026-03-07 08:09:39 +00:00
examples_cache.json Sprint 9: cloze cards, plurals deck, project reorg, lint tooling 2026-03-07 08:09:39 +00:00
frequency_cache.json feat: add apkg builder, frequency, Ben Yehuda examples, conjugation deck 2026-03-03 01:58:31 +00:00
frequency_clean.json feat: YAP-cleaned frequency corpus + two-tier assignment pipeline 2026-03-10 06:22:55 +00:00
frequency_discarded.json feat: YAP-cleaned frequency corpus + two-tier assignment pipeline 2026-03-10 06:22:55 +00:00
ktiv_male_forms.json v0.14: rescrape vocab, formatting fixes for all decks 2026-03-07 09:26:41 +00:00
legacy_guid_map.json Sprint 9: cloze cards, plurals deck, project reorg, lint tooling 2026-03-07 08:09:39 +00:00
noun_plurals.json v0.14: rescrape vocab, formatting fixes for all decks 2026-03-07 09:26:41 +00:00
noun_slug_map.json Sprint 9: cloze cards, plurals deck, project reorg, lint tooling 2026-03-07 08:09:39 +00:00
refined_meanings.json Sprint 9: cloze cards, plurals deck, project reorg, lint tooling 2026-03-07 08:09:39 +00:00
vetted_sentences.json Sprint 9: cloze cards, plurals deck, project reorg, lint tooling 2026-03-07 08:09:39 +00:00
vocab_sentence_matches.json Sprint 9: cloze cards, plurals deck, project reorg, lint tooling 2026-03-07 08:09:39 +00:00
words.json feat: YAP-cleaned frequency corpus + two-tier assignment pipeline 2026-03-10 06:22:55 +00:00