- Add clean_frequency_corpus.py: YAP morphological analyzer removes
prefix+word combos (e.g. בבית=ב+בית) from he_50k frequency data.
Headwords always protected. 30,430 clean entries from 49,999 raw.
- Add assign_frequency.py: two-tier assignment with PoS-aware homograph
handling. Tier 1 matches headwords; Tier 2 matches inflections (any rank)
and conjugations (rank>5000 only, to avoid false positives).
Function words claim frequency over content words in homograph groups,
with manual overrides for 12 common dual-use words.
- frequency_lookup.py auto-prefers frequency_clean.json when available
- 6,691 entries now have frequency (was 5,974), 717 newly assigned
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>