hebrew_flash_cards/scripts
Sochen c85063ee2f Sprint 15: example sentence pipeline overhaul + corpus expansion + card improvements
- Regenerated all example sentences from scratch (deleted legacy + stale entries)
- Added .txt file support to epub_examples.py for Ben Yehuda corpus
- 7 Ben Yehuda nikkud'd children's texts + 3 new Time Tunnel EPUBs
- Maqaf-stripped construct form indexing (+68% inflected matches)
- Total: 3,598 words with examples, 3,289 with cloze (was ~2,900)
- Cloze prefix preservation (_cloze_prefix_len)
- Hebrew spoiler stripping from English meanings
- Gender field (זָכָר/נְקֵבָה) on vocab cards
- sec-table CSS layout for aligned key:value pairs
- Mishkal uses mishkal_hebrew on plural cards
- Improved mishkal extraction from pealim detail pages
- 21 new pytest tests (cloze, PoS, Hebrew stripping, gender, mishkal)
- 2 new validate_data.py tests + mishkal stats
- Colliding forms tracking (local-only)
- Release tag v0.17

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 10:44:14 +00:00
..
assign_frequency.py feat: YAP-cleaned frequency corpus + two-tier assignment pipeline 2026-03-10 06:22:55 +00:00
check_guid_coverage.py Sprint 11: unified JSON architecture + consolidated scraping pipeline 2026-03-08 10:54:58 +00:00
clean_frequency_corpus.py feat: YAP-cleaned frequency corpus + two-tier assignment pipeline 2026-03-10 06:22:55 +00:00
extract_verb_list.py Sprint 9: cloze cards, plurals deck, project reorg, lint tooling 2026-03-07 08:09:39 +00:00
validate_data.py Sprint 15: example sentence pipeline overhaul + corpus expansion + card improvements 2026-03-10 10:44:14 +00:00