Commit graph

60 commits

Author SHA1 Message Date
ca7ca74a39 feat: Sprint 3 — Heebo font files, image fetch, verb validator scripts
- data/fonts/: Heebo variable font TTF (Regular + Bold) for bundling in .apkg
- image_fetch.py: Wikipedia/Commons image fetch for concrete nouns
- validate_verb_list.py: pealim.com validator for verb input list

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-03 08:37:08 +00:00
b018f21b1d feat: Sprint 2 + Sprint 3 — verb list, audio, passive forms, CSS/UX, validation, Heebo font, images
Sprint 2:
- extract_verb_list.py (NEW): downloads Coffin & Bolozky PDF, extracts
  71-verb paradigm list from Appendix 1 with hardcoded fallback.
  Pu'al/Huf'al use '# 3ms:' prefix for 3ms search.
- conjugation_extract.py: audio URL capture per form, passive forms
  parsing (Pu'al/Huf'al partner tables), 3ms search support.
- benyehuda.py: nikkud corpus (txt.zip), index by nikkud word form,
  single best example (longest ≤200 chars), --refresh-examples rebuild.
- apkg_builder.py: Hebrew labels, centered dark Hebrew text, freq-badge,
  related words grouped by PoS. Conjugation: Voice/Audio fields,
  present-tense 12-card expansion, 2fp/3fp modern fallback with
  classical in parens, פָּעִיל/סָבִיל voice labels.
- README.md: rewritten — learner-first structure, data sources.
- run.py: --refresh-examples flag, conjugation audio download (step 4b).
- data/conjugations.json: rebuilt with 70 verbs, audio URLs, passive
  partner data.

Sprint 3:
- validate_verb_list.py (NEW): queries pealim.com for all entries in
  verb input list, classifies as OK/3ms/REVIEW/NOT_FOUND, writes
  cleaned verbs_input.txt. Results: 51 OK, 15 3ms-past, 4 REVIEW.
- apkg_builder.py: binyan in Hebrew (BINYAN_TO_HEBREW map) on its own
  line; remove "דוגמה:" label; "Other" related-words shown unlabeled;
  "50k+" freq display for unlisted words; Image field in VOCAB_MODEL.
- image_fetch.py (NEW): Wikipedia/Commons thumbnails for concrete nouns,
  caches in data/image_cache.json, downloads to data/images/.
- Heebo variable font TTF bundled in both .apkg files via @font-face.
- run.py: step_fonts(), step_images(), --skip-images flag.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-03 08:36:51 +00:00
b086123bec feat: add apkg builder, frequency, Ben Yehuda examples, conjugation deck
Implements four major improvements to the Pealim Anki deck pipeline:

1. Automated .apkg generation (genanki) — no more manual Anki Desktop step.
   Both vocabulary and conjugation decks are built programmatically.

2. Word frequency ranking from hermitdave/FrequencyWords he_50k corpus.
   Notes sorted by rank so Anki presents most common words first.

3. Example sentences from Ben Yehuda public domain corpus (not pealim.com).
   Downloads txt_stripped.zip, indexes 25k texts, ~89% coverage on test set.

4. Conjugation drill deck — one card per form × verb.
   Input: verbs_input.txt (Hebrew infinitives). Initial set: 7 verbs (one
   per binyan). Extracts 28 forms each via pealim.com/search/ + table parse.

New files:
  apkg_builder.py     — genanki deck builder for both decks
  benyehuda.py        — Ben Yehuda corpus downloader + sentence indexer
  frequency_lookup.py — FrequencyWords downloader + rank lookup
  verbs_input.txt     — verb input list (7 test verbs, one per binyan)
  data/               — baseline CSVs + generated caches

Updated:
  conjugation_extract.py — rewritten: reads verbs_input.txt, searches
                           /search/?q= for slug, parses table by row labels
  requirements.txt       — add genanki, beautifulsoup4, lxml
  run.py                 — full orchestration pipeline with CLI flags
  .gitignore             — exclude venv/, benyehuda_index.json, audio/, output/

CLI:
  python run.py --skip-scrape --skip-audio --test 20  (quick test)
  python run.py --skip-scrape                          (full build)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-03 01:58:31 +00:00
e23b353064 Improve scraper robustness and Hebrew text handling 2026-02-26 21:57:20 +00:00
158f0477a3 added extraction of verb conjugations 2025-07-21 01:43:47 -07:00
b9be01e4c6 Update README.md 2024-06-08 21:27:04 -07:00
fd4b65d54f Update README.md 2024-06-08 21:25:44 -07:00
fbd94d58c8 added a pic 2024-06-08 21:24:41 -07:00
bf7dfcd7a1 Update README.md 2024-06-08 21:23:33 -07:00
db2a1bcb03 Initial commit 2024-06-08 21:15:20 -07:00