fix: correct word/verb counts in README, add missing .gitignore entries

- README: ~14,400 → ~9,100 words (actual scrape count)
- README: 71 → 69 verbs (current verb list; 2 short of Coffin & Bolozky — to investigate)
- .gitignore: add data/audio_conj/, data/image_cache.json, data/images/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Sochen 2026-03-04 06:32:44 +00:00
parent d26e4c8ce5
commit 58dc1b8d9b
2 changed files with 16 additions and 11 deletions

7
.gitignore vendored
View file

@ -15,8 +15,13 @@ __pycache__/
# Large generated cache files (rebuild locally)
data/benyehuda_index.json
# Audio directory (large; rebuild with --skip-scrape)
# Audio directories (large; rebuild locally)
data/audio/
data/audio_conj/
# Image cache and downloads (rebuild with image_fetch.py)
data/image_cache.json
data/images/
# Output .apkg files (generated by pipeline)
output/

View file

@ -8,8 +8,8 @@
This project generates two Anki decks for learning Modern Hebrew:
- **Vocabulary deck** — ~14,400 words from [pealim.com](https://www.pealim.com/dict/), with nikkud (vowel marks), roots, parts of speech, related words, example sentences from classic Hebrew literature, and audio pronunciation.
- **Conjugation deck**71 paradigm verbs from Coffin & Bolozky's *A Reference Grammar of Modern Hebrew* (2005), fully conjugated in all tenses and persons, across all seven binyanim.
- **Vocabulary deck** — ~9,100 words from [pealim.com](https://www.pealim.com/dict/), with nikkud (vowel marks), roots, parts of speech, related words, example sentences from classic Hebrew literature, and audio pronunciation.
- **Conjugation deck**69 paradigm verbs from Coffin & Bolozky's *A Reference Grammar of Modern Hebrew* (2005), fully conjugated in all tenses and persons, across all seven binyanim.
All card data comes from open or academic sources:
- Word data: [pealim.com](https://www.pealim.com) — a free Modern Hebrew dictionary
@ -56,7 +56,7 @@ Cards are presented in **random order** within Anki's spaced-repetition system,
## What's in the conjugation deck
71 paradigm verbs from Coffin & Bolozky's *A Reference Grammar of Modern Hebrew* (Appendix 1), covering all seven binyanim:
69 paradigm verbs from Coffin & Bolozky's *A Reference Grammar of Modern Hebrew* (Appendix 1), covering all seven binyanim:
- פָּעַל (Pa'al), נִפְעַל (Nif'al), פִּעֵל (Pi'el), פֻּעַל (Pu'al)
- הִתְפַּעֵל (Hitpa'el), הִפְעִיל (Hif'il), הֻפְעַל (Huf'al)
@ -76,9 +76,9 @@ Each verb is drilled in: present, past, future, imperative, infinitive — all p
## Suggested study strategy
Start with the vocabulary deck. Anki will present the most frequent words first. Aim for 1020 new cards per day.
Start with the vocabulary deck. Anki will present the most frequent words first. Don't try to study to many cards every single day-- Anki suggests 20 per day.
Once you have ~300500 vocabulary words, add the conjugation deck. The conjugation cards reinforce verb forms you've already seen in vocabulary.
The conjugation cards reinforce verb forms you've already seen in vocabulary.
Use the Hebrew → English direction to build reading comprehension. Use the English → Hebrew direction to build writing and speaking recall.
@ -86,7 +86,7 @@ Use the Hebrew → English direction to build reading comprehension. Use the Eng
## About the data sources
**pealim.com** — A comprehensive free Modern Hebrew dictionary with nikkud, roots, conjugations, and audio. This project scrapes the public dictionary listing (not conjugation tables, which are covered separately).
**pealim.com** — A comprehensive free Modern Hebrew dictionary with nikkud, roots, conjugations, and audio. This project scrapes the public dictionary and conjugation tables.
**Project Ben-Yehuda** — A public-domain digital library of Hebrew literature. Example sentences come from the nikkud corpus (classic texts with full vowel marks).
@ -100,9 +100,9 @@ Use the Hebrew → English direction to build reading comprehension. Use the Eng
If you notice a wrong translation, missing audio, or incorrect conjugation:
- For vocabulary errors: the source is pealim.com — you can suggest corrections there.
- For conjugation errors: open an issue in this repository with the verb and the correct form.
- For example sentence issues: open an issue with the word and sentence.
- For vocabulary errors: the source is pealim.com — you can suggest corrections there. But if you think morfix has a correct translation and pealim.com does not, we may be able to encode an override.
For any other issue, whether you know to code or not: Email me at pealim [at] nevo [dot] engineer
---
@ -177,4 +177,4 @@ python run.py [options]
## AnkiWeb
The generated decks will be published on AnkiWeb. See `ANKIWEB_DESCRIPTION.md` for the submission content.
The decks will be published as shared decks on AnkiWeb (TBD).