Anki Flash Cards for Learning Hebrew Vocabulary and Conjugations!
Find a file
Sochen b086123bec feat: add apkg builder, frequency, Ben Yehuda examples, conjugation deck
Implements four major improvements to the Pealim Anki deck pipeline:

1. Automated .apkg generation (genanki) — no more manual Anki Desktop step.
   Both vocabulary and conjugation decks are built programmatically.

2. Word frequency ranking from hermitdave/FrequencyWords he_50k corpus.
   Notes sorted by rank so Anki presents most common words first.

3. Example sentences from Ben Yehuda public domain corpus (not pealim.com).
   Downloads txt_stripped.zip, indexes 25k texts, ~89% coverage on test set.

4. Conjugation drill deck — one card per form × verb.
   Input: verbs_input.txt (Hebrew infinitives). Initial set: 7 verbs (one
   per binyan). Extracts 28 forms each via pealim.com/search/ + table parse.

New files:
  apkg_builder.py     — genanki deck builder for both decks
  benyehuda.py        — Ben Yehuda corpus downloader + sentence indexer
  frequency_lookup.py — FrequencyWords downloader + rank lookup
  verbs_input.txt     — verb input list (7 test verbs, one per binyan)
  data/               — baseline CSVs + generated caches

Updated:
  conjugation_extract.py — rewritten: reads verbs_input.txt, searches
                           /search/?q= for slug, parses table by row labels
  requirements.txt       — add genanki, beautifulsoup4, lxml
  run.py                 — full orchestration pipeline with CLI flags
  .gitignore             — exclude venv/, benyehuda_index.json, audio/, output/

CLI:
  python run.py --skip-scrape --skip-audio --test 20  (quick test)
  python run.py --skip-scrape                          (full build)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-03 01:58:31 +00:00
data feat: add apkg builder, frequency, Ben Yehuda examples, conjugation deck 2026-03-03 01:58:31 +00:00
.gitignore feat: add apkg builder, frequency, Ben Yehuda examples, conjugation deck 2026-03-03 01:58:31 +00:00
apkg_builder.py feat: add apkg builder, frequency, Ben Yehuda examples, conjugation deck 2026-03-03 01:58:31 +00:00
benyehuda.py feat: add apkg builder, frequency, Ben Yehuda examples, conjugation deck 2026-03-03 01:58:31 +00:00
conjugation_extract.py feat: add apkg builder, frequency, Ben Yehuda examples, conjugation deck 2026-03-03 01:58:31 +00:00
flashcard.png added a pic 2024-06-08 21:24:41 -07:00
frequency_lookup.py feat: add apkg builder, frequency, Ben Yehuda examples, conjugation deck 2026-03-03 01:58:31 +00:00
pealim.apkg Initial commit 2024-06-08 21:15:20 -07:00
pealim_dict.csv Initial commit 2024-06-08 21:15:20 -07:00
pealim_dict_for_anki.csv Initial commit 2024-06-08 21:15:20 -07:00
pealim_extract.py Improve scraper robustness and Hebrew text handling 2026-02-26 21:57:20 +00:00
README.md Improve scraper robustness and Hebrew text handling 2026-02-26 21:57:20 +00:00
requirements.txt feat: add apkg builder, frequency, Ben Yehuda examples, conjugation deck 2026-03-03 01:58:31 +00:00
run.py feat: add apkg builder, frequency, Ben Yehuda examples, conjugation deck 2026-03-03 01:58:31 +00:00
test_scrape.py Improve scraper robustness and Hebrew text handling 2026-02-26 21:57:20 +00:00
verbs_input.txt feat: add apkg builder, frequency, Ben Yehuda examples, conjugation deck 2026-03-03 01:58:31 +00:00

Pealim — Hebrew Vocabulary Scraper & Anki Deck Generator

Extract Hebrew vocabulary from pealim.com and automatically generate Anki flashcards with roots, parts of speech, and related words.

Features

  • Dictionary Scraping — Extracts ~14,400 Hebrew words with roots and parts of speech
  • Anki-Ready — Generates flashcards with Hebrew tags and shared-root grouping
  • Conjugation Tables — Extracts verb conjugation forms for reference
  • Respectful — Built-in delays and connection pooling
  • Robust — Retry logic, error handling, and detailed logging

Installation

pip install -r requirements.txt

Usage

Extract Everything

python3 run.py

Dictionary Only

python3 pealim_extract.py

Conjugations Only

python3 conjugation_extract.py

Output Files

  • pealim_dict.csv — Raw dictionary (Word, Root, Part of Speech, Word Without Nikkud)
  • pealim_dict_for_anki.csv — Anki-formatted (adds shared roots and Hebrew tags)
  • conjugations.csv — Verb conjugation forms
  • pealim.apkg — Ready-to-import Anki deck

Configuration

Edit constants at the top of each script:

  • REQUEST_DELAY — Seconds between requests (default: 1.5)
  • REQUEST_TIMEOUT — Network timeout (default: 10s)
  • max_pages — Limit extraction for testing

Performance

  • Full dictionary: ~10-15 minutes (608 pages × 2 requests/page + delays)
  • ~14,400 words extracted
  • ~960KB CSV output

Data Structure

pealim_dict_for_anki.csv

Column Example
Word שמור
Root שמר
Part of Speech Verb
Word Without Nikkud שמור
shared roots שומר שמירה
tags שורש::שמר פעלים

conjugations.csv

Columns: present_ms, present_fs, past_1s, future_1s, infinitive, etc.

Notes

  • Respects pealim.com's server with configurable delays
  • Uses session pooling for efficiency
  • Handles network errors gracefully with retries
  • All logging output goes to stdout + log file

License

Personal use. Hebrew learning tool.