hebrew_flash_cards

Author	SHA1	Message	Date
Sochen	6d2d446ed5	feat: pseudo-frequency for confusables using English word frequency 264 confusable groups where all entries shared the same Hebrew frequency now have differentiated pseudo_frequency values based on English word commonality (hermitdave en_50k.txt). Most common meaning keeps base rank; less common meanings get +100 offset per position. Examples: - אב: "father" (en:194) → 2491, "bud" (en:2963) → 2591 - אח: "brother" (en:300) → 911, "fireplace" (en:9389) → 1011 Builder uses pseudo_frequency for sort order when available. Confusable card definitions now sorted most-common-first. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-03 05:28:30 +00:00
Sochen	f978e5f39a	fix: vet fallback emoji — verb gate + expanded stop list removes 852 bad matches The fallback emoji system (keyword→Unicode char matching at build time) was producing 1,733 matches, many with wrong-sense emoji: - "high, tall" → ⚡ (from "high voltage") - "to cut" → 🥩 (cut of meat) - "city" → 🇻🇦 (Vatican flag) Two fixes: 1. Skip fallback for verbs (meanings starting "to ") — 476 removed 2. Expand _EMOJI_STOP with 100+ polysemous/abstract keywords — 376 more Result: 1733 → 881 fallback matches (49% reduction). The 114 from_pealim emojis (concrete nouns like 🍎 apple, 🐪 camel) are unaffected. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-03 05:17:31 +00:00
Sochen	f3496998f5	feat: confusables show ktiv male, emoji/prep stripping fully upstream - Confusables deck front now shows shared ktiv male form instead of nikkud variants joined by "/". Back still shows nikkud with definitions. - Fixed list scraper EMOJI_RE to catch variation selectors (U+FE0F) and ZWJ (U+200D) — cleaned 17 entries with leftover selectors in meaning. - Removed build-time prep extraction fallback (0 entries relied on it). - release.py: fix keeshare field name (API_TOKEN → password). Closes: Pealim #11 (emoji/prep upstream), Pealim #16 (confusables ktiv male) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-21 02:19:03 +00:00
Sochen	138acb06d8	bump RELEASE_TAG to v0.20 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-15 14:09:19 +00:00
Sochen	af186e2030	Sprint 17: homograph example dedup + plural audio + prep extraction - Homograph collision fix: _deduplicate_confusable_examples() clears shared examples from less-common confusable group members (36 entries fixed). Keeps examples only on highest-frequency meaning. - Plural deck audio: wired up PluralAudio field in apkg_builder.py, downloaded 613 plural audio files from pealim.com for all deck entries. - Prep extraction upstream: moved Hebrew preposition parsing from build time into list/detail scrapers (SCHEMA.yaml prep field added). - Validation: new no_shared_confusable_examples check in validate_data.py - Tests: 9 new unit tests for confusable deduplication (98 total) - Release: v0.19 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-14 21:51:35 +00:00
Sochen	0d92451271	Sprint 16: collapsible card details + related words table - All secondary fields (shoresh, PoS, ktiv male, plural, related words) behind a "מידע נוסף" toggle button using HTML <details>/<summary> - Conjugation back: English meaning, binyan also behind toggle - Related words: table format with word + meaning, sorted by frequency - Hebrew words not bold, English meanings 24px gray (#555) - "מִילִים קְשׁוּרוֹת" sub-header with nikkud inside toggle - "אֵיךְ אוֹמְרִים" prompt centered using hint class - New CSS: .more-toggle, .more-header, .related-header, .rw-word, .rw-meaning - Dark mode support for all new classes - Bump to v0.18 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-11 01:34:14 +00:00
Sochen	c85063ee2f	Sprint 15: example sentence pipeline overhaul + corpus expansion + card improvements - Regenerated all example sentences from scratch (deleted legacy + stale entries) - Added .txt file support to epub_examples.py for Ben Yehuda corpus - 7 Ben Yehuda nikkud'd children's texts + 3 new Time Tunnel EPUBs - Maqaf-stripped construct form indexing (+68% inflected matches) - Total: 3,598 words with examples, 3,289 with cloze (was ~2,900) - Cloze prefix preservation (_cloze_prefix_len) - Hebrew spoiler stripping from English meanings - Gender field (זָכָר/נְקֵבָה) on vocab cards - sec-table CSS layout for aligned key:value pairs - Mishkal uses mishkal_hebrew on plural cards - Improved mishkal extraction from pealim detail pages - 21 new pytest tests (cloze, PoS, Hebrew stripping, gender, mishkal) - 2 new validate_data.py tests + mishkal stats - Colliding forms tracking (local-only) - Release tag v0.17 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 10:44:14 +00:00
Sochen	efd0745ada	Sprint 14: deck template/CSS overhaul + Sprint 12 detail scrape Template & CSS fixes (15 items from Mar 9 feedback): - Fix conjugation front showing 3ms form instead of infinitive - Rename conjugation model to "Hebrew Conjugation" - Strip Hebrew parenthesized text from English meanings - Shoresh separator: spaces → dots (א.כ.ל) - Remove duplicate English meaning from cloze back - Remove example sentences from vocab front/back (cloze only) - Center-align audio buttons on all decks - Fix parenthesis spacing: "you(feminine,singular)" → "you (feminine, singular)" - Unify sec-key/sec-label fonts, make keys bold - Size overhaul: bigger Hebrew (42px), meaning (34px), secondary (28px) - Center-align related words groups - Sort confusables by average frequency - Plurals: show Gender (Hebrew) before Mishkal, strip emoji from meaning - Clean duplicate quotation marks in cloze sentences Sprint 12 carry-forward (detail scrape + EPUB): - Adjective/preposition detail scraping in pealim_detail_scrape.py - EPUB example matching rewrite in epub_examples.py - Delete benyehuda.py and rebuild_sentence_matches.py (merged) - 49 parser tests for detail scraping - SCHEMA.yaml updates for new fields Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 07:44:47 +00:00
Sochen	04a4b52113	fix: deduplicate 66 plural GUIDs for homograph nouns Homographs (same nikkud form, different meanings) had identical plurals_guid values. Regenerated unique GUIDs by including meaning in the hash. Also updated build-time fallback to use meaning. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-09 04:12:45 +00:00
Sochen	f6af714e22	bump RELEASE_TAG to v0.15.1 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-09 04:08:35 +00:00
Sochen	b2fef5aa8a	Sprint 11.1: strip_nikkud cleanup, dead code removal, test fixes Remove strip_nikkud from all pipeline files — use ktiv_male directly. Fix case-insensitive binyan matching in detail scraper (og:description uses UPPERCASE). Fix integration test slugs and test limits. Delete legacy CSVs, stale .apkg, and dead scripts from git. Add vulture to pre-commit hook. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-09 04:03:47 +00:00
Sochen	08fb7009d8	Sprint 11: unified JSON architecture + consolidated scraping pipeline Migrate from fragmented CSV + 10 JSON files to a single data/words.json (9,104 entries) as the unified data store. All GUIDs preserved for Anki study progress continuity. New files: - SCHEMA.yaml: authoritative schema for words.json - pealim_list_scrape.py: consolidated list page scraper → words.json - pealim_detail_scrape.py: noun/verb detail scraper → words.json - pealim_audio_download.py: audio downloader reading from words.json - scripts/migrate_to_json.py: one-time CSV→JSON migration - scripts/validate_data.py: 17 data integrity tests - scripts/check_guid_coverage.py: GUID preservation checker - scripts/repair_slugs.py: slug deduplication repair tool - tests/test_scraper_integration.py: live scraper integration tests Updated: - apkg_builder.py: reads from words.json (no more pandas) - run.py: 8-step pipeline (list scrape → frequency → examples → detail scrape → audio download → fonts → images → build) - benyehuda.py, frequency_lookup.py, image_fetch.py: TODO markers for future words.json integration Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-08 10:54:58 +00:00
Sochen	2e48109d7f	v0.15: PoS fix, slug-based audio, CSS cleanup, template improvements - Fix PoS substring bug: "Pronoun" no longer matches "Noun" - CSS: reduce sec-label/sec-key font sizes, add .definitions/.conf-entry - Slug-based audio filenames for confusable words (no more collisions) - Scraper captures slug from pealim.com list page links - Confusables: RTL alignment, re-enable audio (remove all-must-have gate) - Plurals: blue given word, gray meaning, labeled mishkal badge - Conjugation: add "אֵיךְ אוֹמְרִים" prompt, tense prefix (בְּ), Prep field from HBPAREN_RE, labeled RelatedVocab - Ben Yehuda: skip stripped fallback for confusable words - Bump RELEASE_TAG to v0.15 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 17:50:23 +00:00
Sochen	802c369365	v0.14: rescrape vocab, formatting fixes for all decks - Full pealim.com rescrape: 9,120 words (15 new), all with audio URLs - Plurals deck: 2:1 regular:irregular ratio (649 notes), RTL arrows, 1.6x hint text - Conjugation deck: blue infinitive on front, plain meaning on back, nikkud labels - Confusables deck: larger prompt text (32px), audio only when all words have it - Validator: non-audio variants no longer false-fail on audio check - 14 new audio files downloaded Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 09:26:41 +00:00
Sochen	def2fc1aca	fix: card formatting, example sentence homograph protection, plural coverage Formatting (#5): - Labels now display with nikkud (שֹׁרֶשׁ, חֵלֶק דִּיבּוּר, רַבִּים, etc.) - Secondary fields below audio 1.6x bigger (20px → 32px) - Label keys styled separately (.sec-key class, smaller/dimmer than values) - Example sentences centered on card (margin: auto, max-width: 90%) - Emoji only on English side (removed duplicate from Eng→Heb back) - Broken images hidden via onerror handler Example sentences (#6): - Confusable words (same consonants, different nikkud) now only match example sentences by exact nikkud form, preventing wrong-word sentences - Same protection applied to cloze sentence and vetted sentence lookups Plural coverage (#3): - Added stripped-nikkud fallback for noun plural matching - 3,918 nouns now show plurals (was ~3,604, +314 from fallback) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 08:45:53 +00:00
Sochen	17f7458d19	Sprint 9: cloze cards, plurals deck, project reorg, lint tooling - Cloze card pipeline: 924 cards from 2,296 AI-vetted Hebrew book sentences - Plurals deck: 375 notes (144 irregular + 231 regular from 86 mishkal patterns) - Ktiv male forms expanded to 20,711 entries for sentence matching - Project reorg: helpers.py (deduped strip_nikkud from 10 files), scripts/ for one-off tools, tests/ with smoke tests, deleted 3 dead files - Lint tooling: pyproject.toml with ruff/vulture/bandit/pytest config, .editorconfig, fixed all 129 ruff errors (B023 closure fix, SIM103, unused vars) - validate_apkg.py: card count range check for optional cloze template - Data caches committed: vetted_sentences, ktiv_male_forms, noun_plurals, noun_slug_map, vocab_sentence_matches, epub_sentence_index Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 08:09:39 +00:00
Sochen	419e952389	feat: curated emoji denylist, vocab audio URLs in CSV - Expanded _EMOJI_STOP from ~20 to ~80 keywords after manual review of all 2,261 emoji-word pairs. Removes false positives from polysemous words (french→🍟, water→🤽, rock→🪨, etc.) - Emoji count: 2,261 → 1,820 (removed ~440 bad matches) - hebrew_dict.csv now populated with audio_url from pealim.com scrape (8,727 words with audio URLs) - Cached emoji_lookup.json (1,749 keywords from Unicode emoji-test.txt) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-06 12:29:15 +00:00
Sochen	607fd1a3bc	feat: emoji Unicode lookup, conj nikkud, fix summary metric - Emoji: _load_emoji_lookup() fetches unicode.org emoji-test.txt, builds {keyword: emoji_char} map cached in data/emoji_lookup.json. Falls back to empty dict on network failure. build_all_variants() loads once and passes to all build_vocab_deck() calls. For each word without pealim emoji, tries first 5 keywords from English meaning against lookup. - Nikkud: זכר→זָכָר, נקבה→נְקֵבָה in PRESENT_EXPANSION constants and build_conj_deck() 1st-person gender labels. - Summary: conj audio file count now excludes _infinitive and _passive_ on-disk extras never bundled in .apkg (was 2235, now shows ~1765). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-05 21:24:10 +00:00
Sochen	ccd7d61efb	Add 6-variant release build (4 vocab + 2 conj), bump to v0.12 - build_vocab_deck(): include_audio/include_images flags - build_conj_deck(): include_audio flag - build_all_variants(): builds all 6 apkg files in one call - Variants: hebrew_vocabulary{,_audio,_images,_audio_images}.apkg hebrew_conjugations{,_audio}.apkg - run.py: step_build_all() replaces step_build_vocab(); conjugation extraction reuses cached conjugations.json unless refreshed - RELEASE_TAG bumped to v0.12 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-05 20:58:06 +00:00
Sochen	62c92ffae0	Emoji/image mutual exclusion, font size increases, layout - Emoji and image are now mutually exclusive: emoji shown if present, image used as fallback ({{^Emoji}}{{#Image}}...{{/Image}}{{/Emoji}}) - Emoji shown on English card front (under meaning) — both card directions - Emoji appears directly under meaning on backs, before secondary info - sec-label: 16px → 20px; root-info/example: 16px → 18px; related-group: 15px → 18px - hebrew-sm font-weight:normal (prep label no longer inherits bold) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-05 20:51:20 +00:00
Sochen	0e4b041331	Fix emoji ordering and prep font weight - Emoji now shown above image on both back templates (was below) - Emoji also shown on English→Hebrew card front (visual cue with meaning) - hebrew-sm: add font-weight:normal (was inheriting bold from .hebrew parent) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-05 06:09:55 +00:00
Sochen	64a1b18951	Sprint 7: emoji/prep extraction, conjugation reduction, project rename - Item 1/2: Extract emoji and Hebrew parentheticals (prepositions) from Meaning field; display emoji with 3.5em font, prep inline after Hebrew word. Add Emoji and Prep fields to Hebrew Flash Cards model. - Item 3: Seeded RNG per verb reduces conjugation cards by ~630 (4 present forms → 1 pronoun each; past_3p → 1 gender). 1st-person forms gain gender label (זכר/נקבה). Total: 1,834 conj cards (was ~2,464). - Item 4: hebrew_extract.py uses BeautifulSoup to capture data-audio URLs from pealim.com list pages during scraping. step_audio() reads audio_url column from CSV (no longer needs audio_extract.py). - Item 5: Rename to 'Hebrew Flash Cards'. New filenames: hebrew_dict.csv, hebrew_extract.py, hebrew_vocabulary.apkg, hebrew_conjugations.apkg. Deck/model names updated throughout. Forgejo repo rename pending (sochen lacks admin rights — Nevo must do via UI). - Fix: Deduplicate entries with same Hebrew word before adding notes (eliminates GUID collisions from duplicate source CSV rows). - Bump RELEASE_TAG to v0.11. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-05 05:49:51 +00:00
Sochen	e66020628f	Fix: use stable GUIDs for Anki note matching on reimport genanki's default GUID is computed from ALL fields, so adding audio to a previously-empty Audio field changes the GUID — Anki can't match the old note and skips the update. Fix: explicitly set GUID from identity-only fields: - Conjugation notes: guid_for(infinitive, pronoun, tense) - Vocabulary notes: guid_for(word) [Hebrew word with nikkud] With stable GUIDs, reimporting a rebuilt deck correctly updates existing notes (audio, tags, corrected fields) without breaking study progress. NOTE: users who imported a previous release will see new notes on first reimport (old GUID → new GUID mismatch). They can delete the old untagged notes via Browse → tag:v0.10 missing → delete. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-05 05:21:45 +00:00
Sochen	4fcc5cff60	Sprint 6: release tagging, conjugation front swap, validate_apkg.py - Add RELEASE_TAG="v0.10" constant; tag all notes (vocab + conj) so users can identify which release their cards came from via Anki Browse - Swap conjugation card front: Pronoun now above Infinitive for easier recall - Add validate_apkg.py: comprehensive .apkg integrity checker covering ZIP structure, media manifest, audio format, DB schema, card counts, sound refs, and field content; runs on both decks - Configure Forgejo v0.10 release with conjugation .apkg as downloadable asset - Update releases/pealim_conjugations.apkg with tagged notes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-05 05:09:45 +00:00
Sochen	bb79725a7f	Python code cleanup (python-pro review) - Type annotations: dict\|None defaults, return types, nested func annotations - Dead code: removed unused row_forms_with_audio(), duplicate _strip_nikkud defs, redundant guards, duplicate 'ism' in ABSTRACT_SUFFIXES - Exceptions: narrowed bare except to (ValueError, pd.errors.ParserError) and (json.JSONDecodeError, OSError) throughout; all raise ValueError given messages - Deduplication: extracted deduplicate() helper in _parse_table; setdefault() for dict building in benyehuda and apkg_builder; list comprehension in benyehuda - Correctness: limit=0 guard fixed (is not None); audio tag parsing uses removeprefix/removesuffix instead of magic offsets; vectorized pandas sum - Constants: BINYAN_NAMES extracted; unicodedata imports moved to top level - benyehuda load(): removed wasted cache read on force_rebuild; word-boundary regex simplified from double-negative to \w Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-04 07:33:14 +00:00
Sochen	6cd42b1e12	Sprint 5: dark mode CSS, alternate conjugation forms, README releases link fix - add @media (prefers-color-scheme: dark) block to CARD_CSS covering all hardcoded colors - _parse_table: add table_el param to parse a specific table directly - _extract_conjugations: detect second active conjugation table; store alternate_forms - build_conj_deck: show "primary / alternate" when alternate form exists for a key - README: fix dead ../../releases link → git.nevo.engineer/nevo/pealim/releases Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-04 07:07:28 +00:00
Sochen	0db11b1aa1	Sprint 4: fix insertion order, skip infinitive cards, split past_3p, fix empty binyan - vocab deck uses frequency insertion order (genanki.Package); conjugation deck random (_RandomOrderPackage) - skip infinitive form_key in conjugation deck build (reference only, not a quiz target) - PAST_3P_EXPANSION: split past_3p into separate הֵם and הֵן cards - SECTION_BINYAN parsing: read section headers from verbs_input.txt as binyan hints - add binyan_hint param to _extract_conjugations and _extract_passive_from_active_slug - patch 20 cached entries with empty binyan (Pa'al, Nif'al) using section hints - result: 2428 notes across 69 verbs, all with populated binyan Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-04 06:42:32 +00:00
Sochen	d26e4c8ce5	feat: Sprint 3 — passive/active separation, random card order, card UX fixes Conjugation extraction: - Active entries now extract active forms only (no auto passive partner) - Passive (# 3ms:) entries extract passive section only via new _extract_passive_from_active_slug(); search-based fallback also uses this path so no active forms leak into passive entries - # slug: VERB SLUG override syntax for search-ambiguous active verbs - # 3ms: FORM ACTIVE-SLUG syntax for passive entries with known active page - Fixed verb spellings: בוטל (was בותל), slug overrides for תואם → 2344-letaem, זוכה → 503-lezakot, לָשִׂים → 45-lasim, העבר → 1442-lehaavir Card UX: - Passive card front: shows active partner infinitive (e.g. לְבַטֵּל) with (סָבִיל) inline in smaller font instead of bare 3ms past form - Removed פָּעִיל label from active cards; only passive cards carry voice label - New cards introduced in random order (new.order=0 via _RandomOrderPackage) - Frequency badge: words outside top 50k show "50k+" instead of blank README: updated CLI options, output files table, pipeline list, card descriptions to reflect Sprint 3 state Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-03 10:16:50 +00:00
Sochen	b018f21b1d	feat: Sprint 2 + Sprint 3 — verb list, audio, passive forms, CSS/UX, validation, Heebo font, images Sprint 2: - extract_verb_list.py (NEW): downloads Coffin & Bolozky PDF, extracts 71-verb paradigm list from Appendix 1 with hardcoded fallback. Pu'al/Huf'al use '# 3ms:' prefix for 3ms search. - conjugation_extract.py: audio URL capture per form, passive forms parsing (Pu'al/Huf'al partner tables), 3ms search support. - benyehuda.py: nikkud corpus (txt.zip), index by nikkud word form, single best example (longest ≤200 chars), --refresh-examples rebuild. - apkg_builder.py: Hebrew labels, centered dark Hebrew text, freq-badge, related words grouped by PoS. Conjugation: Voice/Audio fields, present-tense 12-card expansion, 2fp/3fp modern fallback with classical in parens, פָּעִיל/סָבִיל voice labels. - README.md: rewritten — learner-first structure, data sources. - run.py: --refresh-examples flag, conjugation audio download (step 4b). - data/conjugations.json: rebuilt with 70 verbs, audio URLs, passive partner data. Sprint 3: - validate_verb_list.py (NEW): queries pealim.com for all entries in verb input list, classifies as OK/3ms/REVIEW/NOT_FOUND, writes cleaned verbs_input.txt. Results: 51 OK, 15 3ms-past, 4 REVIEW. - apkg_builder.py: binyan in Hebrew (BINYAN_TO_HEBREW map) on its own line; remove "דוגמה:" label; "Other" related-words shown unlabeled; "50k+" freq display for unlisted words; Image field in VOCAB_MODEL. - image_fetch.py (NEW): Wikipedia/Commons thumbnails for concrete nouns, caches in data/image_cache.json, downloads to data/images/. - Heebo variable font TTF bundled in both .apkg files via @font-face. - run.py: step_fonts(), step_images(), --skip-images flag. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-03 08:36:51 +00:00
Sochen	b086123bec	feat: add apkg builder, frequency, Ben Yehuda examples, conjugation deck Implements four major improvements to the Pealim Anki deck pipeline: 1. Automated .apkg generation (genanki) — no more manual Anki Desktop step. Both vocabulary and conjugation decks are built programmatically. 2. Word frequency ranking from hermitdave/FrequencyWords he_50k corpus. Notes sorted by rank so Anki presents most common words first. 3. Example sentences from Ben Yehuda public domain corpus (not pealim.com). Downloads txt_stripped.zip, indexes 25k texts, ~89% coverage on test set. 4. Conjugation drill deck — one card per form × verb. Input: verbs_input.txt (Hebrew infinitives). Initial set: 7 verbs (one per binyan). Extracts 28 forms each via pealim.com/search/ + table parse. New files: apkg_builder.py — genanki deck builder for both decks benyehuda.py — Ben Yehuda corpus downloader + sentence indexer frequency_lookup.py — FrequencyWords downloader + rank lookup verbs_input.txt — verb input list (7 test verbs, one per binyan) data/ — baseline CSVs + generated caches Updated: conjugation_extract.py — rewritten: reads verbs_input.txt, searches /search/?q= for slug, parses table by row labels requirements.txt — add genanki, beautifulsoup4, lxml run.py — full orchestration pipeline with CLI flags .gitignore — exclude venv/, benyehuda_index.json, audio/, output/ CLI: python run.py --skip-scrape --skip-audio --test 20 (quick test) python run.py --skip-scrape (full build) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-03 01:58:31 +00:00

30 commits