Remove strip_nikkud from all pipeline files — use ktiv_male directly.
Fix case-insensitive binyan matching in detail scraper (og:description
uses UPPERCASE). Fix integration test slugs and test limits. Delete
legacy CSVs, stale .apkg, and dead scripts from git. Add vulture to
pre-commit hook.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Item 1/2: Extract emoji and Hebrew parentheticals (prepositions) from
Meaning field; display emoji with 3.5em font, prep inline after Hebrew
word. Add Emoji and Prep fields to Hebrew Flash Cards model.
- Item 3: Seeded RNG per verb reduces conjugation cards by ~630 (4 present
forms → 1 pronoun each; past_3p → 1 gender). 1st-person forms gain gender
label (זכר/נקבה). Total: 1,834 conj cards (was ~2,464).
- Item 4: hebrew_extract.py uses BeautifulSoup to capture data-audio URLs
from pealim.com list pages during scraping. step_audio() reads audio_url
column from CSV (no longer needs audio_extract.py).
- Item 5: Rename to 'Hebrew Flash Cards'. New filenames: hebrew_dict.csv,
hebrew_extract.py, hebrew_vocabulary.apkg, hebrew_conjugations.apkg.
Deck/model names updated throughout. Forgejo repo rename pending (sochen
lacks admin rights — Nevo must do via UI).
- Fix: Deduplicate entries with same Hebrew word before adding notes
(eliminates GUID collisions from duplicate source CSV rows).
- Bump RELEASE_TAG to v0.11.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Type annotations: dict|None defaults, return types, nested func annotations
- Dead code: removed unused row_forms_with_audio(), duplicate _strip_nikkud defs,
redundant guards, duplicate 'ism' in ABSTRACT_SUFFIXES
- Exceptions: narrowed bare except to (ValueError, pd.errors.ParserError) and
(json.JSONDecodeError, OSError) throughout; all raise ValueError given messages
- Deduplication: extracted deduplicate() helper in _parse_table; setdefault() for
dict building in benyehuda and apkg_builder; list comprehension in benyehuda
- Correctness: limit=0 guard fixed (is not None); audio tag parsing uses
removeprefix/removesuffix instead of magic offsets; vectorized pandas sum
- Constants: BINYAN_NAMES extracted; unicodedata imports moved to top level
- benyehuda load(): removed wasted cache read on force_rebuild; word-boundary
regex simplified from double-negative to \w
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- data/fonts/: Heebo variable font TTF (Regular + Bold) for bundling in .apkg
- image_fetch.py: Wikipedia/Commons image fetch for concrete nouns
- validate_verb_list.py: pealim.com validator for verb input list
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>