hebrew_flash_cards

nevo/hebrew_flash_cards

Fork 0

Commit graph

Author	SHA1	Message	Date
Sochen	419e952389	feat: curated emoji denylist, vocab audio URLs in CSV - Expanded _EMOJI_STOP from ~20 to ~80 keywords after manual review of all 2,261 emoji-word pairs. Removes false positives from polysemous words (french→🍟, water→🤽, rock→🪨, etc.) - Emoji count: 2,261 → 1,820 (removed ~440 bad matches) - hebrew_dict.csv now populated with audio_url from pealim.com scrape (8,727 words with audio URLs) - Cached emoji_lookup.json (1,749 keywords from Unicode emoji-test.txt) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-06 12:29:15 +00:00
Sochen	64a1b18951	Sprint 7: emoji/prep extraction, conjugation reduction, project rename - Item 1/2: Extract emoji and Hebrew parentheticals (prepositions) from Meaning field; display emoji with 3.5em font, prep inline after Hebrew word. Add Emoji and Prep fields to Hebrew Flash Cards model. - Item 3: Seeded RNG per verb reduces conjugation cards by ~630 (4 present forms → 1 pronoun each; past_3p → 1 gender). 1st-person forms gain gender label (זכר/נקבה). Total: 1,834 conj cards (was ~2,464). - Item 4: hebrew_extract.py uses BeautifulSoup to capture data-audio URLs from pealim.com list pages during scraping. step_audio() reads audio_url column from CSV (no longer needs audio_extract.py). - Item 5: Rename to 'Hebrew Flash Cards'. New filenames: hebrew_dict.csv, hebrew_extract.py, hebrew_vocabulary.apkg, hebrew_conjugations.apkg. Deck/model names updated throughout. Forgejo repo rename pending (sochen lacks admin rights — Nevo must do via UI). - Fix: Deduplicate entries with same Hebrew word before adding notes (eliminates GUID collisions from duplicate source CSV rows). - Bump RELEASE_TAG to v0.11. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-05 05:49:51 +00:00

Author

SHA1

Message

Date

Sochen

419e952389

feat: curated emoji denylist, vocab audio URLs in CSV

- Expanded _EMOJI_STOP from ~20 to ~80 keywords after manual review
  of all 2,261 emoji-word pairs. Removes false positives from
  polysemous words (french→🍟, water→🤽, rock→🪨, etc.)
- Emoji count: 2,261 → 1,820 (removed ~440 bad matches)
- hebrew_dict.csv now populated with audio_url from pealim.com scrape
  (8,727 words with audio URLs)
- Cached emoji_lookup.json (1,749 keywords from Unicode emoji-test.txt)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-06 12:29:15 +00:00

Sochen

64a1b18951

Sprint 7: emoji/prep extraction, conjugation reduction, project rename

- Item 1/2: Extract emoji and Hebrew parentheticals (prepositions) from
  Meaning field; display emoji with 3.5em font, prep inline after Hebrew
  word. Add Emoji and Prep fields to Hebrew Flash Cards model.
- Item 3: Seeded RNG per verb reduces conjugation cards by ~630 (4 present
  forms → 1 pronoun each; past_3p → 1 gender). 1st-person forms gain gender
  label (זכר/נקבה). Total: 1,834 conj cards (was ~2,464).
- Item 4: hebrew_extract.py uses BeautifulSoup to capture data-audio URLs
  from pealim.com list pages during scraping. step_audio() reads audio_url
  column from CSV (no longer needs audio_extract.py).
- Item 5: Rename to 'Hebrew Flash Cards'. New filenames: hebrew_dict.csv,
  hebrew_extract.py, hebrew_vocabulary.apkg, hebrew_conjugations.apkg.
  Deck/model names updated throughout. Forgejo repo rename pending (sochen
  lacks admin rights — Nevo must do via UI).
- Fix: Deduplicate entries with same Hebrew word before adding notes
  (eliminates GUID collisions from duplicate source CSV rows).
- Bump RELEASE_TAG to v0.11.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-03-05 05:49:51 +00:00

2 commits