AnkiBatcher | Christian Adleta

AnkiBatcher

Transform quiz files into Anki flashcard decks automatically

What It Does

Parse multiple question formats: true/false, single choice, multiple choice, and cloze deletions
Generate Anki-compatible import files with 5 different note type formats
Automatically remove duplicate questions before export
Look up definitions via Google search and populate flashcard answers automatically
Support 4 processing modes including web scraping for terminology questions
Handle complex multiple choice with proper Anki encoding (binary flag system)
Interactive CLI that guides you through file selection and output options

Why I Built This

Creating flashcards from study materials is essential for spaced repetition learning, but manually entering quiz questions into Anki is tedious and error-prone. I often had quiz questions in text format from learning management systems, but converting them required hours of copy-paste work and careful formatting to match Anki's import syntax.

The challenge was building a parser that could handle the variability in quiz formats while maintaining accuracy. Multiple-choice questions were especially tricky—I needed to encode answer options in Anki's pipe-delimited format with binary correct/incorrect flags (like |1 0 0 0| for answer A). I wanted a tool that would let me drop in a quiz file and get a ready-to-import Anki deck in seconds.

How It Works

AnkiBatcher is built in C# targeting .NET 5.0, structured as a two-project solution: AnkiBatcher.Core (parsing and web scraping logic) and AnkiBatcher (CLI interface). The parser uses stateful line-by-line processing with string markers like "Question text" and "Select one:" as anchors, accumulating question data until it encounters the next question boundary.

For the Definition Researcher mode, I integrated HtmlAgilityPack to scrape Google search results, with 3-second throttling between requests to respect rate limits. The tool uses LINQ's GroupBy to eliminate duplicate questions based on text content. Each note type has a dedicated formatter method that translates internal Question objects into Anki's specific import syntax, making it easy to add new formats without modifying the parser.

Impact

By the numbers:

3 years of active development (October 2021 – September 2024)
12 commits adding features like multiple choice support and dynamic question types
~1,500 lines of C# across core parsing logic and CLI interface
Featured in educational YouTube video about Anki automation workflows

What changed:

Eliminated hours of manual flashcard creation from quiz materials
Made spaced repetition study workflows more accessible for students
Demonstrated practical automation for education technology pain points
Project is now archived but served as a valuable learning tool for text parsing

Challenges & Solutions

The hardest part was handling the variability in quiz text formats. Different learning management systems export quizzes with inconsistent spacing, line breaks, and formatting. I solved this by implementing stateful parsing that looks for specific markers rather than relying on rigid line positions. The parser maintains a Question object that accumulates data as it processes lines, making it resilient to extra whitespace or formatting quirks.

Another challenge was encoding multiple-choice answers for Anki. Anki uses a pipe-delimited format with binary flags like |1 0 0 0| where 1 indicates the correct answer. I had to dynamically count answer options (2-5 choices supported) and generate the correct flag sequence based on which choice matched the stored correct answer. This required careful string parsing to match answer text even when formatting varied.

The Google search scraping functionality was fragile—it depended on HTML structure that could change without notice. I added throttling and error handling to make it more reliable, but ultimately this taught me that web scraping should be a last resort. If I revisited the project, I'd replace it with a dictionary API for more stable definition lookup.

What I Learned

I learned that tightly coupling parsers to specific input formats creates fragility. A more flexible approach using configurable regex patterns or a grammar definition would have improved reusability across different quiz sources. I also discovered the importance of CLI user experience—implementing input validation loops with clear error messages significantly improved usability compared to crash-on-invalid-input, especially for non-technical users.

This project taught me that web scraping introduces brittle dependencies. The Google search integration, while useful, could break without warning. In production tools, official APIs or local knowledge bases are more maintainable. Finally, I learned the value of modular architecture: having separate formatter methods for each note type made it trivial to add new output formats without touching the core parser logic.

What I'd do differently: