[MAICE Dev Log 1] Building a Teaching AI, not just an Answer Bot (Project Overview)

1. Why this project started

In high-school math classrooms, the quality of student questions strongly shapes the quality of AI answers.

In our pilot analysis (N=385), 72.3% of student questions lacked learning context (progress stage, exact difficulty, etc.). We also observed a positive correlation between question quality and answer quality (r=0.691).

In short: vague questions produce vague answers. If the system only gives immediate answers (Freepass mode), students get fewer chances to rethink and refine their own questions.

MAICE (Mathematics AI Chatbot for Education) starts from that gap. Using Bloom’s Taxonomy and Dewey’s Reflective Thinking, it does not always answer immediately; it first helps students clarify what they are actually asking.

2. Core philosophy: Bloom + Dewey

Bloom’s Taxonomy (K1-K4)

MAICE classifies questions into four knowledge dimensions:

K1 (Factual): direct fact/formula questions
K2 (Conceptual): relationship and principle questions
K3 (Procedural): step-by-step method questions
K4 (Metacognitive): reflection and self-evaluation questions

Dewey’s Reflective Thinking

When a question is unclear, MAICE explicitly triggers a clarification loop aligned with Dewey’s reflective process, especially the problem-definition stage.

Instead of answering immediately, it asks: “What exactly do you want to understand?“

3. High-level architecture

MAICE avoids a single monolithic prompt. It uses specialized agents in a microservice architecture.

graph TD
    Client[SvelteKit Frontend] -->|WS/SSE| Gateway[FastAPI Backend]
    Gateway -->|Redis Streams| Classifier[Question Classifier]
    Classifier -->|Pub/Sub| Improver[Question Improver]
    Classifier -->|Pub/Sub| Answer[Answer Generator]
    Answer -->|Stream| Gateway
    Answer -->|Pub/Sub| Observer[Observer Agent]
    Observer -->|Analysis| DB[(PostgreSQL)]

Frontend uses SvelteKit + MathLive for math-friendly input. Backend uses FastAPI + Redis Streams for reliable asynchronous coordination.

4. Core agents

QuestionClassifierAgent (QC): classifies K1-K4 and decides if clarification is needed.
QuestionImprovementAgent (QI): runs clarification prompts to refine unclear questions.
AnswerGeneratorAgent (AG): generates differentiated answers by K-level.
FreepassTriggerAgent (FT): handles immediate-answer mode routing.
ObserverAgent (LO): summarizes sessions and learning traces.

CurriculumTermAgent (curriculum term validation) remained at design stage in this study.

5. A/B test summary

Participants: 58 students (Agent 28 / Freepass 30), 3 weeks, mathematical induction unit.

LLM evaluation (N=284) and teacher evaluation (N=100) both showed Agent advantages in learning-support signals.
Effects were especially visible in the lower quartile (Q1).
In A3 (learning context), Freepass sometimes scored better, suggesting clarification flow can reduce explicitly stated context in some sessions.

LLM-teacher correlation was r=0.754. We treated LLM scoring as a pattern-discovery aid, not a strict absolute grading substitute.

6. Closing

MAICE demonstrates how educational theory can be operationalized in system behavior and evaluated with classroom data.

It improves some dimensions clearly, while revealing trade-offs in others. The following posts cover those implementation and evaluation details by layer.