ai

[MAICE Dev Log 4] QAC checklist development: making educational quality measurable

#agent

1. Why this post pivots to QAC

Persona simulation was useful for exploration, but it did not directly guarantee production-quality educational behavior.

What actually supported iterative improvement was a consistent evaluation framework.

That is why this post focuses on QAC (Question-Answer-Context).


2. Why QAC was needed

Educational AI quality cannot be reduced to factual correctness only.

We needed a framework that evaluates, together:

  • whether student questions include meaningful learning context
  • whether answers match learner level
  • whether dialogue supports actual thinking processes

3. QAC structure (40 points)

QAC has three domains:

  • A (Question, 15): math expertise, question structure, learning context
  • B (Answer, 15): learner fit, explanation structure, learning expansion
  • C (Context, 10): dialogue coherence, learning-process support

Session-level scores are computed from checklist items and aggregated by domain.


4. How it was used in research

In the thesis workflow:

  • LLM evaluation for large-scale pattern scan: N=284
  • teacher evaluation for educational validity check: N=100
  • LLM-teacher correlation: r=0.754 (p<0.001)

Interpretation rule:

  • LLM tended to score higher than teachers
  • use LLM primarily for relative comparison and pattern detection
  • keep final interpretation anchored with teacher-side validation

5. Concrete engineering outcomes

Compared to persona simulation, QAC produced clearer implementation artifacts:

  1. standardized session-log units for evaluation
  2. fixed rubric-based scoring structure
  3. aligned comparison paths between LLM and teacher evaluations

This changed iteration from subjective intuition to item-level traceable improvement.


6. Current role of persona testing

Persona testing is still useful, but now as a supporting tool:

  • typo/ungrammatical/short-query robustness checks
  • edge-case discovery
  • QA scenario enrichment

Core quality judgment remains QAC-centered.


7. Closing

The key outcome of this stage is not “better persona realism.” It is making educational quality measurable and actionable.

See details

[MAICE Dev Log 7] How we validated educational impact: thesis-based summary

Source

  • Master’s thesis: Development and Effectiveness Analysis of AI Agent Supporting Question Clarification in High School Mathematics Learning (Kim Kyubong, Pusan National University, 2026)

💬 댓글

이 글에 대한 의견을 남겨주세요