[MAICE Dev Log 4] QAC checklist development: making educational quality measurable
1. Why this post pivots to QAC
Persona simulation was useful for exploration, but it did not directly guarantee production-quality educational behavior.
What actually supported iterative improvement was a consistent evaluation framework.
That is why this post focuses on QAC (Question-Answer-Context).
2. Why QAC was needed
Educational AI quality cannot be reduced to factual correctness only.
We needed a framework that evaluates, together:
- whether student questions include meaningful learning context
- whether answers match learner level
- whether dialogue supports actual thinking processes
3. QAC structure (40 points)
QAC has three domains:
- A (Question, 15): math expertise, question structure, learning context
- B (Answer, 15): learner fit, explanation structure, learning expansion
- C (Context, 10): dialogue coherence, learning-process support
Session-level scores are computed from checklist items and aggregated by domain.
4. How it was used in research
In the thesis workflow:
- LLM evaluation for large-scale pattern scan:
N=284 - teacher evaluation for educational validity check:
N=100 - LLM-teacher correlation:
r=0.754 (p<0.001)
Interpretation rule:
- LLM tended to score higher than teachers
- use LLM primarily for relative comparison and pattern detection
- keep final interpretation anchored with teacher-side validation
5. Concrete engineering outcomes
Compared to persona simulation, QAC produced clearer implementation artifacts:
- standardized session-log units for evaluation
- fixed rubric-based scoring structure
- aligned comparison paths between LLM and teacher evaluations
This changed iteration from subjective intuition to item-level traceable improvement.
6. Current role of persona testing
Persona testing is still useful, but now as a supporting tool:
- typo/ungrammatical/short-query robustness checks
- edge-case discovery
- QA scenario enrichment
Core quality judgment remains QAC-centered.
7. Closing
The key outcome of this stage is not “better persona realism.” It is making educational quality measurable and actionable.
See details
[MAICE Dev Log 7] How we validated educational impact: thesis-based summary
Source
- Master’s thesis: Development and Effectiveness Analysis of AI Agent Supporting Question Clarification in High School Mathematics Learning (Kim Kyubong, Pusan National University, 2026)
💬 댓글
이 글에 대한 의견을 남겨주세요