[MAICE Dev Log 5] Zero-downtime deployment for heavy AI containers (Blue-Green with Jenkins)

한국어 버전

0. Update note

This post records why MAICE introduced Blue-Green deployment to operate the service reliably during the experiment. The deployment scripts and operations became more segmented over time, but the core problem remained the same.

If the service stops while a student is asking a question, the student's learning flow is interrupted. From a researcher's point of view, the session log is also broken, and the amount of evaluable data decreases. In MAICE, stability was not just operational convenience. It was part of protecting the experimental condition.


1. In an experiment, stability is a feature

Post 4 covered how MAICE conversations were evaluated with QAC. But even the best evaluation standard becomes useless if the service stops frequently during the experiment. There would be fewer complete conversations to compare.

A typical web server can restart quickly. An AI-agent service is heavier. It loads multiple libraries, connects to external services, and prepares model calls. If every deployment interrupts student chats, the operating condition is already unstable before we can evaluate whether the system is educationally useful.

So MAICE used a Blue-Green strategy to keep the old service alive while the new version was being prepared. Blue-Green deployment means preparing two copies of the same service and switching the traffic entrance only after the new copy is ready.

The reported system uptime during the three-week research period was 99.2%. This number was observed under the experiment-scale operating environment. Blue-Green deployment contributed to stability, but that number alone does not prove the independent effect of the deployment strategy.


2. Blue-Green as a classroom analogy

Blue-Green deployment prepares two copies of the same service. Suppose the version students are currently using is Blue, and the new version is deployed to Green.

  1. Blue is receiving student requests.
  2. A new version is deployed to Green.
  3. Green is checked for health.
  4. If Green is healthy, Nginx switches traffic to Green.
  5. If something goes wrong, traffic can be switched back to Blue.

In classroom terms, it is like preparing the next board on the side instead of erasing the board during class. Once the new board is ready, students simply look at the new board.


3. The actual deployment flow

In the MAICE codebase, Jenkinsfile acts as the entry point for deployment automation. Jenkins records which commit was built, which image was created, and which environment was deployed. This is more traceable than logging into a server and restarting containers by hand.

The core backend deployment flow is in scripts/deploy_backend_blue_green.sh. The script does the following.

  • Checks that required environment variables and API keys exist.
  • Pulls the backend image tagged with the build number from the registry.
  • Checks whether Nginx is currently pointing to Blue or Green.
  • Starts a new container for the inactive color.
  • Repeatedly checks /health/simple to see whether the new container is ready.
  • Updates the Nginx configuration and reloads Nginx when the new container is healthy. Here, Nginx is the entrance that routes student requests to either the Blue or Green backend.

The health check is not merely asking whether the process is turned on. It is a safety step for moving traffic only after the container is minimally ready to receive student requests.


4. The goal is not to eliminate failure, but to keep a way back

Blue-Green deployment does not remove deployment failure. A new container may fail to start, an environment variable may be missing, or the Nginx configuration may fail to reload.

The important point is not pretending that failures never happen. It is preparing a path back when they do. MAICE includes scripts such as rollback_backend_blue_green.sh and traffic_control.sh. They check whether the previous color's container is alive, run health checks, and switch traffic back when needed.

As a development note, the backend containers are named by color, such as maice-back-blue and maice-back-green. Nginx upstream configuration determines which one receives traffic. The scripts also consider nginx -t, reload, and a restart fallback when necessary.


5. Why this is still an education post

Deployment may look far from education. But in an experiment, it is not separate. If the service stops while a student is writing a question, the learning flow is interrupted and the session log becomes incomplete. If such logs are evaluated with QAC, the comparison condition is weakened.

Deployment automation does not directly create learning effects. Instead, it protects the operating condition under which learning effects can be observed.


6. The remaining question after stability

Blue-Green deployment addressed the problem of keeping the service running during the experiment. The remaining question is no longer an operational one. It is what the conversations and evaluation data actually showed under those stable conditions.

The next post summarizes MAICE's educational effects and limits using the thesis data.

Next post: [MAICE Dev Log 6] How we validated educational impact

💬 댓글

이 글에 대한 의견을 남겨주세요