Quiz - Deployment Foundations
Section A: Deployment Concepts (8 questions)
Question 1
What is the main difference between ML development and ML deployment?
A) Development uses Python, deployment uses Java
B) Development focuses on model accuracy, deployment on reliability and production rollout
C) Development is free, deployment is paid
D) Development is for data scientists, deployment is for managers
Show Answer
Answer: B) Development focuses on model accuracy, deployment on reliability and production rollout
ML development typically happens in notebooks and aims to maximize model metrics. Deployment adds requirements for reliability, performance, monitoring, and maintainability. Both phases use Python and involve technical teams.
Question 2
What does MLOps mean?
A) Machine Learning Online Processing System
B) The combination of ML, DevOps, and data engineering practices for reliable deployment of models in production
C) A specific tool developed by Google for ML deployment
D) The process of manual optimization of ML models
Show Answer
Answer: B) The combination of ML, DevOps, and data engineering practices for reliable deployment of models in production
MLOps is a set of practices (not a specific tool) that combines machine learning, DevOps, and data engineering to deploy and maintain ML systems in production reliably and efficiently.
Question 3
At which MLOps maturity level does a project have automated model training but still manual deployment?
A) Level 0 — No MLOps
B) Level 1 — DevOps but no MLOps
C) Level 2 — Automated Training
D) Level 3 — Automated Deployment
Show Answer
Answer: C) Level 2 — Automated Training
Level 2 automates the training pipeline (scheduled retraining with new data) but model deployment remains a manual process. Level 3 also automates deployment.
Question 4
What is "data drift"?
A) The process of migrating data to the cloud
B) A change in the statistical distribution of input data compared to training data
C) Data loss during network transfer
D) Accidental duplication of data in the database
Show Answer
Answer: B) A change in the statistical distribution of input data compared to training data
Data drift occurs when production data evolves compared to the data used to train the model. For example, a model trained on pre-COVID data may perform poorly on post-COVID data because purchasing behaviors have changed.
Question 5
What is the difference between "data drift" and "concept drift"?
A) There is no difference, they are synonyms
B) Data drift concerns input data, concept drift concerns the relationship between inputs and outputs
C) Data drift is fast, concept drift is slow
D) Data drift is detectable, concept drift is not
Show Answer
Answer: B) Data drift concerns input data, concept drift concerns the relationship between inputs and outputs
Data drift occurs when the distribution of input variables changes (e.g., new demographic segment). Concept drift occurs when the relationship between inputs and the target variable changes (e.g., fraud criteria evolve). Both are detectable with appropriate monitoring tools.
Question 6
What is a "Feature Store"?
A) An online store to buy ML models
B) A centralized repository to store, manage, and serve ML features consistently between training and inference
C) A data visualization tool
D) A file storage system like AWS S3
Show Answer
Answer: B) A centralized repository to store, manage, and serve ML features consistently between training and inference
The Feature Store ensures that features used during training are computed exactly the same way during inference. This avoids the classic "training-serving skew" problem where transformations differ between the two phases.
Question 7
In the context of ML deployment, what does "MVM" (Minimum Viable Model) mean?
A) The model with the smallest possible number of parameters
B) A functional model that solves the core problem and can be improved iteratively
C) A model that only works on a minimum amount of data
D) The least expensive model to deploy
Show Answer
Answer: B) A functional model that solves the core problem and can be improved iteratively
Inspired by MVP (Minimum Viable Product) in software development, MVM is a model performant enough to be useful in production, even if not optimal. The idea is to deploy quickly and iterate rather than seeking perfection before the first deployment.
Question 8
What versioning strategy is recommended for ML models?
A) Version only the code with Git
B) Use semantic versioning (MAJOR.MINOR.PATCH) for code, model, data, and configuration
C) Name models with the date (model_2024-01-15.pkl)
D) Do not version models because they change too often
Show Answer
Answer: B) Use semantic versioning (MAJOR.MINOR.PATCH) for code, model, data, and configuration
Semantic versioning applied to models allows:
- MAJOR: breaking change (new output format)
- MINOR: improvement (retraining with more data)
- PATCH: bug fix (fixes a preprocessing issue)
Versioning only the code (A) is not sufficient. Date-based names (C) do not convey the nature of the change.
Section B: Infrastructure (7 questions)
Question 9
What is the main advantage of a Python virtual environment?
A) It makes Python faster
B) It isolates each project's dependencies to avoid version conflicts
C) It protects against computer viruses
D) It allows running Python without installing it
Show Answer
Answer: B) It isolates each project's dependencies to avoid version conflicts
A virtual environment creates an isolated Python installation. Each project can have its own package versions without interfering with other projects or the global Python system. This is essential when different projects require different versions of the same package.
Question 10
What is the correct command to create a virtual environment with venv?
A) pip install venv
B) python -m venv .venv
C) conda create venv
D) virtualenv --create .venv
Show Answer
Answer: B) python -m venv .venv
venv is a built-in module in Python 3.3+. It is invoked with python -m venv followed by the directory name (.venv is the convention). It does not require separate installation (pip install is not needed). conda create is for Conda environments, not venv.
Question 11
Why is it important to "pin" (fix) versions in requirements.txt?
A) To make the file more readable
B) To ensure reproducibility — the same code gives the same result everywhere
C) To reduce the size of installed packages
D) To avoid paying software licenses
Show Answer
Answer: B) To ensure reproducibility — the same code gives the same result everywhere
Without pinned versions, pip install scikit-learn will install the latest available version. If a new version is released with breaking changes, your code could stop working. With scikit-learn==1.4.2, you always get the same version on any machine.
Question 12
What is the role of a Dockerfile?
A) To document the project for developers
B) To define the recipe for building a Docker image containing the application and all its dependencies
C) To configure network settings of the container
D) To store sensitive environment variables
Show Answer
Answer: B) To define the recipe for building a Docker image containing the application and all its dependencies
A Dockerfile is a text file that contains a sequence of instructions for building a Docker image. Each instruction creates a layer in the image. The result is a portable container that works identically on any machine with Docker installed.
Question 13
In a Dockerfile, why do we copy requirements.txt and install dependencies BEFORE copying the source code?
A) It's just a convention, the order doesn't matter
B) To leverage Docker cache — dependencies don't change with every code modification
C) Because dependencies must be installed before Python exists in the container
D) For security reasons
Show Answer
Answer: B) To leverage Docker cache — dependencies don't change with every code modification
Docker builds images in layers. If a layer hasn't changed, Docker reuses the cache. By copying requirements.txt first, the dependency installation layer (which is slow) is cached as long as dependencies don't change. If we copied all the code first, every Python file modification would invalidate the cache and force reinstalling all packages.
Question 14
For a scikit-learn model (Random Forest) serving predictions via API, what instance type is most appropriate?
A) GPU (g4dn.xlarge) to accelerate inference
B) CPU (t3.medium or c5.xlarge) — sufficient for classical models
C) Multi-GPU (p4d.24xlarge) for high availability
D) A GPU is always required for machine learning
Show Answer
Answer: B) CPU (t3.medium or c5.xlarge) — sufficient for classical models
Scikit-learn models (Random Forest, Logistic Regression, etc.) are optimized for CPU. They do not benefit from GPU acceleration, which is designed for massively parallel operations in neural networks. Using a GPU for sklearn would waste resources and money (a GPU costs 10-100x more).
Question 15
What is CI/CD in the ML context?
A) "Code Integration / Code Deployment" — a code management tool
B) "Continuous Integration / Continuous Deployment" — automation of testing, validation, and deployment
C) "Cloud Infrastructure / Cloud Delivery" — a cloud computing service
D) "Customer Interface / Customer Delivery" — a customer-centric approach
Show Answer
Answer: B) "Continuous Integration / Continuous Deployment" — automation of testing, validation, and deployment
CI/CD automates the code validation and deployment process. In ML, this includes not only code tests but also data validation, model metrics verification, and automatic deployment when all conditions are met. Tools like GitHub Actions, GitLab CI, or Jenkins are commonly used.
Section C: Practical Scenarios (5 questions)
Question 16
You are deploying a fraud detection model that must respond in under 100ms. Which deployment pattern is most appropriate?
A) Batch — run predictions every hour
B) Real-time — serve predictions via an API
C) Shadow mode — record predictions without using them
D) Batch — run predictions once per day
Show Answer
Answer: B) Real-time — serve predictions via an API
Fraud detection requires immediate response — each transaction must be evaluated in real time before approval. Batch processing (hourly or daily) would be too slow: fraud would already be committed before detection. Shadow mode does not serve predictions to users.
Question 17
You have trained a new model (v2) that improves accuracy by 5%. You want to test it in production without risk. Which strategy do you use?
A) Direct deployment (Big Bang) — immediately replace v1 with v2
B) Shadow mode — send traffic to v2 but do not use its predictions
C) Delete v1 and hope v2 works
D) Wait 6 months of lab testing before deploying
Show Answer
Answer: B) Shadow mode — send traffic to v2 but do not use its predictions
Shadow mode is the safest strategy to test a new model. Model v2 receives real production traffic and its predictions are recorded, but users continue to receive predictions from v1. You can compare the performance of both models on real data with no risk to users.
Question 18
Your e-commerce recommendation model was trained in 2023. In 2024, "fitness" category sales tripled due to a TikTok trend. Recommendations seem less relevant. What is the problem?
A) A bug in the API code
B) Data drift and possibly concept drift — purchasing behaviors have changed
C) The server is too slow
D) The model is too complex (overfitting)
Show Answer
Answer: B) Data drift and possibly concept drift — purchasing behaviors have changed
This is a classic case of data drift (purchase distribution has changed, with many more fitness products) and concept drift (the relationship between features and user preferences has evolved). The solution is to retrain the model with recent data that reflects new purchasing trends.
Question 19
You are developing an ML project as a team. A colleague has pandas==1.5.3 and you have pandas==2.2.0. The code works for you but not for your colleague. What is the best solution?
A) Ask your colleague to manually update their packages
B) Use a requirements.txt with pinned versions and a virtual environment
C) Use the same computer for everyone
D) Write code compatible with all versions of pandas
Show Answer
Answer: B) Use a requirements.txt with pinned versions and a virtual environment
This is exactly the problem that virtual environments and pinned dependency files solve. By combining python -m venv .venv and pip install -r requirements.txt (with pinned versions like pandas==2.2.0), each team member will have exactly the same environment. Docker goes further by also fixing the operating system.
Question 20
You are deploying a new model in canary deployment at 5% of traffic. After 24 hours, you notice that the canary model's error rate is 3x higher than the current model. What do you do?
A) Increase traffic to 50% to get more data
B) Wait another week before making a decision
C) Immediate rollback — bring canary traffic to 0% and investigate the issue
D) Ignore the problem because 5% of traffic is negligible
Show Answer
Answer: C) Immediate rollback — bring canary traffic to 0% and investigate the issue
A 3x higher error rate is a clear warning sign. The advantage of canary deployment is precisely the ability to detect problems on a small percentage of traffic and perform a quick rollback. Increasing traffic (A) would worsen the problem. Waiting (B) would unnecessarily expose the 5% of users to degraded service. Ignoring (D) goes against the purpose of canary.
Scoring
| Section | Questions | Points |
|---|---|---|
| A — Deployment Concepts | 1 to 8 | 40 points (5 pts each) |
| B — Infrastructure | 9 to 15 | 35 points (5 pts each) |
| C — Practical Scenarios | 16 to 20 | 25 points (5 pts each) |
| Total | 20 questions | 100 points |
Interpretation
| Score | Level | Recommendation |
|---|---|---|
| 90-100 | Excellent | You have mastered the fundamentals of ML deployment |
| 75-89 | Good | Review the concepts where you made mistakes |
| 60-74 | Adequate | Re-read the "Concepts" and "Infrastructure" sections |
| < 60 | Insufficient | Restart the module from the beginning before continuing |