Quiz - Deployment Foundations

Quiz 30 min 20 questions

Section A: Deployment Concepts (8 questions)

Question 1

What is the main difference between ML development and ML deployment?

A) Development uses Python, deployment uses Java
B) Development focuses on model accuracy, deployment on reliability and production rollout
C) Development is free, deployment is paid
D) Development is for data scientists, deployment is for managers

Show Answer

Answer: B) Development focuses on model accuracy, deployment on reliability and production rollout

ML development typically happens in notebooks and aims to maximize model metrics. Deployment adds requirements for reliability, performance, monitoring, and maintainability. Both phases use Python and involve technical teams.

Question 2

What does MLOps mean?

A) Machine Learning Online Processing System
B) The combination of ML, DevOps, and data engineering practices for reliable deployment of models in production
C) A specific tool developed by Google for ML deployment
D) The process of manual optimization of ML models

Show Answer

Answer: B) The combination of ML, DevOps, and data engineering practices for reliable deployment of models in production

MLOps is a set of practices (not a specific tool) that combines machine learning, DevOps, and data engineering to deploy and maintain ML systems in production reliably and efficiently.

Question 3

At which MLOps maturity level does a project have automated model training but still manual deployment?

A) Level 0 — No MLOps
B) Level 1 — DevOps but no MLOps
C) Level 2 — Automated Training
D) Level 3 — Automated Deployment

Show Answer

Answer: C) Level 2 — Automated Training

Level 2 automates the training pipeline (scheduled retraining with new data) but model deployment remains a manual process. Level 3 also automates deployment.

Question 4

What is "data drift"?

A) The process of migrating data to the cloud
B) A change in the statistical distribution of input data compared to training data
C) Data loss during network transfer
D) Accidental duplication of data in the database

Show Answer

Answer: B) A change in the statistical distribution of input data compared to training data

Data drift occurs when production data evolves compared to the data used to train the model. For example, a model trained on pre-COVID data may perform poorly on post-COVID data because purchasing behaviors have changed.

Question 5

What is the difference between "data drift" and "concept drift"?

A) There is no difference, they are synonyms
B) Data drift concerns input data, concept drift concerns the relationship between inputs and outputs
C) Data drift is fast, concept drift is slow
D) Data drift is detectable, concept drift is not

Show Answer

Answer: B) Data drift concerns input data, concept drift concerns the relationship between inputs and outputs

Data drift occurs when the distribution of input variables changes (e.g., new demographic segment). Concept drift occurs when the relationship between inputs and the target variable changes (e.g., fraud criteria evolve). Both are detectable with appropriate monitoring tools.

Question 6

What is a "Feature Store"?

A) An online store to buy ML models
B) A centralized repository to store, manage, and serve ML features consistently between training and inference
C) A data visualization tool
D) A file storage system like AWS S3

Show Answer

Answer: B) A centralized repository to store, manage, and serve ML features consistently between training and inference

The Feature Store ensures that features used during training are computed exactly the same way during inference. This avoids the classic "training-serving skew" problem where transformations differ between the two phases.

Question 7

In the context of ML deployment, what does "MVM" (Minimum Viable Model) mean?

A) The model with the smallest possible number of parameters
B) A functional model that solves the core problem and can be improved iteratively
C) A model that only works on a minimum amount of data
D) The least expensive model to deploy

Show Answer

Answer: B) A functional model that solves the core problem and can be improved iteratively

Inspired by MVP (Minimum Viable Product) in software development, MVM is a model performant enough to be useful in production, even if not optimal. The idea is to deploy quickly and iterate rather than seeking perfection before the first deployment.

Question 8

What versioning strategy is recommended for ML models?

A) Version only the code with Git
B) Use semantic versioning (MAJOR.MINOR.PATCH) for code, model, data, and configuration
C) Name models with the date (model_2024-01-15.pkl)
D) Do not version models because they change too often

Show Answer

Answer: B) Use semantic versioning (MAJOR.MINOR.PATCH) for code, model, data, and configuration

Semantic versioning applied to models allows:

MAJOR: breaking change (new output format)
MINOR: improvement (retraining with more data)
PATCH: bug fix (fixes a preprocessing issue)

Versioning only the code (A) is not sufficient. Date-based names (C) do not convey the nature of the change.

Section B: Infrastructure (7 questions)

Question 9

What is the main advantage of a Python virtual environment?

A) It makes Python faster
B) It isolates each project's dependencies to avoid version conflicts
C) It protects against computer viruses
D) It allows running Python without installing it

Show Answer

Answer: B) It isolates each project's dependencies to avoid version conflicts

A virtual environment creates an isolated Python installation. Each project can have its own package versions without interfering with other projects or the global Python system. This is essential when different projects require different versions of the same package.

Question 10

What is the correct command to create a virtual environment with venv?

A) pip install venv
B) python -m venv .venv
C) conda create venv
D) virtualenv --create .venv

Show Answer

Answer: B) python -m venv .venv

venv is a built-in module in Python 3.3+. It is invoked with python -m venv followed by the directory name (.venv is the convention). It does not require separate installation (pip install is not needed). conda create is for Conda environments, not venv.

Question 11

Why is it important to "pin" (fix) versions in requirements.txt?

A) To make the file more readable
B) To ensure reproducibility — the same code gives the same result everywhere
C) To reduce the size of installed packages
D) To avoid paying software licenses

Show Answer

Answer: B) To ensure reproducibility — the same code gives the same result everywhere

Without pinned versions, pip install scikit-learn will install the latest available version. If a new version is released with breaking changes, your code could stop working. With scikit-learn==1.4.2, you always get the same version on any machine.

Question 12

What is the role of a Dockerfile?

A) To document the project for developers
B) To define the recipe for building a Docker image containing the application and all its dependencies
C) To configure network settings of the container
D) To store sensitive environment variables

Show Answer

Answer: B) To define the recipe for building a Docker image containing the application and all its dependencies

A Dockerfile is a text file that contains a sequence of instructions for building a Docker image. Each instruction creates a layer in the image. The result is a portable container that works identically on any machine with Docker installed.

Question 13

In a Dockerfile, why do we copy requirements.txt and install dependencies BEFORE copying the source code?

A) It's just a convention, the order doesn't matter
B) To leverage Docker cache — dependencies don't change with every code modification
C) Because dependencies must be installed before Python exists in the container
D) For security reasons

Show Answer

Answer: B) To leverage Docker cache — dependencies don't change with every code modification

Docker builds images in layers. If a layer hasn't changed, Docker reuses the cache. By copying requirements.txt first, the dependency installation layer (which is slow) is cached as long as dependencies don't change. If we copied all the code first, every Python file modification would invalidate the cache and force reinstalling all packages.

Question 14

For a scikit-learn model (Random Forest) serving predictions via API, what instance type is most appropriate?

A) GPU (g4dn.xlarge) to accelerate inference
B) CPU (t3.medium or c5.xlarge) — sufficient for classical models
C) Multi-GPU (p4d.24xlarge) for high availability
D) A GPU is always required for machine learning

Show Answer

Answer: B) CPU (t3.medium or c5.xlarge) — sufficient for classical models

Scikit-learn models (Random Forest, Logistic Regression, etc.) are optimized for CPU. They do not benefit from GPU acceleration, which is designed for massively parallel operations in neural networks. Using a GPU for sklearn would waste resources and money (a GPU costs 10-100x more).

Question 15

What is CI/CD in the ML context?

A) "Code Integration / Code Deployment" — a code management tool
B) "Continuous Integration / Continuous Deployment" — automation of testing, validation, and deployment
C) "Cloud Infrastructure / Cloud Delivery" — a cloud computing service
D) "Customer Interface / Customer Delivery" — a customer-centric approach

Show Answer

Answer: B) "Continuous Integration / Continuous Deployment" — automation of testing, validation, and deployment

CI/CD automates the code validation and deployment process. In ML, this includes not only code tests but also data validation, model metrics verification, and automatic deployment when all conditions are met. Tools like GitHub Actions, GitLab CI, or Jenkins are commonly used.

Section C: Practical Scenarios (5 questions)

Question 16

You are deploying a fraud detection model that must respond in under 100ms. Which deployment pattern is most appropriate?

A) Batch — run predictions every hour
B) Real-time — serve predictions via an API
C) Shadow mode — record predictions without using them
D) Batch — run predictions once per day

Show Answer

Answer: B) Real-time — serve predictions via an API

Fraud detection requires immediate response — each transaction must be evaluated in real time before approval. Batch processing (hourly or daily) would be too slow: fraud would already be committed before detection. Shadow mode does not serve predictions to users.

Question 17

You have trained a new model (v2) that improves accuracy by 5%. You want to test it in production without risk. Which strategy do you use?

A) Direct deployment (Big Bang) — immediately replace v1 with v2
B) Shadow mode — send traffic to v2 but do not use its predictions
C) Delete v1 and hope v2 works
D) Wait 6 months of lab testing before deploying

Show Answer

Answer: B) Shadow mode — send traffic to v2 but do not use its predictions

Shadow mode is the safest strategy to test a new model. Model v2 receives real production traffic and its predictions are recorded, but users continue to receive predictions from v1. You can compare the performance of both models on real data with no risk to users.

Question 18

Your e-commerce recommendation model was trained in 2023. In 2024, "fitness" category sales tripled due to a TikTok trend. Recommendations seem less relevant. What is the problem?

A) A bug in the API code
B) Data drift and possibly concept drift — purchasing behaviors have changed
C) The server is too slow
D) The model is too complex (overfitting)

Show Answer

Answer: B) Data drift and possibly concept drift — purchasing behaviors have changed

This is a classic case of data drift (purchase distribution has changed, with many more fitness products) and concept drift (the relationship between features and user preferences has evolved). The solution is to retrain the model with recent data that reflects new purchasing trends.

Question 19

You are developing an ML project as a team. A colleague has pandas==1.5.3 and you have pandas==2.2.0. The code works for you but not for your colleague. What is the best solution?

A) Ask your colleague to manually update their packages
B) Use a requirements.txt with pinned versions and a virtual environment
C) Use the same computer for everyone
D) Write code compatible with all versions of pandas

Show Answer

Answer: B) Use a requirements.txt with pinned versions and a virtual environment

This is exactly the problem that virtual environments and pinned dependency files solve. By combining python -m venv .venv and pip install -r requirements.txt (with pinned versions like pandas==2.2.0), each team member will have exactly the same environment. Docker goes further by also fixing the operating system.

Question 20

You are deploying a new model in canary deployment at 5% of traffic. After 24 hours, you notice that the canary model's error rate is 3x higher than the current model. What do you do?

A) Increase traffic to 50% to get more data
B) Wait another week before making a decision
C) Immediate rollback — bring canary traffic to 0% and investigate the issue
D) Ignore the problem because 5% of traffic is negligible

Show Answer

Answer: C) Immediate rollback — bring canary traffic to 0% and investigate the issue

A 3x higher error rate is a clear warning sign. The advantage of canary deployment is precisely the ability to detect problems on a small percentage of traffic and perform a quick rollback. Increasing traffic (A) would worsen the problem. Waiting (B) would unnecessarily expose the 5% of users to degraded service. Ignoring (D) goes against the purpose of canary.

Scoring

Section	Questions	Points
A — Deployment Concepts	1 to 8	40 points (5 pts each)
B — Infrastructure	9 to 15	35 points (5 pts each)
C — Practical Scenarios	16 to 20	25 points (5 pts each)
Total	20 questions	100 points

Interpretation

Score	Level	Recommendation
90-100	Excellent	You have mastered the fundamentals of ML deployment
75-89	Good	Review the concepts where you made mistakes
60-74	Adequate	Re-read the "Concepts" and "Infrastructure" sections
< 60	Insufficient	Restart the module from the beginning before continuing

Section A: Deployment Concepts (8 questions)​

Question 1​

Question 2​

Question 3​

Question 4​

Question 5​

Question 6​

Question 7​

Question 8​

Section B: Infrastructure (7 questions)​

Question 9​

Question 10​

Question 11​

Question 12​

Question 13​

Question 14​

Question 15​

Section C: Practical Scenarios (5 questions)​

Question 16​

Question 17​

Question 18​

Question 19​

Question 20​

Scoring​

Interpretation​

Section A: Deployment Concepts (8 questions)

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Question 7

Question 8

Section B: Infrastructure (7 questions)

Question 9

Question 10

Question 11

Question 12

Question 13

Question 14

Question 15

Section C: Practical Scenarios (5 questions)

Question 16

Question 17

Question 18

Question 19

Question 20

Scoring

Interpretation