إنتقل إلى المحتوى الرئيسي

Quiz - Testing & Explainability

Quiz 25 Questions 30 min

Section A — Testing AI Systems (10 questions)

Question 1

Why is testing AI systems more challenging than testing traditional software?

  • A) AI systems use more memory
  • B) AI systems can produce non-deterministic outputs and suffer from silent failures
  • C) AI systems always require GPU hardware for testing
  • D) AI systems cannot be tested with standard frameworks
Show Answer

B) AI systems can produce non-deterministic outputs and suffer from silent failures

Unlike traditional software where a bug causes a crash or wrong output, AI systems can produce subtly wrong predictions that go unnoticed. Additionally, the same input may produce slightly different outputs due to model randomness, floating-point precision, or data-dependent behavior.


Question 2

In the testing pyramid for AI, which type of test should you write the most of?

  • A) End-to-end tests
  • B) Integration tests
  • C) Unit tests
  • D) Performance tests
Show Answer

C) Unit tests

The testing pyramid recommends a 70/20/10 distribution: 70% unit tests (fast, cheap, many), 20% integration tests (medium), 10% end-to-end tests (slow, expensive, few). Unit tests catch the most bugs at the lowest cost.


Question 3

What is the purpose of conftest.py in pytest?

  • A) To configure the Python interpreter
  • B) To store shared fixtures and hooks that are automatically discovered by pytest
  • C) To define test assertion methods
  • D) To configure code coverage reporting
Show Answer

B) To store shared fixtures and hooks that are automatically discovered by pytest

conftest.py is a special file that pytest discovers automatically. Fixtures defined in it are available to all tests in the same directory and subdirectories — no imports needed. You can have multiple conftest.py files at different directory levels.


Question 4

What does @pytest.mark.parametrize do?

  • A) It runs a test in parallel across multiple CPU cores
  • B) It runs a single test function with multiple sets of input data
  • C) It marks a test as parameterized so it can be skipped
  • D) It configures the test parameters in pytest.ini
Show Answer

B) It runs a single test function with multiple sets of input data

@pytest.mark.parametrize lets you define multiple input/output pairs for a single test function. Instead of writing 10 separate tests for 10 inputs, you write one test that runs 10 times with different data. Example:

@pytest.mark.parametrize("input,expected", [(1, 2), (2, 4), (3, 6)])
def test_double(input, expected):
assert input * 2 == expected

Question 5

What is the main advantage of using TestClient from FastAPI/Starlette for testing?

  • A) It supports parallel test execution
  • B) It allows testing API endpoints without starting a real HTTP server
  • C) It automatically generates test cases from your API schema
  • D) It provides built-in load testing capabilities
Show Answer

B) It allows testing API endpoints without starting a real HTTP server

TestClient simulates HTTP requests in memory, making tests fast and reliable. You don't need to start uvicorn, manage ports, or handle network issues. The tests run synchronously and deterministically.


Question 6

When should you mock the ML model in your tests?

  • A) Always — never use the real model in tests
  • B) When testing API routing, validation, and response format (not prediction accuracy)
  • C) Only when the model file is too large to include in the repository
  • D) Never — mocking defeats the purpose of testing
Show Answer

B) When testing API routing, validation, and response format (not prediction accuracy)

Mock the model when you want to test the API logic in isolation: Does the endpoint return the right status code? Does it validate input correctly? Does it handle errors gracefully? Use the real model when testing prediction quality, accuracy, and edge-case behavior.


Question 7

Which of these is NOT a valid edge case to test for an AI prediction API?

  • A) NaN values in features
  • B) Empty feature list
  • C) Features with correct values and types
  • D) Infinity values in features
Show Answer

C) Features with correct values and types

Valid input is the happy path, not an edge case. Edge cases are boundary conditions and unusual inputs that might break your system: NaN, infinity, empty lists, null values, extremely large numbers, wrong types, etc.


Question 8

What does pytest --cov=app --cov-fail-under=80 do?

  • A) It runs tests and generates a coverage report, failing if any test takes over 80 seconds
  • B) It runs tests with code coverage and fails the build if coverage is below 80%
  • C) It runs only 80% of the test suite for faster execution
  • D) It sets the maximum number of test failures to 80
Show Answer

B) It runs tests with code coverage and fails the build if coverage is below 80%

The --cov=app flag measures coverage for the app package, and --cov-fail-under=80 sets a minimum threshold. If the measured coverage is below 80%, pytest exits with a non-zero status code, which causes CI/CD pipelines to fail.


Question 9

In a GitHub Actions CI pipeline, what is the correct order of steps for testing an AI API?

  • A) Run tests → Install dependencies → Checkout code
  • B) Checkout code → Run tests → Install dependencies
  • C) Checkout code → Install dependencies → Run tests → Check coverage
  • D) Install dependencies → Checkout code → Check coverage → Run tests
Show Answer

C) Checkout code → Install dependencies → Run tests → Check coverage

The logical order is:

  1. Checkout the code from the repository
  2. Install Python, pip, and project dependencies
  3. Run unit and integration tests
  4. Check code coverage meets the threshold
  5. (Optional) Upload coverage report to a service like Codecov

Question 10

What is the AAA pattern in testing?

  • A) Authentication, Authorization, Accounting
  • B) Arrange, Act, Assert
  • C) Analyze, Apply, Approve
  • D) Automate, Accelerate, Audit
Show Answer

B) Arrange, Act, Assert

The AAA pattern structures every test in three phases:

  1. Arrange: Set up the test data and preconditions
  2. Act: Execute the function or action being tested
  3. Assert: Verify the result matches expectations
def test_prediction():
features = [5.1, 3.5, 1.4, 0.2, 2.3] # Arrange
result = model.predict([features]) # Act
assert result[0] in [0, 1] # Assert

Section B — Postman (5 questions)

Question 11

In Postman, what is the purpose of environment variables?

  • A) To store test results for reporting
  • B) To switch between different API configurations (local, staging, production) without changing requests
  • C) To define the programming language for test scripts
  • D) To set the HTTP protocol version
Show Answer

B) To switch between different API configurations (local, staging, production) without changing requests

Environment variables let you define values like base_url, auth_token, and api_version per environment. Your requests use {{base_url}} instead of hardcoded URLs. Switching from local to production is just selecting a different environment — no request changes needed.


Question 12

Which Postman test assertion checks that the response status code is 200?

  • A) pm.assert(response.code === 200)
  • B) pm.test("Status", () => { pm.response.to.have.status(200) })
  • C) assert(pm.status == 200)
  • D) expect(response).toBe(200)
Show Answer

B) pm.test("Status", () => { pm.response.to.have.status(200) })

Postman uses the pm.test() function with Chai-style assertions. The first argument is a description string, and the second is a callback function with the assertion. pm.response.to.have.status(200) is a BDD-style assertion that checks the HTTP status code.


Question 13

What is Newman in the Postman ecosystem?

  • A) A visual test editor for Postman
  • B) A command-line tool that runs Postman collections for CI/CD automation
  • C) A Postman plugin for load testing
  • D) A code generator that converts Postman requests to Python
Show Answer

B) A command-line tool that runs Postman collections for CI/CD automation

Newman is the command-line companion for Postman. It runs exported collections from the terminal, which makes it perfect for CI/CD pipelines (GitHub Actions, Jenkins, GitLab CI). You can run it with environment files, data files, and generate HTML reports.

newman run collection.json -e environment.json -r htmlextra

Question 14

What is the difference between a pre-request script and a post-response script in Postman?

  • A) Pre-request runs on the server, post-response runs on the client
  • B) Pre-request runs before the request is sent, post-response runs after the response is received
  • C) Pre-request sets environment variables, post-response cannot
  • D) There is no difference — they run at the same time
Show Answer

B) Pre-request runs before the request is sent, post-response runs after the response is received

  • Pre-request scripts execute before the HTTP request is sent — use them to generate dynamic data, set variables, or log information
  • Post-response scripts (formerly "Tests") execute after the response arrives — use them to validate the response, extract data, and set variables for the next request

Question 15

How do you chain requests in Postman (pass data from one request to the next)?

  • A) Use global JavaScript variables shared between all requests
  • B) Save values from the response using pm.collectionVariables.set() and reference them with {{variable_name}}
  • C) Write the data to a file and read it in the next request
  • D) Use HTTP cookies to persist data between requests
Show Answer

B) Save values from the response using pm.collectionVariables.set() and reference them with {{variable_name}}

In the post-response script of Request 1:

const data = pm.response.json();
pm.collectionVariables.set("prediction_id", data.id);

In Request 2's URL or body:

{{prediction_id}}

This creates a workflow where the output of one request feeds into the next.


Section C — Model Explainability (10 questions)

Question 16

Why is model explainability important under the EU AI Act?

  • A) It helps models train faster
  • B) High-risk AI systems must provide explanations for their automated decisions, with fines up to €35M for non-compliance
  • C) It's optional but improves model accuracy
  • D) It's only required for open-source models
Show Answer

B) High-risk AI systems must provide explanations for their automated decisions, with fines up to €35M for non-compliance

The EU AI Act classifies AI systems by risk level. High-risk systems (healthcare, finance, hiring, law enforcement, education) are legally required to provide explanations for their decisions. Non-compliance can result in fines up to €35 million or 7% of global annual revenue.


Question 17

What is the difference between a local and a global explanation?

  • A) Local runs on your machine, global runs in the cloud
  • B) Local explains one specific prediction, global explains the model's behavior across all predictions
  • C) Local uses LIME, global uses SHAP — they cannot be swapped
  • D) Local is for classification, global is for regression
Show Answer

B) Local explains one specific prediction, global explains the model's behavior across all predictions

  • Local: "Why did the model predict this specific result for this specific input?" (e.g., "This loan was rejected because of high debt ratio")
  • Global: "What features matter in general across all predictions?" (e.g., "Income and credit score are the two most important features overall")

Both LIME and SHAP can provide local explanations. SHAP also provides global explanations through summary and bar plots.


Question 18

How does LIME generate an explanation for a single prediction?

  • A) It reads the model's internal weights directly
  • B) It creates perturbations of the input, gets predictions for each, and fits a local linear model
  • C) It uses gradient descent to find the most important features
  • D) It compares the prediction against all training examples
Show Answer

B) It creates perturbations of the input, gets predictions for each, and fits a local linear model

LIME's algorithm:

  1. Take the instance you want to explain
  2. Generate many similar but slightly different inputs (perturbations)
  3. Ask the black-box model to predict each perturbation
  4. Weight the perturbations by proximity to the original instance
  5. Fit a simple linear model on the weighted perturbations
  6. Extract the linear model's coefficients as feature contributions

Question 19

What mathematical concept from game theory underlies SHAP values?

  • A) Nash equilibrium
  • B) Prisoner's dilemma
  • C) Shapley values — fair distribution of payoff among cooperating players
  • D) Minimax theorem
Show Answer

C) Shapley values — fair distribution of payoff among cooperating players

SHAP is based on Shapley values (1953, Lloyd Shapley — Nobel Prize in Economics 2012). In the context of ML: the "players" are features, and the "payoff" is the prediction. Shapley values calculate each feature's fair contribution by considering all possible orderings of features and their marginal contributions.


Question 20

What does the SHAP local accuracy property guarantee?

  • A) The model achieves at least 95% accuracy
  • B) The SHAP values for a prediction sum up to the difference between the base value and the prediction
  • C) Local explanations are more accurate than global ones
  • D) SHAP produces the same explanation as LIME
Show Answer

B) The SHAP values for a prediction sum up to the difference between the base value and the prediction

Mathematically: base_value + Σ(SHAP values) = prediction

This means the explanation perfectly accounts for the prediction. If the base value (average prediction) is 0.50 and the prediction is 0.85, then the SHAP values sum to exactly 0.35. This property is unique to SHAP and doesn't hold for LIME.


Question 21

In a SHAP force plot, what do the red and blue colors represent?

  • A) Red = correct prediction, Blue = incorrect prediction
  • B) Red = features pushing the prediction higher, Blue = features pushing it lower
  • C) Red = continuous features, Blue = categorical features
  • D) Red = training data, Blue = test data
Show Answer

B) Red = features pushing the prediction higher, Blue = features pushing it lower

In a SHAP force plot:

  • Red (left side) represents features that push the prediction up (toward a higher value / Class 1)
  • Blue (right side) represents features that push the prediction down (toward a lower value / Class 0)
  • The width of each bar indicates the magnitude of the feature's contribution
  • The prediction is where the red and blue forces balance

Question 22

What information does a SHAP summary plot provide?

  • A) A summary of all SHAP research papers
  • B) A global view showing feature importance, SHAP value distribution, and feature value correlation
  • C) A summary of model hyperparameters
  • D) A comparison of different model architectures
Show Answer

B) A global view showing feature importance, SHAP value distribution, and feature value correlation

The SHAP summary plot shows:

  • Y-axis: Features ranked by importance (most important at top)
  • X-axis: SHAP value (impact on prediction)
  • Each dot: One data point (one prediction)
  • Color: Feature value (red = high original value, blue = low original value)

This lets you see both which features matter and how their values affect predictions.


Question 23

When comparing LIME and SHAP, which statement is true?

  • A) LIME has stronger theoretical guarantees than SHAP
  • B) SHAP is always faster than LIME
  • C) SHAP provides mathematical guarantees (local accuracy, consistency, missingness) that LIME does not
  • D) LIME works only with tree-based models
Show Answer

C) SHAP provides mathematical guarantees (local accuracy, consistency, missingness) that LIME does not

SHAP has three key properties from Shapley value theory:

  1. Local accuracy: Values sum to the prediction minus the base value
  2. Consistency: If a feature contributes more in model B than A, its SHAP value is higher
  3. Missingness: Features that don't affect the output get SHAP = 0

LIME has none of these formal guarantees. However, LIME is often faster for single predictions and more intuitive to understand.


Question 24

A data scientist explains to a bank regulator: "This loan was rejected because the applicant's debt-to-income ratio of 0.85 had a SHAP value of -0.28, the credit score of 580 had a SHAP value of -0.22, and the employment duration of 3 months had a SHAP value of -0.15." This is an example of:

  • A) A global explanation
  • B) A local explanation
  • C) A model summary
  • D) A fairness analysis
Show Answer

B) A local explanation

This is a local explanation because it explains a specific prediction for a specific applicant. It details exactly which features contributed to this particular decision and by how much. A global explanation would instead describe overall feature importance across all applicants.


Question 25

You run SHAP on your model and find that the feature "gender" has a mean |SHAP| value of 0.15, making it the 3rd most important feature. What should you do?

  • A) Nothing — the model is working correctly
  • B) Remove the feature and retrain the model
  • C) Investigate for potential bias — a protected attribute having high importance is a red flag that requires fairness analysis
  • D) Increase the number of training samples
Show Answer

C) Investigate for potential bias — a protected attribute having high importance is a red flag that requires fairness analysis

When a protected attribute (gender, race, age, religion) has high feature importance, it means the model is using it significantly in its decisions. This is a bias red flag that requires:

  1. Fairness analysis: Check if the model treats different groups differently
  2. Disparate impact testing: Compare prediction distributions across groups
  3. Stakeholder discussion: Decide if the feature should be included, modified, or removed
  4. Regulatory compliance: The EU AI Act and many regulations prohibit discrimination by protected attributes

Simply removing the feature (option B) may not fix the issue — other correlated features may serve as proxies.


Score Interpretation

ScoreLevelRecommendation
23-25🏆 ExcellentYou have a solid understanding of testing and explainability. Ready for Module 6!
18-22🟢 GoodReview the concepts you missed, especially the LIME/SHAP comparison.
13-17🟡 AcceptableRe-read the theory sections and redo the labs before proceeding.
< 13🔴 Needs WorkGo through Module 5 content again. Focus on pytest fundamentals and SHAP properties.

What You Learned in Module 5

Module 5 Summary
  1. Testing AI systems requires extra attention due to non-determinism, data dependency, and silent failures
  2. pytest is the standard testing framework — master fixtures, parametrize, markers, and conftest.py
  3. TestClient enables fast API testing without starting a server
  4. Postman provides visual API testing with scripted assertions and CI/CD integration via Newman
  5. LIME explains individual predictions using local linear approximations
  6. SHAP uses game theory to provide mathematically rigorous feature attributions
  7. Explainability is not optional — it's required by regulation and essential for trust
  8. Combine testing + explainability for robust, trustworthy AI deployments