AIF-C01 Exam Study Guide

Getting ready for the AIF-C01 exam covers a wide range of topics, from basic machine learning theory and generative AI to AWS services, responsible AI, and governance frameworks.

The real challenge is not just understanding the material, but also figuring out which service to choose when several seem similar.

This guide helps by offering clear summaries, direct service comparisons, and warnings about common exam pitfalls based on real question trends.

Exam Domain Overview

Domain Approx. Weight
Fundamentals of AI and ML ~20%
Fundamentals of Generative AI ~24%
Applications of Foundation Models ~28%
Guidelines for Responsible AI ~14%
Security, Compliance, and Governance ~14%

Domain 1: Fundamentals of AI and ML

1.1 ML Paradigms

Paradigm How It Works Example Use Cases
Supervised Labeled data, learns input→output mapping Classification, regression, fraud detection
Unsupervised Unlabeled data, finds hidden patterns Clustering, customer segmentation, anomaly detection
Semi-supervised Mix of labeled + unlabeled data When labeling is expensive
Reinforcement Agent takes actions, receives rewards/penalties Robotics, game AI, AWS DeepRacer

The deciding factor is the data, not the task.

Exam scenarios describe a business use case and ask which learning paradigm applies. Candidates often default to supervised learning because the task sounds familiar, even when the scenario explicitly states there are no labels.

Always identify whether the training data is labeled or unlabeled first, then map to the paradigm. If data has no labels, unsupervised is correct regardless of the application domain.

1.2 ML Algorithms — When to Use What

Algorithm Type Use Case
Linear regression Supervised Numeric prediction (prices, demand)
Logistic regression Supervised Binary/multi-class classification; interpretable
Decision tree Supervised Classification & regression; interpretable
K-means Unsupervised Clustering customers/data into groups
K-nearest neighbors (k-NN) Supervised Classification based on proximity to labeled examples
SVM Supervised Classification
Random forest / Ensemble Supervised Combining models for higher accuracy
CNN Supervised Image classification, object detection
RNN Supervised Sequential/time-series data
GAN Generative Synthetic data generation
Autoencoder Unsupervised Anomaly detection with no labeled data
ARIMA Statistical Time-series forecasting
BERT Transformer Contextual NLP, text fill-in

K-means and K-NN share a letter but nothing else.

K-means is an unsupervised clustering algorithm — it groups data points by similarity without any labels.

K-nearest neighbors is a supervised classification algorithm — it assigns a label to a new data point by looking at the labels of its closest neighbors in the training set.

Because the names are visually similar, this is a reliable exam trap. When you see either algorithm as an option, immediately check whether the scenario involves labeled or unlabeled data to eliminate the wrong one.

When interpretability is a requirement, deep learning is wrong.

Certain domains — regulated industries, financial decisions, medical risk scoring — require that a model's reasoning be traceable and explainable to non-technical stakeholders.

Neural networks and deep learning architectures are black boxes: they can produce accurate outputs but cannot clearly articulate why.

Logistic regression and decision trees make their decision boundaries explicit and auditable.

If a scenario mentions regulatory requirements, auditability, or the need to explain a decision to a human, interpretable models are always the correct choice.

1.3 The ML Lifecycle (in order)

  1. Business goal identification — define objectives; compliance requirements determined here
  2. Data collection
  3. Data preprocessing — filtering, cleaning, handling missing values (imputation), normalization
  4. Exploratory Data Analysis (EDA) — correlation matrices, statistics, visualizations, pattern discovery
  5. Feature engineering — creating/selecting input variables
  6. Model training
  7. Model evaluation — test against metrics
  8. Deployment — inference begins here
  9. Monitoring — detect drift, retrain

EDA and data preprocessing are neighboring steps with entirely different purposes.

Preprocessing is a corrective step: it resolves known data quality problems such as missing values, inconsistent formats, outliers, and unscaled features.

EDA is a discovery step: it uses statistics and visualizations to uncover patterns, distributions, and correlations that inform later decisions.

A scenario describing correlation analysis, distribution plots, or anomaly discovery is describing EDA.

A scenario describing normalization, imputation, or data cleaning is describing preprocessing.

Conflating the two leads to selecting the wrong lifecycle phase.

Compliance and regulatory requirements belong in the business goal phase, not later stages.

Many candidates assume compliance surfaces during data handling or model evaluation. In the ML lifecycle, determining which legal or regulatory frameworks apply to a solution is part of scoping the problem — it is the first step.

Questions that ask when compliance considerations are identified should point to business goal identification.

1.4 Evaluation Metrics

Metric Best For Description
Accuracy Balanced classes % of total correct predictions
Precision Minimizing false positives Of all flagged positives, how many are actually positive?
Recall Minimizing false negatives Of all actual positives, how many did the model catch?
F1 Score Imbalanced classes Harmonic mean of precision and recall
ROC / AUC Binary classifiers Trade-off between sensitivity and specificity
Confusion matrix Multi-class Shows all correct + misclassification patterns
MSE / RMSE Regression Numeric prediction error
R-squared Regression Variance explained by model
BLEU Translation quality Compare machine vs. human translation
ROUGE Summarization quality Recall-based; compare generated vs. reference summaries
BERTScore Semantic text similarity For style/coherence evaluation

Precision and recall require reading the scenario carefully for which type of error matters most.

These two metrics measure opposite risks.

Precision answers: "When the model flags something as positive, how often is it actually positive?" It matters when acting on a false positive is costly — wasted resources, unnecessary interventions, eroded trust.

Recall answers: "Of all the things that are actually positive, how many did the model catch?" It matters when missing a true positive is costly — an undetected threat, a missed diagnosis.

Identifying which error is described in the scenario determines the correct metric.

Accuracy is a misleading metric on imbalanced datasets.

A model that always predicts the majority class can achieve high accuracy while failing entirely at its task. When class distribution is skewed — such as rare events or minority categories — F1 score, precision, recall, or AUC give a more honest picture.

Accuracy should only be treated as a reliable standalone metric when classes are roughly balanced.

BLEU and ROUGE are task-specific and not interchangeable.

BLEU was designed for machine translation evaluation: it measures n-gram overlap between a machine-generated translation and a human reference translation.

ROUGE was designed for summarization evaluation: it measures recall of key content from a reference summary.

Neither is appropriate for regression or classification tasks. Applying BLEU to summarization or ROUGE to translation is a common wrong answer.

1.5 Bias, Variance, Overfitting, Underfitting

Problem Bias Variance Fix
Underfitting High Low More data, more features, more epochs, less regularization
Overfitting Low High More data, regularization, fewer features, less training
Ideal model Low Low

The fix for overfitting requires increasing regularization, which is counterintuitive.

Overfitting means the model has learned the training data too specifically and fails to generalize. The instinctive response is to train more or increase model complexity — both of which make the problem worse. The correct remedies are to increase the regularization parameter (adding a penalty for complexity), reduce the number of features, or supply more diverse training data.

Decreasing the regularization parameter relaxes constraints on the model and exacerbates overfitting.

1.6 Key ML Concepts

  • Epoch: One complete pass through the entire training dataset
  • Gradient descent: Optimization algorithm to minimize the loss function
  • Backpropagation: Algorithm for updating neural network weights based on error
  • Normalization: Scaling features so they contribute equally to the model
  • Imputation: Technique for handling missing data
  • Transfer learning: Reusing a pre-trained model for a new related task
  • Ensemble learning: Combining multiple models to improve performance
  • Tokenization (NLP): Breaking text into smaller units for processing
  • Embeddings: Numerical vector representations that capture semantic meaning; enable mathematical comparison of texts
  • Context window: Maximum text an LLM can process in one operation
  • Inference: Using a trained model to make predictions on new data

1.7 Inference Types

Type Latency Use Case
Real-time Low (immediate) Patient check-ins, live predictions, immediate response required
Serverless Medium (cold starts) Intermittent workloads, no infrastructure management
Asynchronous Higher (queued) Large payloads up to 1 GB, processing up to 1 hour, near-real-time OK
Batch transform High (scheduled) Large datasets, once-per-day inference, no immediacy needed

Asynchronous and batch inference are not the same despite both being non-immediate.

Asynchronous inference is designed for large individual requests — single inputs up to 1 GB that take a long time to process — where the result is retrieved shortly after completion rather than immediately.

Batch transform is designed for bulk processing of entire datasets at a scheduled time, with no real-time component at all.

Serverless inference is optimized for sporadic, low-frequency workloads where cold starts are acceptable; it is not the same as real-time, which requires consistently low latency.

Reading the scenario for payload size, frequency, and urgency is the key to differentiating these.

Domain 2: Fundamentals of Generative AI

2.1 Key Concepts

  • Foundation Model (FM): Large model pre-trained on massive data; broad generalized capabilities; base for many AI applications
  • Large Language Model (LLM): A type of FM specialized for language understanding and generation
  • LLMs are non-deterministic: Same input can produce different outputs; this is expected behavior
  • Tokens: Basic units (words/subwords) that LLMs process; inference cost is driven by token count
  • Embeddings: High-dimensional vectors that capture semantic relationships; enable similarity searches
  • Context window: Max tokens in a single prompt+response; if exceeded, model fails
  • Hallucinations: Model generates plausible but false information; reduce by lowering temperature or using RAG/guardrails

2.2 Inference Parameters

Parameter Controls Effect
Temperature Randomness/creativity Higher = more creative/diverse; Lower = more deterministic/consistent
Top K Number of candidate tokens considered Controls vocabulary breadth
Top P Cumulative probability of candidates Percentage of likely candidates considered
Max tokens Output length Hard cap on response length
Stop sequences Where generation stops Specific strings that terminate output

Temperature is about output consistency

Temperature controls the probability distribution over possible next tokens at each generation step. Setting it to 0 makes the model deterministically select the most probable token every time, producing stable and repeatable output.

Setting it high allows lower-probability tokens to be selected, increasing variety and creativity at the cost of reliability.

For any scenario that requires consistent, repeatable, or predictable outputs, temperature should be set as close to 0 as possible.

Top K and Top P both narrow token selection but operate on different principles.

Top K sets an absolute limit: at each step, only the K most probable tokens are eligible, regardless of how their probabilities are distributed.

Top P sets a probabilistic threshold: tokens are added to the candidate pool in descending probability order until their cumulative probability reaches P, meaning the actual number of candidates varies by context.

Both reduce randomness, but Top P is more adaptive to the shape of the probability distribution.

The exam may present both as options — Top K controls a count, Top P controls a cumulative percentage.

LLM inference cost is determined by token volume

What drives cost is the total number of tokens processed: input tokens (the prompt, any examples, injected context) plus output tokens (the generated response). To reduce inference cost, reduce prompt length, eliminate unnecessary examples, or constrain maximum output tokens.

2.3 Prompt Engineering Techniques

Technique Description When to Use
Zero-shot No examples provided Simple tasks, general instructions
One-shot One example provided When a single example clarifies format
Few-shot Multiple examples (2–10+) Format matching, style alignment, classification
Chain-of-thought (CoT) Ask model to reason step-by-step Complex reasoning, math, multi-step problems
Negative prompting Specify what to exclude Image generation, content control
Prompt chaining Break complex task into sequential subtasks Multi-step workflows
ReAct prompting Reasoning + acting with real-time tool calls Chatbots that query live data (e.g., inventory)
Least-to-most Decompose problem from simple to complex
Directional stimulus Guide model with hints about desired output

Chain-of-thought and few-shot are frequently confused because both involve additional content in the prompt.

Few-shot prompting provides labeled input-output examples that demonstrate a desired format, style, or classification pattern — the model learns by imitation.

Chain-of-thought prompting asks the model to work through intermediate reasoning steps before arriving at a final answer — the model learns to reason, not just imitate.

If the scenario involves showing the model examples of what good output looks like, that is few-shot. If the scenario involves asking the model to explain its reasoning or solve a problem step by step, that is chain-of-thought.

Prompt engineering is always the first and lowest-cost intervention before any model customization.

When a model produces output in the wrong format, language, or structure, the correct first response is to refine the prompt — not to resize the model, adjust the architecture, or initiate fine-tuning.

Prompt changes require no training, no infrastructure, and no cost beyond the token count.

Fine-tuning and retraining are appropriate only after prompt engineering has been genuinely exhausted.

2.4 Model Customization Options (Least → Most Effort)

  1. Prompt engineering — no training, cheapest, try first
  2. RAG (Retrieval Augmented Generation) — inject external knowledge at query time; best for frequently changing data
  3. Fine-tuning — supervised training on labeled prompt/completion pairs; adapts style/domain
  4. Continued pre-training — training on unlabeled domain text; for domain vocabulary adaptation
  5. Training from scratch — maximum control, maximum cost

RAG and fine-tuning address fundamentally different problems and cannot substitute for each other.

Fine-tuning reshapes how the model behaves — its tone, vocabulary, output structure, and domain-specific response patterns — but it does not efficiently update the model's factual knowledge over time.

RAG does not change the model at all; instead it retrieves current, relevant information at query time and injects it into the prompt, keeping responses grounded in up-to-date sources.

When knowledge changes frequently, fine-tuning is the wrong tool: you would need to retrain the model every time facts change. RAG handles dynamic knowledge without any retraining.

Fine-tuning and continued pre-training require different data formats and serve different goals.

Fine-tuning requires labeled pairs of prompts and expected completions — structured input-output examples that teach the model to behave in a specific way for a specific task.

Continued pre-training uses large volumes of raw, unlabeled domain text to expose the model to domain-specific terminology and writing patterns, without teaching it to produce any particular output format.

Providing unlabeled text when fine-tuning is expected, or providing labeled pairs when continued pre-training is described, will not produce the intended result.

2.5 Model Types

Model Type Purpose
Text generation / GPT Generate text, code, SQL from natural language
Text embedding Convert text to vectors for similarity search/RAG
Multi-modal embedding Handle text AND images in queries
Multi-modal generation (large multi-modal LLM) Accept text/image input; produce text/image output
Diffusion model Image generation via iterative denoising
GAN Two competing networks; synthetic data
Transformer Self-attention mechanism; basis for most modern LLMs
WaveNet Audio/speech synthesis
VAE Generative model using latent space compression

Embedding models and generation models serve opposite purposes and cannot be swapped.

An embedding model converts text or images into dense numerical vectors for use in similarity comparisons, semantic search, and RAG retrieval pipelines — it does not generate readable output.

A generation model produces new content: text, images, or audio.

When a use case involves searching, matching, or ranking content by semantic similarity, an embedding model is needed. When it involves producing new content from a prompt, a generation model is needed.

Multi-modal variants exist for both — they accept or produce combinations of text and images, but the embedding vs. generation distinction still applies.

2.6 Evaluation Metrics for Generative AI

  • BLEU — translation quality (relative comparison to human reference)
  • ROUGE — summarization quality (recall-oriented)
  • BERTScore — semantic similarity; for style/coherence tasks
  • F1 score — precision+recall balance; classification accuracy after fine-tuning

BLEU is a comparative metric, not an absolute quality score.

A BLEU score only has meaning when comparing two systems translating the same source content against the same human reference. It cannot tell you whether a translation is "good" in isolation — only whether it resembles the human reference more or less than another system does.

BLEU also does not measure fluency, meaning, or style directly. It is specifically a translation evaluation tool and should not be applied to summarization, classification, or other generative tasks.

Domain 3: Applications of Foundation Models

3.1 Amazon Bedrock

Core platform for accessing and customizing foundation models without managing infrastructure.

Feature Purpose
Foundation Models Access models from Anthropic, Meta, Amazon, etc.
Knowledge Bases Fully managed RAG; default vector store = OpenSearch Serverless
Agents Orchestrate multi-step tasks; call APIs, query databases, take actions
Guardrails Filter harmful content, block topics, protect PII
Fine-tuning Customize models with your labeled data (prompt/completion pairs)
Provisioned Throughput For steady, predictable workloads (cost-effective at scale)
On-Demand Throughput Pay-per-use; best for experimentation or unpredictable usage
Invocation Logging Log model inputs/outputs for monitoring
Watermark detection Identifies images created by Amazon Titan Image Generator
PartyRock Free playground for experimenting with generative AI
bedrock-runtime API Makes inference requests
bedrock-agent-runtime API Invokes agents and queries knowledge bases

Guardrails and watermark detection are completely separate features that do not overlap in purpose.

Guardrails are a runtime content control mechanism: they inspect model inputs and outputs and enforce rules around harmful content categories, blocked topics, sensitive information, and specific words. ** Watermark detection** is a provenance tool: it determines whether a given image was generated by the Amazon Titan Image Generator, helping identify AI-created images after the fact.

One filters what the model says; the other identifies the origin of an image. They are not interchangeable and serve unrelated use cases.

Amazon Bedrock does not expose user data to the underlying model providers.

A common assumption is that using a third-party model through Bedrock means the model vendor can see your prompts and responses. This is not the case: Amazon Bedrock does not share user inputs or model outputs with any third-party model provider. This data privacy assurance is frequently tested and is a key differentiator from direct API access to those same model vendors.

A fine-tuned model in Bedrock cannot serve traffic until Provisioned Throughput is purchased.

Fine-tuning in Bedrock creates a custom model artifact, but that artifact is not automatically deployable. Unlike base models, which can be invoked on demand, custom fine-tuned models require a Provisioned Throughput commitment before they can receive inference requests. Skipping this step means the custom model has no serving capacity and cannot be used in production.

3.2 Amazon Bedrock Guardrails — Content Filter Categories

Filters for: Violence, Hate speech, Sexual content, Insults, Misconduct

Built-in content filters and configured denied topics are not the same mechanism, and the default content filter categories are narrower than most candidates expect.

Content filters target universally harmful content: violence, hate, sexual content, insults, and misconduct. They do not, by default, block topics that are sensitive but not inherently harmful — such as politics, religion, competitor products, or gambling. Those types of restrictions require explicitly configuring a denied topics list.

Assuming that content filters cover all unwanted content is a common error; topic-based restrictions require a separate and deliberate configuration step.

3.3 SageMaker Services Reference

Service Purpose
SageMaker Canvas No-code ML model building
SageMaker Ground Truth Data labeling with human annotators
SageMaker Ground Truth Plus Fully managed labeling (no app development needed)
SageMaker Data Wrangler Data preparation, transformation, feature engineering
SageMaker Feature Store Centralized feature repository; share features across teams/models
SageMaker Experiments Track and compare ML experiments
SageMaker Clarify Bias detection and model explainability
SageMaker Model Monitor Detect data/model drift in production
SageMaker Model Registry Store, version, and manage ML models
SageMaker Model Cards Document model purpose, metrics, limitations
SageMaker Model Dashboard Monitor and manage multiple deployed models
SageMaker JumpStart Pre-built models and solutions; accelerate development
SageMaker Autopilot Automated model tuning (AutoML)
SageMaker Debugger Real-time training metrics
SageMaker HyperPod Distributed training; reduces training time up to 40%
SageMaker Studio Lab Free environment for ML experimentation
Amazon A2I (Augmented AI) Human review workflows for ML predictions

SageMaker Clarify and SageMaker Model Monitor are both quality tools but address different problems at different lifecycle stages.

Clarify is used during development: it statistically evaluates training data and model outputs for bias across demographic groups, and it generates feature-level explanations (using Shapley values) for why a model made specific predictions.

Model Monitor is used in production: it continuously compares live inference data against a baseline to detect when the model's input distribution or output behavior has drifted from its original state.

Clarify does not detect drift; Model Monitor does not explain predictions or detect bias in training data.

SageMaker Model Cards and SageMaker Model Registry are commonly confused because both involve recording information about models.

Model Cards are documentation artifacts for transparency and compliance: they describe the model's intended use, training methodology, performance characteristics, limitations, and ethical considerations.

Model Registry is a versioning and lifecycle management system: it stores the trained model artifacts, tracks versions, and controls the promotion workflow from development to production.

One is a human-readable document; the other is a software catalog and deployment pipeline.

3.4 AWS AI Services (Managed, No ML Expertise Needed)

Service Function
Amazon Textract Extract text and data from documents/PDFs/scanned images
Amazon Transcribe Speech-to-text; subtitles
Amazon Transcribe Medical Speech-to-text with healthcare compliance
Amazon Comprehend NLP: sentiment analysis, entity recognition, PII detection, toxicity detection
Amazon Comprehend Medical Extract medical info from clinical notes
Amazon Rekognition Computer vision: object detection, face recognition, image/video analysis
Amazon Translate Language translation
Amazon Polly Text-to-speech
Amazon Lex Build conversational chatbots
Amazon Kendra ML-powered enterprise search
Amazon Personalize Real-time personalization and recommendations
Amazon Forecast Time-series demand/traffic forecasting
Amazon Macie Detect and protect sensitive/PII data in S3
AWS HealthScribe Generative AI for clinical note dictation

Amazon Rekognition is a computer vision service with no language or translation capabilities.

Rekognition analyzes images and video frames to detect objects, faces, scenes, and text embedded within visual content. It does not translate, understand natural language, or handle multilingual content.

Any scenario involving multiple spoken or written languages should point to Amazon Translate for text conversion between languages, or Amazon Polly for synthesizing speech in a target language — not Rekognition.

Performing sentiment analysis on audio is a two-service pipeline, not a one-service task.

Amazon Transcribe converts audio to text; it does not analyze sentiment.

Amazon Comprehend analyzes text for sentiment, entities, and key phrases; it cannot process audio.

Neither service covers the full workflow alone. Selecting only Transcribe leaves the analysis undone; selecting only Comprehend ignores the fact that the input is audio. The correct answer always requires both services in sequence.

Amazon Textract and Amazon Comprehend both deal with text but operate at completely different layers.

Textract is an extraction tool: it reads scanned documents, PDFs, and images and pulls out the raw text and structured data embedded within them, such as tables and form fields.

Comprehend is an analysis tool: it processes text that has already been extracted and derives meaning from it — sentiment, named entities, key phrases, PII, language.

Textract sees pixels and produces text; Comprehend sees text and produces insight. They are complementary services, not alternatives.

3.5 Amazon Q Services

Service Use Case
Amazon Q Business Enterprise AI assistant; answers questions from internal data
Amazon Q Developer AI coding assistant in IDE (code, test, document)
Amazon Q in QuickSight BI dashboards via natural language
Amazon Q in Connect Customer service agent assistance
Amazon Q Apps Create and share generative AI-powered apps within Q Business

The Amazon Q variants target completely different user personas and are not interchangeable.

Q Business serves general enterprise employees querying internal company knowledge bases.

Q Developer serves software engineers within an IDE for coding, testing, and documentation tasks.

Q in QuickSight serves business analysts creating data visualizations through natural language.

Q in Connect supports live customer service agents during active customer interactions.

Selecting Q Business for a developer productivity scenario, or Q Developer for an enterprise knowledge question, are common mistakes caused by treating "Amazon Q" as a single product rather than a family of purpose-specific tools.

3.6 RAG (Retrieval Augmented Generation)

RAG retrieves relevant content from a knowledge base at query time and injects it as context into the prompt.

When to use RAG:

  • Knowledge base changes frequently
  • Large documentation base
  • Cost-effective alternative to fine-tuning
  • Need factual, grounded responses

RAG Pipeline — offline batch processing steps (done ahead of query time):

  • Generation of content embeddings
  • Creation of the search index

RAG Pipeline — online (done at query time):

  • Generation of embeddings for user queries
  • Retrieval of relevant content
  • Response generation

Vector databases for RAG: Amazon OpenSearch Service, Amazon Aurora PostgreSQL (with pgvector), Amazon Redshift

RAG addresses knowledge gaps; fine-tuning addresses behavioral gaps. Applying the wrong one wastes time and money.

If the issue is that the model does not know certain facts, lacks access to specific documents, or produces outdated information, RAG is the appropriate solution — it retrieves the relevant knowledge at query time without modifying the model.

If the issue is that the model produces output in the wrong format, tone, structure, or domain-specific style despite having the necessary knowledge, fine-tuning is the appropriate solution — it shapes behavior through additional training.

Attempting to fine-tune a model on factual documents to keep it current requires retraining every time facts change, which is expensive and operationally impractical.

Domain 4: Guidelines for Responsible AI

4.1 Core Responsible AI Principles

Principle Description
Fairness Model treats all groups equitably; diverse, balanced training data
Transparency Stakeholders can understand how the system works
Explainability Model can provide rationale for individual decisions (e.g., Shapley values, PDPs)
Privacy & Security Protect personal data; prevent exposure
Governance Policies, guidelines, auditing, compliance frameworks
Safety Prevent harm; human oversight

Fairness and explainability are related concepts that test different things.

Fairness is a population-level property: it asks whether the model produces equitable outcomes across different demographic or social groups, and whether certain groups are systematically advantaged or disadvantaged by the model's predictions.

Explainability is a decision-level property: it asks whether the reasoning behind a specific prediction can be articulated in a way that a human can understand, verify, and act on. A model can be explainable but unfair if its transparent reasoning is based on biased patterns.

The exam differentiates them by whether the scenario describes a group-level outcome disparity (fairness) or the need to justify a specific individual decision (explainability).

4.2 Types of AI Bias

Bias Type Description Example
Sampling bias Training data not representative of population Security camera model trained mainly on one demographic
Measurement bias Inconsistent data collection Systematic errors in how data is recorded
Confirmation bias Model reinforces existing assumptions
Observer bias Human annotator introduces personal bias

How to address bias:

  • Use diverse, balanced training datasets
  • Data augmentation for underrepresented groups
  • Apply fairness metrics during evaluation
  • Use SageMaker Clarify for bias detection

Sampling bias and measurement bias are both data problems but have different root causes and different solutions.

Sampling bias is a coverage problem: the data collection process failed to represent certain groups, conditions, or scenarios in proportion to how they appear in the real world. The model then learns a skewed view of reality.

Measurement bias is an accuracy problem: data was collected from the right population but recorded, labeled, or quantified inconsistently — the same real-world condition is captured differently across subgroups or time periods.

Fixing sampling bias requires collecting more representative data; fixing measurement bias requires standardizing collection and labeling procedures.

4.3 AWS Tools for Responsible AI

Tool Purpose
SageMaker Clarify Bias detection + explainability (Shapley values)
SageMaker Model Cards Document model purpose, risks, performance for transparency
AWS AI Service Cards Transparency documentation for AWS-managed AI services
Amazon Bedrock Guardrails Filter harmful content, block topics, protect PII in generative AI
Amazon A2I Human review of ML predictions at defined confidence thresholds
RLHF Incorporate human preferences during model training
Amazon Comprehend PII detection and redaction; toxicity detection

Pitfall — RLHF and Amazon A2I both involve human feedback but operate at completely different points in the AI lifecycle and should never be confused.

RLHF (Reinforcement Learning from Human Feedback) is a training-time technique: human raters evaluate model-generated outputs during the training process, and those preferences are used as reward signals to steer the model toward better behavior before deployment.

Amazon A2I is an inference-time mechanism: it routes individual production predictions to human reviewers when the model's confidence falls below a defined threshold, providing a safety net around live outputs without modifying the model.

RLHF improves the model itself during training; A2I supplements the deployed model with human oversight in production.

4.4 Guardrails for Amazon Bedrock — Filter Types

Filter Blocks
Content filters Violence, hate, sexual content (harmful categories)
Denied topics Specific topics you define (e.g., politics, competitor products)
Sensitive information filters PII and other sensitive data
Word filters Specific words or phrases

4.5 Hallucinations

  • LLMs generate confident but false information
  • Reduce by: lowering temperature, using RAG to ground responses in facts, using Bedrock Guardrails
  • Amazon Q Business prevents hallucinations by confining responses to existing enterprise data

Domain 5: Security, Compliance, and Governance

5.1 Shared Responsibility Model for AI

  • AWS secures: Infrastructure, physical hardware, underlying cloud services
  • Customer secures: Data, access controls (IAM), model inputs/outputs, application logic

Security responsibility increases as you build more yourself:

  • Using third-party SaaS with embedded AI → lowest customer responsibility
  • Building application on existing FM → moderate
  • Fine-tuning an existing FM → more responsibility
  • Building and training FM from scratch → maximum customer responsibility

Customer security responsibility scales with how much of the stack the customer owns, and the exam tests this spectrum directly.

When a company consumes AI through a fully managed third-party application, the vendor and AWS handle nearly all infrastructure and model security; the customer is responsible only for access management and appropriate usage.

As customers move toward building, fine-tuning, or training their own models, they absorb increasing responsibility for training data security, model artifact protection, output validation, and infrastructure configuration.

The exam will describe an implementation approach and ask where responsibility lies — always map the approach to its position on the spectrum from fully managed to fully custom.

5.2 IAM and Access Control

  • Use IAM policies to restrict which foundation models employees can access
  • Use custom service roles per team to restrict data access in Bedrock (e.g., each team only sees their S3 data)
  • Use AWS IAM Identity Center to securely integrate Bedrock into enterprise systems

Data access isolation across multiple teams in Bedrock requires separate, scoped service roles — not a single shared role filtered at the application layer.

A single Bedrock service role with broad S3 permissions satisfies basic functionality but violates least privilege and creates no enforcement boundary between teams. Each team should be assigned a role scoped exclusively to their own data resources.

Relying on application logic or user self-reporting to limit data access is not a security control — it is an assumption, and it fails the moment the application has a bug or a user makes a mistake.

5.3 Network Security for AI

Service/Feature Purpose
AWS PrivateLink Private connectivity between VPC and Bedrock/SageMaker — no public internet
VPC with S3 endpoint Manage secure data flow from S3 to SageMaker
SageMaker network isolation Run training/inference jobs without internet access
AWS Shield DDoS protection

AWS PrivateLink is the correct and only answer when traffic between a VPC and an AWS service must not traverse the public internet.

CloudFront is a content delivery network that accelerates public-facing traffic — it does not create private network paths. Internet gateways route traffic through the public internet by definition, which is the opposite of what private connectivity requires.

A VPN connects on-premises networks to AWS but does not create a private path between AWS services within the same cloud environment.

PrivateLink creates an internal network endpoint within the AWS backbone, ensuring that data between the VPC and the service never leaves the AWS network and never touches the public internet.

5.4 Monitoring and Audit

Service Use
AWS CloudTrail Log API calls; identify unauthorized access attempts; audit trail
Amazon CloudWatch Operational monitoring, metrics, alarms for model performance
AWS Config Track configuration compliance against rules
AWS Audit Manager Assess compliance with frameworks; generate audit reports
AWS Artifact Access AWS compliance reports and certifications
Amazon Macie Detect sensitive/PII data in S3; automated alerts
Amazon Inspector Security vulnerability assessment

CloudTrail, CloudWatch, Config, and Audit Manager all relate to monitoring but serve non-overlapping purposes, and selecting the wrong one is a common exam error.

CloudTrail records who made API calls, from where, and when — it is the authoritative source for access auditing and investigating unauthorized activity.

CloudWatch monitors what the system is doing right now — collecting operational metrics, logs, and events to trigger alarms and dashboards.

Config evaluates whether AWS resource configurations comply with defined rules over time, answering whether infrastructure is set up according to policy.

Audit Manager aggregates evidence across services to support formal compliance assessments against frameworks like ISO or SOC, producing structured audit reports.

Each serves a distinct governance layer; the scenario's question — who accessed what, what is the system doing, is the configuration compliant, or what does our compliance evidence show — determines the correct service.

Amazon Macie is the purpose-built service for automated sensitive data detection across S3.

Comprehend can detect and redact PII within text that is already extracted and passed to it programmatically, but it requires integration work and does not natively scan S3 buckets.

Macie continuously monitors S3 objects for sensitive data patterns — including PII, credentials, and financial information — and generates automated findings and alerts without requiring application code to orchestrate it.

When a scenario describes automated, ongoing discovery of sensitive content in S3 with minimal development effort, Macie is the answer.

5.5 Encryption and Data Protection

  • AWS KMS — Customer-managed encryption keys for model artifacts and data
  • SSE-S3 — S3-managed encryption (Bedrock service role must have decrypt permissions)
  • Federated learning — Train on distributed data without centralizing it; preserves privacy/compliance

Encrypting training data protects it at rest and in transit, but does not prevent a trained model from learning and reproducing that information.

A common misconception is that if sensitive data is encrypted before being used for training, the model cannot expose it. Encryption governs access to the raw data — it does not affect what patterns the model learns. If personally identifiable, confidential, or regulated information is present in training data, the model may reproduce or infer that information in its outputs regardless of how the source data was stored. The correct approach to preventing a model from generating responses based on sensitive training content is to remove or de-identify that data before training ever begins.

5.6 Data Governance Concepts

Concept Description
Data residency Where data is physically stored geographically
Data retention Policies for how long data is kept before deletion
Data lineage Tracking data flow for compliance and audit
Data de-identification Removing PII from datasets
Data quality Standards ensuring accuracy and reliability

Data residency and data lineage are distinct governance concerns that are frequently conflated.

Data residency is a geographic constraint: regulations in many jurisdictions prohibit certain types of data — patient records, financial data, citizen data — from being stored or processed outside a defined geographic boundary. It is a where question.

Data lineage is an audit and traceability concern: it tracks how data has moved, been transformed, and been used throughout its lifecycle. It is a history question.

A scenario about regulatory requirements preventing data from crossing a national border is a data residency concern; a scenario about tracing the origin of a dataset used to train a model is a data lineage concern.

5.7 Model Documentation and Governance

  • SageMaker Model Cards — Standardize documentation: intended use, training details, performance, limitations, risks
  • Infrastructure as Code (IaC) — Enables consistent, scalable, repeatable ML deployments
  • MLflow with SageMaker — Manage and track ML experiments collaboratively

5.8 Prompt Injection and Security

  • Prompt injection — Attacker crafts inputs to override a model's instructions or extract system prompt contents
  • Extracting the prompt template — Specific attack that exposes the configured system behavior of an LLM
  • Defense: Use Bedrock Guardrails, dynamic context-aware prompt templates, denied topics

Prompt injection is an input-layer attack and cannot be mitigated by changing the underlying model.

The vulnerability exists because LLMs process user-supplied input and system instructions in the same text stream, making it possible for a crafted user message to override, neutralize, or extract the system's configured behavior.

Switching to a different base model or fine-tuning does not close this vulnerability because the architecture of how instructions and inputs are combined remains unchanged.

Defenses must be applied at the input handling and prompt construction layer: using Guardrails to inspect inputs, structuring prompts to separate system instructions from user content, and using denied topics to block extraction attempts.

High-Frequency Tricky Scenarios

"Most cost-effective" questions

Scenario Answer
Improve FM accuracy Prompt engineering first (cheapest)
Frequently changing knowledge base RAG (not fine-tuning)
Steady request rate for custom model Provisioned Throughput
Unpredictable/experimental usage On-Demand Throughput
Reduce token costs Decrease tokens in prompt
Monthly reports, not immediate Batch transform

"Least effort / least operational overhead" questions

Scenario Answer
Apply safeguards to LLM Amazon Bedrock Guardrails
Add subtitles/voice-overs to film Transcribe + Translate + Polly
Detect sensitive data in S3 Amazon Macie
Fine-tune open-source LLM SageMaker JumpStart
Build ML model without code SageMaker Canvas
Human data labeling without managing workforce SageMaker Ground Truth Plus
Experiment with generative AI for free PartyRock

Model performance problems

Symptom Diagnosis Fix
Good on training, bad on new data Overfitting Increase regularization, more training data
Bad on training and new data Underfitting More epochs, features, reduce regularization
Model performance degrades over time Data/concept drift Retrain with fresh data; SageMaker Model Monitor
Disproportionate outcomes for groups Bias SageMaker Clarify; diverse training data

Quick Service Disambiguation

Textract vs. Transcribe vs. Translate vs. Comprehend

  • Textract = extract text FROM documents/images (OCR)
  • Transcribe = convert speech/audio TO text
  • Translate = convert text from one language TO another language
  • Comprehend = understand/analyze text content (NLP)

Clarify vs. Model Monitor vs. Model Cards

  • Clarify = detect bias + explain predictions (Shapley values)
  • Model Monitor = detect drift in production
  • Model Cards = document model for transparency/compliance

RAG vs. Agents vs. Knowledge Bases

  • RAG = the technique (retrieve + generate)
  • Knowledge Bases = AWS managed RAG implementation in Bedrock
  • Agents = orchestrate multi-step tasks (retrieve + act + loop)

CloudTrail vs. CloudWatch vs. Config vs. Audit Manager

  • CloudTrail = API call logging (who did what)
  • CloudWatch = metrics, alarms, operational monitoring
  • Config = resource configuration compliance rules
  • Audit Manager = compliance framework reporting and evidence collection

Ready to put this knowledge to the test? CertVista AIF-C01 offers a realistic test environment that mirrors the real exam experience — along with domain breakdowns and the latest updates to the question format.

If you're still weighing whether this certification is right for you, our AWS Certified AI Practitioner overview covers what the credential entails, who it's designed for, and where it fits in the broader AWS certification path.


Last updated: Sunday, 08 March 2026

Certifications
New exams
An unhandled error has occurred. Reload 🗙