Safety & Ethics | AI Training Hub

AI Foundations

Module 05

Foundation

Safe Customer Interactions: ShopMate will reply to real customers. A hallucinated delivery date or a promise of a refund the business cannot honour would cause serious problems. The team adds guardrails: ShopMate can only state facts from the order database, never invent information, and always escalates refund requests to a human.

AI Safety & Ethics

AI safety is not a compliance checkbox -- it is a core engineering and organisational discipline. These principles apply regardless of which model or tool you use.

Helpful, Harmless, Honest

The three core alignment objectives shared across all major AI labs. Every responsible AI system tries to balance genuine usefulness against avoiding harm, while maintaining honesty. When these goals conflict, harm avoidance and honesty take precedence.

Bias and Fairness

LLMs inherit biases from their training data. They may produce outputs that reflect historical inequities, stereotype groups, or perform inconsistently across languages and cultures. Always test AI systems on diverse inputs before deployment.

Privacy and Data

Do not send personal data, credentials, or confidential documents to AI models without a clear legal basis and appropriate data processing agreements. Treat AI prompts as potential data flows subject to GDPR, CCPA, and sector-specific regulations.

Human Oversight

AI systems should support human oversight, not undermine it. For high-stakes decisions -- legal, medical, financial -- AI outputs must be reviewed by qualified humans before action. The model's confidence is not a substitute for expert judgment.

Risk Category	Example	Mitigation
Hallucination	Fabricated legal citations in a brief	RAG grounding + human review gate
PII leakage	User pastes customer data into prompt	PII detection layer + policy training
Bias in output	Resume screening that disadvantages groups	Diverse test sets + output audits
Prompt injection	Malicious data in tool result hijacks agent	Output sanitisation + sandboxed execution
Over-reliance	Decisions made without human review	Mandatory review gates for high-stakes actions

<-- Agents & Tools Next: Enterprise Strategy -->