AI Safety & Ethics
AI safety is not a compliance checkbox -- it is a core engineering and organisational discipline. These principles apply regardless of which model or tool you use.
Helpful, Harmless, Honest
The three core alignment objectives shared across all major AI labs. Every responsible AI system tries to balance genuine usefulness against avoiding harm, while maintaining honesty. When these goals conflict, harm avoidance and honesty take precedence.
Bias and Fairness
LLMs inherit biases from their training data. They may produce outputs that reflect historical inequities, stereotype groups, or perform inconsistently across languages and cultures. Always test AI systems on diverse inputs before deployment.
Privacy and Data
Do not send personal data, credentials, or confidential documents to AI models without a clear legal basis and appropriate data processing agreements. Treat AI prompts as potential data flows subject to GDPR, CCPA, and sector-specific regulations.
Human Oversight
AI systems should support human oversight, not undermine it. For high-stakes decisions -- legal, medical, financial -- AI outputs must be reviewed by qualified humans before action. The model's confidence is not a substitute for expert judgment.
| Risk Category | Example | Mitigation |
|---|---|---|
| Hallucination | Fabricated legal citations in a brief | RAG grounding + human review gate |
| PII leakage | User pastes customer data into prompt | PII detection layer + policy training |
| Bias in output | Resume screening that disadvantages groups | Diverse test sets + output audits |
| Prompt injection | Malicious data in tool result hijacks agent | Output sanitisation + sandboxed execution |
| Over-reliance | Decisions made without human review | Mandatory review gates for high-stakes actions |