MARS 4
MARS 4
Paper
Characterizing Backtracking in CoT through Internal Probes and Surface-Level Features
Published at ICLR 2026
MARS 4
Post
Attack Selection In Agentic AI Control Evals Can Decrease Safety
Published on LessWrong
MARS 4
Paper
When Are Two Networks the Same? Tensor Similarity for Mechanistic Interpretability
MARS 4
Paper
Jailbreaking Vision-Language Models Through the Visual Modality
Accepted to ICML 2026
MARS 4
Article
Making Extreme AI Risk Tradeable: A New Financial Instrument for Catastrophic AI Risk
Published in AI Frontiers
MARS 4
Article
Artificial Intelligence in the States
Published in Law & Liberty
MARS 4
Report
AI Governance Mapping Project
Past MARS
MARS 2
Paper
Combining Cost-Constrained Runtime Monitors for AI Safety
Accepted to NeurIPS 2025
MARS 2
Paper
Large Language Models Can Learn and Generalize Steganographic Chain-of-Thought Under Process Supervision
Accepted to NeurIPS 2025
MARS 3
Paper
A transformer architecture alteration to incentivise externalised reasoning
MARS 3
Interactive Demo
AI Has Opinions, and They’re Not the Same as Yours.
MARS 3
Paper
Depth-Wise Activation Steering for Honest Language Models
MARS 2
Paper
Evaluating Language Model Character Traits
MARS 1
Paper