Human-AI Interaction · AI Safety · Research & Evaluation · AI Experience Design
I study how people understand, trust, and make decisions with AI, then design interventions that improve the interaction.
MPhil in Human-Inspired AI, University of Cambridge. Experience working on AI products and multimodal systems. I combine usability research, LLM evaluation, and interaction design.
ColdFit: SUS 90 (guided), n = 8 · LLM Literacy Cards: large belief change (d = 0.89) and medium–large shifts in scenario-based responses (d = 0.71), n = 49
What I Do
Study how users interpret AI outputs, calibrate trust, handle uncertainty, and decide whether to act. Mixed-methods: usability testing, pre/post studies, think-aloud, surveys.
Build rubric-based evaluation pipelines for LLM outputs. Pairwise comparison, absolute scoring, position randomisation, inter-model agreement. Not just "does it work" but "does it work safely."
Turn research findings into interaction patterns, prompt structures, and user-facing interventions. Cards, conversation flows, verification cues, uncertainty framing.
Methods & Tools
Featured Work
MPhil Dissertation · Cambridge
Designed seven “myth → reality → habit” cards targeting common LLM misconceptions: confidence, citations, web access, privacy, memory, high-stakes trust, and neutrality. Built a three-layer LLM-as-a-judge evaluation pipeline to select the strongest prompt format per card from 105 generated responses and 285 pairwise judgements. Then ran a Prolific pre/post study measuring belief change and behavioural intention.
What I found: A single viewing of the cards produced significant belief shifts on 6 of 7 misconceptions, with privacy (d = 0.88) and citations (d = 0.74) showing the largest effects. Behavioural intentions also shifted significantly on 5 of 7 scenario tasks (composite d = 0.71), with the strongest changes on confidence under time pressure (d = 0.53) and neutrality awareness (d = 0.46). Five misconceptions showed significant effects on both belief and scenario measures: confidence, citations, web access, memory, and neutrality. The technical evaluation suggested that misconceptions requiring a broader change in response style benefited from more structured prompts, while those requiring a specific added behaviour worked with lighter cues.
What I built: Built a three-stage prompt evaluation pipeline (generation → rubric scoring → pairwise LLM-as-judge comparisons) to select the best-performing prompt per card from 105 candidates and 285 pairwise judgements. Implemented 7 Python scripts for evaluation and analysis, designed analytic rubrics with anchored descriptors, developed the Qualtrics pre/post survey (randomised card order and varied response-option ordering), managed Prolific recruitment with screen-out handling, and produced the card intervention.
Results from a pre/post study conducted as part of an MPhil dissertation at the University of Cambridge (2026).
HCI Module · Cambridge
Designed and evaluated a multimodal AI clothing assistant for international students navigating UK cold weather. Compared guided vs. free-chat interaction and text-dominant vs. image-dominant output across a 2×2 between-subjects design.
What I found: Interaction structure, not AI capability, was the primary driver of trust, effort, and decision confidence. Guided chat scored SUS 90 but users felt less autonomous. Free chat scored SUS 80 but users valued the control. Users wanted support, not prescription.
What I built: Figma prototype, Wizard-of-Oz web setup (Firebase + Netlify), structured interview guide, task-based evaluation protocol, CHI-format report.
Industry · Petabyte eSports
Worked on integrating GPT, Whisper, and DALL-E into a multimodal AI system for an eSports platform. Contributed to feature design, prompt engineering, and output quality assessment across text, audio, and image modalities.
Relevance: Hands-on experience with production AI systems, multimodal pipelines, and the gap between model capability and user experience.
What I'm Looking For
I'm looking for roles (internship or entry-level) where I can combine research, evaluation, and design to improve human interactions with AI systems.
User studies, trust calibration research, usability evaluation for AI-powered features and tools
Designing interaction patterns, conversation flows, and output framing for LLM and multimodal systems
Rubric-based LLM evaluation, red-teaming, output quality assessment, appropriate reliance testing
Human-AI interaction, AI literacy, trust and transparency, user mental models of AI systems
Get in Touch
Available from June 2026. Open to internships and entry-level roles in AI-focused UX research, AI experience design, and AI evaluation and safety. Also interested in research internships and PhD opportunities.