Amirat Abdulsalam — AI Experience Researcher & Designer

Featured Work

Research & Evaluation

Study Conducted

MPhil Dissertation · Cambridge

LLM Literacy Cards: Can a short intervention change how people understand AI?

d=0.89

Belief effect (large)

p<0.001

Significant pre–post improvements

d=0.71

Scenario effect (med-large)

94%

Found the cards useful

UK adults via Prolific

Designed seven “myth → reality → habit” cards targeting common LLM misconceptions: confidence, citations, web access, privacy, memory, high-stakes trust, and neutrality. Built a three-layer LLM-as-a-judge evaluation pipeline to select the strongest prompt format per card from 105 generated responses and 285 pairwise judgements. Then ran a Prolific pre/post study measuring belief change and behavioural intention.

What I found: A single viewing of the cards produced significant belief shifts on 6 of 7 misconceptions, with privacy (d = 0.88) and citations (d = 0.74) showing the largest effects. Behavioural intentions also shifted significantly on 5 of 7 scenario tasks (composite d = 0.71), with the strongest changes on confidence under time pressure (d = 0.53) and neutrality awareness (d = 0.46). Five misconceptions showed significant effects on both belief and scenario measures: confidence, citations, web access, memory, and neutrality. The technical evaluation suggested that misconceptions requiring a broader change in response style benefited from more structured prompts, while those requiring a specific added behaviour worked with lighter cues.

What I built: Built a three-stage prompt evaluation pipeline (generation → rubric scoring → pairwise LLM-as-judge comparisons) to select the best-performing prompt per card from 105 candidates and 285 pairwise judgements. Implemented 7 Python scripts for evaluation and analysis, designed analytic rubrics with anchored descriptors, developed the Qualtrics pre/post survey (randomised card order and varied response-option ordering), managed Prolific recruitment with screen-out handling, and produced the card intervention.

Results from a pre/post study conducted as part of an MPhil dissertation at the University of Cambridge (2026).

View design process View all 7 cards

Study Conducted

HCI Module · Cambridge

ColdFit: How much structure should an AI recommendation give you?

2×2 mixed-methods design, n=8 · SUS 90 (guided) / 80 (free) · Guided improved speed and usability; free chat increased autonomy but required more effort

Designed and evaluated a multimodal AI clothing assistant for international students navigating UK cold weather. Compared guided vs. free-chat interaction and text-dominant vs. image-dominant output across a 2×2 between-subjects design.

What I found: Interaction structure, not AI capability, was the primary driver of trust, effort, and decision confidence. Guided chat scored SUS 90 but users felt less autonomous. Free chat scored SUS 80 but users valued the control. Users wanted support, not prescription.

What I built: Figma prototype, Wizard-of-Oz web setup (Firebase + Netlify), structured interview guide, task-based evaluation protocol, CHI-format report.

View full case study Try the prototype

Industry · Petabyte eSports

Multimodal AI feature integration

Worked on integrating GPT, Whisper, and DALL-E into a multimodal AI system for an eSports platform. Contributed to feature design, prompt engineering, and output quality assessment across text, audio, and image modalities.

Relevance: Hands-on experience with production AI systems, multimodal pipelines, and the gap between model capability and user experience.

What I'm Looking For

Roles where research shapes the product

I'm looking for roles (internship or entry-level) where I can combine research, evaluation, and design to improve human interactions with AI systems.

UX Research (AI products)

User studies, trust calibration research, usability evaluation for AI-powered features and tools

AI Experience Design

Designing interaction patterns, conversation flows, and output framing for LLM and multimodal systems

AI Evaluation & Safety

Rubric-based LLM evaluation, red-teaming, output quality assessment, appropriate reliance testing

Research Internship / PhD

Human-AI interaction, AI literacy, trust and transparency, user mental models of AI systems

I evaluate and improve human interaction with AI systems

Three things, end to end

Research the interaction

Evaluate the AI

Design the fix

What I work with