This job offer is not available in your country.

AI Agent Evaluation Analyst - AI Trainer

MindriftWorkFromHome, Auckland, New Zealand

1 day ago

Job description

Overview

AI Agent Evaluation Analyst - AI Trainer at Mindrift. This is a flexible, project-based opportunity suitable for remote, async work. The Mindrift platform connects domain experts with AI projects powered by Toloka. Our mission is to unlock the potential of GenAI by leveraging real-world expertise from across the globe.

We are looking for curious, intellectually proactive contributors who double-check assumptions and think critically. If you enjoy ambiguity and complexity and want to learn how modern AI systems are tested and evaluated, this role may be a fit.

About the project

We are seeking QA-like support for autonomous AI agents on a project focused on validating and improving complex task structures, policy logic, and agent evaluation frameworks. You will balance quality assurance, research, and logical problem-solving to help ensure robust testing of systems.

You do not need a coding background, but you must be curious, rigorous, and able to evaluate the soundness and consistency of complex setups. If you have experience in consulting, case solving, systems thinking, or related activities, you might be a great fit.

Responsibilities

Review evaluation tasks and scenarios for logic, completeness, and realism
Identify inconsistencies, missing assumptions, or unclear decision points
Help define clear expected behaviours (gold standards) for AI agents
Annotate cause–effect relationships, reasoning paths, and plausible alternatives
Think through complex systems and policies to ensure agents are tested properly
Collaborate with QA, writers, or developers to suggest refinements or edge-case coverage

What you’ll be doing

Reviewing evaluation tasks and scenarios for logic, completeness, and realism

Identifying inconsistencies, missing assumptions, or unclear decision points

Helping define clear expected behaviours (gold standards) for AI agents

Annotating cause–effect relationships, reasoning paths, and plausible alternatives

Thinking through complex systems and policies as a human to ensure proper testing of agents

Working with QA, writers, or developers to suggest refinements or edge-case coverage

Requirements

Excellent analytical thinking : ability to reason about complex systems, scenarios, and logical implications

Strong attention to detail : identify contradictions, ambiguities, and vague requirements

Familiarity with structured data formats; ability to read JSON / YAML is helpful

Ability to assess scenarios holistically : identify what’s missing or unrealistic and what could break

Good communication and clear writing in English to document findings

What we value

Experience with policy evaluation, logic puzzles, case studies, or structured scenario design

Background in consulting, academia, Olympiads, or research

Exposure to LLMs, prompt engineering, or AI-generated content

Familiarity with QA or test-case thinking (edge cases, failure modes)

Understanding of how scoring or evaluation works in agent testing (precision, coverage)

Benefits

Compensation up to $38 / hour depending on skills and project needs

Flexible, remote, freelance project that fits around commitments

Opportunity to gain experience on an advanced AI project

Contribute to shaping how AI models understand and communicate in your field

How to get started

Apply to this post, qualify, and contribute to a project aligned with your skills on your own schedule. Shape the future of AI while building tools that benefit everyone.

#J-18808-Ljbffr

Create a job alert for this search

Trainer • WorkFromHome, Auckland, New Zealand