Talent.com
This job offer is not available in your country.
AI Agent Evaluation Analyst - AI Trainer

AI Agent Evaluation Analyst - AI Trainer

MindriftWorkFromHome, Auckland, New Zealand
1 day ago
Job description

Overview

AI Agent Evaluation Analyst - AI Trainer at Mindrift. This is a flexible, project-based opportunity suitable for remote, async work. The Mindrift platform connects domain experts with AI projects powered by Toloka. Our mission is to unlock the potential of GenAI by leveraging real-world expertise from across the globe.

We are looking for curious, intellectually proactive contributors who double-check assumptions and think critically. If you enjoy ambiguity and complexity and want to learn how modern AI systems are tested and evaluated, this role may be a fit.

About the project

We are seeking QA-like support for autonomous AI agents on a project focused on validating and improving complex task structures, policy logic, and agent evaluation frameworks. You will balance quality assurance, research, and logical problem-solving to help ensure robust testing of systems.

You do not need a coding background, but you must be curious, rigorous, and able to evaluate the soundness and consistency of complex setups. If you have experience in consulting, case solving, systems thinking, or related activities, you might be a great fit.

Responsibilities

  • Review evaluation tasks and scenarios for logic, completeness, and realism
  • Identify inconsistencies, missing assumptions, or unclear decision points
  • Help define clear expected behaviours (gold standards) for AI agents
  • Annotate cause–effect relationships, reasoning paths, and plausible alternatives
  • Think through complex systems and policies to ensure agents are tested properly
  • Collaborate with QA, writers, or developers to suggest refinements or edge-case coverage

What you’ll be doing

  • Reviewing evaluation tasks and scenarios for logic, completeness, and realism
  • Identifying inconsistencies, missing assumptions, or unclear decision points
  • Helping define clear expected behaviours (gold standards) for AI agents
  • Annotating cause–effect relationships, reasoning paths, and plausible alternatives
  • Thinking through complex systems and policies as a human to ensure proper testing of agents
  • Working with QA, writers, or developers to suggest refinements or edge-case coverage
  • Requirements

  • Excellent analytical thinking : ability to reason about complex systems, scenarios, and logical implications
  • Strong attention to detail : identify contradictions, ambiguities, and vague requirements
  • Familiarity with structured data formats; ability to read JSON / YAML is helpful
  • Ability to assess scenarios holistically : identify what’s missing or unrealistic and what could break
  • Good communication and clear writing in English to document findings
  • What we value

  • Experience with policy evaluation, logic puzzles, case studies, or structured scenario design
  • Background in consulting, academia, Olympiads, or research
  • Exposure to LLMs, prompt engineering, or AI-generated content
  • Familiarity with QA or test-case thinking (edge cases, failure modes)
  • Understanding of how scoring or evaluation works in agent testing (precision, coverage)
  • Benefits

  • Compensation up to $38 / hour depending on skills and project needs
  • Flexible, remote, freelance project that fits around commitments
  • Opportunity to gain experience on an advanced AI project
  • Contribute to shaping how AI models understand and communicate in your field
  • How to get started

    Apply to this post, qualify, and contribute to a project aligned with your skills on your own schedule. Shape the future of AI while building tools that benefit everyone.

    #J-18808-Ljbffr

    Create a job alert for this search

    Trainer • WorkFromHome, Auckland, New Zealand