Overview
AI Agent Evaluation Analyst - AI Trainer at Mindrift. This is a flexible, project-based opportunity suitable for remote, async work. The Mindrift platform connects domain experts with AI projects powered by Toloka. Our mission is to unlock the potential of GenAI by leveraging real-world expertise from across the globe.
We are looking for curious, intellectually proactive contributors who double-check assumptions and think critically. If you enjoy ambiguity and complexity and want to learn how modern AI systems are tested and evaluated, this role may be a fit.
About the project
We are seeking QA-like support for autonomous AI agents on a project focused on validating and improving complex task structures, policy logic, and agent evaluation frameworks. You will balance quality assurance, research, and logical problem-solving to help ensure robust testing of systems.
You do not need a coding background, but you must be curious, rigorous, and able to evaluate the soundness and consistency of complex setups. If you have experience in consulting, case solving, systems thinking, or related activities, you might be a great fit.
Responsibilities
- Review evaluation tasks and scenarios for logic, completeness, and realism
- Identify inconsistencies, missing assumptions, or unclear decision points
- Help define clear expected behaviours (gold standards) for AI agents
- Annotate cause–effect relationships, reasoning paths, and plausible alternatives
- Think through complex systems and policies to ensure agents are tested properly
- Collaborate with QA, writers, or developers to suggest refinements or edge-case coverage
What you’ll be doing
Reviewing evaluation tasks and scenarios for logic, completeness, and realismIdentifying inconsistencies, missing assumptions, or unclear decision pointsHelping define clear expected behaviours (gold standards) for AI agentsAnnotating cause–effect relationships, reasoning paths, and plausible alternativesThinking through complex systems and policies as a human to ensure proper testing of agentsWorking with QA, writers, or developers to suggest refinements or edge-case coverageRequirements
Excellent analytical thinking : ability to reason about complex systems, scenarios, and logical implicationsStrong attention to detail : identify contradictions, ambiguities, and vague requirementsFamiliarity with structured data formats; ability to read JSON / YAML is helpfulAbility to assess scenarios holistically : identify what’s missing or unrealistic and what could breakGood communication and clear writing in English to document findingsWhat we value
Experience with policy evaluation, logic puzzles, case studies, or structured scenario designBackground in consulting, academia, Olympiads, or researchExposure to LLMs, prompt engineering, or AI-generated contentFamiliarity with QA or test-case thinking (edge cases, failure modes)Understanding of how scoring or evaluation works in agent testing (precision, coverage)Benefits
Compensation up to $38 / hour depending on skills and project needsFlexible, remote, freelance project that fits around commitmentsOpportunity to gain experience on an advanced AI projectContribute to shaping how AI models understand and communicate in your fieldHow to get started
Apply to this post, qualify, and contribute to a project aligned with your skills on your own schedule. Shape the future of AI while building tools that benefit everyone.
#J-18808-Ljbffr