AI QA Tester

  • -
  • Full-Time
  • Remote

Job Description:

Job Overview

We are looking for an AI QA Engineer to join one of our projects and help ensure the quality, reliability, and safety of advanced AI systems. In this role, you will test and validate LLM-based applications, agentic workflows, and AI-powered systems, working across software, infrastructure, and hardware components.

You will play a critical role in identifying issues, validating model behavior, and building automated testing frameworks to support the deployment of high-quality, production-ready AI solutions.

Key Responsibilities

LLM & Agent Evaluation

  • Evaluate advanced language models across areas such as hallucination detection, factual consistency, prompt-injection resistance, bias/fairness auditing, and reasoning reliability.

  • Design and execute test plans for single-agent and multi-agent systems, validating reasoning paths, memory behavior, and tool usage.

  • Assess semantic correctness of model outputs, beyond simple pass/fail validation.

  • Validate RAG pipelines and retrieval accuracy within AI workflows.

  • Document failure modes and propose improvements to model performance and reliability.

Test Automation & Frameworks

  • Perform manual and exploratory testing for complex AI-driven applications to identify bugs and unexpected behaviors.

  • Build and maintain automated test suites for LLM-based features using tools such as PyTest, OpenAI Evals, and Weights & Biases.

  • Develop regression testing frameworks to detect model performance changes across releases.

  • Implement and maintain CI/CD-integrated testing pipelines to monitor AI model regressions.

Collaboration & Quality Assurance

  • Work closely with AI engineers, product teams, and data scientists to ensure system reliability and performance.

  • Contribute to improving testing methodologies, evaluation metrics, and QA processes for AI-driven applications.

  • Ensure AI systems meet quality, reliability, and safety standards before production deployment.

Required Skills & Experience

Experience

  • 3–5+ years of experience in QA engineering

  • 1–2+ years testing AI/ML or LLM-based systems

  • Experience delivering QA validation for at least one production AI system (chatbot, agentic workflow, RAG pipeline, or similar)

  • Hands-on experience with prompt engineering and generative AI output evaluation

Technical Skills

  • Strong proficiency in Python for test scripting and automation

  • Experience with LLM evaluation tools such as:

    • OpenAI Evals

    • RAG evaluation frameworks

    • Weights & Biases

  • Familiarity with agent frameworks (LangChain, LangGraph, CrewAI, AutoGen) to trace and test agent behavior

  • Experience with API testing

  • Understanding of NLP concepts such as intent recognition and entity extraction

  • Familiarity with vector databases and RAG architectures for retrieval accuracy testing

  • Experience with version control systems (Git) and issue tracking tools (Jira, Linear)

Preferred Qualifications

  • Experience in AI safety testing, red-teaming, or adversarial testing

  • Skills in evaluation rubric design, bias and fairness auditing, and grounding verification

  • Experience testing AI systems in regulated industries (healthcare, fintech, legal)

  • Background in NLP, computational linguistics, or data science

  • Familiarity with RLHF (Reinforcement Learning from Human Feedback) and model alignment concepts

  • Security certifications such as OSCP or OSWA

  • Coursework or experience related to AI safety