LLM Evaluation Engineer

Mountain View, CA

About the Role:
We are seeking a LLM Evaluation Engineer to join a forward-thinking team responsible for developing a sophisticated voice assistant platform. This isn’t your typical QA role – it’s a unique blend of technical engineering, machine learning evaluation, and data analysis. You’ll work closely with cutting-edge conversational AI technology, designing evaluation frameworks, building custom scripts, and creating data visualizations to assess platform performance.
 
Key Responsibilities:
  • Design and implement evaluation strategies for voice and language models, including automated testing approaches.
  • Analyze unstructured data from log store systems to identify performance gaps and optimize user experiences.
  • Build and maintain custom Python scripts to streamline data processing and generate actionable insights.
  • Develop visual reports to communicate findings and drive continuous improvement.
  • Collaborate with cross-functional teams globally to identify and address pain points in conversational AI performance.
  • Use prompt engineering techniques to refine LLM outputs and articulate system health.
 
Ideal Candidate:
  • 3+ years of experience in machine learning evaluation, data analysis, or related technical roles.
  • Intermediate to advanced Python scripting, including log parsing and API testing.
  • Familiarity with GenAI and LLMs, including automated workflows and API integrations.
  • Strong analytical mindset, capable of working independently and identifying innovative solutions.
  • Excellent communication skills, able to present complex findings clearly to both technical and non-technical stakeholders.

 

Apply Now

Required
Required
Required if no phone number provided
Required if no email address provided. Phone Number must be 10 digits.
Required, maximum file size is 512KB, allowed file types are doc, docx, pdf, odf, and txt

Not yet ready to apply?

Join our talent community