We are seeking a LLM Evaluation Engineer to join a forward-thinking team responsible for developing a sophisticated voice assistant platform. This isn’t your typical QA role – it’s a unique blend of technical engineering, machine learning evaluation, and data analysis. You’ll work closely with cutting-edge conversational AI technology, designing evaluation frameworks, building custom scripts, and creating data visualizations to assess platform performance.
Key Responsibilities:
- Design and implement evaluation strategies for voice and language models, including automated testing approaches.
- Analyze unstructured data from log store systems to identify performance gaps and optimize user experiences.
- Build and maintain custom Python scripts to streamline data processing and generate actionable insights.
- Develop visual reports to communicate findings and drive continuous improvement.
- Collaborate with cross-functional teams globally to identify and address pain points in conversational AI performance.
- Use prompt engineering techniques to refine LLM outputs and articulate system health.
Ideal Candidate:
- 3+ years of experience in machine learning evaluation, data analysis, or related technical roles.
- Intermediate to advanced Python scripting, including log parsing and API testing.
- Familiarity with GenAI and LLMs, including automated workflows and API integrations.
- Strong analytical mindset, capable of working independently and identifying innovative solutions.
- Excellent communication skills, able to present complex findings clearly to both technical and non-technical stakeholders.