Job Description
Salary: $? - ? per year
Requirements:- Several years of software engineering experience (3 years or more)
- Strong expertise in Python with deep knowledge of frameworks, tooling, and best practices for building production-grade software.
- Experience building full-stack applications and deploying scalable software using modern languages and tools.
- Deep understanding of software architecture, design, development, debugging, and code quality/review assessment.
- Excellent oral and written communication skills for clear, structured evaluation rationales.
- Ideal Background:
- This role is ideal for engineers who have built production systems at companies like Google, Microsoft, Apple, Amazon, Meta, or similar high-scale engineering organizations. We especially welcome graduates from top computer science programs such as Stanford, MIT, Carnegie Mellon, UC Berkeley, Georgia Tech, and comparable institutions - though exceptional experience and skill always take precedence over pedigree.
- Project Overview:
- As a Software Engineering evaluator, you will create cutting-edge datasets for training, benchmarking, and advancing large language models, collaborating closely with researchers. This includes curating code examples, providing precise solutions, and making corrections - with a primary focus on Python across backend services, data pipelines, and ML infrastructure, alongside JavaScript (including ReactJS), C/C++, Java, Rust, and Go. You will evaluate and refine AI-generated code for efficiency, scalability, and reliability, and work with cross-functional teams to enhance enterprise-level AI-driven coding solutions.
- What Does a Typical Day Look Like?
- Work on AI model training initiatives by curating code examples, building solutions, and correcting code - primarily in Python, with additional work in JavaScript (including ReactJS), C/C++, Java, Rust, and Go.
- Evaluate and refine AI-generated code to ensure that it is efficient, scalable, and reliable.
- Collaborate with cross-functional teams to enhance AI-driven coding solutions against industry performance benchmarks.
- Build agents and automated verification tools in Python that can verify the quality of code and identify error patterns.
- Hypothesize on steps in the software engineering cycle (prototyping, architecture design, API design, production implementation, launch, experiments, monitoring, operational maintenance) and evaluate model capabilities on them.
- Design verification mechanisms that can automatically verify a solution to a software engineering task.
- AI
- API
- Backend
- Java
- JavaScript
- Python
- Rust
More:
Engagement Details:
Commitment: flexible engagement, minimum 10 hrs/week, up to 40 hrs/week
Type: Contractor (no medical/paid leave)
Duration: 1 month (potential extensions based on performance and fit)
Location: Candidates must be based in the United States
Evaluation Process:
The application process takes 15-30 minutes.
Completion of an AI video interview is required.
last updated 24 week of 2026
Salary: $? - ? per year
Requirements:- Several years of software engineering experience (3 years or more)
- Strong expertise in Python with deep knowledge of frameworks, tooling, and best practices for building production-grade software.
- Experience building full-stack applications and deploying scalable software using modern languages and tools.
- Deep understanding of software architecture, design, development, debugging, and code quality/review assessment.
- Excellent oral and written communication skills for clear, structured evaluation rationales.
- Ideal Background:
- This role is ideal for engineers who have built production systems at companies like Google, Microsoft, Apple, Amazon, Meta, or similar high-scale engineering organizations. We especially welcome graduates from top computer science programs such as Stanford, MIT, Carnegie Mellon, UC Berkeley, Georgia Tech, and comparable institutions - though exceptional experience and skill always take precedence over pedigree.
- Project Overview:
- As a Software Engineering evaluator, you will create cutting-edge datasets for training, benchmarking, and advancing large language models, collaborating closely with researchers. This includes curating code examples, providing precise solutions, and making corrections - with a primary focus on Python across backend services, data pipelines, and ML infrastructure, alongside JavaScript (including ReactJS), C/C++, Java, Rust, and Go. You will evaluate and refine AI-generated code for efficiency, scalability, and reliability, and work with cross-functional teams to enhance enterprise-level AI-driven coding solutions.
- What Does a Typical Day Look Like?
- Work on AI model training initiatives by curating code examples, building solutions, and correcting code - primarily in Python, with additional work in JavaScript (including ReactJS), C/C++, Java, Rust, and Go.
- Evaluate and refine AI-generated code to ensure that it is efficient, scalable, and reliable.
- Collaborate with cross-functional teams to enhance AI-driven coding solutions against industry performance benchmarks.
- Build agents and automated verification tools in Python that can verify the quality of code and identify error patterns.
- Hypothesize on steps in the software engineering cycle (prototyping, architecture design, API design, production implementation, launch, experiments, monitoring, operational maintenance) and evaluate model capabilities on them.
- Design verification mechanisms that can automatically verify a solution to a software engineering task.
- AI
- API
- Backend
- Java
- JavaScript
- Python
- Rust
More:
Engagement Details:
Commitment: flexible engagement, minimum 10 hrs/week, up to 40 hrs/week
Type: Contractor (no medical/paid leave)
Duration: 1 month (potential extensions based on performance and fit)
Location: Candidates must be based in the United States
Evaluation Process:
The application process takes 15-30 minutes.
Completion of an AI video interview is required.
last updated 24 week of 2026