About Company
I am currently working with a company focusing on AI, LLM and Computer Vision.
Hybrid working arrangement - 2 days office, 3 days WFH. Office location near Buona Vista. 4 rounds of interview to offer stage.
About Job
- Design and implement robust frameworks to evaluate the performance of generative AI systems, including text and multi-modal models for Large Language Models (LLMs), including but not limited to GPT-based models, BERT, T5, and other state‑of‑the‑art architectures
- Perform technical AI evaluations on LLM including assessing them for robustness in performance, embedded biases, vulnerability to jailbreaks and prompt injection attacks
- Work with stakeholders to design strong LLM models, custom evaluation approaches and a suite of technical and analytical AI evaluation frameworks and tools
- Define and refine metrics for evaluating model performance, such as perplexity, BLEU, ROUGE, accuracy, coherence, factual consistency, and bias detection
- Lead efforts in curating and managing large, high-quality datasets for evaluating LLMs
Skills and Requirements
Min 2 years for junior, 5 years for seniorExperience with Agentic AI or Agentic LLMStrong experience in evaluating LLMs using metrics such as perplexity, BLEU, ROUGE, and human‑centered evaluation techniquesProven track record of managing and analyzing large, complex language datasets, including text preprocessing and tokenizationSolid programming skills in Python and experience building automated pipelines for continuous model evaluationApplication
To apply online please use the 'apply' function, alternatively you may contact Stella at (EA : 94C3609 / R )
Skills
No additional skills required
Qualifications
No additional qualifications required
Education
Bachelor Degree
#J-18808-Ljbffr