About the Role

We are seeking an experienced AI Expert and Consultant to join our National Large Language Model (LLM) Project, replacing ChatGPT usage in the workplace. As a key technical advisor, you will provide expertise across the full LLM stack, from model training and fine-tuning to deployment and RAG implementation.

Key Responsibilities:

  • Provide strategic guidance and technical oversight for the development of Arabic LLM initiative
  • Evaluate and select appropriate base models (7B and 30B parameter models) based on benchmarking performance for Arabic language tasks
  • Design data acquisition strategies and processing pipelines for Arabic language data, including Omani dialect-specific data
  • Lead continuous pre-training, supervised fine-tuning, RLHF and DPO implementation processes
  • Design and implement comprehensive evaluation frameworks for LLM assessment across diverse Arabic tasks
  • Establish benchmarking methodologies aligned with international standards
  • Coordinate model submissions to relevant leaderboards to demonstrate comparative performance
  • Architect efficient tokenization approaches optimized for Arabic language to improve token fertility
  • Develop RAG (Retrieval Augmented Generation) frameworks for government document search and integration
  • Consult on model optimization techniques for efficient inference, including quantization strategies and knowledge distillation
  • Implement model compression techniques to create efficient student models for deployment
  • Lead the design of model guardrails that align with Omani cultural values and governmental requirements
  • Advise on infrastructure requirements for model training, fine-tuning, and deployment
  • Collaborate with AWS/Cohere/other vendor teams to implement the technical solution
  • Conduct knowledge transfer sessions to build local AI capabilities
  • Guide the development of metrics and monitoring frameworks for model performance
  • Implement and manage data annotation pipelines and quality assurance processes

Requirements:

  • Degree(s) in Computer Science, AI, Machine Learning, or a related field
  • 5+ years of experience in LLM and deep learning, with specific expertise in transformer-based language models
  • Hands-on experience with full-cycle LLM development, including pre-training, fine-tuning, and deployment
  • Demonstrated expertise in Arabic NLP, particularly tokenization approaches and language-specific optimization
  • Extensive experience with LLM evaluation methodologies, including automatic metrics and human evaluation protocols
  • Proficiency in designing, implementing, and analyzing benchmark suites for language models
  • Experience with leaderboard submission processes and performance verification
  • Experience with RLHF (Reinforcement Learning from Human Feedback) and DPO (Direct Preference Optimization) techniques
  • Strong knowledge of model quantization techniques (INT8, INT4, GPTQ, AWQ, etc.) and their impact on model performance
  • Deep expertise in knowledge distillation methods for creating smaller, efficient student models from larger teacher models
  • Practical knowledge of efficient model inference strategies and optimization techniques
  • Deep understanding of RAG systems and information retrieval
  • Proficiency in PyTorch, TensorFlow, or JAX for LLM development
  • Strong understanding of distributed training systems and GPU optimization
  • Experience with containerization (Docker) and orchestration (Kubernetes) for ML workloads
  • Knowledge of responsible AI practices and guardrail implementation
  • Experience working with government or enterprise LLM deployments
  • Proficiency with Scale AI SGP (Synthetic Generation Platform) or similar tools for data generation and model evaluation
  • Experience with data annotation platforms and human feedback collection systems
  • Leadership skills with ability to guide technical teams and communicate with stakeholders

Preferred Qualifications:

  • Previous experience with Arabic-specific language models like Jais or similar
  • Experience with AWS cloud services, particularly SageMaker, HyperPod, Trainium, and other AI/ML infrastructure
  • Previous work with model customization and adaptation for specific languages or domains
  • Understanding of data privacy considerations for government applications
  • Experience implementing token efficiency strategies for non-English languages
  • Knowledge of MLOps practices for LLM lifecycle management
  • Experience with post-training optimization techniques like pruning, weight sharing, and structured sparsity
  • Demonstrated ability to communicate complex technical concepts to non-technical stakeholders
  • Experience with Scale AI's Human Feedback and AI Evaluation tools
  • Familiarity with other data annotation and synthetic data generation platforms (Snorkel, Humanloop, etc.)
  • Experience building custom evaluation harnesses for LLM performance assessment
  • Demonstrated success in optimizing models for resource-constrained environments
  • Experience with holistic LLM valuation methodologies that assess both performance and business impact
  • Familiarity with public LLM benchmarks and Arabic-specific evaluation suites
  • Experience benchmarking models against commercial offerings (OpenAI, Claude, etc.)