About the Role

We are seeking an experienced AI Expert and Consultant to join our National Large Language Model (LLM) Project, replacing ChatGPT usage in the workplace. As a key technical advisor, you will provide expertise across the full LLM stack, from model training and fine-tuning to deployment and RAG implementation.

Key Responsibilities:

Provide strategic guidance and technical oversight for the development of Arabic LLM initiative
Evaluate and select appropriate base models (7B and 30B parameter models) based on benchmarking performance for Arabic language tasks
Design data acquisition strategies and processing pipelines for Arabic language data, including Omani dialect-specific data
Lead continuous pre-training, supervised fine-tuning, RLHF and DPO implementation processes
Design and implement comprehensive evaluation frameworks for LLM assessment across diverse Arabic tasks
Establish benchmarking methodologies aligned with international standards
Coordinate model submissions to relevant leaderboards to demonstrate comparative performance
Architect efficient tokenization approaches optimized for Arabic language to improve token fertility
Develop RAG (Retrieval Augmented Generation) frameworks for government document search and integration
Consult on model optimization techniques for efficient inference, including quantization strategies and knowledge distillation
Implement model compression techniques to create efficient student models for deployment
Lead the design of model guardrails that align with Omani cultural values and governmental requirements
Advise on infrastructure requirements for model training, fine-tuning, and deployment
Collaborate with AWS/Cohere/other vendor teams to implement the technical solution
Conduct knowledge transfer sessions to build local AI capabilities
Guide the development of metrics and monitoring frameworks for model performance
Implement and manage data annotation pipelines and quality assurance processes

Requirements:

Degree(s) in Computer Science, AI, Machine Learning, or a related field
5+ years of experience in LLM and deep learning, with specific expertise in transformer-based language models
Hands-on experience with full-cycle LLM development, including pre-training, fine-tuning, and deployment
Demonstrated expertise in Arabic NLP, particularly tokenization approaches and language-specific optimization
Extensive experience with LLM evaluation methodologies, including automatic metrics and human evaluation protocols
Proficiency in designing, implementing, and analyzing benchmark suites for language models
Experience with leaderboard submission processes and performance verification
Experience with RLHF (Reinforcement Learning from Human Feedback) and DPO (Direct Preference Optimization) techniques
Strong knowledge of model quantization techniques (INT8, INT4, GPTQ, AWQ, etc.) and their impact on model performance
Deep expertise in knowledge distillation methods for creating smaller, efficient student models from larger teacher models
Practical knowledge of efficient model inference strategies and optimization techniques
Deep understanding of RAG systems and information retrieval
Proficiency in PyTorch, TensorFlow, or JAX for LLM development
Strong understanding of distributed training systems and GPU optimization
Experience with containerization (Docker) and orchestration (Kubernetes) for ML workloads
Knowledge of responsible AI practices and guardrail implementation
Experience working with government or enterprise LLM deployments
Proficiency with Scale AI SGP (Synthetic Generation Platform) or similar tools for data generation and model evaluation
Experience with data annotation platforms and human feedback collection systems
Leadership skills with ability to guide technical teams and communicate with stakeholders

Preferred Qualifications:

Previous experience with Arabic-specific language models like Jais or similar
Experience with AWS cloud services, particularly SageMaker, HyperPod, Trainium, and other AI/ML infrastructure
Previous work with model customization and adaptation for specific languages or domains
Understanding of data privacy considerations for government applications
Experience implementing token efficiency strategies for non-English languages
Knowledge of MLOps practices for LLM lifecycle management
Experience with post-training optimization techniques like pruning, weight sharing, and structured sparsity
Demonstrated ability to communicate complex technical concepts to non-technical stakeholders
Experience with Scale AI's Human Feedback and AI Evaluation tools
Familiarity with other data annotation and synthetic data generation platforms (Snorkel, Humanloop, etc.)
Experience building custom evaluation harnesses for LLM performance assessment
Demonstrated success in optimizing models for resource-constrained environments
Experience with holistic LLM valuation methodologies that assess both performance and business impact
Familiarity with public LLM benchmarks and Arabic-specific evaluation suites
Experience benchmarking models against commercial offerings (OpenAI, Claude, etc.)