About the Role
We are seeking an experienced AI Expert and Consultant to join our National Large Language Model (LLM) Project, replacing ChatGPT usage in the workplace. As a key technical advisor, you will provide expertise across the full LLM stack, from model training and fine-tuning to deployment and RAG implementation.
Key Responsibilities:
- Provide strategic guidance and technical oversight for the development of Arabic LLM initiative
- Evaluate and select appropriate base models (7B and 30B parameter models) based on benchmarking performance for Arabic language tasks
- Design data acquisition strategies and processing pipelines for Arabic language data, including Omani dialect-specific data
- Lead continuous pre-training, supervised fine-tuning, RLHF and DPO implementation processes
- Design and implement comprehensive evaluation frameworks for LLM assessment across diverse Arabic tasks
- Establish benchmarking methodologies aligned with international standards
- Coordinate model submissions to relevant leaderboards to demonstrate comparative performance
- Architect efficient tokenization approaches optimized for Arabic language to improve token fertility
- Develop RAG (Retrieval Augmented Generation) frameworks for government document search and integration
- Consult on model optimization techniques for efficient inference, including quantization strategies and knowledge distillation
- Implement model compression techniques to create efficient student models for deployment
- Lead the design of model guardrails that align with Omani cultural values and governmental requirements
- Advise on infrastructure requirements for model training, fine-tuning, and deployment
- Collaborate with AWS/Cohere/other vendor teams to implement the technical solution
- Conduct knowledge transfer sessions to build local AI capabilities
- Guide the development of metrics and monitoring frameworks for model performance
- Implement and manage data annotation pipelines and quality assurance processes
Requirements:
- Degree(s) in Computer Science, AI, Machine Learning, or a related field
- 5+ years of experience in LLM and deep learning, with specific expertise in transformer-based language models
- Hands-on experience with full-cycle LLM development, including pre-training, fine-tuning, and deployment
- Demonstrated expertise in Arabic NLP, particularly tokenization approaches and language-specific optimization
- Extensive experience with LLM evaluation methodologies, including automatic metrics and human evaluation protocols
- Proficiency in designing, implementing, and analyzing benchmark suites for language models
- Experience with leaderboard submission processes and performance verification
- Experience with RLHF (Reinforcement Learning from Human Feedback) and DPO (Direct Preference Optimization) techniques
- Strong knowledge of model quantization techniques (INT8, INT4, GPTQ, AWQ, etc.) and their impact on model performance
- Deep expertise in knowledge distillation methods for creating smaller, efficient student models from larger teacher models
- Practical knowledge of efficient model inference strategies and optimization techniques
- Deep understanding of RAG systems and information retrieval
- Proficiency in PyTorch, TensorFlow, or JAX for LLM development
- Strong understanding of distributed training systems and GPU optimization
- Experience with containerization (Docker) and orchestration (Kubernetes) for ML workloads
- Knowledge of responsible AI practices and guardrail implementation
- Experience working with government or enterprise LLM deployments
- Proficiency with Scale AI SGP (Synthetic Generation Platform) or similar tools for data generation and model evaluation
- Experience with data annotation platforms and human feedback collection systems
- Leadership skills with ability to guide technical teams and communicate with stakeholders
Preferred Qualifications:
- Previous experience with Arabic-specific language models like Jais or similar
- Experience with AWS cloud services, particularly SageMaker, HyperPod, Trainium, and other AI/ML infrastructure
- Previous work with model customization and adaptation for specific languages or domains
- Understanding of data privacy considerations for government applications
- Experience implementing token efficiency strategies for non-English languages
- Knowledge of MLOps practices for LLM lifecycle management
- Experience with post-training optimization techniques like pruning, weight sharing, and structured sparsity
- Demonstrated ability to communicate complex technical concepts to non-technical stakeholders
- Experience with Scale AI's Human Feedback and AI Evaluation tools
- Familiarity with other data annotation and synthetic data generation platforms (Snorkel, Humanloop, etc.)
- Experience building custom evaluation harnesses for LLM performance assessment
- Demonstrated success in optimizing models for resource-constrained environments
- Experience with holistic LLM valuation methodologies that assess both performance and business impact
- Familiarity with public LLM benchmarks and Arabic-specific evaluation suites
- Experience benchmarking models against commercial offerings (OpenAI, Claude, etc.)