Principal Scientist - Computational Biologist and AI/ML Researcher
About the position
Responsibilities
• Develop and deploy ML models that deepen our understanding of complex biological systems in health and disease.
• Promote open science through publishing papers and open-source code.
• Collaborate with teams of scientists, computational biologists, and software engineers within the Allen Institute and external partners.
• Advance community standards for scalability in developing, disseminating, and evaluating AI/ML/computational methods for scientific problems.
• Lead the development of state-of-the-art engineering infrastructure at the Allen Institute to support AI/ML research and applications.
• Stay up-to-date with the latest advancements in AI/ML and their potential applications in biological research.
• Foster a collaborative and inclusive work environment that values diversity and encourages participation from team members with different voices, experiences, and backgrounds.
• Mentor and guide junior researchers, interns, or students working on AI/ML projects related to biological research.
• Participate in institute-wide initiatives, workshops, and seminars to promote cross-disciplinary collaboration and knowledge sharing.
Requirements
• PhD in Computer Science, Applied Mathematics, Computational Biology, Statistics, Biostatistics or similar field; or equivalent combination of degree and experience.
• 11 years of equivalent experience.
• Demonstrated ability to design, implement and apply AI/ML models for the analysis of large-scale biological data.
Nice-to-haves
• 15+ years of experience developing and applying ML methods.
• Strong publication record of innovative scientific accomplishments (both individual and team).
• Expertise in Python-based ML libraries and frameworks such as PyTorch, Jax, Pyro, NumPy, and Pandas.
• Solid understanding of statistical analysis, data preprocessing, feature selection, and model evaluation techniques.
• Experience building data pipelines to make biological data ML-ready, pipeline for model training and evaluation.
• Knowledge of data preprocessing, normalization, and integration techniques specific to biological and clinical datasets.
• Experience with distributed computing for ML models, (e.g. distributing load across multiple nodes, Ray, HPC, Distributed PyTorch, etc.).
• Experience with data visualization and presentation of complex biological findings to both technical and non-technical audiences.
• Strong problem-solving skills and ability to develop innovative computational approaches to address complex biological questions.
• Proven ability to work independently and manage multiple projects simultaneously while meeting deadlines.
• Excellent written and verbal communication skills, with the ability to collaborate effectively in a multidisciplinary team environment.
Benefits
• Medical insurance
• Dental insurance
• Vision insurance
• Basic life insurance
• 401k plan
• Paid time off
Apply tot his job
Apply To this Job