[FULL TIME Remote] LLM Data Engineer United States Fully Remote
Seeking a new challenge? This is the perfect opportunity to grow as a LLM Data Engineer United States Fully Remote! Enjoy the freedom and flexibility of this Remote role. This position requires a strong and diverse skillset in relevant areas to drive success. This straightforward role comes with a dependable salary of a competitive salary.
Â
Â
We are seeking an experienced AI/LLM Data Engineer to build and maintain the data pipeline for our Generative AI platform. The ideal candidate will be well-versed in the latest Large Language Model (LLM) technologies and have a strong background in data engineering, with a focus on Retrieval-Augmented Generation (RAG) and knowledge-base techniques. This role sits in the AI COE within DX Tech & Digital. As a AI/LLM Data Engineer (you will report into the Director, AI Solutions & Development who oversees the AI COE.You will work on highly visible strategic projects, collaborating with cross-functional teams
to define requirements and deliver high-quality AI solutions.
The ideal candidate will have a passion for Generative AI and LLMs, with a proven track record of delivering innovative AI applications.
Responsibilities
 Design, implement, and maintain an end-to-end multi-stage data pipeline for LLMs, including Supervised Fine Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) data processes
 Identify, evaluate, and integrate diverse data sources and domains to support the Generative AI platform
 Develop and optimize data processing workflows for chunking, indexing, ingestion, and vectorization for both text and non-text data
 Benchmark and implement various vector stores, embedding techniques, and retrieval methods
 Create a flexible pipeline supporting multiple embedding algorithms, vector stores, and search types (e.g., vector search, hybrid search)
 Implement and maintain auto-tagging systems and data preparation processes for LLMs
 Develop tools for text and image data crawling, cleaning, and refinement
 Collaborate with cross-functional teams to ensure data quality and relevance for AI/ML models
 Work with data lake house architectures to optimize data storage and processing
 Integrate and optimize workflows using Snowflake and various vector store technologies
Requirements
 Master's degree in Computer Science, Data Science, or a related field
 3-5 years of work experience in data engineering, preferably in AI/ML contexts
 Proficiency in Python, JSON, HTTP, and related tools
 Strong understanding of LLM architectures, training processes, and data requirements
 Experience with RAG systems, knowledge base construction, and vector databases
 Familiarity with embedding techniques, similarity search algorithms, and information retrieval concepts
 Hands-on experience with data cleaning, tagging, and annotation processes (both manual and automated)
 Knowledge of data crawling techniques and associated ethical considerations
 Strong problem-solving skills and ability to work in a fast-paced, innovative environment
 Familiarity with Snowflake and its integration in AI/ML pipelines
 Experience with various vector store technologies and their applications in AI
 Understanding of data lakehouse concepts and architectures
 Excellent communication, collaboration, and problem-solving skills.
 Ability to translate business needs into technical solutions.
 Passion for innovation and a commitment to ethical AI development.
 Experience building LLMs pipeline using framework like LangChain, LlamaIndex, Semantic Kernel, OpenAI functions.
 Familiar with different LLM parameters like temperate, top-k, and repeat penalty, and different LLM outcome evaluation data science metrics and methodologies.
Preferred Skills
 Experience with popular LLM/ RAG frameworks
 Familiarity with distributed computing platforms (e.g., Apache Spark, Dask)
 Knowledge of data versioning and experiment tracking tools
 Experience with cloud platforms (AWS, GCP, or Azure) for large-scale data processing
 Understanding of data privacy and security best practices
 Practical experience implementing data lakehouse solutions
 Proficiency in optimizing queries and data processes in Snowflake or Databricks
 Hands-on experience with different vector store technologies
Benefits
 US employees benefit package. Apply Job!
Â
Are You the One We're Looking For?
If you believe you have what it takes, submit your application without delay. We are keen to hear from talented candidates like you.
Apply To This Job