AI/ML Engineer

Full-Time

India, Remote

1 Opening

About the Role

LakeFusion is seeking an experienced AI/ML Engineer to advance the intelligence behind our Master Data Management platform, built natively on the Databricks Data Intelligence Platform. In this role, you will design and optimize LLM-driven entity resolution systems that improve match accuracy, explainability, and performance across enterprise data environments.

You will take a hands-on role in developing prompt engineering strategies, refining Retrieval-Augmented Generation (RAG) architectures, and implementing evaluation frameworks that enhance how data is matched, understood, and trusted. This includes improving precision and recall, reducing bias, and ensuring scalable, cost-efficient model performance in production.

Working closely with product managers, data engineers, and data stewards, you will translate complex business requirements into robust AI/ML solutions and build user-facing tools that provide transparency and control over matching decisions. You will also monitor model performance, identify drift, and continuously iterate to improve outcomes.

This is a highly self-directed role suited for someone who thrives in a fast-paced startup environment, where solving complex AI challenges and building production-grade systems are central to success.

What you’ll do

  • Lead the design, development, and optimization of prompt engineering strategies for LakeFusion's LLM-based entity matching to improve accuracy, reduce bias, and enhance interpretability.
  • Drive the continuous improvement of our Retrieval-Augmented Generation (RAG) architecture, refining the interplay between Vector Search candidate generation and LLM evaluation for superior match results.
  • Iterate on LakeFusion's entity resolution process, exploring novel approaches to enhance match performance (precision, recall, F1-score) and operational efficiency (speed, flexibility, cost).
  • Investigate and implement advanced LLM evaluation strategies, including multi-stage processing with potentially less powerful models to balance performance, cost, and output quality.
  • Contribute to the design and development of production-grade, business-user-facing data science tools and workflows that provide transparency and control over AI matching.
  • Collaborate closely with product managers, data engineers, and data stewards to translate complex business requirements into robust, scalable AI/ML solutions.
  • Monitor and analyze AI model performance using telemetry from AI Gateway Inference Tables and custom logs, identifying opportunities for continuous improvement and drift mitigation.

What We're Looking For

  • 5+ years of hands-on experience as an ML Engineer, Data Scientist, or similar role, specifically building and deploying machine learning solutions in a production environment.
  • Deep expertise in Entity Resolution and Master Data Management (MDM), understanding the nuances of data matching, deduplication, and survivorship.
  • Extensive practical experience with Generative AI (GenAI) concepts, Large Language Models (LLMs), Vector Search, and Retrieval-Augmented Generation (RAG) architectures.
  • Strong proficiency in Python and its ecosystem for data science and machine learning (e.g., PyTorch, TensorFlow, scikit-learn).
  • Demonstrated ability to deploy, manage, and optimize modern AI/ML models in production, with a focus on latency, throughput, and cost.
  • Proven track record of building production-grade data science tools or applications that directly enable business users to interact with and leverage AI/ML insights.
  • Solid foundation in machine learning fundamentals, including experience with diverse model types and strong statistical analysis skills.
  • Experience working with the Databricks platform (e.g., Delta Lake, MLflow, Databricks SQL Analytics) is highly desirable.
  • Excellent problem-solving skills and the ability to debug complex AI systems, understanding the interplay between data, models, and prompts.
  • Strong communication skills, capable of articulating complex technical concepts to both engineering and non-technical stakeholders.

Nice-to-Have

  • Experience with MLOps practices, CI/CD for ML pipelines.
  • Knowledge of distributed computing frameworks beyond Databricks.
  • Experience with other MDM platforms or enterprise data quality tools.
  • Familiarity with cloud platforms (AWS, Azure) for AI/ML deployments.

About LakeFusion

LakeFusion is the modern Master Data Management (MDM) company. Global enterprises across industries ranging from retail to manufacturing and financial services rely on the LakeFusion platform to unify, govern, and deliver trusted data entities such as customers, products, suppliers, and employees. Built natively on the Databricks Lakehouse, LakeFusion creates a single source of truth that powers analytics and AI. LakeFusion enables organizations worldwide to accelerate innovation with trusted and governed data.

Join us

Help build the future of master data

Join a Databricks-native team building the trusted data foundation powering AI-ready enterprises.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.