The AI Stack We Trust: Tools, Frameworks, and Practices We Use in Production

#aistack #aitools #datalayer #mlops

In the fast-paced world of artificial intelligence, building and maintaining an AI stack is no easy task. Decisions being made today affect the ability to innovate, scale, and provide trustworthy AI-based products directly. We are like any competent workman and rely on reliable tools. This paper focuses on the AI stack used in production that includes data ingestion to model deployment among others.

Guiding Principles

It is necessary to understand the philosophy that guides our choices before moving into the complexity of particular tools. When considering and putting into the stack a new technology, we see several principles:

User Experience in the Development Process: We believe that the best things to build on are the ones that our developers enthusiastically use. When a developer experience is positive, it leads to increased productivity, better quality of the code and eventually innovative products are created.

Scalability and Reliability are Non-Negotiable: Our systems will be designed with high levels of volume of data and traffic. Our first preference is given to tools that are already tested and can be used horizontally with an increase in user demand.

Open Standards and Interoperability: Our team will be using open-source technologies and standards capable of enabling flexibility and avoiding vendor lock-in. Our stack is a set of best-in-class tools which work together in harmony.

Adopt MLOps: MLOps principles are at the center of our approach. We aim to automate all stages of the machine-learning lifecycle, starting with data preparation and continuing to monitor the model.

Our Production AI Stack

It is possible to further divide the AI stack into five major layers with their individual tools and best practices.

1. The Data Layer: Intelligence Smarts

Any AI system is its data and the data layer is responsible for ingesting, storing, and processing large amounts of data from heterogeneous sources. The major technologies utilized are:

Data Ingestion: Apache Kafka supports the real-time data streaming, whereas Fivetran supports batch data ingestion of numerous sources. The tools can be used to create scalable and robust data pipelines.

Data Storage: Our data lake is located in Amazon S3, which was selected due to its ability to scale and its reliability. Snowflake is our cloud data warehouse that allows storing and querying large volumes of data without difficulty. Embeddings storage and retrieval Vector databases like Pinecone and Weaviate are also integrated in the stack to store and retrieve embeddings in Retrieval-Augmented Generation (RAG) applications.

Data Processing: Apache Spark is used in large-scale data processing and transformation. ksqlDB over Kafka is used in the event of real-time data processing.

2. The Magic Happens Here: The Modeling Layer

In this layer, machine-learning engineers and data scientists create, train, and tune models. They receive a highly flexible and powerful toolset that promotes fast experimentation and iteration.

Development Environment: Jupyter Notebooks are the modeling, prototyping, and interactive data exploration environment of choice. In more intricate projects, VS Code is used and a collection of extensions that are specific to both Python and AI development.

Machine Learning Frameworks: The most popular one is PyTorch, which has the advantage of being flexible and with a strong community presence. Other models are created with the help of TensorFlow and XGBoost. The Hugging Face Transformers library is essential to NLP and computer-vision tasks.

Experiment Tracking: MLflow tracks experiments which include code versions, data versions, and model performance. This helps to achieve reproducibility and to improve collaboration.

3. The Deployment Layer: Model to Life

After a model has been trained and validated, it has to be deployed into production. The deployment layer shall be robust, automated, and scalable.

Containerization: Docker uses portable containers to package models and their dependencies.

Orchestration: Kubernetes forms the basis of the deployment strategy, providing an opportunity to manage containerized applications on a scale and provide high availability.

Model Serving: TorchServe and TensorFlow serving handle high-performance model serving. Fast API is used to implement models that need API exposure due to its fast and simple new features.

4. The MLOps and Governance Layer

Half of the work is carried out by model construction and deployment. A specific system of tools and procedures deals with the tracking, control, and administration of models in production.

CI/CD: Jenkins and GitLab CI form the basis of automated pipelines of build, test, and deploy.

Model Monitoring: Prometheus and Grafana are used to do performance monitoring. The tools like Evidently AI detect the data drift and model decay.

Versioning of Data and Models: Data Version Control (DVC) allows one to version data and models to make sure that one can revert to previous states where needed.

5. The Frontend and Application Layer

The uppermost stack layer is the one that will interact with users. Web technologies are used to create intuitive and interactive experiences nowadays.

Frontend Framework: React and Next.js are chosen to build high-speed and interactive user interfaces.

Backend Framework: Node.js and Python (FastAPI or Django) are used as backend services.

UI Components: Shadcn/ui and Tailwind CSS are used to create beautiful and responsive UIs.

Looking Ahead

The AI world is constantly changing, so is our stack. We are on the alert for new tools and technologies that are able to promote the quality of products. The focus areas in the present stage of exploration are:

Large Language Models (LLM) Ops: With the spread of applications based on the LLM, the investment is targeted at the tools and platforms that would support prompt engineering, evaluation of the LLM, and cost control.

Edge AI: Edge devices are explored to help run models with lower latency and enhance user privacy.

Explainable AI (XAI): Building fair and transparent AI is a fundamental commitment, which leads to the investment in model explainability and bias detection methods.

Hopefully, this description of our AI stack is helpful. What is the status of the stack of AI at your organization? What are the tools and technologies that are the source of specific enthusiasm? We invite your commentary below.