DEV Community

Cover image for Oracle Autonomous Database: Machine Learning with AutoML and Natural Language Queries with Select AI
Ryan Giggs
Ryan Giggs

Posted on

Oracle Autonomous Database: Machine Learning with AutoML and Natural Language Queries with Select AI

Oracle Autonomous Database integrates powerful machine learning capabilities and generative AI features that democratize data science and make database interactions accessible to everyone. Oracle Machine Learning AutoML automates complex machine learning workflows, while Select AI enables natural language database queries without SQL expertise.

Oracle Machine Learning for Python (OML4Py)

OML4Py Overview

Oracle Machine Learning for Python (OML4Py) is a Python API that supports the machine learning process including data exploration and preparation, machine learning modeling, and solution deployment using your Oracle AI Database or Oracle Autonomous AI Database.

Key Capabilities:

  • In-Database Machine Learning: Run algorithms directly in the database
  • Embedded Python Execution: Execute Python functions in database-spawned engines
  • Automated Machine Learning (AutoML): Automated algorithm selection, feature selection, and model tuning
  • Scalability: Leverage database parallel processing capabilities
  • Data Proximity: Process data where it resides without extraction

AutoML: Automating Machine Learning Workflows

The AutoML Value Proposition

Automated Machine Learning (AutoML) enhances user productivity and machine learning results through automated algorithm and feature selection, as well as automated model tuning and selection.

AutoML automates tedious parts of data scientists' workflow and enables non-experts to produce models, democratizing machine learning across organizations.

Benefits:

  • Enhanced Productivity: Reduces time spent on routine ML tasks
  • Non-Expert Enablement: Allows business users to build models
  • Best Practice Automation: Incorporates data science expertise automatically
  • Reduced Errors: Eliminates manual mistakes in model selection
  • Faster Time-to-Value: Accelerates ML project delivery

AutoML Workflow Stages

The AutoML pipeline consists of six essential stages that guide data from raw input to production-ready models:

1. Dataset Preparation:

  • Prepare and clean your training data
  • Create Oracle Machine Learning DataFrame proxy objects
  • Separate predictors from target variables
  • Handle missing values and data quality issues

2. Algorithm Selection:
Identify top-k algorithms most suitable for your dataset and prediction task.

Algorithm Selection Process:

from oml import automl

# Define algorithm selection object
as_object = automl.AlgorithmSelection(
    mining_function='classification',
    score_metric='accuracy',
    parallel=2
)

# Select top algorithms
algorithm_ranking = as_object.select(X_data, y_data, k=4)
Enter fullscreen mode Exit fullscreen mode

Supported Algorithms:

  • Support Vector Machines (SVM) - Gaussian and Linear
  • Neural Networks
  • Random Forest
  • Decision Trees
  • Logistic Regression
  • Naive Bayes

3. Adaptive Sampling:
Select a suitable subsample for model training to optimize performance and training time.

Sampling Benefits:

  • Reduces training time for large datasets
  • Maintains statistical significance
  • Oracle's advanced meta-learning intelligence suggests optimal sample sizes
  • Balances accuracy with computational efficiency

4. Feature Selection:
Select relevant features that contribute most significantly to prediction accuracy while reducing dimensionality.

Feature Selection Implementation:

# Define feature selection object
fs_object = automl.FeatureSelection(
    mining_function='classification',
    score_metric='accuracy',
    parallel=2
)

# Reduce features
selected_features = fs_object.reduce(
    selected_algorithm,
    X_data,
    y_data
)
Enter fullscreen mode Exit fullscreen mode

Feature Selection Benefits:

  • Reduces model training time without compromising performance
  • Uses multiple feature-ranking algorithms
  • Oracle advanced meta-learning techniques prune search space efficiently
  • Handles both classification and regression problems

5. Model Tuning:
Identify optimal hyperparameters for the selected algorithm to maximize model performance.

Model Tuning Process:

# Define model tuning object
mt_object = automl.ModelTuning(
    mining_function='classification',
    score_metric='accuracy',
    parallel=2
)

# Tune model
tuned_model = mt_object.tune(
    selected_algorithm,
    X_reduced,
    y_data
)
Enter fullscreen mode Exit fullscreen mode

Hyperparameter Optimization:

  • Automated parameter search across algorithm-specific hyperparameter space
  • Cross-validation for robust performance estimation
  • Grid search and random search strategies
  • Oracle meta-learning for intelligent parameter recommendations

6. Feature Prediction Impact:
Compute the impact of each input column on the predictions of the final tuned model.

Impact Analysis Benefits:

  • Provides insights into tuned AutoML model behavior
  • Helps business users understand model decisions
  • Identifies most influential features
  • Supports model explainability and transparency

AutoML UI: Visual Model Development

No-Code ML Development:
To use the Oracle Machine Learning AutoML UI, you start by creating an experiment. An experiment is a unit of work that minimally specifies the data source, prediction target, and prediction type.

AutoML UI Workflow:

  1. Create Experiment: Define data source, target, and prediction type
  2. Configure Settings: Set maximum models, run duration, metrics
  3. Start Experiment: Choose between "Faster Results" or "Better Accuracy"
  4. Review Results: Examine ranked models by quality metric
  5. Deploy or Generate Notebook: Deploy models or generate Python code

Experiment Configuration:

  • Maximum Top Models: Select number of top models to create (default 5)
  • Maximum Run Duration: Set time limit (default 8 hours)
  • Database Service Level: Configure query parallelism
  • Model Metric: Choose evaluation metric (accuracy, F1, precision, recall)
  • Algorithm Selection: Select specific algorithms or use all candidates

After an experiment runs successfully, it presents you a list of machine learning models in order of model quality according to the metric selected.

Deployment Options:

  • Direct Deployment: Deploy selected models for scoring
  • Notebook Generation: Generate Python code using OML4Py and specific settings AutoML used to produce the model
  • Model Comparison: Compare models across multiple metrics
  • Model Explanation: Understand feature importance and prediction drivers

AutoML API: Programmatic Model Development

OML4Py AutoML Classes:

  • automl.AlgorithmSelection: Rank algorithms by predicted score
  • automl.FeatureSelection: Identify most relevant feature subsets
  • automl.ModelTuning: Tune hyperparameters for specific algorithms
  • automl.ModelSelection: Select top algorithm and tune automatically

Model Selection Example:

from oml import automl

# Automatic model selection and tuning
ms_object = automl.ModelSelection(
    mining_function='classification',
    score_metric='f1_weighted',
    parallel=4
)

# Select and tune best model
best_model = ms_object.select(X_train, y_train)

# Score on test data
predictions = best_model.predict(X_test)
Enter fullscreen mode Exit fullscreen mode

Select AI: Natural Language Database Queries

Select AI Overview

Select AI makes it easy to get answers about your business using natural language, transforming how users interact with databases by eliminating the need for SQL expertise.

How Select AI Works:
Select AI processes questions using an AI Large Language Model (LLM), translating natural language prompts into SQL queries that the database executes automatically.

The DBMS_CLOUD_AI package enables integration with a user-specified LLM for generating SQL code using natural language prompts.

Select AI Process:

  1. User Provides Natural Language Prompt: "What were our top 5 products by revenue last quarter?"
  2. Database Augments Prompt: Adds schema metadata to provide context
  3. LLM Generates SQL: Translates augmented prompt to SQL query
  4. Database Executes Query: Runs generated SQL and returns results
  5. Natural Language Response: Optionally narrates results in natural language

Key Select AI Features

Simple and Intuitive:

  • Use plain language instead of complex SQL syntax
  • No need to understand database schema or table relationships
  • Multilingual support for global businesses
  • Voice command support for hands-free queries

Future-Enabled:

  • Integration with cutting-edge LLM technologies
  • Continuous improvement as LLMs evolve
  • Support for emerging AI capabilities
  • Regular feature enhancements and updates

Secure:

  • Respects existing database security policies
  • Access control based on user permissions
  • Data never leaves your organization
  • Compliant with data governance requirements

Easy to Extend:

  • Build new natural language processing applications
  • Integrate with existing tools and workflows
  • Develop custom AI-driven database interactions
  • API support for programmatic access

Supported LLM Providers

Select AI supports multiple Large Language Model providers, giving organizations flexibility to choose their preferred AI platform:

OCI Generative AI:

  • Oracle's native generative AI service
  • Models from Meta (Llama) and Cohere hosted on OCI
  • Strong data governance and security
  • Custom models trained on customer data
  • Region-specific deployment options

OpenAI:

  • GPT-3.5 and GPT-4 models
  • Advanced natural language understanding
  • Broad language support
  • Regular model updates and improvements

Cohere:

  • Enterprise-focused LLMs
  • Specialized text generation and understanding
  • Multilingual capabilities
  • Privacy-preserving features

Azure OpenAI:

  • Microsoft's enterprise OpenAI service
  • GPT models with Azure integration
  • Enterprise security and compliance
  • Hybrid cloud deployment options

Select AI Actions

Natural Language to SQL:
Generate and execute SQL queries from natural language prompts.

SELECT AI what were the top 5 selling products last month;
Enter fullscreen mode Exit fullscreen mode

SHOWSQL - View Generated Query:
Display the SQL query generated from natural language without executing it.

SELECT AI SHOWSQL what are the top 5 products by revenue;
Enter fullscreen mode Exit fullscreen mode

NARRATE - Natural Language Results:
Execute query and return results as natural language description instead of tabular data.

SELECT AI NARRATE what were our total sales by region last quarter;
Enter fullscreen mode Exit fullscreen mode

CHAT - General AI Conversation:
Submit general requests and questions to the LLM, with optional context from vector stores.

SELECT AI CHAT explain the concept of customer lifetime value;
Enter fullscreen mode Exit fullscreen mode

Retrieval Augmented Generation (RAG)

Select AI with RAG:
Select AI with RAG augments your natural language prompt by retrieving content from your specified vector store using semantic similarity search. This reduces hallucinations by using your specific and up-to-date content and provides more relevant natural language responses.

RAG Benefits:

  • Reduced Hallucinations: LLM uses your actual data and documents
  • Current Information: Access to latest business content
  • Improved Accuracy: Context-specific responses
  • Domain Expertise: Incorporate organizational knowledge

AI Vector Search:
AI Vector Search enables similarity queries on text, documents, images, and other unstructured data represented as vectors, powering RAG implementations.

Select AI Profile Configuration

Creating an AI Profile:

BEGIN
    DBMS_CLOUD_AI.CREATE_PROFILE(
        profile_name => 'my_ai_profile',
        attributes => JSON_OBJECT(
            'provider' VALUE 'oci',
            'credential_name' VALUE 'OCI_CRED',
            'object_list' VALUE JSON_ARRAY(
                JSON_OBJECT(
                    'owner' VALUE 'ADMIN',
                    'name' VALUE 'CUSTOMERS'
                ),
                JSON_OBJECT(
                    'owner' VALUE 'ADMIN',
                    'name' VALUE 'ORDERS'
                )
            )
        )
    );
END;
/
Enter fullscreen mode Exit fullscreen mode

Profile Components:

  • Provider: LLM provider (OCI, OpenAI, Cohere, Azure)
  • Credentials: Authentication for LLM access
  • Object List: Schemas, tables, and views available for queries
  • Settings: Model-specific parameters and configurations

Select AI Use Cases

Business Intelligence:

  • Sales performance analysis
  • Customer behavior insights
  • Market trend identification
  • Financial reporting and analysis

Data Exploration:

  • Ad-hoc data analysis by business users
  • Quick insights without waiting for IT
  • Exploratory data analysis for new datasets
  • Validation of business hypotheses

Application Development:

  • Build conversational AI applications
  • Integrate natural language interfaces into existing apps
  • Develop chatbots with database access
  • Enable voice-activated database queries

Citizen Data Science:

  • Empower non-technical users to analyze data
  • Self-service analytics for departments
  • Reduce IT bottlenecks for simple queries
  • Democratize data access across organization

Integration: AutoML and Select AI

Complementary Capabilities

AutoML for Model Building:
Use AutoML to build predictive models from your data, automating the machine learning workflow.

Select AI for Model Insights:
Use Select AI to query model predictions and understand model behavior through natural language.

Combined Workflow:

  1. Use AutoML to build customer churn prediction model
  2. Deploy model in database
  3. Use Select AI to query: "Which customers are most likely to churn next month?"
  4. Get natural language explanation of predictions and contributing factors

End-to-End AI/ML Platform

Oracle Autonomous Database provides:

  • Integrated machine learning platform (OML4Py, OML4R, OML4SQL)
  • Automated model development (AutoML UI and API)
  • Natural language data access (Select AI)
  • Vector database capabilities (AI Vector Search)
  • Generative AI integration (LLM providers)
  • Scalable in-database processing

Best Practices

AutoML Best Practices

Data Preparation:

  • Clean and validate data before AutoML
  • Handle missing values appropriately
  • Ensure sufficient training data
  • Balance classes for classification problems

Experiment Configuration:

  • Start with "Faster Results" for quick validation
  • Use "Better Accuracy" for production models
  • Set reasonable time limits based on dataset size
  • Monitor resource consumption during training

Model Evaluation:

  • Evaluate multiple top models, not just the best
  • Test models on holdout data
  • Consider business metrics alongside statistical metrics
  • Validate model explainability and fairness

Select AI Best Practices

Profile Configuration:

  • Include only relevant schemas and tables in profiles
  • Use descriptive table and column comments
  • Maintain accurate database metadata
  • Create separate profiles for different use cases

Prompt Engineering:

  • Be specific and clear in natural language prompts
  • Provide context when needed
  • Start simple and refine based on results
  • Use SHOWSQL to understand generated queries

Security and Governance:

  • Leverage existing database security policies
  • Monitor LLM usage and costs
  • Review generated SQL for sensitive operations
  • Implement appropriate access controls

Quality Assurance:

  • Validate generated SQL queries
  • Review LLM responses for accuracy
  • Test with various prompt formulations
  • Implement guardrails for production use

Conclusion

Oracle Autonomous Database's integration of Oracle Machine Learning AutoML and Select AI represents a significant advancement in making advanced analytics accessible to all users. AutoML automates tedious parts of data scientists' workflow and enables non-experts to produce models, while Select AI makes it easy to get answers about your business using natural language.

Key Capabilities:

AutoML:

  • Six-stage automated ML pipeline
  • Algorithm selection, feature selection, and model tuning
  • Visual AutoML UI for no-code development
  • Programmatic API for advanced users
  • Embedded Python execution for scalability

Select AI:

  • Natural language to SQL generation
  • Multiple LLM provider support (OCI, OpenAI, Cohere, Azure)
  • RAG for improved accuracy with organizational data
  • Simple, secure, and easily extensible
  • Voice and multilingual support

Business Impact:

  • Democratize Data Science: Enable business users to build ML models
  • Accelerate Insights: Get answers in natural language instantly
  • Reduce IT Bottlenecks: Self-service analytics and model building
  • Improve Decision Making: Data-driven insights accessible to all
  • Enhance Productivity: Automate routine ML and query tasks

Whether you're a data scientist looking to accelerate model development with AutoML, or a business user seeking insights through natural language queries with Select AI, Oracle Autonomous Database provides the integrated platform and automation necessary for modern AI/ML-driven applications.

Top comments (0)