Ryan Giggs

Posted on Oct 25

Oracle Autonomous Database: Data Lake Analytics and Modern Data Sharing

#oracleautonomousdatabase #datalakehouse #datasharing

Oracle Autonomous Database transforms how organizations access, analyze, and share data through integrated data lakehouse capabilities and modern data sharing protocols. By combining data warehouse and data lake technologies with secure, real-time data sharing, Autonomous Database enables organizations to break down data silos and accelerate insights across cloud platforms and organizational boundaries.

Oracle Autonomous AI Lakehouse

Multi-Cloud Data Lake Integration

Oracle Autonomous AI Lakehouse provides simple, secure multi-cloud access to all types of data, eliminating traditional boundaries between databases and data lakes.

Oracle Autonomous AI Lakehouse enables open, interoperable data access across multi-platform, multicloud environments. It combines Oracle Autonomous AI Database with vendor-independent Apache Iceberg, enabling customers to run AI and analytics securely on all their data—available on OCI, AWS, Azure, Google Cloud, and Exadata Cloud@Customer.

Supported Cloud Object Stores:

Oracle Cloud Infrastructure (OCI): Native object storage integration
Amazon Web Services (AWS): S3 bucket connectivity
Microsoft Azure: Azure Blob Storage support
Google Cloud Platform (GCP): Google Cloud Storage access

Key Capabilities:

Query or load data from multi-cloud object stores without data movement
Unified SQL access across distributed data sources
Integrated security and governance policies
Cost-effective storage at object storage prices with database performance

Comprehensive Data Format Support

Autonomous Database supports all common data types and table formats, providing flexibility for diverse data sources and analytics workloads.

Supported File Formats:

Parquet: Columnar storage format optimized for analytics
ORC (Optimized Row Columnar): High-compression columnar format
CSV (Comma-Separated Values): Traditional text-based data format
JSON: Semi-structured hierarchical data format
Avro: Row-based format with schema evolution support

Additional Format Support:

Delimited text files
Apache Iceberg tables
Delta Lake format (via Delta Sharing)
Proprietary formats through custom readers

External Tables for Data Lake Access

Direct Object Store Querying:
Autonomous Database external tables are used to link to data in object storage. An external table accesses the files in object store directly without loading the data.

External Table Benefits:

Zero Data Movement: Query data where it resides
Rapid Data Exploration: Quickly assess data value without loading
Transparent Access: Applications query external tables like regular tables
Cost Efficiency: No storage costs within database for external data

External Table Types:

Standard External Tables:

BEGIN
    DBMS_CLOUD.CREATE_EXTERNAL_TABLE(
        table_name      => 'SALES_DATA',
        credential_name => 'OBJ_STORE_CRED',
        file_uri_list   => 'https://objectstorage.us-phoenix-1.oraclecloud.com/n/.../sales/*.parquet',
        format          => JSON_OBJECT('type' value 'parquet')
    );
END;
/

Partitioned External Tables:
Partitioned external tables deliver performance benefits at query time through partition pruning, which eliminates scanning of unnecessary data files.

BEGIN
    DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE(
        table_name      => 'SALES_PARTITIONED',
        credential_name => 'OBJ_STORE_CRED',
        file_uri_list   => 'https://objectstorage/.../sales/',
        column_list     => 'sale_id NUMBER, amount NUMBER, sale_date DATE',
        field_list      => 'sale_id, amount, sale_date',
        format          => JSON_OBJECT('type' value 'parquet'),
        partitioning_clause => 'PARTITION BY (sale_date)'
    );
END;
/

Hybrid Partitioned Tables:
CREATE_HYBRID_PART_TABLE creates a hybrid partitioned table allowing you to specify both partition data stored in Autonomous Database and partition data stored in object storage.

Apache Iceberg Integration

Open Lakehouse Format:
Autonomous AI Lakehouse integrates with open data platforms across any cloud via Apache Iceberg, letting you query Iceberg tables in place with Oracle AI Database 26ai's built-in AI, machine learning, graph, and spatial—without data movement.

Iceberg Benefits:

Open Standard: Vendor-independent table format
Schema Evolution: Add, drop, and rename columns without rewriting data
Time Travel: Query historical versions of tables
ACID Transactions: Full transactional support for data lakes
Hidden Partitioning: Automatic partition management

Multi-Catalog Support:
The Autonomous AI Database Data Catalog supports integration with external data sources and catalogs including:

Oracle Cloud Infrastructure Data Catalog
AWS Glue
Databricks Unity Catalog
Snowflake Polaris
Apache Iceberg catalogs

Data Lake Accelerator

High-Performance Query Engine:
Oracle Data Lake Accelerator significantly improves query speeds on external data stored in object stores, enabling faster analysis of large datasets without data movement or workflow changes.

Accelerator Features:

Dynamic Scaling: Automatically scales network and compute resources based on query demands
Petabyte-Scale Scans: Handles massive datasets efficiently
Table Caching: Fast repeat queries through intelligent caching
Consumption-Based Billing: Pay only for resources used during query execution

Performance Benefits:
Organizations using Data Lake Accelerator experience dramatically faster query speeds on Iceberg tables, with the ability to dynamically scale compute resources on demand while keeping costs under control.

Data Sharing in Autonomous Database

Modern Data Sharing Overview

Data sharing is a process of making data accessible to one or many recipients to use, enabling collaboration, innovation, and insights across organizational boundaries.

Autonomous Database Data Sharing:
Making data centrally and securely available is a fundamental capability in Autonomous Database, enabling real-time collaboration without data duplication.

Key Sharing Capabilities:

ADB can share Oracle data with non-Oracle clients through open protocols
Real-time data exchange across Oracle Autonomous Database instances
Bi-directional sharing (both provide and consume data)
Support for internal and external recipients

Data Sharing Methods

1. Delta Sharing Protocol:
Oracle Autonomous Database now supports Delta Sharing, enabling secure, open data collaboration across platforms without data copies or ETL.

Delta Sharing Characteristics:

Open Protocol: Developed by Databricks, now open-sourced
Vendor-Agnostic: Works across cloud providers and platforms
Secure: Strong authentication and authorization
Efficient: No data duplication or complex pipelines
Versioned: Share specific versions of data with recipients

Delta Sharing Architecture:

Provider creates data share and publishes to object storage
Recipients receive activation link with credentials
Data shared in Parquet format via REST API
Recipients query data without direct database access

2. Live Share (Real-Time Sharing):
Live Share enables optimized, native, and security-focused data sharing between Autonomous Database instances without data duplication.

Live Share Benefits:

Real-Time Access: Data instantly available across databases
Native Integration: Based on Autonomous Database Cloud Link technology
Zero Duplication: No data copying or physical movement
Low Latency: Smooth, high-performance connection between ADB instances
Security-Focused: Native OCI security controls

Live Share vs. Delta Sharing:

Live Share: For ADB-to-ADB real-time sharing within OCI
Delta Sharing: For ADB-to-external platforms (Databricks, Power BI, Tableau, etc.)

Data Sharing Components

Provider:
The entity (person, institution, or software system) that shares data objects from their Autonomous Database.

Recipient:
An entity (individual, institution, or software system) that receives and accesses shared data from a provider.

Share:
A named collection of tables or views shared as a single entity, representing logical groupings of related data.

Cloud Storage Link:
Location where shared data is stored (Object Storage bucket with credentials) for versioned Delta Sharing.

Data Share Version:
Data shares are versioned; recipients typically access only the latest version unless configured otherwise.

Activation Link:
Secure link sent to recipients enabling them to download a data share profile with authentication tokens.

Data Sharing Use Cases

Internal Collaboration:

Sharing data between departments without duplication
Enabling self-service analytics across business units
Providing test data to development teams
Supporting data science and ML model development

External Collaboration:

Sharing data with business partners and suppliers
Providing data to customers for analytics
Collaborating with research institutions
Supporting third-party applications

Multi-Cloud Analytics:

Analyzing Oracle data in Databricks workspaces
Visualizing ADB data in Power BI or Tableau
Integrating ADB with Snowflake analytics
Building cross-cloud data products

Data Monetization:

Creating data products for external customers
Establishing data marketplaces
Providing subscription-based data access
Supporting SaaS and analytics applications

Implementing Data Lake and Sharing Solutions

Setting Up Data Lake Access

1. Create Cloud Credentials:

BEGIN
    DBMS_CLOUD.CREATE_CREDENTIAL(
        credential_name => 'OCI_CRED',
        username        => '<tenancy>/<username>',
        password        => '<auth_token>'
    );
END;
/

2. Create External Tables:

BEGIN
    DBMS_CLOUD.CREATE_EXTERNAL_TABLE(
        table_name      => 'CUSTOMER_DATA',
        credential_name => 'OCI_CRED',
        file_uri_list   => 'https://objectstorage.region.oraclecloud.com/n/namespace/b/bucket/o/customers/*.parquet',
        format          => JSON_OBJECT('type' value 'parquet')
    );
END;
/

3. Query External Data:

SELECT 
    customer_id,
    customer_name,
    total_purchases
FROM CUSTOMER_DATA
WHERE purchase_date >= DATE '2024-01-01'
ORDER BY total_purchases DESC
FETCH FIRST 10 ROWS ONLY;

Configuring Data Sharing

As a Data Provider:

1. Access Data Share Tool:
Navigate to Database Actions > Data Studio > Data Share

2. Create Share:

Click "Provide Share"
Select tables or views to share
Name the share and provide description
Choose share type (versioned Delta Share or Live Share)

3. Add Recipients:

Specify recipient email addresses
Set access permissions and expiration
Generate activation links
Recipients receive email with profile download link

4. Publish Share:

BEGIN
    DBMS_SHARE.PUBLISH_SHARE(
        share_name => 'SALES_SHARE',
        comments   => 'Q4 2024 sales data'
    );
END;
/

As a Data Recipient:

1. Receive Activation Link:
Provider sends email with secure activation link

2. Download Profile:
Click activation link to download JSON profile with credentials

3. Configure Connection:

In Database Actions, navigate to Data Share
Click "Consume Share"
Import JSON profile
Configure access parameters

4. Access Shared Data:

-- Query shared data as if it were local
SELECT * FROM SHARED_SALES_DATA
WHERE region = 'WEST'
AND sale_date >= TRUNC(SYSDATE, 'MM');

Integrating with External Platforms

Databricks Integration:

# Python code in Databricks
import delta_sharing

# Path to profile JSON file
profile_file = "/dbfs/delta_sharing/oracle_share.json"

# Create SharingClient
client = delta_sharing.SharingClient(profile_file)

# List available shares
shares = client.list_shares()

# Access shared table
table_url = profile_file + "#share_name.schema_name.table_name"
df = delta_sharing.load_as_pandas(table_url)

Power BI Integration:

In Power BI, select "Get Data" > "Delta Share"
Import JSON profile from Oracle ADB
Select tables to visualize
Build reports and dashboards

Tableau Integration:

Create Delta Share connection in Tableau
Import profile JSON
Connect to shared tables
Create visualizations

Best Practices

Data Lake Best Practices

Performance Optimization:

Use partitioned external tables for large datasets
Leverage Data Lake Accelerator for complex queries
Cache frequently accessed data
Use appropriate file formats (Parquet for analytics)

Security:

Implement least-privilege access for credentials
Rotate credentials regularly
Use private endpoints when possible
Implement comprehensive audit logging

Cost Management:

Store cold data in object storage, not database
Use compression for file formats
Monitor Data Lake Accelerator consumption
Clean up unused external tables

Data Sharing Best Practices

Provider Best Practices:

Document shared datasets with metadata and descriptions
Version shares appropriately
Monitor recipient access patterns
Set appropriate expiration dates for external shares
Maintain security and governance policies

Recipient Best Practices:

Secure profile JSON files properly
Rotate credentials per security policies
Monitor data freshness and versions
Validate data quality from external sources
Document data lineage and dependencies

Governance:

Establish data classification standards
Implement approval workflows for external sharing
Regular audits of shared data and recipients
Compliance checks for regulatory requirements
Data quality validation processes

Advanced Scenarios

Hybrid Data Lakehouse

Combining Internal and External Data:

-- Join database table with external data lake table
SELECT 
    c.customer_id,
    c.customer_name,
    e.transaction_amount,
    e.transaction_date
FROM CUSTOMERS c
INNER JOIN EXTERNAL_TRANSACTIONS e 
    ON c.customer_id = e.customer_id
WHERE e.transaction_date >= ADD_MONTHS(SYSDATE, -3);

Multi-Cloud Analytics Pipeline

End-to-End Flow:

Data ingested from various sources to multi-cloud object storage
Autonomous Database queries data in place using external tables
ML models built on combined data
Results shared via Delta Sharing to analytics platforms
Business users consume via Power BI, Tableau, or custom applications

Real-Time Collaborative Analytics

Live Share Architecture:

Production ADB instance shares live data
Analytics ADB instances consume via Live Share
Data scientists access real-time data without impacting production
Business analysts query current state for decision-making
No data replication or synchronization delays

Conclusion

Oracle Autonomous Database's integrated data lakehouse and data sharing capabilities represent a fundamental shift in how organizations access, analyze, and collaborate with data. By combining simple, secure multi-cloud access with modern sharing protocols, Autonomous Database eliminates traditional barriers to data-driven insights.

Key Capabilities:

Data Lake Integration:

Multi-cloud object store access (OCI, AWS, Azure, GCP)
Comprehensive format support (Parquet, ORC, CSV, JSON, Avro)
External tables for zero-copy querying
Apache Iceberg integration for open lakehouse
Data Lake Accelerator for petabyte-scale performance

Data Sharing:

Delta Sharing for cross-platform collaboration
Live Share for real-time ADB-to-ADB exchange
Bi-directional sharing capabilities
No data duplication required
Secure, versioned, and auditable

Business Impact:

Break Down Silos: Unified access to data across platforms
Accelerate Insights: Query data where it lives without movement
Enable Collaboration: Secure sharing within and across organizations
Reduce Costs: Object storage pricing with database performance
Future-Proof: Open standards and multi-cloud flexibility

Whether querying petabytes of data across multiple clouds, sharing real-time data between Autonomous Database instances, or collaborating with external partners through Delta Sharing, Autonomous Database provides the comprehensive platform necessary for modern data-driven organizations.

The combination of powerful data lake analytics and secure data sharing capabilities positions Oracle Autonomous Database as the foundation for enterprise data platforms that demand openness, performance, security, and collaboration without compromise.

DEV Community

Oracle Autonomous Database: Data Lake Analytics and Modern Data Sharing

Oracle Autonomous AI Lakehouse

Multi-Cloud Data Lake Integration

Comprehensive Data Format Support

External Tables for Data Lake Access

Apache Iceberg Integration

Data Lake Accelerator

Data Sharing in Autonomous Database

Modern Data Sharing Overview

Data Sharing Methods

Data Sharing Components

Data Sharing Use Cases

Implementing Data Lake and Sharing Solutions

Setting Up Data Lake Access

Configuring Data Sharing

Integrating with External Platforms

Best Practices

Data Lake Best Practices

Data Sharing Best Practices

Advanced Scenarios

Hybrid Data Lakehouse

Multi-Cloud Analytics Pipeline

Real-Time Collaborative Analytics

Conclusion

Top comments (0)