DEV Community

Ahmed Moussa
Ahmed Moussa

Posted on • Originally published at api.aaido.dev

DataPulse API Tutorial: Structured Web Data Extraction and Analysis

DataPulse is a REST API that extracts and structures data from web pages, transforming unstructured HTML content into machine-readable JSON format. This tutorial covers the core functionality, practical implementation examples, and integration patterns for modern development workflows.

Getting Started

Before making API calls, you'll need to create an account and obtain an API key:

curl -X POST https://api.aaido.dev/signup \
  -H "Content-Type: application/json" \
  -d '{
    "email": "developer@company.com",
    "password": "your-secure-password"
  }'
Enter fullscreen mode Exit fullscreen mode

Once registered, you'll receive an API key for authentication. All DataPulse requests require this key in the Authorization header.

Basic Data Extraction

The primary endpoint accepts a target URL and optional extraction parameters. Here's a basic extraction request:

curl -X POST https://api.aaido.dev/v1/products/datapulse \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example-ecommerce.com/product/wireless-headphones",
    "extract": ["title", "price", "description", "images"]
  }'
Enter fullscreen mode Exit fullscreen mode

The API returns structured JSON data:

{
  "status": "success",
  "data": {
    "title": "Premium Wireless Headphones",
    "price": "$299.99",
    "description": "High-quality noise-canceling headphones with 30-hour battery life",
    "images": [
      "https://example-ecommerce.com/images/headphones-1.jpg",
      "https://example-ecommerce.com/images/headphones-2.jpg"
    ],
    "extracted_at": "2024-01-15T10:30:00Z",
    "source_url": "https://example-ecommerce.com/product/wireless-headphones"
  },
  "processing_time": 1.2
}
Enter fullscreen mode Exit fullscreen mode

Advanced Extraction with Selectors

For precise data targeting, use CSS selectors or XPath expressions:

curl -X POST https://api.aaido.dev/v1/products/datapulse \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://news-site.com/article/tech-trends-2024",
    "selectors": {
      "headline": "h1.article-title",
      "author": ".author-name",
      "publish_date": "time[datetime]",
      "content": ".article-body p",
      "tags": ".tag-list a"
    },
    "options": {
      "wait_for": ".article-body",
      "timeout": 10000
    }
  }'
Enter fullscreen mode Exit fullscreen mode

Response with structured content:

{
  "status": "success",
  "data": {
    "headline": "Top Technology Trends Shaping 2024",
    "author": "Sarah Chen",
    "publish_date": "2024-01-15T08:00:00Z",
    "content": [
      "Artificial intelligence continues to transform industries...",
      "Cloud computing adoption reaches new heights..."
    ],
    "tags": ["AI", "Cloud Computing", "Machine Learning", "Tech Trends"]
  },
  "processing_time": 2.1
}
Enter fullscreen mode Exit fullscreen mode

Use Case 1: E-commerce Price Monitoring

Monitor competitor pricing across multiple retailers:

curl -X POST https://api.aaido.dev/v1/products/datapulse \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://retailer.com/product/smartphone-xyz",
    "extract": ["title", "price", "availability", "rating"],
    "options": {
      "format_price": true,
      "extract_numbers": true
    }
  }'
Enter fullscreen mode Exit fullscreen mode

The format_price option normalizes price formats, while extract_numbers converts ratings to numeric values for easier analysis.

Use Case 2: Content Aggregation for News Monitoring

Extract article metadata and content for media monitoring:

curl -X POST https://api.aaido.dev/v1/products/datapulse \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://tech-blog.com/ai-breakthrough-announcement",
    "extract": ["title", "content", "author", "publish_date"],
    "options": {
      "clean_text": true,
      "extract_keywords": true,
      "sentiment_analysis": true
    }
  }'
Enter fullscreen mode Exit fullscreen mode

This configuration cleans extracted text, identifies key terms, and provides sentiment scoring for content analysis workflows.

Use Case 3: Real Estate Data Collection

Aggregate property listings for market analysis:

curl -X POST https://api.aaido.dev/v1/products/datapulse \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://realty-site.com/listing/downtown-condo",
    "selectors": {
      "price": ".listing-price",
      "bedrooms": ".bed-count",
      "bathrooms": ".bath-count",
      "square_feet": ".sqft",
      "address": ".property-address",
      "amenities": ".amenity-list li"
    },
    "options": {
      "extract_numbers": true,
      "geocode_address": true
    }
  }'
Enter fullscreen mode Exit fullscreen mode

Error Handling

The API returns standardized error responses:

{
  "status": "error",
  "error": {
    "code": "EXTRACTION_FAILED",
    "message": "Unable to access the specified URL",
    "details": "Connection timeout after 10 seconds"
  }
}
Enter fullscreen mode Exit fullscreen mode

Common error codes include:

  • INVALID_URL: Malformed or inaccessible URL
  • EXTRACTION_FAILED: Content extraction unsuccessful
  • RATE_LIMIT_EXCEEDED: API quota exceeded
  • AUTHENTICATION_ERROR: Invalid or missing API key

CI/CD Integration Example

Integrate DataPulse into automated workflows using GitHub Actions:

name: Daily Price Monitoring
on:
  schedule:
    - cron: '0 9 * * *'  # Daily at 9 AM UTC

jobs:
  price-check:
    runs-on: ubuntu-latest
    steps:
      - name: Extract competitor pricing
        run: |
          response=$(curl -s -X POST https://api.aaido.dev/v1/products/datapulse \
            -H "Authorization: Bearer ${{ secrets.DATAPULSE_API_KEY }}" \
            -H "Content-Type: application/json" \
            -d '{
              "url": "${{ vars.COMPETITOR_URL }}",
              "extract": ["price", "availability"],
              "options": {"format_price": true}
            }')

          price=$(echo $response | jq -r '.data.price')
          echo "Current competitor price: $price"

          # Store in database or trigger alerts based on price changes
          if [ "$price" != "null" ]; then
            echo "price=$price" >> $GITHUB_OUTPUT
          fi

      - name: Update pricing database
        if: steps.price-check.outputs.price
        run: |
          # Your database update logic here
          echo "Updating database with new price data"
Enter fullscreen mode Exit fullscreen mode

For Docker environments, create a simple monitoring script:

FROM alpine:latest
RUN apk add --no-cache curl jq
COPY price-monitor.sh /usr/local/bin/
CMD ["/usr/local/bin/price-monitor.sh"]
Enter fullscreen mode Exit fullscreen mode

Rate Limits and Best Practices

The API implements rate limiting to ensure service stability:

  • Free tier: 100 requests per hour
  • Pro tier: 1,000 requests per hour
  • Enterprise: Custom limits

Implement exponential backoff for production applications:

# Retry logic with exponential backoff
retry_count=0
max_retries=3

while [ $retry_count -lt $max_retries ]; do
  response=$(curl -s -w "%{http_code}" -X POST https://api.aaido.dev/v1/products/datapulse \
    -H "Authorization: Bearer $API_KEY" \
    -H "Content-Type: application/json" \
    -d "$request_body")

  http_code="${response: -3}"
  if [ "$http_code" = "200" ]; then
    echo "${response%???}"  # Remove HTTP code from response
    break
  elif [ "$http_code" = "429" ]; then
    sleep $((2 ** retry_count))
    ((retry_count++))
  else
    echo "Request failed with HTTP $http_code"
    break
  fi
done
Enter fullscreen mode Exit fullscreen mode

Performance Optimization

For high-volume extraction, batch similar requests and cache results when appropriate. The API supports concurrent requests, but respect rate limits to avoid throttling.

Consider using webhooks for long-running extractions by including a callback_url parameter in your request payload.

DataPulse transforms web scraping from a complex, maintenance-heavy process into a simple API call. Whether you're building price comparison tools, content aggregators, or market research applications, the structured data output integrates seamlessly into modern development workflows.

For complete API documentation and advanced features, visit: https://api.aaido.dev/products/datapulse

Top comments (0)