DataPulse is a REST API that extracts and structures data from web pages, transforming unstructured HTML content into machine-readable JSON format. This tutorial covers the core functionality, practical implementation examples, and integration patterns for modern development workflows.
Getting Started
Before making API calls, you'll need to create an account and obtain an API key:
curl -X POST https://api.aaido.dev/signup \
-H "Content-Type: application/json" \
-d '{
"email": "developer@company.com",
"password": "your-secure-password"
}'
Once registered, you'll receive an API key for authentication. All DataPulse requests require this key in the Authorization header.
Basic Data Extraction
The primary endpoint accepts a target URL and optional extraction parameters. Here's a basic extraction request:
curl -X POST https://api.aaido.dev/v1/products/datapulse \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example-ecommerce.com/product/wireless-headphones",
"extract": ["title", "price", "description", "images"]
}'
The API returns structured JSON data:
{
"status": "success",
"data": {
"title": "Premium Wireless Headphones",
"price": "$299.99",
"description": "High-quality noise-canceling headphones with 30-hour battery life",
"images": [
"https://example-ecommerce.com/images/headphones-1.jpg",
"https://example-ecommerce.com/images/headphones-2.jpg"
],
"extracted_at": "2024-01-15T10:30:00Z",
"source_url": "https://example-ecommerce.com/product/wireless-headphones"
},
"processing_time": 1.2
}
Advanced Extraction with Selectors
For precise data targeting, use CSS selectors or XPath expressions:
curl -X POST https://api.aaido.dev/v1/products/datapulse \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://news-site.com/article/tech-trends-2024",
"selectors": {
"headline": "h1.article-title",
"author": ".author-name",
"publish_date": "time[datetime]",
"content": ".article-body p",
"tags": ".tag-list a"
},
"options": {
"wait_for": ".article-body",
"timeout": 10000
}
}'
Response with structured content:
{
"status": "success",
"data": {
"headline": "Top Technology Trends Shaping 2024",
"author": "Sarah Chen",
"publish_date": "2024-01-15T08:00:00Z",
"content": [
"Artificial intelligence continues to transform industries...",
"Cloud computing adoption reaches new heights..."
],
"tags": ["AI", "Cloud Computing", "Machine Learning", "Tech Trends"]
},
"processing_time": 2.1
}
Use Case 1: E-commerce Price Monitoring
Monitor competitor pricing across multiple retailers:
curl -X POST https://api.aaido.dev/v1/products/datapulse \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://retailer.com/product/smartphone-xyz",
"extract": ["title", "price", "availability", "rating"],
"options": {
"format_price": true,
"extract_numbers": true
}
}'
The format_price option normalizes price formats, while extract_numbers converts ratings to numeric values for easier analysis.
Use Case 2: Content Aggregation for News Monitoring
Extract article metadata and content for media monitoring:
curl -X POST https://api.aaido.dev/v1/products/datapulse \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://tech-blog.com/ai-breakthrough-announcement",
"extract": ["title", "content", "author", "publish_date"],
"options": {
"clean_text": true,
"extract_keywords": true,
"sentiment_analysis": true
}
}'
This configuration cleans extracted text, identifies key terms, and provides sentiment scoring for content analysis workflows.
Use Case 3: Real Estate Data Collection
Aggregate property listings for market analysis:
curl -X POST https://api.aaido.dev/v1/products/datapulse \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://realty-site.com/listing/downtown-condo",
"selectors": {
"price": ".listing-price",
"bedrooms": ".bed-count",
"bathrooms": ".bath-count",
"square_feet": ".sqft",
"address": ".property-address",
"amenities": ".amenity-list li"
},
"options": {
"extract_numbers": true,
"geocode_address": true
}
}'
Error Handling
The API returns standardized error responses:
{
"status": "error",
"error": {
"code": "EXTRACTION_FAILED",
"message": "Unable to access the specified URL",
"details": "Connection timeout after 10 seconds"
}
}
Common error codes include:
-
INVALID_URL: Malformed or inaccessible URL -
EXTRACTION_FAILED: Content extraction unsuccessful -
RATE_LIMIT_EXCEEDED: API quota exceeded -
AUTHENTICATION_ERROR: Invalid or missing API key
CI/CD Integration Example
Integrate DataPulse into automated workflows using GitHub Actions:
name: Daily Price Monitoring
on:
schedule:
- cron: '0 9 * * *' # Daily at 9 AM UTC
jobs:
price-check:
runs-on: ubuntu-latest
steps:
- name: Extract competitor pricing
run: |
response=$(curl -s -X POST https://api.aaido.dev/v1/products/datapulse \
-H "Authorization: Bearer ${{ secrets.DATAPULSE_API_KEY }}" \
-H "Content-Type: application/json" \
-d '{
"url": "${{ vars.COMPETITOR_URL }}",
"extract": ["price", "availability"],
"options": {"format_price": true}
}')
price=$(echo $response | jq -r '.data.price')
echo "Current competitor price: $price"
# Store in database or trigger alerts based on price changes
if [ "$price" != "null" ]; then
echo "price=$price" >> $GITHUB_OUTPUT
fi
- name: Update pricing database
if: steps.price-check.outputs.price
run: |
# Your database update logic here
echo "Updating database with new price data"
For Docker environments, create a simple monitoring script:
FROM alpine:latest
RUN apk add --no-cache curl jq
COPY price-monitor.sh /usr/local/bin/
CMD ["/usr/local/bin/price-monitor.sh"]
Rate Limits and Best Practices
The API implements rate limiting to ensure service stability:
- Free tier: 100 requests per hour
- Pro tier: 1,000 requests per hour
- Enterprise: Custom limits
Implement exponential backoff for production applications:
# Retry logic with exponential backoff
retry_count=0
max_retries=3
while [ $retry_count -lt $max_retries ]; do
response=$(curl -s -w "%{http_code}" -X POST https://api.aaido.dev/v1/products/datapulse \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d "$request_body")
http_code="${response: -3}"
if [ "$http_code" = "200" ]; then
echo "${response%???}" # Remove HTTP code from response
break
elif [ "$http_code" = "429" ]; then
sleep $((2 ** retry_count))
((retry_count++))
else
echo "Request failed with HTTP $http_code"
break
fi
done
Performance Optimization
For high-volume extraction, batch similar requests and cache results when appropriate. The API supports concurrent requests, but respect rate limits to avoid throttling.
Consider using webhooks for long-running extractions by including a callback_url parameter in your request payload.
DataPulse transforms web scraping from a complex, maintenance-heavy process into a simple API call. Whether you're building price comparison tools, content aggregators, or market research applications, the structured data output integrates seamlessly into modern development workflows.
For complete API documentation and advanced features, visit: https://api.aaido.dev/products/datapulse
Top comments (0)