DEV Community

Cover image for πŸ” Build Your Own Search Engine in PHP with Sitemap Parsing, Pagination, and Auto-Suggestions
dva3121
dva3121

Posted on

πŸ” Build Your Own Search Engine in PHP with Sitemap Parsing, Pagination, and Auto-Suggestions

how to build a simple PHP Search Engine with Sitemap Indexing, Pagination, and Auto-Suggestions β€” perfect for beginner to intermediate developers.


πŸ” Build Your Own Search Engine in PHP with Sitemap Parsing, Pagination, and Auto-Suggestions

If you've ever wanted to build a simple search engine for your own content or a client project, this post will show you exactly how to do it β€” using PHP, MySQL, Tailwind CSS, and basic JavaScript for real-time auto-suggestions.

We'll cover:

  • βœ… User authentication
  • βœ… Submitting and parsing sitemaps
  • βœ… Indexing pages in MySQL
  • βœ… Search input with live suggestions
  • βœ… Search result pagination

Let’s dive in!


🧱 1. Database Structure

Create a MySQL database and run the following SQL to set up required tables:

CREATE TABLE users (
  id INT AUTO_INCREMENT PRIMARY KEY,
  username VARCHAR(100) UNIQUE,
  password VARCHAR(255)
);

CREATE TABLE sitemaps (
  id INT AUTO_INCREMENT PRIMARY KEY,
  user_id INT,
  url TEXT,
  submitted_at DATETIME DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE search_index (
  id INT AUTO_INCREMENT PRIMARY KEY,
  url TEXT,
  title TEXT,
  description TEXT,
  content LONGTEXT,
  indexed_at DATETIME DEFAULT CURRENT_TIMESTAMP
);
Enter fullscreen mode Exit fullscreen mode

πŸ” 2. User Registration and Login (Tailwind Styled)

register.php – Tailwind-Styled Registration Form

<?php
include 'db.php';
$message = "";

if ($_SERVER['REQUEST_METHOD'] === 'POST') {
    $username = trim($_POST['username']);
    $password = $_POST['password'];

    if ($username && $password) {
        $hash = password_hash($password, PASSWORD_DEFAULT);
        $stmt = $conn->prepare("INSERT INTO users (username, password) VALUES (?, ?)");
        $stmt->bind_param("ss", $username, $hash);
        if ($stmt->execute()) {
            $message = "Registration successful. <a href='login.php' class='underline text-blue-500'>Login here</a>.";
        } else {
            $message = "Error: Username might already exist.";
        }
    } else {
        $message = "All fields are required.";
    }
}
?>
<!-- HTML form as shown in previous answer (Tailwind-styled) -->
Enter fullscreen mode Exit fullscreen mode

login.php – Secure Login

<?php
session_start();
include 'db.php';
$message = "";

if ($_SERVER['REQUEST_METHOD'] === 'POST') {
    $username = trim($_POST['username']);
    $password = $_POST['password'];

    $stmt = $conn->prepare("SELECT id, password FROM users WHERE username = ?");
    $stmt->bind_param("s", $username);
    $stmt->execute();
    $stmt->bind_result($userId, $hash);

    if ($stmt->fetch() && password_verify($password, $hash)) {
        $_SESSION['user_id'] = $userId;
        header("Location: dashboard.php");
        exit;
    } else {
        $message = "Invalid credentials.";
    }
}
?>
<!-- Tailwind styled login form similar to register.php -->
Enter fullscreen mode Exit fullscreen mode

πŸ—ΊοΈ 3. Submit Sitemap and Crawl URLs

submit_sitemap.php

<?php
session_start();
include 'db.php';

if (!isset($_SESSION['user_id'])) {
    die("Login required.");
}

if ($_SERVER['REQUEST_METHOD'] === 'POST') {
    $sitemap = $_POST['sitemap'];
    $userId = $_SESSION['user_id'];

    $stmt = $conn->prepare("INSERT INTO sitemaps (user_id, url) VALUES (?, ?)");
    $stmt->bind_param("is", $userId, $sitemap);
    $stmt->execute();

    // Start background crawl (e.g. queue for cron job)
    file_get_contents("http://localhost/crawler.php?sitemap=" . urlencode($sitemap));
    echo "Sitemap submitted. Crawling started!";
}
?>
Enter fullscreen mode Exit fullscreen mode

πŸ€– 4. Sitemap Crawler and Indexer (Basic)

crawler.php

<?php
include 'db.php';

function fetchSitemapUrls($sitemapUrl) {
    $xml = @simplexml_load_file($sitemapUrl);
    $urls = [];
    if ($xml && $xml->url) {
        foreach ($xml->url as $url) {
            $urls[] = (string) $url->loc;
        }
    }
    return $urls;
}

function indexUrl($url) {
    $html = @file_get_contents($url);
    if (!$html) return;

    preg_match("/<title>(.*?)<\/title>/i", $html, $titleMatch);
    preg_match("/<meta name=\"description\" content=\"(.*?)\"/i", $html, $descMatch);
    $title = $titleMatch[1] ?? 'No Title';
    $desc = $descMatch[1] ?? '';
    $cleanText = strip_tags($html);

    global $conn;
    $stmt = $conn->prepare("INSERT INTO search_index (url, title, description, content) VALUES (?, ?, ?, ?)");
    $stmt->bind_param("ssss", $url, $title, $desc, $cleanText);
    $stmt->execute();
}

$sitemap = $_GET['sitemap'] ?? '';
if ($sitemap) {
    $urls = fetchSitemapUrls($sitemap);
    foreach ($urls as $url) {
        indexUrl($url);
    }
    echo "Indexed " . count($urls) . " pages.";
}
Enter fullscreen mode Exit fullscreen mode

Use a cron job to trigger this script in production or queue it after submission.


πŸ”Ž 5. Search Engine Interface (with Autocomplete + Pagination)

index.php

<?php
include 'db.php';
$q = $_GET['q'] ?? '';
$page = max(1, intval($_GET['page'] ?? 1));
$perPage = 10;
$offset = ($page - 1) * $perPage;

$total = 0;
$results = [];

if ($q) {
    $stmt = $conn->prepare("SELECT COUNT(*) FROM search_index WHERE title LIKE CONCAT('%', ?, '%') OR content LIKE CONCAT('%', ?, '%')");
    $stmt->bind_param("ss", $q, $q);
    $stmt->execute(); $stmt->bind_result($total); $stmt->fetch(); $stmt->close();

    $stmt = $conn->prepare("SELECT * FROM search_index WHERE title LIKE CONCAT('%', ?, '%') OR content LIKE CONCAT('%', ?, '%') LIMIT ? OFFSET ?");
    $stmt->bind_param("ssii", $q, $q, $perPage, $offset);
    $stmt->execute(); $results = $stmt->get_result();
}
?>
<!-- Tailwind search form, result loop, and pagination from previous answer -->
Enter fullscreen mode Exit fullscreen mode

suggest.php for AJAX Autocomplete

<?php
include 'db.php';
$q = $_GET['q'] ?? '';
$suggestions = [];

if ($q) {
    $stmt = $conn->prepare("SELECT DISTINCT title FROM search_index WHERE title LIKE CONCAT(?, '%') LIMIT 10");
    $stmt->bind_param("s", $q);
    $stmt->execute();
    $res = $stmt->get_result();
    while ($row = $res->fetch_assoc()) {
        $suggestions[] = $row['title'];
    }
}
header('Content-Type: application/json');
echo json_encode($suggestions);
Enter fullscreen mode Exit fullscreen mode

πŸš€ Final Thoughts

You now have a working PHP search engine that:

  • Lets users submit and index sitemaps
  • Crawls and stores searchable content
  • Provides a nice UI with Tailwind
  • Includes live search suggestions and pagination

πŸ”§ Bonus Ideas:

  • Add full-text indexing with MySQL FULLTEXT
  • Use Redis to cache popular queries
  • Add tags or categories to indexed pages

Top comments (0)