Nithin Bharadwaj

Posted on Dec 13

7 Essential Data Access Patterns Every Developer Must Master for Scalable Applications

#programming #devto #webdev #softwareengineering

As a best-selling author, I invite you to explore my books on Amazon. Don't forget to follow me on Medium and show your support. Thank you! Your support means the world!

When I build web applications today, I don't just talk to a single database. My applications might need a traditional relational database for transactions, a fast document store for user profiles, and a caching layer for session data. Juggling these different systems directly in my business logic is a recipe for confusion. It makes code hard to change, test, and understand.

This is where the idea of a data access layer comes in. Think of it as a dedicated translator or ambassador between my application's core logic and the various places it stores data. Its job is to handle all the messy details of talking to databases, so the rest of my code can focus on what the application actually does. Over time, developers have settled on several reliable ways to build this layer. Let me walk you through seven of the most useful patterns I use regularly.

The first concept is the Repository. Imagine you have a box for every major type of data in your app: a User box, an Order box, a Product box. You don't care what's inside the box or how it organizes things. You just tell the box, "give me the user with this email," or "save this new order." The box handles the rest. In code, a Repository is a class that acts as that box. It provides a collection-like interface for your data.

Here's a simple example in TypeScript. I start by defining what every User box must be able to do.

interface UserRepository {
  findById(id: string): Promise<User | null>;
  findByEmail(email: string): Promise<User | null>;
  save(user: User): Promise<void>;
  delete(id: string): Promise<boolean>;
}

This interface is a contract. It says, "I don't care how you do it, but if you want to be a UserRepository, you must be able to find users by ID or email, save them, and delete them." Now, I can write the actual implementation for a PostgreSQL database.

class PostgresUserRepository implements UserRepository {
  constructor(private pool: Pool) {}

  async findById(id: string): Promise<User | null> {
    const result = await this.pool.query(
      'SELECT * FROM users WHERE id = $1',
      [id]
    );
    return result.rows[0] || null;
  }

  async save(user: User): Promise<void> {
    await this.pool.query(
      `INSERT INTO users (id, email, name) 
       VALUES ($1, $2, $3)
       ON CONFLICT (id) DO UPDATE SET
       email = $2, name = $3`,
      [user.id, user.email, user.name]
    );
  }
}

The beauty of this is in my main application code. I just work with the UserRepository interface. If tomorrow I need to switch from PostgreSQL to MySQL, I write a new MySqlUserRepository class that implements the same interface. The hundreds of other files in my project that depend on the UserRepository don't need to change at all. They keep calling .findByEmail() like before, completely unaware the underlying database changed.

Now, writing raw SQL strings inside these repository methods can get tedious, especially for complex queries with many optional filters. That's where Query Builders come in. A query builder lets me construct a database query using a chain of method calls, like building with Lego blocks. It's more readable and safer than gluing strings together.

Let me show you with Knex.js, a popular query builder for Node.js. Suppose I need to get a paginated list of active users.

const activeUsers = await knex('users')
  .select('id', 'email', 'created_at')
  .where('status', 'active')
  .whereBetween('created_at', [startDate, endDate])
  .orderBy('created_at', 'desc')
  .limit(20)
  .offset(40);

The code almost reads like a sentence: "From the 'users' table, select id, email, and created_at where status is active and created_at is between two dates, order them descending, and give me 20 results starting from the 40th." For more complex operations, like aggregations with joins, it remains clear.

const bigSpenders = await knex('orders')
  .join('users', 'orders.user_id', 'users.id')
  .select([
    'users.email',
    knex.raw('SUM(orders.total_amount) as lifetime_value')
  ])
  .groupBy('users.email')
  .having('lifetime_value', '>', 1000);

The query builder generates the proper SQL for me, handling differences in dialect between PostgreSQL, MySQL, and SQLite. It also protects me from a common security flaw called SQL injection, by safely parameterizing my values.

Often, the shape of data in my database is different from the shape of objects in my application. My database table might have columns full_name and created_at, but my User class has properties name and joinedDate. Manually converting back and forth is error-prone. The Data Mapper pattern formalizes this conversion.

A Data Mapper's only job is to translate between database records and domain entities. It knows how to take a raw row from the database and birth a proper User object from it, and vice versa.

class UserMapper {
  toEntity(row: UserRecord): User {
    // Transform database snake_case to application camelCase
    // Convert string dates to Date objects
    return new User(
      row.id,
      row.email,
      row.full_name, // Database field
      new Date(row.created_at) // String to Date
    );
  }

  toPersist(user: User): UserRecord {
    // Prepare the object for the database
    return {
      id: user.id,
      email: user.email,
      full_name: user.name, // Application field to DB field
      created_at: user.joinedDate.toISOString(), // Date to String
    };
  }
}

I then use this mapper inside my service layer, keeping the translation logic in one predictable place.

class UserService {
  constructor(
    private repository: UserRepository,
    private mapper: UserMapper
  ) {}

  async getUserProfile(id: string): Promise<UserProfile> {
    const record = await this.repository.findById(id);
    if (!record) throw new Error('User not found');
    const userEntity = this.mapper.toEntity(record);
    // Now I can work with a proper User object
    return this.buildProfile(userEntity);
  }
}

This separation keeps my core User class pure. It doesn't need special annotations or knowledge about the database. The mapper handles the glue.

Opening and closing a database connection for every single query is slow. It's like hanging up the phone after every sentence in a conversation. Connection Pooling solves this. When my application starts, it opens a group of connections to the database and keeps them alive in a "pool." When my code needs to run a query, it checks out a connection from the pool, uses it, and then returns it. This reuse saves a tremendous amount of time.

Here's how I typically set up a pool with the Node.js pg library for PostgreSQL.

const { Pool } = require('pg');
const pool = new Pool({
  host: process.env.DB_HOST,
  database: process.env.DB_NAME,
  user: process.env.DB_USER,
  password: process.env.DB_PASSWORD,
  max: 20, // Don't open more than 20 connections at once
  idleTimeoutMillis: 30000, // Close idle connections after 30 sec
});

Using the pool looks like this. I get a client, run my operations, and make sure to release it back so someone else can use it.

async function getUserOrders(userId) {
  const client = await pool.connect(); // Check out a connection
  try {
    const result = await client.query(
      'SELECT * FROM orders WHERE user_id = $1',
      [userId]
    );
    return result.rows;
  } finally {
    client.release(); // Always release it back to the pool
  }
}

This pattern is crucial for performance. Without it, a moderately busy website would spend more time negotiating connections than actually fetching data.

As my application grows, its database structure needs to change. I might need to add a new column, create an index for speed, or split a table in two. I can't just do this manually on the production server. I need a systematic, repeatable way to evolve the schema. This is managed through Database Migrations.

A migration is a small file that describes a change to the database. Each migration is numbered and has an up function (apply this change) and a down function (undo this change). These files live in my version control system, right next to my application code.

Here’s what a migration to create a users table might look like.

// 20231015091032_create_users_table.js
exports.up = (pgm) => {
  pgm.createTable('users', {
    id: { type: 'uuid', primaryKey: true, default: pgm.func('gen_random_uuid()') },
    email: { type: 'varchar(255)', notNull: true, unique: true },
    name: { type: 'varchar(255)', notNull: true },
    created_at: { 
      type: 'timestamp', 
      notNull: true, 
      default: pgm.func('current_timestamp') 
    }
  });
  pgm.createIndex('users', 'email'); // Index for fast lookups by email
};

exports.down = (pgm) => {
  pgm.dropTable('users'); // How to rollback this change
};

I run a command like npm run migrate up, and the migration tool applies all new migrations in order. If I deploy a bug, I can run npm run migrate down 1 to revert the last change. This ensures every environment—my laptop, the testing server, and production—has an identical database structure. It’s also how I can safely rename a column or backfill data.

exports.up = async (pgm) => {
  // Add the new column
  pgm.addColumn('users', {
    display_name: { type: 'varchar(255)' }
  });
  // Backfill: copy data from old column to new
  await pgm.sql(`
    UPDATE users 
    SET display_name = name
    WHERE display_name IS NULL
  `);
  // Now I can safely plan to remove the old 'name' column later
};

For many projects, especially when I'm working in an object-oriented language, using a full Object-Relational Mapper (ORM) makes sense. An ORM is a powerful tool that lets me interact with my database using my programming language's objects. I define my models once, and the ORM handles creating the tables, writing the queries, and fetching related data.

Prisma is a modern ORM that I often reach for. I first define my data model in a schema file.

// schema.prisma
model User {
  id        String   @id @default(uuid())
  email     String   @unique
  name      String?
  posts     Post[]   // This defines a relationship
  createdAt DateTime @default(now())
}

model Post {
  id        String   @id @default(uuid())
  title     String
  content   String?
  author    User     @relation(fields: [authorId], references: [id])
  authorId  String
}

From this schema, Prisma generates a fully type-safe client. My code becomes very expressive and clear.

// Create a new user with their first post in one operation
const newUser = await prisma.user.create({
  data: {
    email: 'alice@example.com',
    name: 'Alice',
    posts: {
      create: {
        title: 'Hello, World!',
        content: 'My first blog post.'
      }
    }
  },
  include: {
    posts: true // Include the related posts in the result
  }
});

// A complex, type-safe query
const popularAuthors = await prisma.user.findMany({
  where: {
    posts: {
      some: {
        likes: { gt: 100 }
      }
    }
  },
  include: {
    _count: {
      select: { posts: true }
    }
  }
});

The ORM manages the relationships for me. When I fetch a user, I can easily load all their posts, comments, or other related data without writing explicit join queries. It feels like I'm just working with native objects. The trade-off is that for extremely complex queries, I might still drop down to raw SQL for performance, which most ORMs also allow.

Finally, modern applications rarely use one database. I might use PostgreSQL as my main "source of truth," Redis for caching session data, and perhaps Elasticsearch for powerful text search. The Database Abstraction Layer pattern helps me manage this diversity.

The goal is to create a consistent interface for different types of data operations, even if the backing store is completely different. For example, I might define a simple key-value store interface.

interface KeyValueStore {
  get(key: string): Promise<any>;
  set(key: string, value: any, ttl?: number): Promise<void>;
  delete(key: string): Promise<boolean>;
}

I can then implement this interface for a real cache like Redis.

class RedisStore implements KeyValueStore {
  constructor(private client: RedisClient) {}

  async get(key: string): Promise<any> {
    const data = await this.client.get(key);
    return data ? JSON.parse(data) : null;
  }

  async set(key: string, value: any, ttl?: number): Promise<void> {
    const serialized = JSON.stringify(value);
    if (ttl) {
      await this.client.setex(key, ttl, serialized); // Set with expiry
    } else {
      await this.client.set(key, serialized);
    }
  }
}

And I could write a similar implementation that uses the browser's localStorage for a client-side prototype, or even a simple in-memory map for testing. My application code uses the KeyValueStore interface, so it doesn't know or care what's behind it.

I can combine these patterns to build sophisticated data flows. A common example is a cache-aside strategy.

class UserServiceWithCache {
  constructor(
    private userRepo: UserRepository, // Talks to PostgreSQL
    private cache: KeyValueStore     // Talks to Redis
  ) {}

  async getUser(id: string): Promise<User> {
    const cacheKey = `user:${id}`;

    // 1. Check the cache first
    let user = await this.cache.get(cacheKey);
    if (user) {
      console.log('Cache hit!');
      return user;
    }

    console.log('Cache miss. Querying database.');
    // 2. If not in cache, get from main database
    user = await this.userRepo.findById(id);
    if (!user) throw new Error('User not found');

    // 3. Store in cache for next time
    await this.cache.set(cacheKey, user, 300); // Keep for 5 minutes

    return user;
  }
}

This pattern dramatically reduces load on my primary database for frequently accessed data. The abstraction lets me manage this complexity in a clean, understandable way.

Putting it all together, these patterns are not rules to follow rigidly, but tools to choose from. For a small, simple app, a query builder might be all I need. For a large system with complex business rules, I might combine Repositories, Data Mappers, and a Connection Pool, while using Migrations to manage schema changes from day one.

The core idea across all of them is the same: separate the concern of data storage from the concern of business logic. My application shouldn't be littered with SQL strings or details about connection strings. It should express its intent—"get the active users," "place a new order," "update a profile." The data access layer, built with these patterns, figures out how to make that happen. This separation makes my code easier to read, test, and change over the long life of a project. When I need to scale, swap a database, or add a new cache, I have a clear place to make those changes without breaking the entire application.

📘 Checkout my latest ebook for free on my channel!

Be sure to like, share, comment, and subscribe to the channel!

101 Books

101 Books is an AI-driven publishing company co-founded by author Aarav Joshi. By leveraging advanced AI technology, we keep our publishing costs incredibly low—some books are priced as low as $4—making quality knowledge accessible to everyone.

Check out our book Golang Clean Code available on Amazon.

Stay tuned for updates and exciting news. When shopping for books, search for Aarav Joshi to find more of our titles. Use the provided link to enjoy special discounts!