PrimoCrypt

Posted on Jun 13

Two Projects That Taught Me More Than Any Tutorial - HNG14 Internship Reflections

#backend #nestjs #typescript #internship

There's a difference between knowing how something works and having built one that broke at 2 AM.

The HNG14 internship was nine stages of backend engineering tasks, each one more demanding than the last. Some stages took a few hours. Others ate entire weekends. Out of all of them, two projects stuck with me, not because they were the hardest on paper, but because they forced me to think about things I'd never had to think about before.

This post is about those two projects: Eventrail, an append-only event store I built solo, and the Employer Assessment System for SkillBridge (CredLane), a team-stage feature that touched everything from database schema design to parsing XLSX files without a library.

Task 1: Eventrail - The Append-Only Event Store (Individual Stage)

What It Was

Eventrail is a small NestJS HTTP service that stores arbitrary JSON events in an append-only log file. No database. No SQLite. No MongoDB. Just a single events.log file where every event gets appended as a newline-delimited JSON line.

The API is simple: POST /events to create one, GET /events/:id to read one back, GET /stats to see totals. The interesting part isn't the API, it's everything underneath.

The Problem It Was Solving

The task was about infrastructure resilience. The question was: can your service survive a crash and come back with all its data intact? Can it recover its state from nothing but the log file on disk?

Most tutorials teach you to reach for a database. This task explicitly took that away. You had to think about what a database actually does for you - durability, indexing, crash recovery - and then implement a stripped-down version of those guarantees yourself.

How I Approached It

I started with the write path. Every incoming event gets a UUID and a timestamp, gets serialized to JSON, and gets appended to events.log as a single line. One event, one line, always at the end of the file. No overwrites, no edits. That's the "append-only" part, and it's the reason the system is safe: if the process dies mid-write, older events are untouched because nothing in the file was modified.

For reads, I built an in-memory Map<id, { offset, length }>, a tiny index that maps each event ID to its exact byte position in the file. When a GET /events/:id request comes in, the service looks up the offset and length, seeks directly to that byte range in the file, reads exactly those bytes, and parses the JSON. No scanning. No loading the whole file.

On startup, the service reads events.log line by line, recalculates every offset, and rebuilds the index from scratch. If the log says there are 500 events, the index gets 500 entries. The log file is the source of truth.

What Broke and How I Fixed It

The byte offset bug. This one cost me real time. My first implementation tracked offsets using JavaScript's .length property on strings. That works fine for ASCII. The moment someone posts an event with an emoji or a non-Latin character, the string length and the byte length diverge. A string like "こんにちは" is 5 characters but 15 bytes in UTF-8. My index was pointing to the wrong positions in the file, and reads were returning corrupted JSON.

The fix was Buffer.byteLength(line, 'utf8') everywhere. Every offset calculation, every length calculation - bytes, not characters. It seems obvious in hindsight, but it's the kind of bug that only shows up with the right payload, and the error message (SyntaxError: Unexpected token) gives you absolutely no hint about what's actually wrong.

The concurrent write race. Two simultaneous POST /events requests would both read this.bytes at the same time, calculate the same offset, write their events, and then one event's index entry would point to the other event's data. The in-memory state and the file would drift apart.

I fixed this with a promise-based write queue:

private writeQueue = Promise.resolve();

private async enqueueWrite(event: StoredEvent) {
  const queuedWrite = this.writeQueue.then(() => this.appendEvent(event));
  this.writeQueue = queuedWrite.catch(() => undefined);
  await queuedWrite;
}

Every write chains onto the previous one. No two writes execute at the same time. It's not the most sophisticated concurrency control, but it's correct, and it's simple enough that I can reason about it at 2 AM.

What I Took Away

This project made persistence mechanics tangible. Before Eventrail, "write-ahead log" was a term I'd read about in database internals posts. After Eventrail, it's something I built and debugged. I now understand viscerally why indexes are separate from storage, why byte offsets matter, and why crash recovery has to be a first-class concern rather than an afterthought.

I also learned that NestJS lifecycle hooks (OnModuleInit, OnApplicationShutdown) are genuinely useful for startup recovery and graceful shutdown, not just ceremony.

Why I Picked This One

Because it broke in ways I didn't expect. The byte offset bug in particular taught me something I wouldn't have learned from a tutorial: the gap between "this works on my test data" and "this works on real data" is often a single Unicode character wide.

Task 2: Employer Assessment System - SkillBridge / CredLane (Team Stage)

What It Was

A complete employer assessment module for a talent platform called SkillBridge (CredLane). The feature lets verified employers create skill assessments, populate them with questions - either from CredLane's own question bank or from company-supplied questions - share them via public links or direct candidate invites, collect submissions with server-side scoring, and review results on a dashboard with pass/fail filtering.

The commit alone was 2,754 lines across 22 files: 4 new entities, 2 database migrations, a full service layer with ~1,000 lines, a controller with 12 endpoints, a test suite with 30+ test cases, and a zero-dependency XLSX parser and builder.

The Problem It Was Solving

Employers on the platform needed a way to screen candidates before making offers. The existing flow had employers browsing profiles and sending offers, but there was no way to verify skills. The assessment system closed that gap: an employer creates an assessment for a specific role track and experience level, sets a time limit and passing threshold, and either shares a public link or sends it directly to shortlisted candidates. Candidates take the assessment, and the employer reviews scored results.

The tricky part wasn't any single feature - it was making all the pieces work together safely. Concurrent assessment creation with an active limit. Duplicate submission prevention under race conditions. Server-side scoring that never trusts the client. File imports that can't crash the server. Shareable links that stop working when deactivated.

How I Approached It

I split the work into layers:

Schema first. I designed 4 tables - employer_assessments, employer_assessment_questions, employer_assessment_invites, and employer_assessment_submissions - with CHECK constraints baked into the migration. Time limits could only be 20, 30, 40, or 60 minutes. Passing thresholds had to be between 50 and 90. Scores had to be between 0 and 100. I wanted the database to reject bad data even if validation somehow slipped past the application layer.

Question sources. Employers could either write their own questions or pull from CredLane's verified question bank, filtered by role track and experience level. Company questions required a minimum of 5 before the assessment could be generated. The CredLane bank path queried the existing assessment_questions table with is_live = true, mapped question types between the two systems, and returned up to 10 questions.

Share and delivery. Each assessment gets a share_token - 24 random bytes hex-encoded. The public endpoint strips correct answers from the response so candidates see only the question text and options. Employers can also send assessments directly to saved candidates, which triggers notifications through the platform's dispatch service.

Scoring. All scoring happens server-side. The client submits { answers: { questionId: selectedAnswer } }, and the service compares each answer against the stored correct_answer, normalizing both sides with .trim().toLowerCase(). The score is Math.round((correct / total) * 100), and passed is score >= passing_threshold. The client never sees correct answers and never computes its own score.

What Broke and How I Fixed It

The active assessment limit race. The MVP enforced a limit of 3 active assessments per employer. My first implementation was a simple count-then-insert: query how many active assessments exist, reject if >= 3, then insert. Classic TOCTOU bug. Two concurrent requests could both read count=2, both pass the check, and both insert, giving the employer 4 active assessments.

The fix was pessimistic locking inside a transaction. Before counting, I lock the employer's user row with SELECT ... FOR UPDATE:

private async lockEmployerForAssessmentCreation(
  manager: EntityManager,
  userId: string,
): Promise<void> {
  const user = await manager
    .getRepository(User)
    .createQueryBuilder('user')
    .setLock('pessimistic_write')
    .where('user.id = :userId', { userId })
    .getOne();
  // ...
}

Now the second concurrent request blocks on the lock until the first transaction commits, at which point the count reflects the newly created assessment. No more limit bypass.

Duplicate submissions under concurrency. A candidate could submit an assessment twice if two requests arrived nearly simultaneously. The first check - findOne for an existing submission - would return null for both, and both would insert. I added a unique constraint at the database level (assessment_id, candidate_user_id), and then caught the Postgres 23505 unique-violation error and translated it to a ConflictError:

try {
  return await this.submissionRepo.save(submission);
} catch (error: unknown) {
  if (isPostgresUniqueViolation(error)) {
    throw new ConflictError('You have already submitted this assessment.');
  }
  throw error;
}

Defense in depth: application-level check for the happy path, database constraint for the race condition.

XLSX parsing without a library. The requirement was to let employers import questions from CSV or XLSX files. CSV is straightforward - I wrote a character-by-character parser that handles quoted fields and embedded commas. XLSX was another story entirely. XLSX is a ZIP archive containing XML files. Rather than pulling in a heavy library like xlsx or exceljs, I wrote a minimal XLSX parser from scratch: read the ZIP central directory, inflate compressed entries with zlib.inflateRawSync, resolve the first sheet from workbook.xml.rels, parse shared strings, then extract cell values from the sheet XML.

It was more work than using a library, but the result is zero runtime dependencies for this feature, and I actually understand the XLSX format now. The ZIP parsing alone - finding the end-of-central-directory record, reading file headers, calculating data offsets - felt like a smaller version of the byte-offset work I did on Eventrail.

What I Took Away

This project taught me that the real complexity of backend features isn't in the happy path - it's in the concurrent, adversarial, and edge-case paths. The count-then-insert race, the duplicate submission race, the CSV with embedded commas and quotes - these are the things that separate production code from tutorial code.

I also got a much better intuition for when to push validation into the database versus the application. CHECK constraints, unique indexes, and foreign keys are your last line of defense. If your application layer has a bug, the database should still refuse bad data.

And writing the XLSX parser was a reminder that "just use a library" isn't always the right answer. Sometimes the dependency is heavier than the problem, and building it yourself gives you understanding you'd never get otherwise.

Why I Picked This One

Because it's the project that made me think the hardest about concurrency. The pessimistic locking pattern, the double-layer duplicate prevention, the transactional assessment creation - these are patterns I'll use for the rest of my career. And the XLSX parser was the most fun I've had writing code in months. Parsing binary formats by hand is underrated.

Final Thoughts

The HNG14 internship pushed me further than I expected. Not because the individual concepts were new - I'd heard of append-only logs and pessimistic locking before - but because building them for real, with real constraints, exposed all the gaps between theoretical knowledge and working software.

If you're considering the HNG internship or the HNG premium track, my advice is this: pick the tasks that scare you a little. Those are the ones you'll still be thinking about months later. Those are the ones worth writing about.

DEV Community

Two Projects That Taught Me More Than Any Tutorial - HNG14 Internship Reflections

Task 1: Eventrail - The Append-Only Event Store (Individual Stage)

What It Was

The Problem It Was Solving

How I Approached It

What Broke and How I Fixed It

What I Took Away

Why I Picked This One

Task 2: Employer Assessment System - SkillBridge / CredLane (Team Stage)

What It Was

The Problem It Was Solving

How I Approached It

What Broke and How I Fixed It

What I Took Away

Why I Picked This One

Final Thoughts

Top comments (0)