How I Built an Evidence-Based Developer Assessment Platform

#ai #discuss #startup #career

So, what started off as a side project of a side project on a weekend just a simple vibe coded prototype has now turned into fully fledged SaaS, months of work testing to death ensuring every edge case tested no stone left unturned. This is the story of how I built Exiqus a GitHub Evidence-Based Developer Assessment.

Like any side project of a side project started off with a simple idea and that for me was providing metrics on github repos to provide to hiring managers etc. The premise was simple enough take a candidates repo analyse the repo across different metrics and then provide those in a simple dashboard assigning to either HIRE, PASS, INVESTIGATE. I did not think beyond this backend was underway and tested across multiple repos using the cheapest model I can find and being extremely stingy of how much I will allow an AI model to actually analyse if it actually warranted an analysis based on the quality of the repo so it was AI analysis or a template analysis and based on which one if it hit a number it will trigger one of the 3 hire, pass, investigate.

From there I quickly went onto finish the remaining backend API endpoints etc basic rate limiting all what backend entails. Then came the frontend not my favourite part of the process but one that was required:

Core Framework:

Next.js 15.3.5 (React framework with App Router)
React 18 (UI library)
TypeScript (type safety)

Classic SaaS white background nothing special. Once frontend was completed it was time to test the UI, this was few weeks into the side project of a side project at this point I was somewhat pleased with progress leveraging AI tools as best as I can to get this to launch and generate revenue off the get go so I thought.

The First Real UI Test

So it comes to running my first analysis on this metric driven assessment tool, simple process as taking public repo and pasting it and waiting for metrics to generated, one thing seeing results etc in a backend environment and another seeing it via UI, so I see the results and it was bare and I mean metrics were poor off the start and minimal use of the AI model, turns out I put a strict bottleneck on when the AI should analyse a repo and barely when it did the cost of the analyses was a pittance.

Then I discovered something worse - I had an underscore bug. I was literally paying for AI analysis and then throwing the results away:

_ = await asyncio.to_thread(...)

One underscore destroying the entire value proposition. If I was vibe coding this and not properly testing UX through UI end to end, I'd be scamming people.

So I went back and ensured that every repo will get AI analysis again using the cheapest model but still decent enough to provide some meaningful metrics, so I ran the analysis again and it was better it provide metrics with a percentage assigned across various factors documentation, code implementation etc, the overall score will come from the individual metrics and then based on the overall score it will trigger as mentioned hire, pass, investigate.

What I saw left me underwhelmed and I told myself I wouldn't use this rubbish for free let alone charge people for it.

I was disappointed as this was the launch pad for first live project. 3 weeks and left with more questions than ability to launch. What triggered me was the crappy naive attempt to automate the hiring process using a single source/metric github to determine if someone should be hired etc. It was terrible how can you possibly determine that with one source and it turns the source github is only used by 30% of hiring managers and even then, they don't weight it with any importance, some even think it's a detriment amongst other factors that keep hiring managers away from even asking for repos also turns out no big tech companies from my basic research don't even asks for it.

I went back to the drawing board and said I won't build a system that makes blanket assessments/judgements based on one factor during hiring process very naïve of me at best at worst completely fucking arsene.

The Great Purge

This led me to question the entire premise of a metric-based systems, some arbitrary black box designed that only those on the inside know its makeup and its apparent based on deep algorithms that is only understood by them and everyone must take at face value and give in to they know best.

So I decided to rip up this approach and go for completely evidence driven approach one that the user can fully understand and see no black box metrics/algorithms that only few understand one that everyone can understand because it's linked to single point of source.

This became "The Great Purge" - three months of architectural chemotherapy. No more scores. No more verdicts. Only evidence.

This has now become Exiqus methodology - https://www.exiqus.com/methodology. I have done my best to be transparent about what we actually analyse vs what we don't being upfront from the get go no hidden agendas no rug pulls complete transparency for all to understand.

This entire system switch took months with vigorous testing up to a point where I thought I would not launch perfection was the enemy of actually fucking launching. I had a cancellation feature that was completely fake - just UI theater with no backend. I could've spent weeks implementing proper async cancellation. Instead, I deleted it, added a disclaimer saying "Analysis takes 2-3 minutes" and moved on. That decision saved me from an October launch - shipped in September instead.

I wanted to be truly proud of the side project of a side project well at this point it become more then side project of a side project, it became my life working all hours while working full time, I was consumed by it I wanted to ensure that when utilised it would be completely useful and understandable to anyone using this platform technical or not, it be qualitative based with actual evidence, well that's what I built what you will see now a complete evidence driven assessment tool, that analyses any public repo and provides insights, actions even interview questions but all linking back to evidence drawn from said repo.

Well, this approach required a complete redesign of the platform the white design served a stark reminder of the naïve and stupid approach to this and needed a complete redesign and that's what I did, a slick dark theme to fit my new evidence driven system.

The Moment of Truth

Now the UI test when running analysis now I was finally proud of the thing I built. First production analysis: geohot's QIRA. The questions it generated were brutal:

"Your fetchlibs.sh script supports seven architectures. Describe your strategy for handling cross-platform binary analysis - what are the key differences between analyzing ARM vs x86 vs MIPS binaries?"

These aren't LeetCode puzzles. These are questions only someone who actually wrote the code could answer.

It wasn't a feeling of dissatisfaction this time around but feeling of hard to describe but I was pleased and something I'm happy to provide to the world at cost of course, this does serve purpose for me and that is generating revenue to fund another project I'll be lying otherwise.

The Bigger Picture

What I hope with Exiqus and the bigger picture, with tech interviews there seems be a blueprint general/technical interviews followed with tests like LeetCode alike, as you all know that's the standard and has been for some time - https://www.exiqus.com/why. Another form of standardized test which can be gamed we all know the story.

The idea for Exiqus as I was building especially the switch from metric based to evidence based is for github repo to be the norm for hiring managers to ask for this when applying and for future candidates to have a profile and portfolio of repos. I think its time we move away from something that can be gamed to something that is valuable which actual work instead of studying for tests that lets face will not be used in day-to-day work and its already shown tests have very little to no correlation on actual work performance and actually measure stress within a time constraint.

Github repos actually represent work over a natural course of time like software even hardware development no tests can measure that. Now with Exiqus you can see insights/evidence/questions etc surrounding a candidates repo and have actual meaningful discussion about the work, and guess what it can't be gamed the questions we generate are based on the repo itself and only the person who actually worked on the repo will be able to answer the questions in detail, as a hiring manager you're more likely to extrapolate actual meaning answer that may help you in identifying if a candidate is the right person for the role, we use four contexts Startup, Enterprise, Agency and open source and those become tailored to which one you select.

We want actual work by the candidate to be used as a source during the interview process not some tests that actual in reality do very little to understand the person you're trying to hire "Instead of testing if they can solve puzzles under pressure, let's look at the actual code you write and have meaningful conversations about it."

For candidates its simple revise less for tests and actually work actually writing code.

Because even one well-documented project reveals more about your abilities than months of algorithmic puzzle solving.

Full disclosure: I don't work in tech, so I could be wrong about everything. This was all from scouring the internet/forums and fundamental research. Maybe the current paradigm is perfect. Maybe Exiqus is just another useless SaaS in a world full of useless SaaS.

Only time will tell.

https://www.exiqus.com/

Hiring managers interested in a free trial: sales@exiqus.com (no card required)