Recently, I announced SolSistr, a platform that I have been building since October 2024 that organizes sorority recruitment data into a unified system, enabling chapters to manage and enhance their recruitment process through the power of AI.
While I shared the business motivation and launch story on LinkedIn, I wanted to write a technical reflection for those interested in the engineering side of building a SaaS from scratch.
In the following series of posts, I will cover:
- Tech stack overview: Why I chose these tools and frameworks.
- Main Features: Integral pieces of the puzzle.
- Pitfalls I encountered: What broke, what was harder than expected, and what I would do differently.
- Lessons learned: For anyone looking to build and ship their own SaaS.
I hope to document the major wins and setbacks I’ve encountered during this development journey and share advice for others on a similar path.
Pitfalls
Although I’d like to say this was a cakewalk, I encountered several challenges and unexpected hurdles that tested both the system’s design and my determination to see it through.
Hometown CSV loading/caching trouble
What it was
As a part of the user's profile customization, I added a feature that allowed for the selection of a user's hometown based on a preloaded csv of US Cities. While having such a comprehensive list available on the client seemed convenient, fetching a 36,000-line CSV on every page load proved to be extremely taxing. Page load times stretched into the double digits in seconds, and I even reached the monthly rate limit on the free subscription of Neon Console in a matter of days, making it clear that a more efficient solution was necessary.
Iteration
My first thought was to create a client context to store the list after the initial page load and cache it in local storage. This would allow the app to fetch the CSV only once and reuse the data across sessions, reducing unnecessary downloads and improving load times. While this seemed like a solid solution in theory, it didn’t play well with my development workflow. During testing, I was frequently deleting my profile, signing out, and going through the entire login flow repeatedly. Since I was using NextAuth.js, this process would require me to clear my cache each time I logged out, which meant the cached US Cities CSV was wiped before the expiry I had set. As a result, I started running into errors on reload because the app expected the list to be present when it wasn’t.
After some research, I decided to dynamically query the US Cities table only when needed. Instead of storing the full CSV, user profiles now store the ID of their selected hometown. Whenever a page needs to display multiple user profiles, it performs a bulk fetch for only the required hometowns on page load, rather than loading the entire list.
Additionally, for selecting a hometown, I built an autocomplete component that filters hometowns based on what the user types. To mitigate lag, I implemented a function that waits until the user has entered at least three characters before dynamically fetching a filtered list of matching cities from the database. This approach significantly reduced load times while still providing a seamless user experience during hometown selection.
const fetchCitiesGet = async (searchTerm: string) => {
if (searchTerm.length < 2) {
setCities([]);
return;
}
setLoading(true);
try {
const response = await fetch(`/api/get-cities?q=${encodeURIComponent(searchTerm)}`);
if (!response.ok) {
throw new Error(`HTTP error! status: ${response.status}`);
}
const contentType = response.headers.get('content-type');
if (!contentType || !contentType.includes('application/json')) {
throw new Error(`Expected JSON but got ${contentType}`);
}
const fetchedCities = await response.json();
setCities(fetchedCities);
} catch (error) {
console.error('Error fetching cities:', error);
setCities([]);
} finally {
setLoading(false);
}
};
Chat component (scope creep)
What it was
Initially, I envisioned the need for a Chat feature within the app to allow for seamless communication all in one place. Despite this sounding valuable on paper, I found myself more interested in the technical aspects of its integration than in the necessity of the component itself. I went as far as designing a UI template and researching websocket integration to enable real-time messaging between chapter members and admins.
Iteration
While this feature might still be an option/wanted in the future, I realized that adding a fully functional chat system at my current skill level would expand the project's scope without directly advancing its core purpose. Since the beginning of this project, my dad (goated SWE) kept telling me “You need to focus on the nuts and bolts—what are the bare necessities this thing needs to work?”. Building a system like this would require message persistence, notification toasts, delivery guarantees—a ton of additional infrastructure that would divide my attention from the core goals of the project, especially while taking classes at the same time. It became clear that adding a full chat system wasn’t just a feature, but it was practically a separate product, and taking it on would risk delaying the parts of the project that mattered most.
University selection / university domain search
What it was
Initially, I handled university selection very similarly to the way I handled US City selection. I had found a US University CSV online and I loaded each of them into a searchable auto complete component in which the user could modify on their profile. While this CSV was a lot smaller than the list of US Cities, there was still one major concern with this process: How often do users change which university they go to?
More importantly, a fundamental concern for the app as a whole was verifying that users are actually college students. We needed to prevent random individuals from creating accounts and interfering with the recruitment process at schools they had no intention of attending.
Iteration
My solution to this was grafting another dataset, a list of World Universities CSV, and filter it down to universities located in the US. I then attached each university’s list of email domains to my dataset, allowing the system to verify a user’s enrollment by checking that their provided email matched a valid university domain. Additionally, this verifies that no other existing accounts have been verified with the same university email, ensuring that each account is attached to one and only one university email. I ended up choosing Resend, an email-sending API that allowed me to send verification links with an attached activation token to the user’s school email. Once the user clicked the link, their account would be verified and activated automatically.
Open AI Prompting
What it was
In the member dashboard, I knew I wanted to provide AI-generated suggested conversation starters to help members prepare for interactions with PNMs. However, I quickly discovered I had no idea what kinds of roadblocks I would encounter when trying to get the AI to output exactly what I needed. There were a couple of funny outputs I ran into during testing.
Iteration
To begin, I aimed to prioritize conversation starters that highlighted strong similarities between the two users’ profiles. However, this approach sometimes caused the AI to tie together unrelated topics in ways that didn’t make sense, often resulting in nonsensical or awkward language. For example, when comparing User A and User B where User A had a strong interest in weightlifting and User B had a strong interest in hiking, it output:
Hey [User B Name], I see you are into hiking and I am into weightlifting, maybe one day we can meet up and hike a mountain and do a dumbbell workout at the top!!
While it could technically be argued that the similarity here is “physical activity,” this is an oddly specific and unrealistic suggestion that would likely lead to an awkward conversation rather than a natural icebreaker.
Another issue that I kept running into was the use of language that seemed unnatural to use in casual conversation, outputs such as:
[User B Name], I see you are into gaming, have any spectacularly epic wins recently?
and
I see you are into cooking and watching movies, what is your favorite movie-inspired cuisine that you have made so far?
While these results are kind of funny, they use phrasing that feels overly formal or exaggerated, with words like “spectacularly epic” or oddly specific phrases like “movie-inspired cuisine” that most people wouldn’t naturally say. This type of language risks making conversations feel forced or awkward, which is the opposite of what I wanted for the members using the app.
Through my tests, I found a set of prompt parameters that consistently led to clearer, more natural conversation starters while avoiding awkward phrasing and irrelevant topic matching. Some of the prompt parameters that helped were:
- **Do not force fake connections** — if there’s no match on a trait, don’t pretend there is.
- **Instead of trying to come up with a conversation starter relating the two profiles**, focus on {UserB}'s profile and individual interests, and how you can start a conversation with them.
- **Avoid niche references** (e.g., obscure pop culture, overly technical topics).
The Word2Vec Model and Deployment Concerns
What it was
Originally during development, I was generating the user's profile vectors through a locally downloaded Word2Vec model that I loaded at project runtime. Despite the convenience of have a free and reliable text embeddings model downloaded locally and loaded with the gensim library, the size of the model became a concern once I started going through the motions of deploying my Flask app. When downloading all dependencies during deployment, the build would often fail because the app would try to start before the 3GB Word2Vec model was fully loaded into memory. This created a race condition where health checks would timeout, causing the build to terminate the process prematurely.
Iteration
Initially, I thought to implement a lazy load feature for the model ensure the build atleast succeeds and the application could start without immediate dependency on the Word2Vec model. This approach succeeded with passing the build health checks but often times triggering the lazy load function would cause a loading time of upwards of 60 seconds as the user waited for entire model to be loaded into memory. This performance bottleneck led me to explore external API options for generating profile vectors, ultimately discovering OpenAI's text embedding model. This solution offered two key advantages over the local Word2Vec implementation. First, OpenAI's embedding model is approximately five times larger than the standard Word2Vec model, resulting in more accurate vector representations. Second, by passing off the embedding generation to an external service, I got rid of the loading process entirely, dramatically improving response times for users. My primary concern with this approach was the potential cost of using OpenAI's API. However, after testing with sample data, I found that generating 30,000 profile vectors cost approximately $0.02, making the expense negligible compared to the substantial improvement in user experience.
Concluding Remarks
Overall, I am extremely proud of what I’ve been able to build over the past year. After tens of thousands of lines of code and likely thousands of hours of work, I now have something that’s not only close to being fully usable but also capable of solving real problems that people I know genuinely struggle with. Throughout this process, I’ve seen a tremendous improvement in my ability to plan sprints, assess complex problems, and develop code-driven solutions for abstract challenges. Beyond just writing code, this project has taught me how to iterate efficiently and balance building features with maintaining a clear focus on the core goals of the product and business. I’m excited to see how this project continues to grow and to carry these skills into whatever I build next.
Thanks for reading!
-Evan
Top comments (0)