DEV Community: Loup Topalian

How we used gpt-4o for image detection with 350 very similar, single image classes.

Loup Topalian — Fri, 10 Jan 2025 20:44:23 +0000

This story recounts a challenging request that emerged in our small engineering team and how we solved it. The final solution demonstrates how LLMs have shifted what product-oriented teams can achieve with AI.

× × ×

AskMona is a small company with an even smaller engineering team, striving to provide AI solutions to niche markets. We originally specialized in cultural institutions like museums and foundations but have since expanded into tourism and education.

One aspect of our product involves using computer vision to match points of interest—such as artworks in exhibitions—with personalized experiences. Due to our innovative reputation in the cultural field, a museum approached us with a specific request:

They had a large collection of car illustrations printed on exhibition walls and needed an app to match a picture of a car to its related content - identifier, image and links to more information. They also had a lean budget, so maximizing efficiency in infrastructure and maintenance was crucial.

The signature predated my arrival, but I know that our product team had initially planned to deliver an augmented reality experience via a web app, leveraging the image-based tracking abilities of this stack. This is not something we had in our main line of product, but the tech leader of the time felt this was the way forward. A specialized AR partner, established company in the field, was even identified to help us unblock any potential difficulties. The project was then shelved for later development as other matters took priority.

When I joined as CTO months later, we prepared for the first delivery milestones. At this point I had a detailed look on the images we were given, and I remember my sudden reaction: "but they all look the same.".

Unfortunately I don't have clearance to publish the actual pictures here, yet. Let me try to describe them as well as I can.
Each image is an illustration of a car, seen from the left side, flat on a white background, always filling the same space. They have a slight paint quality, realistic without being a photograph. Every illustration is different, but sometimes very slightly so.
For instance, most of the cars are red (with some gray or blue as well), and many share similar shapes. Sometimes, it's even the same model with discrete updates made over the years.
In the most extreme cases the cars are almost identical, even featuring the same decoration and writings on the body, only arranged a bit differently.
The human eye would differentiate two of them side by side, but finding one of them in the cluster is very challenging (we know).
Did I mention there is 350 of them ?

The provided images were the digital originals used for printing the wall display, in 6 or 7 staggered rows. On top of that, we had just a few pictures of the actual wall, but it was clear that visitors' snapshot would incur distortion, light and color variation, presence of shadows.

Quickly, it became apparent that the client wouldn't have time to capture additional real-life snapshots of the illustration on the wall. And to complicate matters, the museum was over a thousand kilometers away, in another country.

That was the challenge we faced.

× × ×

I knew from previous experiences that web-AR technology has limitations. Having seen the images I felt doubtful about the approach ; this was confirmed by our initial proof of concept. I quickly turned to our AR partner for help, providing a detailed view of our use case for feasibility confirmation.

The response came back: They recommended against the approach, arguing that even their battle-tested technology could not handle such a case.

For instance, their system would simply not accommodate for 350 detection markers at once. On top of that, the size of their bundle dependency files alone would make our app's user experience unwieldy. And finally, the very similar-looking pieces would likely confound the system.

Suddenly, with a looming deadline, we found ourselves without a viable plan. Fortunately, our customer success team negotiated an extension with our client, but we still needed a solution to justify the delay.

And so we went to work.

× × ×

Our small team, though primarily product-focused, has some ML background. Before pivoting to an LLM-based solution, we had built our own NLP models. So our first step was to experiment with training an on-device image classification model, with MobileNet emerging as the ideal candidate.

MobileNet is a lightweight image classification model, pre-trained on a large dataset yet optimized for mobile devices. Its architecture is simple and lightweight, making it easy to load and run in the browser’s Javascript through the ONNX runtime, which in turn meant very light infrastructure and maintenance effort, a requirement for the project.

To leverage the model's training while specializing it for our challenging use case, we tried transfer learning — a classic strategy where only the final layers of a neural network are trained on the specific dataset, while the model carry the general knowledge inferred in its pre-training phase. However, transfer learning typically requires hundreds of images per class, but we only had one. To address this limitation, we turned to data augmentation, artificially creating new versions of each image by modifying colors, adding noise, applying distortion, or rotating images. By the end, we had generated 600 augmented images per car.

Our team experimented with various training parameters until we arrived at a model that seemed promising. Early tests with a few real life snapshots were encouraging. I felt cautiously optimistic — proud, even — of our resourceful solution.

Yet, more extensive testing revealed inconsistent results. Multiple snapshots of the same car often yielded different matches. The client’s initial trials with our alpha app displayed similar issues, as the model struggled to identify the correct car illustration consistently.

To improve reliability, we implemented some user-guiding features, such as a viewfinder to align the camera with the cars and a multi-snapshot background process to gather additional data. Despite these improvements, the solution wasn’t reliable enough for public use.

At this point, the client started to express concerns about our ability to deliver.

× × ×

Meanwhile, part of the team was working on enhancing our main product's image recognition pipeline. Two words on our method.

To do image matching we use nearest neighbor search (or KNN). This works by encoding the catalog of images into embeddings - a tranlsation of the image produced by some pre-trained model, mapping "features" and meaning into a huge numerical representation. When a candidate image comes up from our users, we can convert this one too into embeddings. Then, we run a search for the nearest vector in our catalog and return the closest matches. This method shines because most of the work is done at indexing time, when converting the images. But it also means the system relies on the embedder's quality.

Every provider is releasing multimodal embedding services these days, but image embedding could not really be found as a serverless service at this time, so our system used a self-hosted VGG16, an image classification model, to convert the pictures to vectors. This worked fine for museum collections which are usually diverse-looking, but the embedding quality was limited, and we did not think it would work reliably with the very similar images of our car project.

The release of the AWS Titan multimodal model, alongside an image embedding endpoint, changed the situation: such a large model, jointly trained on text and image, would map finer features from our images. Eager to simplify our architecture, lower our costs, and benefit from better quality embeddings we migrated our main pipeline to use it. Pleased with the results, we thought - why not give it a try on our cars problem?

The proof of concept was quickly implemented. Most of the work could be preloaded on the user's device; we only needed to get the embeddings for the incoming image. Initial results felt a lot more reliable; some cars were matched correctly time and again with various angles and lighting conditions. But on the other hand, some cars would never be matched, always offering another, very similar one in its place.

× × ×

We communicated progress with the client and even sent a team member to conduct real-life tests and gather more pictures of the actual setup. This confirmed our initial discoveries of stability combined with partial success, bringing hope and despair in equal part. But we took the time to analyze the results, and we found something interesting.

When detection failed, the proper match often resided in the second or third row, with a small distance to the first. Our solution suddenly felt just an inch shy of working.

We proposed mitigating this by offering two or three options to the user when candidates were close. If presented neatly, this could complete the feature and visitors would be able to identify the cars with a little natural intelligence on top of the AI.

However, this approach would lose the magical effect of having a machine identify what was on the screen. Understandably, the client ultimately dismissed the proposal.

With other pressing work on our main product, it felt we had given this project all we could. It was time to move on.

× × ×

But the story wasn't over.

After all, we were so close. From three hundred fifty very similar images, we were down to just three - a worthy achievement. The only thing we needed at this point was to find "something" to identify the correct one.

There was a lot of activity at that time around SOTA multimodal LLMs doing incredible things with images. People could do handwriting recognition or get segmented bounding boxes from their prompts. Some talked about moving OCR pipelines to vision LLMs.

So we decided to give it a try: why not prompt a vision model, like gpt-4o, to do this last step for us? As a prompt, we opted for one user message per image - the three candidates and the reference - each containing th image data and some identifier text. Plus one last user message to instruct the model. The output is the image identifier or a specific code if a match isn't found.

And suddenly, we had a really good solution.

I got to test it again and again. Obviously, it is not perfect; the software will still get mixed up in some cases (the twin cars with a “4” on the body from the earlier image is among them; it’s just too similar). However, in the vast majority of cases, the results felt like magic.

The client was pleased with the results and greenlit the development of the web app supporting the feature.

This solution proved so effective that we migrated our main image matching service to use it. It resolved the long-standing problem of assigning meaning to similarity scores (how close embeddings must be for a match ?). Previously, we depended on complex heuristics and a trial-and-error approach, which resulted in false positives. Now, for any ambiguous cases, a multimodal model can verify potential matches.

× × ×

In summary, our solution filters the images with a KNN search based on image embeddings, then feeds the top candidates to an LLM to check for a match when any doubt remains - a setup not unlike the text retrieval of our chat products, based on a KNN search and a reranking/filtering step leveraging an LLM.

While this implementation is slower than our previous KNN-only setup, the results have been exceptional. We're yet to experiment with faster, smaller models. Costs are somewhat higher, but the low-resolution image input (initially down to 512x512) has proven adequate for our use case while minimizing token count. A full prompt for image matching costs around 0.0001 USD with gpt-4o.

More generally, this small adventure illustrates how large language models are changing the dynamics between engineering, product, and AI. It reminds me of how cloud computing and commoditized tooling moved complex processes - that used to require specialized teams - into the hands of generalist product engineers. In the same way, using AI in a product does not require specific engineers anymore, allowing teams to focus on framing domain problems, understanding users, and building products.

LLMs and the platforms powering them are quickly becoming one-stop shops for any ML-related tasks. From my perspective, the real revolution is not the chat ability or the knowledge embedded in these models, but rather the versatility they bring in a single system. We experience this at our company: if we do offer chat-based experiences in some of our products, most of the interesting ways we use LLMs are similar to the example presented in this article. Small, behind-the-curtain tool use, simple agentic workflows. Text categorizing, OCR, document analysis and extravtion, search result reranking, recommendations, image matching, and so on.

As a parting word, I'd say that I am really excited for the future. Not for the ever-larger models and the mythical allure of AGI, but for the optimizing phase that is already ongoing, with models getting smaller and smaller for the same quality. Having access to this kind of tooling on a regular server-grade CPU or even on user's devices opens up whole new playgrounds for crafting products.

The tools for the job - How I code frontend apps in 2020

Loup Topalian — Tue, 18 Feb 2020 14:17:47 +0000

Introduction

This is an exciting time to be a coder, especially a javascript one. Not only is the market large, the pay high, but our tooling has never been so powerful, allowing one to build apps and systems at a speed unprecedented. Long are gone the days of my teens trying to painfully hack together a site with PHP, Dreamweaver and some buggy javascript snippets pushed online with FTP. With a clean decoupling between the front and the back, the entry of major frameworks to allow declarative code and single-page applications, source control and pain-free deployment process, we are living a golden age of web coding. And this is especially true in Javascript, be it server-side or client-side: I can't think of another ecosystem as steaming with ideas and changes as this one.

The tradeoff is the abundantly-discussed "js fatigue", the pain of discovering that one cannot rely on what was taught to him five years ago. In JS-land, you have to be always on the lookout for new libraries, good practices, trends. The reason for this is the huge participation of the community in trying to find better solutions to universal problems. In this booming environment, a change in one corner of the world can transform the trade quickly. Some solutions that were leading two years ago are now considered old fashion. Sometimes it's just trends. Sometimes paradigm shifts for the better. Evaluating the solidity of a JS library is a skill in itself that javascript devs have to learn. So doing javascript is engaging in a life of questioning and experimenting (and that may be why autodidact fare so well in this environment too, as they adapt very well) and I believe that this is exactly why doing javascript is so exciting to me.

So in this post, I just wanted to share the latest setup I have found working when I start a frontend project. Nothing revolutionary here, this won't be news, and at times you might disagree with my views. But on the other hand that might make you curious about something you haven't heard of, or find the last push to finally try this stuff everyone has been talking about.

Language

I am talking here about front-side development, a land completely dominated by javascript since it's the only dynamic programming language that can execute in a browser. Or until recently with the rise of WebAssembly. But even if we can find some amazing early work of react-like or vue-like framework for other languages (for example Vugu), JS will likely keep managing the bulk of front end apps for a long time, and give way to native implementations only for heavy computing (like, say, video editing or 3d). So javascript is the language of choice for front end apps.

But for new projects, I now always use Typescript - the clear winner of the js types battle, and a very pleasant subset to use. It is so good and actually easy that I rarely code without it, even on a technical interview test or when coding a micro app to track my newborn daughter's diapers. So good that I started to refuse jobs when they don't use typescript, as I don't want to go back to refactoring hell. Pretty strong move from a guy that was saying he "did not believe in it" a bit over three years ago.

Anyone saying such things hasn't probably used it, or only barely. But just give it a real try and you will see the enormous amounts of problems it solves. Not only does it imposes good standard practice and replace the chain of transpiling, but it also gives you the beautiful IDE intelliscence, the thing that boosts your productivity tenfold and provides strong confidence in your code. This is not the perfect silver bullet, you still have to test your code. But never again did I have to fry my brain while deciding to change the signature of one function: my IDE would tell me straight that it needed refactoring in ten different files.

The intellectual and time investment is small - or at least to get started and use basic types and inference - but the payoff is unfathomable before feeling it in everyday life.

So bottom line: I use TypeScript for frontend projects, and I strongly believe you should, too.

Framework

Javascript is a language that can run in the browser (but also not in the browser, think node.js). In this particular position, it has access to the DOM, the list of all elements on our page, and can manipulate it. Javascript is imperative: you are telling your div with id title to change its content to the title you got from an XHR request. But when you have fifty such divs and a mighty complicated routing strategy, things become quite unmanageable. This is why the javascript frontend frameworks are so popular: because they shift to a declarative paradigm. Link some variables to the 50 divs. Change the content of the js variable, and the 50 divs will change at once, without you worrying about making this happen. Also, it helps to decouple your app into reusable components, dividing the code into manageable chunks.

There are but three frameworks widely enough used today, and one of them is used way more than the other two, for, I believe, very good reasons. I won't launch into a comparison of them, whatever suits your boat, contract, abilities, etc... For me, having tried all of them, I go React all the way. If you have never tried it, or still think that it's arcane and complicated, I'd invite you to type npx create-react-app myApp --typescript in your terminal and see what fun it is to start a new React project. I actually start all my (non-SSR, see below) projects with create-react-app it has a perfect blend of opinions and freedom. I never feel any needs to eject.

React is pushing new ideas and practices. I would recommend following those as they steam from a simple yet powerful understanding of recurring pains in a coder's ass. React is truly elegant at heart. So there is no excuse not to use the latest features, like hooks and context, and keep moving as they get released.

To be honest, it's been a year that I haven't written a class component - and for the best!

Finally, typescript plays extremely well with React, an elegant way to type props and state.

So bottom line: I use React, with the latest features.

API

You are feeling that I am taking no risk here, just following the classic hype? Well, I am going to do it again!

You don't always have a say in the API the backend team is choosing. But when it's early enough (or when I also work on the backend team) I always try to push in the GraphQL direction.

An API is a language a server will understand when another machine asks it a question. There are many specifications one can use to build an API but as far as the communication between a browser javascript application and a server is concerned, we mainly see REST (or REST-like implementation) or more recently Graphql.

GraphQL, in terms of services rendered, would be the Typescript of API. It changed the way I worked as a React front end coder and made it so much better that I wish to never go back to REST. For those who haven't heard much more of it than the name, I could start describing it as what would your rest endpoint look like if you made a particularly complex query system to select each field you want returned - plus each field of any relations, for any level of nesting. And that it would also self-document, self validate, generates a playground to test it and allow you to load the typescripts types for any query in a single CLI command. So, yeah, pretty good.

GraphQL shines everywhere, but especially bright in javascript, where amazing tooling exists - as I should speak about again in a few paragraphs - and companies like Apollo or Prisma and taking the technology to new levels every year. Major companies already have shifted towards it, and the trend can only go further.

It is always a good attitude to say about it (like about anything) "well, it depends on your project if you should choose it are not". But as far as my front end experience goes, I haven't met one situation, however small, where Graphql was not a good fit.

Bottom line: when possible, choose graphQLwith the Apollo client, and if not, I would complain a little.

Routing

Once you understand you should separate the data management (backend) from the UI generation (front end), and as you have a powerful language working on the browser, it makes good sense to have it manage the whole site or app. And thus Single Page Apps where born. Every React/Vue/Angular/Whatever project will need some routing to map (declaratively, remember) the URLs to this or this component/page.

For this task, the safe React bet is React Router. It's mature, well maintained, kind of too big to fail. And now with propper hook API, it is getting better than ever.

But I would like to submit another powerful library (that I hope will keep being updated): Hook Router. Its API is very elegant, simple to reason about, and way less verbose than the leader I talked about. I would recommend it absolutely, weren't there some little issues that still have to be ironed out (trailing slash management, for example, is a small detail that tells you: maybe not mature enough).

Bottom line: I would love to use Hook Router, but still am turning to React Router for professional projects. To be continued.

Styles

CSS are a pain. Because they rely on arbitrary namings that don't get type-checked; because they rely on a global scope and you can declare some class as many times as you want - making it easy to overload some rules, and hard to debug. And because they involve different professionals with different concerns and technical mindset (from designer to integrators to coders).

As HTML has been blended into JS code by the major javascript frameworks, they too are better handled in the javascript, so that the elements and components that we build get packaged with their styles, without said styles interfering with any other part of our application. That is called CSS-in-js, and as the other stuff I have been pointing here they are a game-changer, something you would deeply miss once tasted.

Many options here, CSS-in-js have just come out of the lush booming phase, as some stuff seems to start fading in the distance, others to slowly become mainstream. I have been trying quite some of them in the latest years, from basic CSS modules to Jss, Styletron or Radium.

But to me and many others, the big API winner is Styled-Components. It's elegant, fast, let's you write real CSS while injecting anything from the js in the form of a string template. Componentalizing and reuse are flawless. It's a bit of a change compared to a big stylesheet with atomic naming convention, so your integrator will have to adapt, and start working in the codebase - however, as it is still regular (sa|le|c)css, the shift is not too big to make.

As much as I enjoy Styled-Components, I think Emotion even comes ahead. They offer the same API than SC, but adds some other niceties, like the CSS prop, and play way better with SSR in my experience.

Bottom line: Emotion or Styled-Component is the way.

UI Kit

When building a front end application, coding the UI elements is a big part of the work. As a coder is not a designer (he might think he is - but he's not) and that you might want to spend your time to more interesting problems, using a UI kit is always a big win - for a quick POC, even for production use when the product is fairly generic.

There are just so many of them out there that you can't check them all out. Some seem mature and beautiful, others just kinds of bla. The key for me is: a nice API on the component props, beautiful styles, a large variety of components and proper styling abilities so that I can adapt the kit to my own design - or a client identity, and save everyone a lot of time and money.

I tried Material UI (one of the biggest in the field), Semantic UI, Evergreen, Blueprint, Atlaskit, Ant Design, the One from Uber and even React-Bootstrap (well, a long time ago). I must admit that I am a big geek of those and is always on the lookout for a new best solution.

Material UI was a direct dislike. Their theming system is - to my taste - painful and weird. I had a better story with Ant Design, but again, their sass theming system is far from ideal (see the section before) plus it was kind of buggy to set up with SSR.

But sooner this year I stumbled upon Chakra Ui, and until now it checks all the boxes. Carefully made, nice looking, varied, and most of all: it's built with Emotion and follows the Styled System Theme Specification for theming, which is extremely nice to use. Every component exports all the useful CSS attributes so that you can add a margin here or there without needing the style prop or adding CSS.

And on top of that, someone did https://openchakra.app/, a visual editor with Chakra Ui that produces React code. Not a big believer in those visual editors in general, but it's worth checking out.

Bottom line: use whatever makes you happy, but I will keep starting up my projects with Chakra Ui, and you should check it out if you haven't yet.

State Management

This is the time to bring up sate management. Once your app is well componentized, well decoupled, you start wondering how to inject, update and react to some global variables. The user data, for example, that are repeated in many discreet places - or a list of posts, the number of stars, the state of the UI, menu, top bar buttons, etc...

Since the introduction of the context API in React, you can inject a state - and have your components react to it - on any level of the tree. However, such a simple state sharing can become very messy: you quickly discover that debugging this shared state is often really difficult. The other essential thing that lacks in the context+state or reducer hook solution is the notion of selectors: when your shared state changes, all the components that listen to this state gets rerendered - and if this state was an object, you cannot link the component to specific keys of it. So your component gets rerendered every time any key changes, even though it doesn't use it. Of course, you can use memoization to temper the problem, but it becomes quite messy.

The big, golden standard in global state management is, of course, Redux. Brought to us by Our Dan Who Art In Facebook, it combines flux design, immutability and good practices with an excellent debugging experience - and even a chrome extension to follow every step of your state changes. For me, it shines in big projects when many different developers work on the same app. If you do React you should know Redux, as you will have to use it sometime in your career.

However Redux is not without its faults. The main one would be the developer experience. Redux is not hard to understand or to set up, but it requires a lot of boilerplate code. It's extremely verbose - and this verbosity is sometimes a strength - but it feels tedious to use time and again. Adding async actions (you always need async actions) demands to add thunks or sagas, to your Redux setup - and it's more stuff to write.

Now, remember how I said GraphQL has amazing tooling for javascript? Apollo offers many nice features in its GraphQL client, one of them is a very powerful caching system. Every query you make will keep in memory everything that gets returned from the server, deserialized, and stored by type and ID. So that even if the queries are not the same - or that an object is deeply nested - it will update its local version. Then every component relying on query data containing the changed object will update on cache update. Again, this is very, very powerful. On mutations, you can easily update the cache yourself for optimistic changes - or ask for the updated data in response, and Apollo will do it for you - as long as you query the ID's on every cached object.

So, when building an app with Apollo, you don't need to store your data in a global state - which makes the bulk of Redux use - but only rely on Apollo queries, and let the magic happen. This is one of the boons of Graphql, and why it is so good to front-end coders. Should I add that there is a very good chrome extension to watch and debug your cache? Apollo offers many other features, but this is beyond this humble piece.

But then what about the data that doesn't come from the API? Ui states, for example? It's likely to be a small amount. However, even for this, I feel reluctant to use either a simple context state either the full Redux machinery.

Apollo offers a way to use their cache for any data you want, even local ones, and it can seem like a good fit for the task. However, it feels very strange to declare graphQL types, mutations and queries for simple state updates. I tried it but ended looking elsewhere.

For me the solution came from this very pleasant (and vegan) library, Easy-Peasy. It uses redux and Immer under the hood, but leverages react hooks and context API to provide for a very intuitive system. You build an object with your data and your actions (and type it with TS) and get a Provider on one side, some hooks on the other, that are selectors to actions or values. So, the best of everything: simple API, hooks ready, typescript ready, multiple global states are possibles, you get real selectors and - best of all: you have access to the Redux debug tool for a perfect debugging workflow.

So bottom line: I use Apollo cache for server-sent data, and Easy-Peasy for the rest - or almost all the rest, see the next section.

Forms

So, forms. At some point, it's difficult to manage one useState per field on your page. Then there is validation, that involves clean/dirty detection, error messages, rules, etc... Once you work on one form, you understand the underlying complexity of doing it properly.

So we want a library to do it. And one that is simple, not too bloated, and hook-ready. Well, there is one just here: React Hook Form. It's elegant, powerful, simple. And, how good, there is a page in Chakra Ui documentation on how to implement Hook Form with it. Doesn't it feel like everything fits together?

Hook Form is my last piece for the state management triangle. I use it on every creation/edition page and plug it straight with apollo queries/mutations.

Bottom line: React Hook Form

SSR and Prerendering

As with every JS framework, building the page on the client has one drawback: bots cannot crawl it's meta tags, and Google bots, even though they are supposed to be able to execute Javascript, will not do it in a consistent way (there is timeouts, etc...). So better not rely on that for SEO, and sharing preview is a no go.

For this, you need to serve the bots a fully built version of your site. As everyone knows, you have two ways to achieve this. Either you build the whole site on the server before sending it to any client (including bots) and let js then manage it from the browser - this is SSR (server-side rendering); Or, you render the site only when a bot asks for it, on the cloud, with some headless chrome instance doing the work - and this is called pre-rendering.

So which one to use?

Here it depends on the project. But doing full SSR involves many tricks, and changing an existing codebase to enable it is a real pain. From my experience, doing prerendering is most of the time easier to build, mainly because it abstracts the rendering question from the react codebase. So then this is not a front-end concern, but an architecture/back end problematic. There are a few docker images that will do Prerendering out of the box if the team ever asks.

When it comes to full SSR, there is one major framework that does it well, it's Next.js. My complaints with it are only related to the routing system: they follow the file system for it, and I didn't leave PHP behind to go back to this convention hell. Otherwise, coupled with Apollo, it's very efficient, and they have good code-splitting out of the box.

The last time I built SSR I used another tool called Razzle, that felt more elegant at the time. If Razzle is very promising, it is not as well maintained as it is not backed by any company, and support is lagging a bit. Worth having a look, but for professional take-no-risk project, go with Next.

Bottom line: For SEO and bots only, I'd say go with prerendering. SSR for end-user means gaining a bit of a better experience only on the first render of the site. It is some work for not so much gain.

Static site rendering

When your site is not very big or doesn't update that often, you may be interested in static rendering. That means SSRing all the pages your site contains in one pass, and serve everything from a static hosting then. No need for backend or API - at least not for your end-users - as all the data you need are included in the site at rendering time.

This is not limited to the front end, by the way. I static render an API of French synonyms that is huge (35000+ JSON documents) but will probably never renders another time.

I am no expert on the subject, but I very much dislike the leader of the field, Gatsby, for their weird data loading API. For my needs, I tend to favor either Next (the SSR framework has a pretty neat static rendering feature) or React Static, which is extremely versatile.

Bottom line: for a blog or a simple presentational site - where data is no changing much - static rendering makes good sense. You can have a look at React Static for the most natural DX I could find.

Last words

There are other things that I don't have the energy to start about now. For example, I recommend integrating Storybook as early as you can for any codebase beyond the odd side project, especially if some UI coding is involved - will save you a world of pain.

We could address testing - or the project's files organization. But that will be for another time.

Before leaving you, I wanted to stress how tiring it can feel to have new tools to learn, and how small the payoff can seem before you experience it yourself. This is a natural attitude. We learned once to adapt, to grow around the problems we had until we don't even see them anymore. But they are still here. When someone tells us "this lib is awesome, it solves this and this" and we think "I already have solutions for that" - well maybe we should give it a try. Remember how Jquery seemed once all we needed to build anything, and how we would never ever go back to it now that we worked with JS frameworks?

Javascript asks that we keep a keen mind - and never stop exploring. New solutions are found every day, and when they find their way to the main public it's usually because they solve real problems that you might have too - even if it seems that you don't. It's never a bad idea to take an hour and to try.

Photo by Lachlan Donald on Unsplash