DEV Community: Gwen (Chen) Shapira

We keep shipping!!!

Gwen (Chen) Shapira — Fri, 11 Apr 2025 19:29:24 +0000

sriramsub

Apr 11 '25

🚀 We just shipped Nile-Auth v4.0: Account Linking, CORS Support, and More

#b2b #postgres #authjs #nextjs

Comments

1 min read

[Boost]

Gwen (Chen) Shapira — Fri, 28 Mar 2025 13:58:46 +0000

sriramsub

Mar 28 '25

Introducing Nile Auth for B2B apps

#b2b #postgres #authjs #nextjs

Comments

7 min read

Debunking 6 common pgvector myths

Gwen (Chen) Shapira — Tue, 01 Oct 2024 00:43:17 +0000

Pgvector is Postgres’ highly popular extension for storing, indexing and querying vectors. Vectors have been a useful data type for a long time, but recently they’ve seen rise in popularity due to their usefulness in RAG (Retrieval Augmented Generation) architectures of AI-based applications. Vectors typically power the retrieval part - using vector similarity search and nearest-neighbor algorithms, one can find the most relevant documents for a given user question.

Having the ability to store vectors in your normal relational database, as opposed to a dedicated vector store, means that you can use all the normal relational database capabilities together with vector search - join vector tables with other data and metadata, use additional fields for filtering, retrieve related information and so on.

From conversations in the pg_vector community, it became clear that there are some common misconceptions and misunderstandings around its best practices and use. As a result of these misunderstandings, some people avoid pg_vector completely or use it less effectively than they otherwise would. So, let's fix this!

Myth 1: You always need to use a vector index

This myth is a result of certain vector stores and popular libraries using the term "index" to describe any method of storing vectors. This has led to the misconception that indexes are the only way to store vectors.

This isn’t true in Postgres terminology. The trick is that other vector stores have what they call a “flat” index. Flat index basically means “no hierarchy”. In postgres, the default table structure is flat. So if you just create a table with a column of vector type, insert some vectors and don’t create any indexes, you actually have what is called elsewhere a flat vector index.

Now that we know that you technically don’t need to create a vector index in order to use pg_vector, you still need to decide when to use an index, which one to use and when to use it.

Let’s take as an example an application that embeds transcripts of sales conversations for searches and knowledge extraction. You may have 10M embeddings in your database, but each one of your customers will have under 10,000. And each sales person has under 1000. And maybe they typically only search calls from the last few weeks, so it is actually under 100.

If you usually only need to search 100 or 1000 vectors, you are almost certainly better off without any vector index. Instead, you can use normal b-tree indexes (maybe with partitions) to limit the query to scan just the right subset of vectors. This means you will have full recall (the indexes perform approximate nearest neighbor search, so there could be loss of recall) and can save on the time, memory and CPU of maintaining indexes that are unlikely to help you (and that the Postgres planner may rightly decide not to use).

Myth 2: Vector indexes semantics are similar to other indexes

If you are familiar with indexes in relational databases, but less familiar with vector indexes, the last few paragraphs may have been very confusing. What do I mean by trading off performance vs recall?

Typically a query that uses the index will return the exact same data that will be returned by a query that doesn’t use the index. This is basic SQL/relational semantics and is expected to be guaranteed for every index and every query. This expectation is so ingrained that most of us aren’t even aware that we expect it.

But vector indexes are not like that. They are data structures for efficient approximate nearest neighbor search (ANN). They improve performance by limiting the search for nearest neighbors to specific subsets of the graph. These subsets are selected because they are likely to contain the nearest neighbors, but not guaranteed.

This also gives you a hint on how the performance / recall tradeoff works - the more graph subsets you search, the more likely you are to find the actual nearest neighbors, but the longer it takes. In addition, different types of vector indexes give you additional configuration options - how many subsets do you split the collection into? How exhaustively you “map” each subset? These decisions will also impact the performance / recall tradeoff of the index.

PG Vector documentation explains the types of indexes and the different parameters you can configure when creating and querying them. Definitely worth reading in detail and experimenting with them.

Myth 3: You can’t store more than 2000 dimensions in a vector index

At the root of this myth is the simple fact that Postgres blocks are limited to 8K in size. By default, vectors are a collection of floats, and each float is 32bit. A simple math shows that if you take overheads into consideration, at around 2000 dimensions, you get very close to the 8K limit. You can still store the data, Postgres has a TOAST feature which uses “pointers” to store a row in more than one block. But - you can’t build a vector index if the rows are split using TOAST.

One option is to use embedding models that output vectors with fewer dimensions, or a model that has been trained to “scale down” without losing performance. But, what if you have an embedding model that works really well for your data and has more dimensions? Switching to a different model may be completely unacceptable.

Another option is to use feature extraction algorithms that reduce dimensions of vectors from other models while attempting to preserve accuracy. PCA, t-SNE, and UMAP are relatively well known for this, and there are some results that show they work quite well.

However a much simpler approach is to use quantization. Quantization is the process of using smaller data type for each dimension . PG_vector supports half_vec type with scalar quantization. It converts the floats to 16bit type by removing the least significant digits. This makes sense - we typically use vectors and indexes for nearest neighbor search. These insignificant digits typically don’t have much impact on the relative distances between vectors.

Since half_vec takes half the size of the usual float, you can store 4000-ish dimensions of half_vec type. Looking ahead, the community is also iterating on 8bit quantization of embeddings with an int_vec type which will allow storing 8000-ish dimensions.

Even if you have smaller embeddings that already fit into a Postgres block and can be indexed, storing half the data will greatly improve performance and reduce resource utilization. All with almost no impact on recall.

Myth 4: Using vector index with other filters will miss data

This isn’t quite as myth-y as the others. In fact, at the time of writing, this is still true. But in the upcoming pg_vector release, 0.8.0, we will be able to relegate this to a myth.
So what does “use vector index with other filters” mean?

Imagine that you indexed your company wiki, and now you want to find the documents most similar to “promotion process and policy”. But since your company has several business units with their own policies, you want to search only within the “engineering” category. Your query will look like:

select doc_id, doc_title from document_embeddings 
where embedding <-> $1 < 1
and doc_category = 'engineering'
order by embedding <-> $1
limit 10;

How can Postgres execute such a query?

While we may want it to search only a subset of the vector index that belongs to ‘engineering’ category, unless you previously created partitions or partial indexes, such a subset will not exist.

What happens is that Postgres uses the vector index first, finds the 10 nearest neighbors, and then filters them and throws out anything that isn’t in the engineering category. The problem, of course, is that this may result in anything from 10 to 0 rows. And we wanted to show 10 rows in our search results. This is the problem - we want K nearest neighbors after filtering, but we can’t know in advance how many neighbors we need the index to return in order to achieve this.

Version 0.8.0 will introduce iterative vector indexes. This will allow Postgres to scan the index, find nearest neighbors, apply the filter, scan the index a bit more, filter more… and continue until the desired number of neighbors is found and can be returned.

Version 0.8.0 will also include an improvement to the cost estimate of using vector indexes. This will help Postgres decide when to use the vector index and when to rely on just the B-Tree or GiST indexes. Both these improvements together will make it much easier to create indexes and run queries, knowing that Postgres will do the right thing with them.

Myth 5: Vector similarity is only useful for RAG

Vector embeddings are becoming increasingly popular due to their role in Retrieval-Augmented Generation (RAG). In RAG, embeddings help locate and retrieve relevant context, which allows large language models (LLMs) to answer questions more accurately and reduce the risk of hallucination.

But sometimes it looks like we forgot all the other uses of vector embeddings. Vectors help find semantically similar items. Finding similar items is useful even without the LLM. For example:

Support: Find knowledge base articles that are relevant to a support ticket and suggest them to the customer or support agent.
Issue tracking: Detect duplicate reports of the same issue.
Recommendations: Recommend items that are similar to ones that the customer already liked. “If you enjoyed this book, you’ll probably also like…”
Anomaly detection: instead of finding the most similar items, we can use vector distance to detect when a new item has no nearest neighbors. If it is very far from every existing item, it is an anomaly and can be reported.
Shop for similar items: Given a photo of a product, you can search for the most similar products.

In all these cases, just finding the nearest neighbors is enough, there is no need for an LLM in the loop.

Myth 6: pg_vector does not support BM25 (and other sparse vectors)

There are two types of vector embeddings: Dense and Sparse.

Dense vectors are typically generated by trained language models and they encode the semantic meaning behind a sentence or a document. This representation is a bit opaque, in the sense that you cannot map each dimension to a specific word or a concept. Dense vectors typically have 256-4096 dimensions.

Sparse vectors are typically the result of traditional text search algorithms (TF-IDF, BM25, SPLADE) that use vectors to represent information about the importance of words used in each text. In sparse vectors, each dimension represents a word and the value indicates how common / important the word is in each text. The number of dimensions in sparse vectors depends on either the number of distinct words in the dataset (TF-IDF, BM25) or the number of words the model was trained on (30,522 in case of SPLADE). Since most texts only contain a small subset of all words, when using sparse vectors most of the dimensions have the value 0.

In version 0.7.0, pg_vector added support for sparse vectors with the sparsevec type. This type only stores the non-zero elements of the vector. You insert sparse vectors by specifying only the non-zero values and their indexes. If you use pg_vector client libraries (they have libraries for many languages and ORMs), you’ll use their sparse vector type, which automatically convert vectors to the correct text representation.

Summary

Vector indexes behave a bit differently than typical indexes in relational databases. In addition, the domain of vector embeddings has its own terminology, which isn’t always clear for new arrivals.

When I first started working with vector embeddings and pg_vector, I found many of the topics above confusing. Nile has supported pgvector since our first private beta release, about a year ago. From my interactions with our users and the pgvector community, I’ve seen others face similar challenges. I hope this blog will be helpful, and I believe almost everyone will walk away with at least one new insight.

Perhaps the most important lesson is that pg_vector is constantly evolving. Version 0.7.0 was released in April this year, version 0.8.0 is anticipated for October. Each vector adds more functionality and resolves old limitations. So it is important to revisit our assumptions and refresh our knowledge on a regular basis. Remember: It isn’t what you don’t know that gets you, its what you think you know but is no longer true.

Investigating 15s HTTP response time in AWS ECS

Gwen (Chen) Shapira — Sat, 16 Jul 2022 02:53:24 +0000

It started innocently enough. During standup, someone mentioned “my branch deployment is a bit slow due to cold-start problem, but no big deal”. Indeed, cold-start should have no impact on production where the environment is long running. No big deal.

But later the same day, I tried checking something in a test system and noticed that a lot of my requests were taking very long time. In fact, I couldn’t get anything done. Since this environment was also long running, it could not be cold start problem.

Cloudwatch metrics on the load balancer showed that all requests return in 6ms or less and that they are all successful. My home internet seemed fine. Time to get an external view.

I set up Pingdom to get our /healthcheck every minute and the pattern became clear: 50% of the response times were around 300ms. The other 50% took over 15s.

Up until this point, my mental model of our deployment was something like this (we have more stuff, but it isn’t relevant here):

I ran Wireshark on my machine and ran a bunch of test requests. The 50% that were successful talked to one IP and the other 50% talked to another IP.

Looking at our configuration, I discovered that our ALB has two availability zones and cross-zone failover. This is mandatory for ECS tasks. So now I had a better mental model:

Note that we have LB in both zones, but ECS task in just on of them. An AWS documentation page that I can no longer find said something like “for best performance, run ECS task in every zone that has a load balancer”. I was naive enough to believe that I found the issue and went ahead to set up the extra tasks.

It took longer than I would have liked because zone 2d didn’t have the EC2 machine type that my task required, so I had to re-configure everything around a new machine type or move everything around zones, which meant learning more about ASG and ECS. But finally I had the following configuration:

Unfortunately, the performance issue remained. Damn it.

I went back to the original system, but investigated the network configuration in more detail. One detail that I discarded until then was that few days earlier, after a long investigation and much debates, we added a NAT to our system.

Why NAT? Because our ECS task occasionally had to make an outbound call to a public service (Google’s identity platform) and our investigation revealed that the only way an EC2 ECS task can make an external call is via NAT. Worth noting that Fargate and EC2 don’t have this limitation. I am not sure we would have used ECS EC2 if we knew about it - NATs are expensive.

When we added the NAT we also added a routing rule (as documented). Outgoing traffic from zone 2b (the zone with the task) will be routed via the NAT. The actual architecture that I worked on was actually:

With these routing rules, traffic that users initiated went through the internet gateway (this is the default way services with public IP address talk to the internet in AWS) to the LB to the task and then back. Outbound traffic went to the NAT.

Or so we thought.

The routing rule that we added to zone 2b for the NAT didn’t really say “outbound traffic goes via NAT”. It rather said: “If destination is anywhere outside the VPC, go via NAT”. I assumed it applied to outbound only, but our load balancer is in the subnet that has these rules.

What if any time the LB tries to send a response to a user, it actually gets routed via the NAT and AWS networking doesn’t deal with this very well?

I have no proof for this theory. I spent a bunch of time playing with AWS's network reachability analyzer. It is nifty, but I couldn't create the routes that would conclusively prove this theory.

But I was out of any more ideas, so I decided that if the theory is right, I need to get the LB that is in zone 2b to a zone that doesn't have the NAT routing rule.

Why not move the LB to zone 2c? Turns out that ECS requires an LB to exist in the zone where the task exists.

We ended up with a messy solution and a better solution for the long term.

Messy solution:
Modify the routing rules. Instead of routing all traffic that goes to the internet via the NAT, we’ll route only traffic going to Google via the NAT. Luckily Google published its IP ranged, so I could add routing rules specific to it: https://www.gstatic.com/ipranges/goog.txt

This isn't a great solution because we'll need to add more routes for every external service we'll need to access.

Better solution:
Use the fact that routing rules apply to subnets but ECS tasks need an LB in the same zone. So we can create two subnets in each zone. One with virtual network equipment and the other with tasks. Something like this:

This is very similar to pretty much every diagram AWS publishes about how to set up public internet access for services. They all look like this:

I just wish they explained why they set it up that way and what happens if you don’t. But here is the full story, at least my readers won't need to learn about it the hard way.

Enjoyed reading this?

Read more about Infra SaaS and Control Planes on my company blog.
Or join the SaaS Developer Community Slack.

Apps for Small Things that Matter

Gwen (Chen) Shapira — Thu, 19 May 2022 16:33:20 +0000

Some apps that help me stay sane and healthy on weeks when there is a lot going on and it is easy to get stuck on VSCode, Slack or Twitter for 14h straight:

Routinery: Lets you configure a routine with steps, and it has a timer per step, so you don't get stuck on one thing or get distracted. This helps me remember to drink tea, meditate and sort out my schedule for the day before turning to any inbox. Routinery also reminds me to get off work at 6pm, move my body, eat and spend time with humans. I ignore it more often than I want to admit (very tempting to just keep going on an interesting problem or the pile of small tasks), but it feels good when I don't.
DayStamp: it is the least judgmental "habits app" that I know. I added about 20 things I'd love to do on a regular basis. Some I do daily, some weekly, some rarely. I use it to check if things got out of balance (didn't work out in a week, didn't code or publish anything in a week, didn't read anything meaningful in a month...)
Wakeout: 1-5 min workouts that can be done at the desk or couch. I have no excuse not to do them (1 minute is nothing!), and it helps me move a bit few times a day. It did wonder to my back/shoulder which used to get stiff after long days. Some of the workouts in the app are very silly, but I love the "office yoga" one.
Sweepy: Prioritizes cleaning tasks and tracks how bad my home got. I can see when the sink is full, but I don't actually see if the floor or bathroom mirror got dirty (I think this is a common problem). As a result, sometimes my house gets to the point where my friends visit and become seriously concerned ("how can you live like this? it is like a pigsty!"). This app helps me "see" that I didn't clean the floor in a while and it is certainly dirty.
Whatsapp, Facetime, SMS, email: My family and close friends are the most important thing in my life, and definitely the most important contributor to my happiness. Grateful for the apps and protocols that help us stay in touch despite time and distance challenges.

Apps that I don't use:

Pomodoro apps: I like the 25 minute or 50 minute work spans followed by a quick break. But I didn't find an app that I like, so I just use my alarm clock.
Meditation apps: You can't convince me to pay money for an app that tells me to close my eyes and breath. I use a timer for this one too.

Generating docs from OpenAPI Spec

Gwen (Chen) Shapira — Wed, 27 Apr 2022 05:18:36 +0000

It started with a very simple setup. Two github repositories: One for our backend, which included OpenAPI specs of our backend APIs. The second for our documentation website, which we based on Facebook's Docusaurus.

I chose Docusaurus for the docs website because about 500 of my closest friends recommended it. And it was a great choice - I could write docs using Markdown and not worry about anything else, Docusaurus magically turned them into a great looking website.

Until the point where we wanted to introduce generated API documentation to our docs. We wanted the generated docs to be integrated with the rest of the docs site. It should really feel like a single experience.

The first iteration involved adding a small build script to the docs repo that cloned the backend repo and used swagger-markdown on each file to generate markdown. Docusaurus found the markdown files and did the rest.

I wasn't super happy with the generated docs, but it worked for a while and we left it alone. Until Monday, when we merged a change that caused us to trigger a bug in swagger-markdown. The issue was reported more than 6 month back and has no responses. Which raised some concerns about whether it is a good project to depend on - very little activity, very few forks and stars... it didn't look good. 👎

So I did some shopping around. Here's what I tried, what worked and what didn't:

Widdershins 👎

This project looks fantastic. Tons of cool features, customizations and lots of github stars. But... I ran into this bug. The bug was fixed almost 2 years ago, but the project didn't have a single release since. I could have probably figured out a script that didn't require a release, but... this project is clearly just a single person who doesn't have time to do a release in two years. I didn't feel good depending on that either.

OpenAPI Generator 👎

This looked very official. The documentation wasn't great. When I tried it, it left a lot of "junk" in my working director. But the real problem was that it generated an entire directory structure of markdown for each input spec - and it seemed really painful to tie this into the main Docusaurus site. I could have probably made it work - but the minimal docs and messy experience got me to look around a bit more.

Redocusaurus ✅

This project was simple to install, simple to use, and was built exactly for my use-case. API docs in Docusaurus with a unified experience for users. It didn't have many github stars, but it wrapped the hugely popular redoc. And most important - the author is active, responsive and kind. Just check out the issues - he comes across as someone you want to work with. To put the icing on the cake - Forem, the engine behind this very website, uses this plugin. How cool is that?

I was all ready to use Redocusaurus, but there was one problem:

Our specs were split between several YAML spec files. It looked like a good idea when we did that - large files are not fun to work with. The problem is that very little in the OpenAPI ecosystem was built for multiple files. I strongly recommend that you will save yourself the pain and go the mono-file route. If Stripe can have a 4.5MB spec file, so can we.

In order to use Redocusaurus, I needed a single spec file. Both Swagger cli and OpenApi cli offered an option to merge separate specs into one. The problem was that one of them required a "root spec file" to drive the merging and the other required extra information to resolve conflicts. My specs had neither.

I ended up with an old fashioned solution - manually merging the spec files for now. Our engineering team has thoughts on how to improve our use of OpenAPI specs, and we'll have a better solution in a week or two as a result of that.

I hope this blog helps someone who has similar requirements or just tries to pick between 3 JS projects who all do similar things. Responsiveness of maintainers is really important.

Choosing Technologies, APIs and Languages

Gwen (Chen) Shapira — Tue, 29 Mar 2022 21:47:56 +0000

There is a vast difference between choosing a technology or a language for one person or a small team, choosing for a large engineering organization, and choosing for a platform with potentially tens of thousands of users.

Choosing a technology for yourself or a small team is usually about personal taste, whether or not you are interested in learning something new, available tools, integration into an existing project, and perhaps performance/scale considerations.

When you choose a technology for a larger engineering organization, perhaps an entire company, it's a different story. In that case, you need to think about hiring, training, use-cases, testing, all the other parts of the CI/CD pipelines, and try to imagine three years into the future - is this technology growing or dying?

Choosing a technology for a platform should be about the people who will use it. What will they find natural? What will make their life easier? What will create the best experience? It is very tempting to build something for yourself and hope that others will like it, but I think we can do better. We can connect to our potential users and try to emphasize, listen to them and see things from their perspective so you can build a fantastic experience for them.

This isn't easy, especially if you hope that front-end engineers will use your product and you only know two.

Can you help me out by sharing your thoughts in a quick survey? https://0sri4j4i8ze.typeform.com/to/IWI56Zkk I ask about your favorite languages and APIs as I make decisions about the platform I'm building. I really appreciate all the help! Feel free to comment below with more feedback, I'll appreciate this even more.

P.S
Chris Ricominni has a great blog post on how to introduce new technologies to an organization - for the more practical aspects.

What is a High Quality Product?

Gwen (Chen) Shapira — Mon, 31 Jan 2022 06:13:25 +0000

When you are building a product from scratch, as my co-founders and I are doing right now, it is easy to become very passionate about doing very high quality work. We constantly talk about how we want to make everything "World Class".

Quality is hard to define though. One of my favorite books growing up was "Zen and the Art of Motorcycle Maintenance", and a good chunk of the book is about the author trying to define quality and what happens to him as a result. Quality is something that you know when you see. High quality products give this impression as a strong immediate experience. You read a story and the first sentence grabs you and you can't let go. But there is also an element of context and experience - it gives you a wider range of quality experiences. I enjoy some wines more than others, but I can't really understand great wines. On the other hand, after 20 years of code reviews, I can tell a lot about an engineer by looking at their code and can distinguish good code from great in different domains, paradigms and languages.

When it comes to software products, I believe that the final judge of quality is the customer. We talked to a lot of potential customers about what makes an experience high quality for them.

A lot of companies and cultures confuse "lack of bugs" with "quality". There is some overlap, but the Venn Diagram is not a circle. We all know software that was buggy and immature but compelling enough to still provide high quality experience. We also know software that is nearly bug free but the experience doesn't feel high quality.

If lack of bugs isn’t it, what makes for high quality product?

Here is my definition: High quality product offers a magical experience to a user in a specific dimension that they really care about. A lot of other things can be forgiven if you can be truly magical in the specific things that matter to them.

Few examples:

Early Slackware Linux had tons of bugs, but it was the first time I could run a Unix on my desktop at home and not the computer lab at the university. Changed my life.
My first car navigation system kept crashing, but it was still way better than stopping to look at maps.
Early Kafka was easy to get started with and it had amazing uptime. There were major bugs and people reported bugs and kept using it. Eventually the bugs were resolved.
Early Twitter and the fail whale.
Datadog was super simple to get started with and sending metrics "just worked", we had some issues with reporting that they fixed later, but we remained a customer forever.
Expensify allowed me to take photos of receipts and not carry them around.

The take-way here is that you need to figure out what your users really care about, especially in their early adoption steps, and make it feel magical.

Very early in my Confluent career, we hired an amazing training developer (she went on to be much more). On her second day, she said "I want to structure my training around a practical example, what is a fun thing to do with Kafka?" and I was the PM of Connect, so I said - "Why not get some data from MySQL to Kafka, do simple aggregation with Kafka Streams, and write the result to S3?". This was already a fairly popular use-case in my mind. And 2 days later she said, "something is wrong, it doesn't work". She was right, none of this "just worked", you had to figure out specific configurations, specific formats, specific steps. It took us weeks to get it to work. And we saw this as a basic use-case! This was completely “green path” - no chaos, no high load, nothing that should have been challenging.

Note that a QA teams rarely finds these kinds of issues, and the issues she found were not in any one part of the product. It was either usability issues or more frequently integration issues - you only see them when you try to use your product like a real customer and implement an entire workflow. We eventually built automated testing framework specifically around real customer workflows.

And to close on a more quantifiable note, one last tip for quality:
How often bad things happen, or if they start happening more frequently, is actually really important thing to know when thinking about user experience in SaaS. SLOs is a good tool around this, but many years ago I learned about a more flexible tool that is worth knowing about. It is called a control chart. You basically take a metric, say response time latency, and you plot it over time. You then define a range of "normal values", it can be an overall average, average per entity like machine or user, or an adaptive average that can handle things like weekend use and rush hour. Now you'll have a set of points outside the "normal range" and you can define rules on them:

Any point 3 standard deviations above the baseline. This will indicate extreme sudden increase.
5 consecutive measurements more than one standard deviation over the baseline. This indicates a sustained increase.
10 consecutive measurement each higher than previous one. This indicates steady upward trend.

This is a super flexible way to detect and communicate a wide range of quality issues in a production system. So you can discuss not just a specific incident but worrying trends.

I also posted this content in a video, if you prefer:

Lessons learned when building self-serve provisioning

Gwen (Chen) Shapira — Tue, 25 Jan 2022 23:24:19 +0000

I've talked to ~20 companies that are at various stages of building data infrastructure (and sometimes ML, analytics and other infrastructure). Many of them have a step when signing up customers that require provisioning some cloud resources for the customers.

I was at Confluent at the time when we moved from "talk to sales and wait a week" manual provisioning to self-serve provisioning. The way different companies go about this are different some of the lessons are generally applicable. Ram Subramanian, who was the VP Eng at Confluent who ran the entire cloud service recorded a vide with me where we discussed the lessons we've learned:

Manual process, especially one when there are multiple humans between the customer and "get something done" introduces risk of getting things wrong and pissing off the customer.
Not only does it piss off customers, a lot of mistakes means a lot of back-and-forth and is a huge waste of time.
This gets much worse as growth accelerates. And customer expectations go up over time too.
Since building actual product automation (as opposed to partially-automated collection of scripts) takes some time (especially if you need it to be well tested since your customers are already a bit annoyed), you actually need to start before this is a real painful problem.
It doesn't take much to go from a point of "manual is fine" to "our engineers spend so much time on manual work that they don't have time to work on automation" and from there to "our best engineers quit".
Another thing we learned is that there is a huge gap between "automated for internal use" and "self serve for customers". For real customer facing you need much better feedback loop around "what is happening? are there any issues? is it making progress? what does this error mean and what do I do now?" and you also need much better guard rails - customer mistakes can bring down your system or cost you a lot if you are not careful about how you build it.

P.S
We started a series of "SaaS Stories" - I am trying to interview people who built great SaaS products to talk about how they built some of the foundational SaaS experience and share what they learned. Especially about things that may look basic but turned out to be much harder than anyone expected. Warning each other about potential pitfalls and potential solutions seems like a great cause. So, if you want to join the series and talk about how you built your on-boarding, authorization, multiple product tiers, billing, notifications, user and org management, etc... I'd love to chat. No vendors talking about their solutions - only people building SaaS products sharing their lessons. Ping me via @gwenshap on twitter. TY!

What I learned about using and building public APIs

Gwen (Chen) Shapira — Sat, 15 Jan 2022 03:28:11 +0000

I've recently learned about the API Economy concept (A good primer and a contrarian view). After a bit of research, I was pretty amazed at how so much of the stuff you need to build your product already exists out there. I think everyone knows about Stripe, Twilio, Auth0.. but I was surprised to learn about Notarize, which has API for signing and notarizing documents, or FaunaDB which is a database with REST API, or background checks with API, etc, etc.

I learned to never implement anything without checking if someone else already provides that as a service via an API that I could integrate with. I found out that Postman (which also has a nice desktop app for testing APIs) hosts API world where tons of APIs are hosted and it lets you experiment with them online: https://www.postman.com/explore

I also learned about JAMStack, which is a community of FE engineers who are building entire websites without ever building a backend - simply by using other services and filling in the gaps with serverless functions (AWS Lambda and such).

It makes sense for pretty much every service to add public APIs and allow others to integrate. It opens up new use-cases and revenue streams, and for the most part we already have these APIs. OpenAPI makes it pretty easy to define the APIs, and then generate the web interfaces in your language, generate documentation, generate mocks, generate tests, etc... All those generated stuff is super important:

Developers will need the documentation to use your API and do the integrations.
Mocks will allow them to test the integration without loading your production application. Maybe even try a "mocked MVP" so you can get feedback before building your app.
Generated tests will let you continuously validate that you didn't break the APIs (since other developers will rely on their for their applications, it is critical not to break their applications - they won't come back and tell their friends).

https://swagger.io/ has great tools for OpenAPI.

I cover all this with a bunch of examples in a video:

I'm also working on a SaaS-in-a-Box backend service (hopefully with both great APIs and great integrations) to make life simpler for SaaS developers. If you are willing to give me feedback on the MVP, mind sharing your email with me in this form? I promise it won't be used for marketing, I'll just personally connect with you to discuss https://forms.gle/8tW73MwEWdWu4rB27.