DEV Community: Phare

Recreating Laravel Cloud’s range input with native HTML

Nicolas Beauvais — Wed, 02 Jul 2025 15:27:54 +0000

For the past few days, I’ve been working on improving the billing experience in Phare, with the addition of prepaid credits. While tweaking the billing UI, I realized the current input for configuring additional quota wasn’t great, it didn’t clearly show what was already included in the paid plan versus what the user could add.

The UX of entering large number in a text input could certainly be improved. It wasn't horrible, but I felt it could be better.

So I went hunting for inspiration on Dribbble, and other listing websites that post screenshots of SaaS services interfaces. After some research, I stumbled upon the Laravel Cloud pricing calculator. Their range input design was spot-on: clear separation between included and additional values, visually appealing, and user-friendly.

Naturally, I did what any self-respecting developer would do, open the browser inspector to ~~steal~~ look at the code. Turns out, they recreated a full range input with a few HTML elements and glued everything with JavaScript using Alpine.js. Here's the structure:

<div class="group/range relative h-8 self-stretch">
  <!– Full track -->
  <div />

  <!-- Static track -->
  <div />

  <!-- Progress bar -->
  <div />

  <!-- Handle -->
  <div>
    <div>
      <span></span>
    </div>
  </div>

  <!-- Tick -->
  <div />

  <!-- HTML range input -->
  <input />
</div>

Because I'm the laziest developer, rebuilding all this for the six people (love you guys 🫶) that pay for Phare felt like overkill. Could I recreate a similar input with less work? There's probably a way to get a similar result with some CSS on top of the native range input, and maybe a few lines of JavaScript.

Building the range input

To match the Laravel Cloud design, we need the following components:

Range track : the rail where the handle moves.

Handle : the draggable thumb.

Progress bar : the filled area left of the handle.

Static part : a fixed section showing the value already included in the plan.

Tick : a visual marker where the included value ends and extra begins.

The static part and tick are mostly cosmetic and can easily be visually faked outside the range input itself. Everything else is already included in the native HTML range input.

So why did the Laravel team go full custom?

Limitations of the native HTML range input

To look great, the handle needs to land exactly at the tick’s position when at the minimum value. It should also cover the tick to be visually appealing:

Unfortunately, this isn't possible as the native range input’s handle is confined to the boundaries of the track. So, what if we make the range input track overlap under the static part to allow the handle to sit on the tick? (such a weird sentence).

Well, the native range inputs don’t let us set a different z-index for the handle and for the track. If we push the track behind the static part, the handle goes with it. If you bring it forward, the whole thing looks messy.

The solution

Enter the CSS inner shadow : using an inner shadow allows us to fake a few extra pixels of the static part inside the track. This lets the handle glide over it without getting hidden.

By carefully layering the tick and the static track visually outside the actual input, and using this inner shadow to fake part of the static part, we can get something that works well.

Styling the handle

Using the border property on the handle with -moz-range-thumb work great in Firefox, but Chrome does not seem to support it. Again, inner shadows are here to save the day, and bring us cross browser consistency.

Styling the progress bar

To make the progress bar pattern, Laravel's team used a clever trick based on repeating-linear-gradient to create infinitely repeating stripes.

background-image: repeating-linear-gradient(135deg, black 0px, black 1px, #99a1af 1px, #99a1af 4px);

But applying that to our native range input will cover the entire track. I only wanted it on the left side of the handle to represent progress.

For fixing this, there isn't any other solution, we will need a few lines of JavaScript:

document.getElementById("range").addEventListener('input', function (event) {
    let input = event.target
    let value = parseInt(input.value)
    let min = parseInt(input.getAttribute('min'))
    let max = parseInt(input.getAttribute('max'))

    let percentage = (value - min) / (max - min) * 100

    input.style.backgroundSize = `${percentage}% 100%, 100% 100%`
})

Final result

The end result is not quite as flexible as Laravel Cloud’s full custom implementation. Since the track should fake the design of the static part and tick, it does not allow more complex design, but it fits perfectly for my use case:

Conclusion

The final native HTML approach is quite simple, with minimal tricks, and still good-looking. I think it shows that it's possible to go quite far with native elements without having to resort to recreating everything with JavaScript.

You can see a fully working example and the code to recreate the input on CodePen:

And if you like my attention to details, you should try Phare, it's a great tool for uptime monitoring, incident management, and status pages.

What to look for in an uptime monitoring tool

Nicolas Beauvais — Mon, 16 Jun 2025 20:43:51 +0000

If your website is how you pay the bills, whether it’s a SaaS, an API, or that side project financing your daily ramen, you need to know when it’s down. Preferably before your customers start angrily spamming that F5 key.

There are thousands of uptime monitoring tools out there, but after running one myself for a few years, here’s what I think you should actually be paying for.

Pick a tool that won’t sleep on the job

There’s no point in using an uptime monitoring service that’s less reliable than the thing you’re monitoring.

I love indie products (obviously), but this is one of those times where time in the market beats timing the market*.

Check how long the tool’s been around
Carefully review stats on their status page, some are... enlightening
Check out the documentation, it’s usually a good indicator of quality
Most reviews online are fake, ask your friends instead

*Not financial advice.

Cloud or self-hosted? Choose wisely

Your monitoring system should live outside your infrastructure, spread across multiple data centers, poking your endpoints from different parts of the world. That’s typically not something you get with a self-hosted setup.

That said, self-hosted might make sense if:

You’re monitoring a closed/private network
You’ve got confidential credentials involved
You like the smell of YAML in the morning

If you go the DIY route, open-source projects like Kener and OpenStatus give you slick status pages and great features while being easy to host.

Otherwise, uptime monitoring being a brutally competitive market, good cloud options are often cheaper than spinning up a new VPS, with the benefit of not having to spend time on maintenance.

Pricing that won’t eat your runway

Some tools charge by tiers, others charge by usage. Both can be good, but you do need to know how far your plan will take you.

Keep an eye out for:

Surprise features locked behind expensive plans, like SSO
Pricing jumps that increase your monthly plan by 600% to monitor that one additional endpoint
Per-seat costs (gets expensive fast if you grow)
Extra costs for things like API monitoring, fancy assertions, or exotic check types

If you’re planning to grow, plan for growth. Otherwise, a cheap starter plan could turn into a budget black hole real quick.

Fast and silent checks

Short intervals of 1 minute to 30 seconds are great, but the increased false positive alerts are not. Make sure your monitoring service confirms failures before it blows up your phone in the middle of the night. Good providers give you:

Failure confirmation
Recovery confirmation
Options to tune how aggressive or chill your alerts are

I wrote a whole guide about this if you want the nerdy details: Best practices to configure an uptime monitoring service

Check from around the world

Just because your site work in Paris does not mean it works in Singapore. Routing is weird. DNS is weird. Internet infrastructure is insanely complex, and sometimes fragile.

If your users are global, your monitoring should be too, especially if you’re doing edge deployments or running multi-region setups.

More than just up and down

“Up or down” is just the start. Depending on what you’re building, you’ll want your uptime tool to handle:

API checks with custom payloads
SSL certificate monitoring (inventory, validity, expiration, AIA, OCSP)
DNS validation
Performance & response times
Tracing & diagnostic info
Custom assertions (e.g., make sure that PHP version header is not present in production)

OhDear is a great example that offers an extensive list of extra checks, like SEO monitoring or broken link and mixed content detection.

Solo today, team tomorrow

Right now you might be a team of one (hey friend 👋), but good monitoring tools support teams, shared dashboards, incident timelines, etc.

Even solo founders need to sleep occasionally. Having a friend or colleague see the same alerts is a life upgrade worth investing in early on.

Plays nice with your stack

Your monitoring tool should work with whatever communication channels you already use, no matter if you’re a Slack, Email, or Webhook person. Alerts should come to you, and not the other way around.

Also keep an eye for generic integration like incoming and outgoing webhooks as well as APIs. They will provide you with ways of integrating a third party or custom-made solution as you grow.

Good logs save bad days

When things break, you need as many details as possible:

The actual HTTP status code
The full response body
Headers
DNS resolution steps
Request trace

You could even go further with traceroute logging or screenshot capture. Most cloud solutions provide this. If you’re going self-hosted, you can rig something with webhooks and a great screenshot API like CaptureKit.

It’ll save you hours writing postmortems, debugging edge cases, or explaining to your users why everything went sideways last Thursday.

TL;DR

Pick a tool you can trust
Make sure it’s got the features you need today and tomorrow
Choose something that helps you fix problems, not just point at them

I’m building Phare.io with this mindset, check it out if you’re looking for a great uptime monitoring tool with incident management and status pages. It’s free to start and scales with your needs.

The 3-Year Journey to an Actually Good Monitoring Stack

Nicolas Beauvais — Tue, 15 Apr 2025 19:34:35 +0000

When I started building Phare in early 2022, I planned the architecture assuming that fetching websites to perform uptime checks would be the main scaling bottleneck, and oh boy, was I wrong. While scaling that part is challenging, this assumption led to suboptimal architectural choices that I had to carry for the past three years.

Of course, when you build an uptime monitoring service, the last thing you want is your monitoring infrastructure to be inefficient, or worse, inaccurate. Maintenance and planning take priority over everything else, and your product stops evolving. You're no longer building a fast-paced side project, you're just babysitting a web crawler.

It took a lot of work to fix things while maintaining the best possible service for the hundreds of users relying on it. But it was worth it, and the future is now brighter than ever for Phare.io.

Let’s go back to an afternoon in the summer of 2022, when I said to myself:

Fuck it, I’m going to make an uptime monitoring tool and compete with the 2,000 that already exist. It should only take a weekend to build anyway.

(It was probably in French in my head, but you get the idea).

The very first version: Python on AWS Lambda

AWS Lambda immediately felt like a perfect fit. I had written a few Lambda functions before, and it seemed like a good choice to easily run code in multiple regions, with the major benefit of no upfront costs and no maintenance. Compared to setting up multiple VPSs, with provisioning and maintenance on top, the choice was clear.

I wrote the Python code for the Lambda, and all that was left was to invoke it in all required regions from my PHP backend whenever I needed to run an uptime check.

The AWS SDK supports parallel invocation, which solved the problem of data reconciliation. I had the results of all regions in a single array and could easily decide if a monitor was up or down, sweet.

$results = Utils::unwrap([
  $lambdaClient->invokeAsync('eu-central-1', $payload),
  $lambdaClient->invokeAsync('us-east-1', $payload),
  $lambdaClient->invokeAsync('ap-south-2', $payload),
]);

Most of the business logic was built on top of that result set. How many regions are returning errors? How many consecutive errors does this particular monitor have? Is an incident already in progress? Should the user be notified? etc. (As you guessed, this becomes important later.)

This setup worked well, delivering accurate and reliable uptime monitoring to the early adopters of Phare, while I focused on building incident management and status pages.

Until May of 2024, when I received a ~25 euro invoice from AWS. Okay, that’s not much, but that was for only 4M performed checks. That’s the cost of five entry-level VPSs, all to monitor about 100 websites. Not cost efficient at all.

I might have created the most expensive uptime monitoring service

The biggest part of the spending was from Lambda duration (GB-Seconds). As Phare’s user base grew, websites got more complex, no more just monitoring my friends’ single-page portfolios with 100 out of 100 Lighthouse scores. Websites can be slow, and even with a 5-second timeout, the Lambda execution ended up being far too expensive.

Another issue was request timing accuracy. AWS Lambda lets you select the memory limit from 128MB to 10GB, and with more memory comes more CPU power. To fetch a URL with realistic browser-like timing, the Lambda needed at least 512MB of memory, a significant cost factor for longer checks, and a huge financial attack vector.

It was time to find an alternative.

Enter Cloudflare Workers

Cloudflare Workers seemed unreal, much cheaper than AWS Lambda, and you only pay for actual CPU time. That meant all the idle time waiting for timeouts was now completely free. I could build the cheapest uptime monitoring service while keeping a good margin, and offer an unbeatable 180 regions.

Setting it up wasn’t straightforward. On top of having to rewrite the code in JavaScript, it was not possible to invoke a Worker in a specific region. And that was a major blocker.

After many failed attempts, I came across a post from another Cloudflare user who had figured out how to do exactly that, using a first Worker to invoke another one in a chosen region. It wasn’t documented, but Cloudflare seemed aware of this loophole for a while, with no public plan to restrict it. The performance and pricing were too good to ignore, so I went with it. YOLO.

The two-Workers technique changed everything. I could send large payloads of monitors, have the first Worker create smaller regional batches, and return reconciled results. My backend became more and more dependent to the way Cloudflare Workers behaved.

Of course, there were limitations: non-secure HTTP checks were a no no, it was impossible to get details on SSL certificate errors, and TCP port access was restricted. But I managed to find a few workarounds, and everything was running smoothly.

The ecosystem was growing fast, edge databases and integrated queues were being released by Cloudflare, my workers averaged sub 3ms execution times. The future looked bright.

Of course, after just a few months, on November 14th, 2024, regional invocation was patched, and the entire uptime infrastructure went down. That day was a looong day.

I quickly patched the script, rerouting all requests to the invoking region so uptime checks still ran, even if not in the right region.

It was time to find an alternative. Fast.

Bunny.net Edge Scripts to the rescue

At that time, Bunny.net had just released their Edge Scripts service in closed beta, a direct competitor to Cloudflare Workers, built on Deno. Pricing was similar, and the migration looked plug-and-play, which was all that mattered, because I couldn’t afford the time to rewrite the backend logic.

I got into the beta, rewrote the script in Deno using the same two-invocation strategy, and began rerouting traffic from Cloudflare to Bunny.

The first part of the migration went smoothly, regional monitoring was back up, and I could finally relax a bit.

Of course it wasn't long until shit hits the fan, and the uptime monitoring performance data started to get funky. Cloudflare was a more mature solution that handled many things in the background, like keeping TCP pools in an healthy state, which is important when you perform thousands of requests to different domains.

Thankfully, Bunny’s technical team was amazing. They helped me a lot, and I gave them plenty to work on in return.

Eventually, things got better. Edge Scripts left beta and became generally available, and that’s when a new bottleneck appeared.

The backend code was still invoking Edge Scripts and waiting for a batched response. As Phare gained new users daily, the number of invocations grew. My backend started hitting 502/503 errors on Bunny’s side. Queue wait times forced me to increase concurrency. And I was still facing the same limitations I previously had with Cloudflare Workers.

Maybe Edge Scripts weren’t the best long-term solution after all.

I knew what I had to do from the beginning: decouple the backend from the edge scripts and process results asynchronously. But doing so meant reworking the deepest, most fundamental part of my backend logic, now massive after years of accumulated features.

Again, I had no choice if I wanted to keep improving Phare.

It was time to find an alternative.

The obvious answer: Bunny Magic Containers

In early 2025, Bunny announced Magic Containers, a new service letting you deploy full Docker containers across Bunny’s global network. I had been desperately trying to find a European hosting provider with such a diverse range of locations. I was already integrated with the Bunny ecosystem, and had full confidence in their amazing support team.

This time, I did things slowly. I built a few preview regions to test at scale with real users, in parallel with the still-working Edge Script setup. Of course this meant running two versions of the backend logic at the same time, two different ways of triggering monitoring checks, and thousands of new line of code to make it work. Not fun, but necessary to finally fix the past mistakes.

The new uptime monitoring agent would run continuously in a Docker container, billed by CPU and memory usage. Cost was a major concern, so I rebuilt it in Go with the following goals:

The Phare backend and the monitoring agent must be fully decoupled.
The agent should fetch its monitor list from an API, no backend push.
Results are sent asynchronously to the backend.
Data exchange should be minimal.
The agent must be fault-tolerant and self-healing.
It should match the feature set of the Edge Script version.

And just like that, six new preview regions were added to Phare at the end of February, and they ran like fine clockwork. I actually went on vacation a few days after the release, for a full month, and didn’t have a single issue. I did have a lot of time to reflect on my past mistakes.

I won’t go into too much detail about the new infrastructure, this post is painfully long enough already. Today, all checks run on Bunny Magic Containers. And for the first time in years, I can focus on building new features for both, the agent, and the platform.

And if I ever need to change provider again, I can just spin up a few VPSs with my Docker image and it’ll work. I should’ve done that from the beginning, but I wanted to go fast, and that costed me a few years of real progress.

What’s next

The current infrastructure works well, but it’s not perfect. When a container is restarted there's a brief overlap where two instances might run the same check. If a region goes offline, there’s no re-routing, users need to monitor from at least two regions to stay safe.

Fetching the monitor list every minute via API works surprisingly well, thanks to ETags and a two-tier cache system. But I’m still exploring how to reduce HTTP calls. Having read replicas closer to the containers might be the best bet.

From the outside, it didn’t look so bad, Phare grew to nearly a thousand users during all this infra chaos. Users loved the quality of the service far more than I did.

This post is mostly a rant at my past self. I took too many shortcuts while building what started as a weekend project, which held the company back once it grew beyond that. But maybe that’s what startups are all about.

That said… see you in three years for the blog post about Phare.io Monitoring Stack v8, probably rewritten in Rust, because history repeats itself.

Best practices to configure an uptime monitoring service

Nicolas Beauvais — Mon, 26 Aug 2024 16:00:22 +0000

Getting alerted of downtime is an essential part of running a healthy website. It's a problem that got solved a long time ago by uptime monitoring services, but as simple as setting up a monitoring service for your website might seem, there are a few best practices that I learned other the years maintaining dozens of websites from side-projects to Fortune 500, and building Phare.io, my own take on uptime monitoring.

We will dive into some best practices to get the best possible monitoring without false positives, the configurations explored in the article should work with most monitoring services.

Choosing the Right URLs to Monitor

Defining which resources to monitor is the first step to a successful uptime monitoring strategy, and as simple as it might seem, there some thinking to do here.

The first thing to consider is how your website is hosted. Many modern startups will have landing pages on a static hosting provider like Vercel or Netlify, and a backend API hosted on a cloud provider like AWS or GCP. Then you might have external services hosted on a subdomain like a blog, a status page, a changelog, etc. Each of these resources can go down independently, and you should monitor them separately.

🎓 Find all resources that can independently go down and monitor them separately.

For each of these resources, you need to define the right URL to monitor, and there are again a few things to consider:

Static hosting

Most statically hosted websites will use some form of caching through a CDN. If you monitor a URL cached at the CDN level, you might not get alerted when the origin server is down. You then need to check with your monitoring service or your CDN for a way to bypass the cache layer.

🎓 Make sure you monitor the origin server and not a cached version of your website.

Dynamic websites

For dynamic websites or API endpoints, it's tempting to monitor a simple health check route that returns a static JSON response, but you might miss issues that are only visible when hitting API endpoints that do some actual work.

Ideally, the URL that you monitor should at least perform a database query, or execute any critical resources of your application to make sure everything is working as expected. Creating a dedicated URL for monitoring is usually a good idea.

🎓 Monitor an endpoint that performs actual work and not just a static health check.

External services

Monitoring external services is usually not as important as you are not responsible for their uptime. However, it's always good to be proactive and get alerted before your users do. This will allow you to communicate about the issues and show that you are on top of things.

🎓 Monitor external services to be proactive and communicate about issues before your users do.

Redirections

Now you should have a good idea of the urls you need to monitor, you need to check for any redirections. Be careful with the URL format that you use to monitor your resources, some services will end all URLs with a / and some won't, you will put an unnecessary load on your server if you don't use the right format and will likely get wrong performance metrics on your uptime monitoring service.

🎓 Be mindful of unnecessary URL redirection to avoid load on your server and inaccurate performance metrics.

Monitoring that a few critical redirections work as expected is also a good idea, things like www to non-www, or http to https redirections are critical for your website SEO and user experience and could be monitored.

🎓 Monitor critical redirections to make sure they work as expected.

Response monitoring

Now that you have defined the right URLs to monitor, you need to define the excepted result of your monitors. In the case of HTTP checks, that will usually be a status code or a keyword on the page.

It is common knowledge among web developers that status codes are not always to be trusted, and that a 200 OK status code doesn't mean that the page is working as expected. This is why it's a good idea to also monitor for the presence of a keyword on the page.

A good keyword is something unique to the page that would not be present on any error page. For example, if you choose the name of your website, there's a high chance that it will also be present on a 4xx error page, and you will get false positives monitoring for it.

🎓 Always check the response status and the presence of a unique keyword on the page.

Request timeout

Finding the right timeout for your monitors is a true balancing act. You want to make sure that the timeout is not too wide to avoid any false positives, but you also want to make sure that it's not too short to get alerted when your server is too slow to respond.

My advice is to start with a large timeout for a few days and then gradually decrease it until you find the right balance. Of course this should be done on a per-url basis, as some resources might be naturally slower than others.

Some monitoring services will have special configurations for performance monitoring that you could use for this purpose, you should also keep in mind that services will calculate response time differently, and you might get different results from different services, so it's always a good idea to start with a large timeout.

🎓 Start with a large timeout and gradually decrease it until you find the right balance.

Monitoring frequency

The monitoring frequency is another balancing act. You want to make sure that you get alerted as soon as possible when your website goes down, but without wasting resources on unnecessary checks for your website that is up 99.99% of the time and for our beautiful planet.

Choose shorter intervals for critical resources and longer intervals for less important things like third-party services or redirections. You could also consider the time of day and monitor more aggressively during your business peak hours.

Keep in mind the following when choosing the monitoring frequency:

Every 30 seconds = ~90k requests per month
Every 1 minute = ~45k requests per month
Every 5 minutes = ~9k requests per month

🎓 Choose shorter intervals for critical resources and longer intervals for less important things.

Incident confirmations

I would strongly advise against using any monitoring service that does not offer a way to configure a number of confirmations before sending an alert. This is, with multi-region monitoring the most impactful way to avoid false positives.

The internet is a complex system, and a single network glitch could prevent your monitoring service from reaching your server. It might not seem like a big deal, but the more alert you get, the more you will ignore them, and you will certainly miss a real incident after a few weeks of receiving daily false positives alerts.

This setting should be configured based on your monitoring frequency, and the criticality of the resource you are monitoring. The more frequent the monitoring, the more confirmations you should require before sending an alert, here is a good rule of thumb:

30 seconds monitoring interval -> 2 to 3 confirmations
1 minute monitoring interval -> 2 to 3 confirmations
2 to 10 minutes monitoring interval -> 2 confirmations
Any greater monitoring interval -> 1 to 2 confirmations

🎓 Always require a confirmation before sending an alert.

Multi-region monitoring

Just like incident confirmations, multi-region monitoring is a must-have feature for any monitoring service. It often happens that a request fails temporarily from a specific monitoring endpoint, but it doesn't mean that your website is down.

When checking from multiple regions, uptime monitoring services will usually require a certain number of regions to fail before sending an alert. This is a great way to avoid false positives and make sure that your website is really down for your users.

You should always monitor all resources from at least 2 regions, and more for critical resources. When possible, choose the regions closest to your users this will give you the best results and accurate performance metrics.

🎓 Monitor all resources from at least 2 regions.

Alerting

The last thing to consider is how you want to be alerted. Most monitoring services will offer a wide range of alerting options, from email to SMS, to Slack or Discord notifications.

As we previously established, not all resources are equally important, and you might want to be alerted differently for each of them. Think about the way your company communicates, and how you could integrate the alerts into your existing workflow. You might want to create a dedicated channel for alerts, or use a dedicated email address for alerts. For the most critical resources, you might want to use SMS or Phone notifications, but discuss this topic with your team and make sure that everyone is on the same page. If you configure SMS alerts and the on-call person keeps a phone on silent, that might not be the best idea.

🎓 Choose the alerting method adapted to each resource and discuss this topic with your team.

Conclusion

In most cases uptime monitoring is a set and forget kind of thing, but I've seen many teams struggle with false positives and alerts fatigue. By following these best practices, you should be able to get the best possible monitoring without false positives, and make sure that you are alerted when your website is really down.

If you are looking for an uptime monitoring service that helps you implement these best practices, you should check out Phare.io. It's free to start and scale with your needs.

How we run Ghost on Docker with subdirectory routing

Nicolas Beauvais — Thu, 22 Aug 2024 17:00:17 +0000

Deciding on the right blog platform is always a bit of a hassle, whether it's for my personal blog or my company. I often have to resist the urge to build something from scratch, which inevitably means sinking the next two weeks into coding yet another blog from the ground up.

When it came to setting up a blog for Phare.io, I made a conscious effort to minimize the time spent on setup. After some research, I decided on Ghost, a well-regarded content platform that seemed to meet all our needs. Self-hosting looked straightforward, and the documentation mentioned support for subdirectory routing, which was a key requirement for our SEO strategy.

But as is often the case, things weren't quite as simple as they first appeared. Hence, this blog post to guide anyone looking to do something similar.

Setting Up Ghost on Docker

To keep things organized, the plan was to isolate Ghost on its own server. For this, I spun up a new VPS instance on Hetzner running a Docker-CE image.

This instance runs on a private network without a public IP, and the firewall is configured to accept traffic only from Phare's NGINX server on port 8080.

This setup might be a bit over the top for hosting a blog, but it was quick to implement and significantly reduces the attack surface, so there’s no reason not to do it.

With the server ready, the next step was to write a Docker Compose file to configure Ghost's Docker image on port 8080 along with a MySQL database:

services:
  ghost:
    image: ghost:5-alpine
    restart: always
    ports:
      - 8080:2368
    environment:
      database__client: mysql
      database __connection__ host: db
      database __connection__ user: root
      database __connection__ password: {{ ghost_db_password }}
      database __connection__ database: ghost
      mail__transport: smtp
      mail __options__ host: {{ ghost_mail_host }}
      mail __options__ port: {{ ghost_mail_port }}
      mail __options__ auth__user: {{ ghost_mail_user }}
      mail __options__ auth__pass: {{ ghost_mail_password }}
      mail __options__ secure: true
      url: https://phare.io/blog
    volumes:
      - ghost:/var/lib/ghost/content

  db:
    image: mysql:8.0
    restart: always
    environment:
      MYSQL_ROOT_PASSWORD: {{ ghost_db_password }}
    volumes:
      - db:/var/lib/mysql

volumes:
  ghost:
  db:

Here are some key points to note in that file:

The ghost service binds to port 8080, which is the one we opened on the firewall.
Both services use persistent storage, making backups straightforward.
The url environment variable should be set to the public URL where your blog will be hosted.

Once the configuration is complete, you can start the services with Docker Compose:

docker compose up

In our case this step is automated with an Ansible playbook task:

- name:
  community.docker.docker_compose_v2:
    project_src: /docker/ghost
    files:
      - docker-compose-ghost.yml
    state: present

And just like that, we have a running Ghost instance.

Configuring Subdirectory Routing with NGINX

Phare.io uses an NGINX server to manage load balancing, headers, and a few other tasks. Our setup involves complex routing to allow users to create status pages on *.status.phare.io or their own domains.

For the blog, we wanted it to be accessible only on our main phare.io domain, so the first step was to adjust our configuration to ensure only phare.io was served, excluding any subdomains.

With that in place, I created a location block to route all /blog traffic to the Ghost instance:

server {
    listen 443 ssl;
    listen [::]:443 ssl;
    http2 on;

    server_name phare.io;

    # location / {
        # Configuration for our Laravel app
    # }

    location ^~ /blog {
        client_max_body_size 10G;

        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $http_host;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_pass http://10.0.1.2:8080;
    }

    location ~* \.(jpg|jpeg|webp|png|svg|gif|ico|css|js|eot|ttf|woff)$ {
        gzip on;
        expires 1M;
        access_log off;
        add_header Cache-Control public;
    }
}

I removed a few irrelevant lines, here are the important details:

As recommended by the Ghost documentation, set a high client_max_body_size to allow large file uploads via the Ghost admin panel.
The ^~ directive in the location block ensures no other location block takes precedence, which is crucial to prevent interference with the caching rules further down that could break asset loading.
The proxy_pass directive points to our Docker server's private IP 10.0.1.2 and the previously opened port 8080.

Accessing the blog

With everything set up, the blog is now accessible at phare.io/blog, and the admin panel at phare.io/blog/ghost. Our Ghost Docker instance runs securely on a private network.

To speed up asset loading and caching, we use bunny.net on the phare.io domain. Most of our existing rules worked seamlessly on the blog, but I hit a snag when Ghost couldn’t create a session cookie, preventing me from signing in.

The problem was that cookies were disabled on the domain. Changing this setting solved the issue without affecting the rest of the site, as Phare only uses session cookies on the app.phare.io domain. However, a potential improvement could be moving Ghost's admin panel to its own subdomain, which would allow this setting to be re-enabled.

Conclusion

Hosting a Ghost blog on a /blog subdirectory path using NGINX is a practical solution when you want to seamlessly integrate your blog with your main website. While it requires some configuration, the benefits for SEO and branding make the effort worthwhile.

I hope this post helps you in setting up your own Ghost blog. The Phare team is delighted with the platform so far, and I’m glad I didn’t spend weeks building a half-baked in-house solution.

Would you like to make sure your blog or any other part of your website stays online? Create a Phare account for free and start monitoring your website today.

Downsampling time series data

Nicolas Beauvais — Mon, 29 Jul 2024 09:50:00 +0000

At Phare uptime we allow you to view your monitor's performance data with up to 90 days of history. While this might not seem like a lot, a monitor running every minute will span about 130 thousand data points during that time frame, for a single region.

Showing that amount of data in a graph would be slow and impossible to understand due to the sheer number of data points. Finding the best solution requires the right balance between user experience, data quality, and performance. We need to tell you the story of your monitor's performance in a way that is easy to understand, fast, and accurate.

In this article, we dive into a few techniques that we used to downsample the data by 99.4% while keeping the most important information.

Raw data

We start with the raw data collected from monitoring the Phare.io dashboard in the last 90 days. The performance varied a lot during this time frame thanks to a noisy CPU neighbor, which made it the perfect candidate for this article.

If we plot the raw data with 130 thousand points, we get a chart that looks like this:

The performance is not terrible, loading the data takes about 230ms, and rendering the chart is done under 100ms, which is already better than a lot of other charts out there. But the real problem is that the chart is unreadable, we can't see any patterns, and it's hard to understand what's going on.

The confusion does not only come from the number of data points but also from the scale difference in the data. The vast majority of the requests will be performed under 200ms, but about 0.1% of them will be slower. It does not matter how fast your website and our monitoring infrastructure are, there will always be a few requests that will be slower due to network latency, congestion, or other factors.

Removing anomalies

Our first step is to remove the anomalies from our data set. A single 4-second request among a few thousand will skew the data and make it hard to read. We need to remove these anomalies in a way that will keep any sustained decline in performance visible.

To solve this problem, we can use one of two techniques:

Standard deviation: We can calculate the standard deviation of the dataset and remove any data points that are outside a certain multiple of the standard deviation.
Percentile: We can calculate the 99th percentile of the data set and remove any data points that are above that value.

Both solutions offer similar results, but need to be applied on a rolling window to make sure that we don't remove any sustained decline in performance and keep anomalies in periods of high performance.

We use the formula rolling mean + (3 x rolling standard deviation) to remove any data points that are three standard deviations above the rolling mean, and chose a rolling window of 30 (15 points before and after the current point):

By zooming in on the few remaining spikes, we can see that they are not isolated anomalies but sustained periods of lower performance which we want to keep in our data set.

The following is a zoomed-in view of the tallest spike on the right, we can see that the spike is followed by a series of slower requests over 1h30, this is exactly the kind of information we want to keep:

If we did not calculate the deviation or quantile in a rolling window, we would get the following chart, which removes everything above 600ms while keeping many anomalies in the lower performance range:

Rolling window average

The next step is to smooth the data with a rolling window average. This technique will slightly reduce the gap between two adjacent data points and make the chart more readable while allowing us to detect trends in the data.

For this example, we use a rolling window of 10 data points (5 points before and after the current point) to smooth the curve without losing too much information:

Downsampling

Our data is now clean and smooth, but we still have 130 thousand data points to display which cost bandwidth and rendering performance. To reduce the number of data points without losing too much information, we can use the largest triangle three buckets (LTTB) algorithm.

The LTTB algorithm is a downsampling technique that finds the most important points in the data set by dividing the data into buckets and selecting the point with the largest triangle area in each bucket. In simpler words, the algorithm will only keep the points that are the most representative of the data set so that the overall shape of the curve is preserved.

By applying the LTTB algorithm to our data set, we can reduce the number of data points from 130 thousand to 750, which is a reduction of almost 99.5%. In the form of a JSON payload, we go from 1.53 MB to 13 KB, which is a significant reduction in bandwidth.

As you can see, the chart looks almost identical to the full data set, but uses only a fraction of the data points.

It is important to carefully prepare the data before applying the LTTB algorithm, as the algorithm is specifically designed to preserve the overall shape of the curve it will keep any anomalies into the final data set.

Implementation with ClickHouse

ClickHouse is a powerful column-oriented database optimized for analytical queries that offers unparalleled performance for time series data. We extensively use it at Phare to store and analyze the performance data of your monitors.

All the techniques described in this article can be implemented in a single ClickHouse query using the largestTriangleThreeBuckets function, as well as window functions.

-- Apply the LTTB algorithm to the data set
SELECT
  largestTriangleThreeBuckets(750)(
    `cleaned_results`.`timestamp`,
    `cleaned_results`.`time`
  )
FROM
  (
    -- Smooth the remaining data with a rolling window average
    SELECT
      `raw_results`.`timestamp`,
      avg(`raw_results`.`time`) OVER (
          ROWS BETWEEN 5 PRECEDING AND 5 FOLLOWING
      ) AS `time`
    FROM
      (
        -- Select the raw data
        SELECT
          `timestamp`,
          `time`,
          -- Calculate the rolling window average and standard deviation
          avg(`time`) OVER (
              ORDER BY `timestamp` ROWS BETWEEN 15 PRECEDING AND 15 FOLLOWING
          ) + 3 * stddevSamp(`time`) OVER (
            ORDER BY `timestamp` ROWS BETWEEN 15 PRECEDING AND 15 FOLLOWING
          ) AS anomalies
        FROM
          `performance_table`
      ) AS `raw_results`
    -- Filter out anomalies
    WHERE
      `raw_results`.`time` < `raw_results`.`anomalies`
) AS `cleaned_results`

Drawing the data

Phare uses uPlot to draw charts in the frontend. uPlot is a small, fast, and flexible charting library, which perfectly fits our needs. It allows us to draw charts with a large number of data points with the best possible performance, where other libraries would struggle.

Keep in mind that uPlot is a low-level library, which means that you will need to spend a good amount of time configuring it to get the desired result. But the performance and flexibility it offers are worth the effort.

Because we already processed the data with ClickHouse, we only need to separate the data in two arrays, one for the x-axis and one for the y-axis, and pass them to uPlot.

new uPlot(options, [
  timestamps, // x-axis values
  times // y-axis values
], document.body);

Of course, Phare implementation is more complex than that, we need to handle responsive, live-streaming, and displaying uptime incidents, but this goes beyond the scope of this article.

Conclusion

By removing anomalies, smoothing the data with a rolling window average, and downsampling the data with the LTTB algorithm, we were able to reduce the amount of data by 99.4% while keeping the most important information.

The chart is now readable, fast to load, and easy to understand.

Would you like to see this performance chart for your own website? Sign up for free and start monitoring your website today.