As developers, we all know the struggle of wading through irrelevant search results to find that one golden line of code. So, I thought, why not build a search engine tailored for us devs? With Rust, Actix, Elasticsearch, React, and Next.js, I created a search engine for developers.
Here is what I made:
https://dev-search.com/
I am not a senior dev, so if I am doing something stupid, please let me know ๐
๐ฏ The Mission
The goal was simple: create a developer's information-focused search engine with:
Frontend: React + Next.js (SSG for speed and SEO)
Backend: Rust and Elasticsearch for robust, scalable search functionality
๐ง Challenges Faced
Search by Elasticsearch is slow ๐ข
Because there are more than 10 million documents, the search of elesticsearch was slow.
I found that the problem that was slowing it down was:
"track_total_hits": {big number like 10000}
The Solution
Actually keeping that number big like 10000 is as slow as actually fetching 10000 documents from elasticsearch. By changing this to
"track_total_hits": false
made the search a lot faster. But this change disables ability to track how many records were hit by a search, so you must consider well if it is good for your use case.
Too Many Malicious Users Scanning the Website ๐ฝ
Ah, the joys of running a public-facing site! Within days of launching, I noticed strange requests hitting my server logs. From bots pretending to be browsers to outright weird payloads like \x00\x00SMB, my site became a playground for malicious users. Here's a gem from my logs:
35.203.211.8 - - [30/Dec/2024:05:15:37 +0000] "\x00\x00\x00\xAC\xFESMB..."
The Solution: Fail2Ban
Fail2Ban came to the rescue! This nifty tool monitors log files and dynamically bans IPs that show malicious behavior. Here's how I set it up:
Defined a Fail2Ban Jail for Nginx:
[nginx-malicious]
enabled = true
port = http,https
logpath = /var/log/nginx/access.log
maxretry = 5
findtime = 300
bantime = 600
action = iptables[name=nginx-malicious, port="http,https", protocol=tcp]
Filter to Detect Malicious Patterns:
[Definition]
failregex = ^<HOST> - - .*SMB.*
ignoreregex =
Dynamic Blocking in Action:
When Fail2Ban detects malicious requests, it updates the firewall to block the offending IP:
sudo iptables -L -n | grep DROP
With Fail2Ban, malicious IPs were swiftly banned, and my server logs became much cleaner. Lesson learned: Bots will come, but so will the ban hammer. ๐ ๏ธ
Please note that, if you are using Docker/Docker compose, you might need the following:
https://github.com/fail2ban/fail2ban/issues/2376#issuecomment-2565534465
Adsense not showing ๐ฟ
As you can see on the capture:
Even though Adsense is set, the Adsense often doesn't show up...
I investigated why it is not showing up, but I guess there are 2 reasons:
- My website's reputation is low
- Google cannot find ad for the specified ad size
Well, I cannot change the first reason, but maybe I can do something for the second one. What I did is as follows.
The Solution
At first, I tried the fixed sized ad because I wanted a not too large ad:
<GoogleAdUnit>
<ins class="adsbygoogle"
style="display:inline-block;width:300px;height:90px"
data-ad-client="ca-pub-{ad-client-id}"
data-ad-slot="{slot id}">
</ins>
</GoogleAdUnit>
But this often fails to show the ad.
- Please note that I am using
nextjs13_google_adsense
because I am using Next.js.
So, after that, I tried a responsive ad. The default code of the responsive ad is:
<GoogleAdUnit>
<ins
className="adsbygoogle"
style={{ display: 'block', width: '100%' }}
data-ad-client="ca-pub-{ad-client-id}" // Replace with your AdSense client ID
data-ad-slot='{slot id}' // Replace with your Ad slot ID
data-ad-format="auto"
data-full-width-responsive="true"
/>
</GoogleAdUnit>
This is the best because the size is changed in accordance with the ad size. But, to me, the auto sized ad looked too big ๐
So I limited the height like this. Please note that I am using the "horizontal" for the data-ad-format
because I wanted a not-too-big horizontal ad.
<GoogleAdUnit>
<ins
className="adsbygoogle"
style={{ display: 'block', width: '100%', height: '50px' }} // limit height
data-ad-client="ca-pub-{ad-client-id}" // Replace with your AdSense client ID
data-ad-slot='{slot id}' // Replace with your Ad slot ID
data-ad-format="horizontal" // horizontal
data-full-width-responsive="true"
/>
</GoogleAdUnit>
It still sometimes fail to show ad, but ad more often appear on my website now because there is not limitation for the width ๐
Unsolved Problems
- Website Design is too simple
- The search accuracy is low
- The returned data is almost always only stackoverflow because large amount of the database is records from stackoverflow. Not sure whether this is OK..
Top comments (0)