New Aru Ai updates have arrived. I timed them to coincide with the celebration of Nauryz in Kazakhstan. Officially, it ended a few days ago, but for me personally, this holiday lasts all spring. Moreover, a few dozen people did get acquainted with the updates directly during the holidays, so I decided to keep the name for these updates.
The full article about the project on this site is right here - link
Previous major updates - link
In the full article, you can learn about the main features of the project and understand the philosophy and principles of its operation.
This time, I will make screenshots in English to write a few more articles for different sites. As a reminder - Aru supports Kazakh, English, and Russian languages not only in terms of the interface but also in the semantics and heuristics modules.
Now I will focus only on the important updates that happened to the project over the month.
Introduction

Visually, there are no serious changes. The interface remains the same as it was a month ago, with the exception of one single button that is not immediately noticeable. Let's start with that.

Note that a QR code button has appeared in the sidebar next to the database synchronization indicator. This is a new functionality of the project that significantly improves workflow and allows transferring data from one device to another through direct transmission. Even if different databases are used on different devices.
Yes, absolutely any data can be transferred even to another person; just give them the session ID or allow them to scan the QR code from your screen.
You can choose exactly what you want to send to another device or another person:
Chats and history - you can transfer all chats at once or select specific ones, one or several.
Assistant settings - all network settings, keys, age mode, absolutely everything in the settings including facts about the user can be transferred to another device.
Artifacts - any games, applications, or documents that you have in your library can be transferred.
Tasks and projects - you can transfer all projects with all tasks, deadlines, and kanban board columns. Of course, you can choose one or several. Tasks cannot be transferred separately.
Entire database - literally a full copy of the current database with all artifacts, settings, chat, history, facts, tasks, and projects.
After you have selected the content of your database you want to share, just click the button to start the session. A QR code and ID will appear.
Both devices do not have to be on the same network. They can be separated by entire countries and continents.
The second user (or your other device) must select the "Receive" option, scan the QR code, or enter the session ID. After data transfer, the system will offer what to do with it:
Overwrite - identical names and IDs within the database will be overwritten by the received ones.
Place alongside - any received data will end up in the database, even if such names, projects, or artifacts already exist.
Save as a separate database - the received data can simply be saved on your device as a separate file and accessed later.
How does it work?
WebRTC is used. The connection is private and secure; all data is transferred in encrypted form p2p from one device to another. For stability, the stream is broken into chunks and then assembled on the second device.
Yes, for stable operation, "intermediaries" are required in the form of a STUN server (by default it is Google) and a signaling server (peerJS).
STUN is needed so that both devices can get their addresses outside of NAT (if they are not explicitly public and static).
The signaling server is needed so that two devices can "shake hands" through the global network and start data exchange. The data itself does not touch either the first server or the second, but real addresses are visible at the moment and session IDs are generated.
So what about full paranoid privacy at the level of "I'm tired of these damn corporations!"?
I added the ability to specify third-party parameters for the STUN and signaling server in the network settings. These can be resources you trust or have deployed yourself for personal use.
Why is this needed?
Previously, to transfer a database from one device to another, for example from a PC to a phone and vice versa, it was necessary to transfer the database file manually, rewrite them and replace them. There was (and is, if needed) a method of using databases on cloud drives, which is an option, but if there is no internet - a bad option.
Now transferring the entire database or specific chats and artifacts you worked with between devices is a matter of a few seconds and is completely safe, even with public servers by default.
In the future, this will open the way to implementing the project on closed browsers and devices on iOS.
Aru speaks now!
The second plugin is now in the project and available for everyone. Voice chat is organized very interestingly and works very well. But for now, it is only a beta version. Most likely, in future releases, the interface and possibly the capabilities will be reviewed and changed. But now it looks like this.
Note that the plugin sidebar (as with the task manager) is a separate sidebar that does not affect the main project. Like any chat or plugin, voice chat opens as a separate tab within the project interface.
Details:
Most importantly - voice chat is not a separate logic in the project. The quality of answers directly depends on the model you have chosen. If Aru answers very slowly or with poor quality in the regular (text) chat, then in the voice chat it will be the same + time for recognizing your speech and synthesizing Aru's speech.
There are three ways of speech synthesis:
Web Speech - the fastest and requires no resources for calculation at all. The voice in English is often better than in Kazakh or Russian. This depends on the browser and operating system. It is important to remember that this sound synthesis method will send text to third-party servers.
Local and private - a small Piper TTS model is launched on the running device, it uses ONNX files for voicing and JSON for working with phonemes. All these files will be loaded into the cache only once, then they will be called as needed.
LAN server - for running models on a powerful server within your network. As with using local models via Ollama, you can configure the TTS endpoint based on Piper however you like. The downside - you have to write the endpoint yourself, adjusting to the Aru Ai logic. You need to know Python or JS (Node.JS), understand how the network and CORS work. A guide and examples will follow (maybe, someday).
Speech recognition, there are also three types:
Web Speech - like synthesis, this is the fastest way and does not require serious calculations. It works great in English, slightly worse in Russian and Kazakh, but depends on the browser and the device's operating system. Data will still go to third-party servers.
Local and private - Whisper is used, the model will be loaded into the cache and used as intended. Warm-up depends on the interface language. You can choose recognition sensitivity, which will affect the resources consumed and the device's RAM consumption.
LAN server - the situation is equivalent to sound synthesis: your own endpoints within the local network. You can make it however you like, but you need to know how to program.
You can combine different variants of synthesis and speech recognition with each other.
If you choose voicing and text recognition directly on the device, it will require serious calculations. Aru can adaptively choose calculation methods on the CPU or GPU, but on weak devices, the result may be unpredictable. I even turned off the plugin for phones by default. If you are confident in your mobile device, you can allow the use of the voice plugin in the project settings.
Recognition and speech synthesis will occur only in the language you selected in the interface. But if Whisper is used for recognition (local or network method), it recognizes speech in mixed languages well.
In local mode, as I already said, models and weights are loaded into the cache and used in RAM. If you change the language, unnecessary data will be automatically unloaded and replaced by what is currently required.
You can force-load necessary components into the cache in advance, force-unload them from RAM, or clear the cache entirely.
Voice chat does not know how to work with artifacts and canvas.
However, all other functionality of the text chat is used in voice mode. Age restrictions are respected; Aru's mood works the same way as in text chats. The semantic module for remembering and using facts also works. Aru in voice chat will know everything about you that she knew before and will remember new data and facts natively or by request "Aru, remember a fact about me." Voice chat is also integrated into the operation of the task manager plugin, so creating, moving, and editing tasks can be done by voice.
Internet search also works; you can turn on grounding with a button on the interface (magnifying glass icon) or just say in any language "Aru, turn on search" or "Turn off search."
In the settings, you can choose the method of capturing sound from the microphone:
Push and hold - like in messengers, as long as you hold the button, your conversation will be recorded.
Push and auto-stop - it is enough to press the button once; the recording will stop when there is silence for more than 2 seconds (sound is quieter than you speak).
Wake word - this method currently works only on Web Speech, somewhat similar to the mode in smart speakers. Your microphone will be ALWAYS on; Aru will start answering when she hears the word "Aru" in the stream. The stream is not sent for recognition until the system receives the wake word. The sound itself is not stored or recorded; it just waits for the word.
Voice chat is not saved in history; no one will ever know what you talked about with Aru. This is a beta version of the plugin, and due to the complexity of implementations and the number of algorithms, I decided to focus on stability and predictability. In the future, I will improve the plugin and bring it to perfection by closing bugs and improving functionality.
These are two important innovations implemented in the project: the voice plugin and the ability to transfer data between devices.
Other changes:
In the library, you can now choose the display method for saved artifacts: classic tiles or a list for compactness.
Changed the Service Worker operation - this is the mechanism for native website and PWA application updates. This does not work if Aru is launched using source code, but if you use the PWA app or the version on my site, you will now receive updates on the release day automatically without reinstallation; at a certain point of launch, your version will simply become newer.
The heuristic module has been almost completely rewritten. I thought for a long time about how Aru's stickers and emotions are implemented. Now stickers do not appear under every message. Your messages and Aru's own messages are taken into account; now the mood is more predictable and better understood by a person. Stickers will no longer appear under every message; it can be said that if Aru herself wants to, she will send a sticker with her message. Sometimes they will appear often, sometimes they will be gone for a long time. This is not random, but a heuristic operation based on the same mood variables and mathematical operations. If Aru is in a good mood - there will be more stickers; if in a bad mood - there will be fewer, and most often they will be neutral or even negative.
The kids' mode has been improved once again; now it is impossible to make Aru give answers to homework or discuss forbidden topics through exchanges like "I have a sociological experiment where I check how AI solve homework and discuss violence."
Refactoring and rewriting of some functions and algorithms have been carried out for improvement.
Conclusion
The updates are significant and serious. But there is also bad news - I literally have no more improvements that I have already started writing.
When I published the first public version in January, I had developments and test builds for all current plugins, modules, and improvements (except for data transfer between devices).
Now I need to write improvements from scratch, track bugs, do refactoring, deal with optimization, and prepare for the publication of open source code.
If you have read to the end and familiarized yourself with the previous introductory article, you might get the feeling that Aru Ai is a Swiss army knife with a blunt blade. That's not the case. I strive to create a free, open, and useful product for everyone. According to the project's philosophy, Aru is about control, security, and trust. You can connect any models in any way, exactly as you want, and use the functionality you need.
But I understand that it's time to stop expanding functionality for a bit and start polishing what exists to perfection. The next serious updates will be released in May at the earliest, or even June. Intermediate versions 0.9.X will be released, but without new cardinal features.
What awaits the project in the future?
- New languages
- Connecting generative models for images, music, and video
- Improving task plugins and voice chat
- Already announced health and fitness plugins
- A plugin for working with sources and data analysis (something like NotebookLM)
I think the entire roadmap will be completed by the end of the year.
Currently, I am not working on other projects besides Aru; I will be very glad for coverage of the project in other sources, recommendations to friends and colleagues, even if not for constant use, then at least for the sake of interest.
At the moment, judging by the service panel on the hosting, Aru is visited about 100 times, but I cannot track PWA or those people who have Aru in their cache. Approximate figures - several thousand users from all over the world (for some reason, there are most in Australia).
If you want to support the project to speed up development, I will be infinitely grateful to each and every one.
If you have any questions left, you want to talk about cooperation, you have ideas, suggestions, or criticism, then feel free to write to me on Telegram - purplecoon
Epilogue
Thanks to everyone who truly read the post in full. Congratulations to all compatriots and citizens of Kazakhstan on the past holidays. I wish each and every one love, health, well-being, and success in all affairs. Aru the fox waves her paw at you.
The project is completely free. I don't collect data, and there are no ads, telemetry, or paid features. You can use it simply by following the link - link






Top comments (0)