Reflections on Autonomous Computing

#database #autonomous #computerscience

Light takes 3 to 22 minutes to travel from Earth to Mars, while the collapse of an IT system might take less than a minute. Based on the principle of the constancy of the speed of light, remote human intervention in the fault handling of deep space systems is an impossible task. Due to the compatibility and response mechanisms between the system and human intervention, faults may not be handled in a timely manner, even on Earth. Systems need the ability to respond to and optimize environmental changes based on their own perception, memory, cognition, and execution capabilities without relying on continuous external intervention—that is, autonomous computing. An undeniable fact is that current continuous availability (CA) technologies focus more on minimizing repair time than maximizing stable uptime. This is similar to how people often focus on medical treatment rather than health maintenance and early prevention.

Mapping Everything is mapped to information, and rationality is mapped to data and logic. Even AI requires ordered datasets. A database, as an open system, can be mapped to a "dynamic container," tolerating the expansion, inflow, and arrangement of data. The only thing it cannot tolerate is "disorder." Entropy reduction requires resistance.

Confrontation Setting aside value judgments of "emotional justice," rationality remains the cornerstone of our understanding of the world, decision-making, and moral judgment. Follow the natural laws of the universe. However, it's important to understand that everything beyond our control is unstable and slowly depletes and disintegrates. The nature of things and human will are in a perpetual "confrontation," showing no signs of stopping. This confrontation comes at a price.

Price The cost of maintaining order in a system includes all the expenditures involved, such as knowledge-based manual intervention, semi-automatic operation based on preset paths, and trend-based autonomous calculation. Inspired by bionics, systems use adaptive technology to manage themselves with little or no human interaction. Autonomous configuration, optimization, recovery, and protection exhaust all possibilities or trends leading to disorder, developing response plans and strategies to minimize human intervention, achieve the task, and reduce costs.

Fatigue Software itself doesn't experience "fatigue," but its operating environment and resource management can lead to performance degradation or instability. You need to anticipate or predict potential "fatigue" and avoid these problems by optimizing code, managing resources effectively, and regularly maintaining the operating environment. Simultaneously, the lifecycle of the media upon which the system relies must be considered. Strategies are not limited to ROM hardening, redundant Flash, write control and minimal writes, ECC, read-only systems, logging, and radiation resistance, all aimed at maximizing media service life.

Distributed systems, while mimicking biological evolution, rely on a single system unified by a single will. However, this approach struggles to provide positive feedback to every subsystem, inevitably leading to a shortened lifecycle and unsustainability. Truly robust systems stem from distributed structure and localized control. Each node is autonomous, and each subsystem can recover independently. The center is no longer the command source but a coordinator. A backbone-based distributed system, while maintaining system direction, can tolerate errors and corruption, serving its goals for a longer period. System robustness depends on its decentralized functions, loosely coupled structure, and localized decision-making. Completely distributed systems, such as atmospheric circulation, thunderstorms, and tides, are entirely devoid of will and order.

Redundancy acts as a buffer against unpredictable risks, harboring vitality. Eliminating redundancy is tantamount to suicide. However, "everything is redundant" is self-harm. The introduction of redundancy complicates the system and introduces new problems. Rationalizing and managing redundancy is fundamental to availability. Unlike biological systems, which rely on finite energy regulation, dynamic adjustment, and gradual compensation, engineering systems, achieving self-managed high availability, may not be suited to redundancy strategies. In monolithic distributed systems, the partition tolerance (P) of the core system, subsystems, and pipelines, according to the CAP theorem, can be guaranteed by multiple paths through the system network. Based on this, strong consistency or eventual consistency strategies are adopted for different systems depending on the scenario.

Adaptability: In biological systems, such as neural networks or vascular systems, there is no single fully backed-up "backup organ." Instead, compensation and self-healing of some functions are achieved through multiple paths and dynamic adjustments within the network. Engineering systems can introduce similar dynamic routing, load balancing, and adaptive control mechanisms. For example, distributed monitoring and real-time feedback mechanisms can be used to redirect traffic and reconfigure tasks for faulty nodes or services, achieving automatic "isolation-recovery-scheduling." Positive feedback drives growth, negative feedback maintains stability; without feedback, control is lost. Static parameters are a curse, dynamic rules are life. A system can perceive external changes in real time and adjust internally to maintain optimal operation. The ability lies not in predicting all changes, but in adapting to them. A system that can adapt to change is more powerful than one that can only complete a specific task.

Self-Healing: Organisms repair local damage through mechanisms like metabolism, repair, and regeneration, ensuring the long-term stability of the overall system. Systems need to automatically detect failed instances, then restart, replace, or reschedule services; and utilize machine learning to predict failures and take early intervention measures, similar to "preventive immunity" in biological systems. Critical services can perform periodic self-diagnosis, with abnormal behavior triggering self-recovery logic. Minor errors are addressed with micro-reboots to prevent catastrophic failures; medium-sized errors are addressed with recovery-oriented approaches, coupled with time travel based on controllable thresholds, preventing any irreparable mistakes.

Self-Defense: You may need to consider periodic autonomous reboots, hot updates, resource leak monitoring and automatic recycling, log control and automatic archiving/cleanup, and prioritizing stateless design or the ability to "reset to the initial state" for all modules. Review the code, limit long-lived external dependencies, and have the capability for remote upgrades/repairs before anomalies occur. OTA modules must be independent and reliable, even designed for dual-system redundancy switching, with remote command channels ensuring "doomsday recovery" capabilities.

Maintaining robustness: Strong coupling of system submodules effectively ensures robustness, but strong coupling based on unrepairable subsystems is a disaster. Redundancy is the bottom line for maintaining the operation of system submodules; redundancy inevitably leads to system management complexity. The more complex the tools, the more maintenance is required. Each additional component can lead to a multiplied increase in cost; the system needs to maintain a minimalist design under strong dynamic equilibrium to avoid collapse. Static perfection is not as good as dynamic evolution.

Boundaries: Autonomous computing must be limited to avoid disorderly expansion. Boundaries are the self-protection mechanism of autonomous systems. We should clearly define their perception boundaries, execution boundaries, and responsibility boundaries to prevent the system from expanding infinitely and becoming out of control. The opposite of freedom is not control, but boundlessness. Every autonomous system must know what it "does not do."

Minimalist architecture design: trim and merge indispensable components, placing them in unchangeable tight coupling; add necessary and irreplaceable combinations, placing them in loose coupling within the system. The engine core is unified and supports auxiliary redundancy; edge computing eliminates the need for constant feedback to the center, avoiding unnecessary pressure; it differentiates between hot and cold computing, and uses filtered, compressed, or tailored storage.

Balance: A multi-layered defense system is built for stable operation, establishing dependencies between subsystems. System modules and the smallest life unit achieve autonomous operation and micro-reboots. The center does not undertake all tasks, but only coordinates local operations based on objectives, with coordination strategies dynamically adjusted according to scenario and time. The most stable system comes from mutually cooperative and complementary units.

Energy: Does system survival require seizing energy from others, or self-generating energy? At any time, a mechanism for continuously obtaining energy from the outside to maintain persistent service is wise, while considering dual-energy system supply to maintain availability; more importantly, low power consumption is always a crucial factor. If continuous external energy extraction is not feasible, then it is necessary to consider long-term energy supply based on the system itself, task priorities, and minimizing power consumption during non-task-intensive periods.

Willpower

Currently, even the world's most popular systems rely on memory, not imagination, to function. Unlike humans, the idea that system feedback stems from memory seems untenable. What can be assumed is the finite desires of living beings and the limitations behind those desires. If a system could create "infinity," it would inevitably become meaningless due to the lack of feeling and will. Therefore, "finiteness" becomes the law of general intelligent systems.

Consensus

The "autonomy" of a subsystem refers to the self-management and evolutionary capabilities of the developing entity, not the "control" and "ownership" of the entity by humans. The former is based on the allure of automation, the latter on the fear of the weak. Different paths lead to different destinations; they are incomparable and evolve independently. System complexity stems from the iterative interaction of simple rules; capabilities arise from the collaboration between underlying rules, not top-level design. Autonomy is not isolation; collaboration is the anchor of order. Systems should be designed to achieve consensus and cooperation with other subsystems while operating autonomously. Consensus is not obedience, but the result of negotiation; collaboration is the only way for complex systems to maintain long-term operation.

Seeds
The future is unpredictable. When a system must end, plant a seed.

The End

Do not try to control everything, but learn to build systems that can evolve, grow, and maintain order on their own. Seek the seeds of order in a world of increasing entropy, and construct a self-consistent path within chaos.

If the end is inevitable, the meaning of this journey will not die. It will endure in the light of moral justice and reason, and will continue to thrive.

DEV Community

Reflections on Autonomous Computing

Top comments (0)