susanayi

Posted on Mar 30

Robotic Brain for Elder Care 3

#ai #algorithms #architecture #systemdesign

Part 3: The Scoring Engine — How a Robot Selects the Perfect Viewpoint

In the previous post, we discussed the "Single Camera + 12 Virtual Nodes" strategy to overcome simulation lag. But with 4 potential nodes in a single room, how does the system "decide" which one provides the best data for our AI backend?

This is where the StaticCameraManager comes in. Instead of random selection, we use a Heuristic Scoring Algorithm to rank viewpoints based on three physical constraints: Visibility, Angle, and Distance.

The Scoring Formula

To quantify the quality of each viewpoint, the system evaluates all registered nodes in the room using a weighted heuristic:

FinalScore = (Visibility × 0.5) + (AngleFactor × 0.3) + (DistanceFactor × 0.2)

By assigning the highest weight (50%) to Visibility, we ensure the robot never prioritizes a "perfect" angle if the person is obscured by furniture or walls.

1. Visibility: The Raycast Test (50%)

The most fundamental requirement is a clear line of sight. We use Unity’s Physics.Linecast to check for obstacles between the camera node and the user.

// Step 2：Visibility (Linecast Occlusion)
float vis = 1f;
if (Physics.Linecast(nodePos, aimPos, out RaycastHit hit))
{
    // Check if the hit object is the user or a part of the user
    bool hitUser = hit.transform == user.transform || hit.transform.IsChildOf(user.transform);
    if (!hitUser) vis = 0f; // Blocked by furniture or walls
}

If the raycast is blocked, the visibility score drops to 0, effectively disqualifying the node regardless of other factors.

2. Angle Factor: Semantic Clarity (30%)

For action recognition, front or side views are more informative than back views. We normalize the angle relative to the FOV center:

// Step 3：Angle Factor (Normalized FOV center)
float angleFactor = Mathf.Clamp01(1f - angle / halfFov);

Case Study: Drinking Behavior

While multiple nodes might have visibility, our algorithm selects the one that best captures the drinking gesture.

Note: Candidate A (Side-Back) - The hand-to-mouth action is partially obscured by the user's shoulder.

Note: Candidate B (Side-Front) - Higher Angle Score. The interaction with the bottle is clearly visible for the VLM.

3. Distance Factor: The Golden Range (20%)

A camera too far away loses pixel density. We prioritize nodes that keep the user within the "Golden Range" of 2 to 5 meters.

// Step 4：Distance Factor (10m Linear Decay)
float dist = Vector3.Distance(nodePos, aimPos);
float distFactor = Mathf.Clamp01(1f - dist / 10f);

Case Study: Typing Interaction

At the desk, the distance and angle combined determine the best viewpoint to capture hand-to-keyboard interaction.

Note: Candidate C - Although the angle is okay, the distance reduces the semantic detail of the typing action.

Note: Candidate D - Optimal Distance & Angle. The high-angle perspective provides a clear view of the hands on the keyboard.

Visualizing the Logic: Debugging with Gizmos

As an engineer, I need to verify the math in real-time. I implemented a custom OnDrawGizmos system that color-codes nodes:

Green: High Score (> 0.5) — Ready for capture.
Red/Grey: Low Score or Out of FOV — Disqualified.

This visual feedback allowed us to fine-tune our thresholds, ensuring the VirtualCameraBrain only teleports to locations that provide high-quality data.

What’s Next?

Now that we have selected the "Best Viewpoint," the final step is execution. In the next post, we will look at the VirtualCameraBrain implementation: Base64 encoding and REST API transmission.

Stay tuned!

DEV Community