Mariano Gobea Alcoba

Posted on Mar 29 • Originally published at mgatc.com

OpenYak – An open-source Cowork that runs any model and owns your filesystem!

#opensource #ai #models #local

OpenYak presents itself as an open-source "cowork" environment designed to seamlessly integrate artificial intelligence models with a user's local filesystem. The core value proposition revolves around two principal axes: providing a unified runtime for "any model" and granting these models deep, programmatic access to the user's filesystem. This design fundamentally redefines the interaction paradigm between developers, their codebase, and AI assistants, moving beyond chat interfaces to a more agentic, integrated workflow.

Architectural Overview

The OpenYak architecture is envisioned as a multi-component system, designed for extensibility and deep system integration. At a high level, it comprises a desktop client, a robust model orchestration layer, and a critical filesystem integration subsystem, all bound together by a sophisticated context and tooling management system.

The Desktop Client: The User's Gateway

The desktop client serves as the primary interface for user interaction. While the specific framework (e.g., Electron, Tauri) is not explicitly detailed as a core component in the provided repository, the nature of a "desktop application" implies a graphical frontend responsible for displaying content, accepting user input, and mediating commands. This client facilitates:

Command Input: A central command palette or integrated terminal interface where users articulate their requests.
Context Visualization: Displaying relevant project files, active terminal sessions, and model outputs.
Action Confirmation: Presenting proposed AI actions (especially those involving filesystem modifications or command execution) for user review and approval, a critical security boundary.

The client's role extends beyond mere display; it acts as an intelligent intermediary, gathering implicit context from the user's current environment (e.g., active file, selected text, terminal history) and forwarding it to the backend for model processing.

Consider a simplified command routing mechanism within the client:

// Conceptual client-side command processing
interface Command {
    id: string;
    type: 'model_prompt' | 'filesystem_action' | 'terminal_exec';
    payload: any;
}

interface UserContext {
    currentFile: string | null;
    selectedText: string | null;
    cwd: string;
    terminalHistory: string[];
    // ... other contextual data
}

class OpenYakClient {
    private websocket: WebSocket; // Communication channel to backend

    constructor() {
        this.websocket = new WebSocket("ws://localhost:8080/yak");
        this.websocket.onmessage = this.handleBackendMessage.bind(this);
    }

    public sendCommand(commandText: string) {
        const context: UserContext = this.gatherUserContext();
        const parsedCommand: Command = this.parseInput(commandText, context); // AI or rule-based parsing
        this.websocket.send(JSON.stringify(parsedCommand));
    }

    private handleBackendMessage(event: MessageEvent) {
        const response = JSON.parse(event.data);
        // Render model output, file changes, terminal output, etc.
        console.log("Backend response:", response);
    }

    private gatherUserContext(): UserContext {
        // Implementation to get current editor state, terminal state, etc.
        return {
            currentFile: "/path/to/current_file.js",
            selectedText: "const foo = 'bar';",
            cwd: "/path/to/project",
            terminalHistory: ["git status", "npm install"],
        };
    }

    private parseInput(input: string, context: UserContext): Command {
        // Simple heuristic for demonstration, real system uses AI or sophisticated NLP
        if (input.startsWith("create file")) {
            const path = input.split(" ")[2];
            return {
                id: "cmd-123",
                type: "filesystem_action",
                payload: { action: "create", path: path, content: "" }
            };
        } else if (input.startsWith("run")) {
            const cmd = input.substring(4);
            return {
                id: "cmd-124",
                type: "terminal_exec",
                payload: { command: cmd, cwd: context.cwd }
            };
        } else {
            return {
                id: "cmd-125",
                type: "model_prompt",
                payload: { prompt: input, context: context }
            };
        }
    }
}

This client-side logic demonstrates the initial interpretation and routing of user intent, setting the stage for more complex backend processing.

Model Agnosticism: The Universal Model Interface

A cornerstone of OpenYak is its claim to run "any model." This necessitates a robust abstraction layer that decouples the application's core logic from the specific APIs and idiosyncrasies of various large language models (LLMs) and other AI models. The challenge lies in harmonizing interaction patterns across diverse model providers, including commercial APIs (e.g., OpenAI, Anthropic) and local inference engines (e.g., Ollama, Llama.cpp).

The Model Adapter Pattern

To achieve model agnosticism, OpenYak likely employs a Model Adapter pattern. Each supported model or model provider would have a dedicated adapter that conforms to a universal interface. This interface would define methods for:

Text Generation: Sending a prompt and receiving a generated response, potentially with streaming capabilities.
Function Calling / Tool Use: A crucial capability where models can "call" predefined external functions or tools based on their understanding of the prompt and available context.
Context Window Management: Handling token limits and structuring context effectively.

Consider a simplified Go interface for a generic model adapter:

// model_interface.go
package models

import (
    "context"
)

// ToolCall represents a function/tool that the model suggests calling.
type ToolCall struct {
    Name      string                 `json:"name"`
    Arguments map[string]interface{} `json:"arguments"`
}

// ModelOutput encapsulates the model's response.
type ModelOutput struct {
    Content   string     `json:"content,omitempty"`
    ToolCalls []ToolCall `json:"tool_calls,omitempty"`
    // Other metadata like token usage, finish reason
}

// ModelInput defines the structure for model requests.
type ModelInput struct {
    Prompt     string            `json:"prompt"`
    Context    map[string]string `json:"context,omitempty"` // e.g., file content, current directory
    AvailableTools []ToolDefinition `json:"available_tools,omitempty"`
}

// ToolDefinition describes a tool available to the model.
type ToolDefinition struct {
    Name        string                 `json:"name"`
    Description string                 `json:"description"`
    Parameters  map[string]interface{} `json:"parameters"` // JSON Schema for parameters
}

// ModelAdapter defines the universal interface for interacting with any AI model.
type ModelAdapter interface {
    Generate(ctx context.Context, input ModelInput) (ModelOutput, error)
    StreamGenerate(ctx context.Context, input ModelInput, outputChan chan<- ModelOutput) error
    // Potentially methods for model-specific capabilities like embedding, fine-tuning etc.
}

// ModelConfig holds configuration for a specific model instance.
type ModelConfig struct {
    ID        string            `json:"id"`
    Provider  string            `json:"provider"` // e.g., "openai", "ollama"
    ModelName string            `json:"model_name"`
    APIKey    string            `json:"api_key,omitempty"`
    BaseURL   string            `json:"base_url,omitempty"`
    Params    map[string]string `json:"params,omitempty"` // Model specific parameters
}

Each specific model implementation (e.g., OpenAIAdapter, OllamaAdapter) would then implement this ModelAdapter interface, translating OpenYak's generic requests into the model's native API calls.

// openai_adapter.go
package models

import (
    "context"
    "fmt"
    "github.com/sashabaranov/go-openai" // Example OpenAI client library
)

type OpenAIAdapter struct {
    client *openai.Client
    model  string
}

func NewOpenAIAdapter(config ModelConfig) (*OpenAIAdapter, error) {
    clientConfig := openai.DefaultConfig(config.APIKey)
    if config.BaseURL != "" {
        clientConfig.BaseURL = config.BaseURL
    }
    client := openai.NewClientWithConfig(clientConfig)
    return &OpenAIAdapter{
        client: client,
        model:  config.ModelName,
    }, nil
}

func (a *OpenAIAdapter) Generate(ctx context.Context, input ModelInput) (ModelOutput, error) {
    // Translate OpenYak's input to OpenAI's ChatCompletionRequest
    messages := []openai.ChatCompletionMessage{
        {Role: openai.ChatMessageRoleUser, Content: input.Prompt},
    }
    // Add context files/data here if necessary
    for key, value := range input.Context {
        messages = append(messages, openai.ChatCompletionMessage{
            Role:    openai.ChatMessageRoleSystem,
            Content: fmt.Sprintf("%s:\n```

\n%s\n

```", key, value),
        })
    }

    // Convert OpenYak ToolDefinitions to OpenAI's FunctionDefinitions
    var functions []openai.FunctionDefinition
    for _, toolDef := range input.AvailableTools {
        functions = append(functions, openai.FunctionDefinition{
            Name:        toolDef.Name,
            Description: toolDef.Description,
            Parameters:  toolDef.Parameters,
        })
    }

    req := openai.ChatCompletionRequest{
        Model:    a.model,
        Messages: messages,
        Tools:    functions, // Use 'Tools' for OpenAI's new tool calling API
    }

    resp, err := a.client.CreateChatCompletion(ctx, req)
    if err != nil {
        return ModelOutput{}, fmt.Errorf("failed to call OpenAI API: %w", err)
    }

    output := ModelOutput{}
    if len(resp.Choices) > 0 {
        choice := resp.Choices[0].Message
        output.Content = choice.Content
        for _, toolCall := range choice.ToolCalls {
            args := make(map[string]interface{})
            // Unmarshal toolCall.Function.Arguments to args map
            // (Error handling omitted for brevity)
            output.ToolCalls = append(output.ToolCalls, ToolCall{
                Name:      toolCall.Function.Name,
                Arguments: args,
            })
        }
    }

    return output, nil
}

func (a *OpenAIAdapter) StreamGenerate(ctx context.Context, input ModelInput, outputChan chan<- ModelOutput) error {
    // Similar logic for streaming, using CreateChatCompletionStream
    return fmt.Errorf("streaming not implemented for OpenAIAdapter example")
}

This adapter-based approach ensures that the core application logic can interact with any model provider uniformly, abstracting away API versioning, authentication, and specific request/response formats. A Model Manager component would then be responsible for loading, configuring, and switching between these adapters based on user preferences or dynamic requirements.

Deep Filesystem Integration: Ownership and Agency

The assertion that OpenYak "owns your filesystem" is a profound technical and security claim. It signifies a level of access and agency far beyond typical AI assistants, which are usually confined to text generation or web searches. This capability is central to OpenYak's vision of an AI agent that can directly manipulate the developer's workspace.

Technical Implementation Approaches

Achieving this level of filesystem "ownership" requires specific operating system interactions. Common approaches include:

Direct System Calls (Standard Library I/O): This is the most straightforward method. The OpenYak backend, running as a desktop application process, can use standard library functions (ee.g., os.ReadFile, os.WriteFile in Go/Python, std::fs::read, std::fs::write in Rust) to interact with the filesystem. The critical aspect here is the privilege level at which the OpenYak process runs. If it runs with the user's standard permissions, it can do anything the user can do. If it required elevated privileges (e.g., sudo on Linux, administrator on Windows), the security implications are even higher.
Dedicated Agent/Daemon: For more sophisticated control or cross-platform consistency, OpenYak might employ a lightweight agent or daemon that runs in the background. This agent could potentially be granted specific permissions or even operate with elevated privileges, managing filesystem operations on behalf of the main application. This pattern is common in tools that require deeper system integration (e.g., Docker Desktop, security software).
FUSE (Filesystem in Userspace): While FUSE is primarily used to create virtual filesystems or intercept filesystem calls for specific paths, it could theoretically be employed to monitor or mediate OpenYak's own filesystem interactions, although this is less about "owning" and more about observing or transforming. For direct manipulation, standard I/O is more likely.

Given the phrasing "owns your filesystem," the most probable implementation involves the OpenYak application process (or a subprocess/daemon it controls) directly performing standard filesystem operations using the operating system's native APIs, inheriting the permissions of the user running the application.

Capabilities and Semantic Understanding

With direct filesystem access, OpenYak's AI models can be empowered to:

Read File Contents: Access source code, configuration files, documentation, and data files to build comprehensive context.
Write/Modify Files: Generate new code, refactor existing code, fix bugs, update configuration, create new markdown files.
Create/Delete Directories and Files: Manage project structure, scaffold new components, clean up temporary files.
Execute Shell Commands: Run tests, compile code, manage dependencies (e.g., npm install, go build, pip install), interact with version control systems (e.g., git commit).
List Directory Contents: Understand project structure, discover relevant files.

The true power comes from combining these low-level capabilities with the AI model's semantic understanding. An AI model, when provided with the correct tools, can translate a high-level request like "implement a user authentication flow" into a series of file creations, modifications, and terminal commands.

Here's a conceptual representation of the filesystem tool available to models:

// filesystem_tools.go
package tools

import (
    "fmt"
    "io/ioutil"
    "os"
    "path/filepath"
)

// FilesystemTool provides methods for AI models to interact with the local filesystem.
type FilesystemTool struct {
    baseDir string // Optional: restrict operations to a base directory
}

func NewFilesystemTool(baseDir string) *FilesystemTool {
    return &FilesystemTool{baseDir: baseDir}
}

func (f *FilesystemTool) resolvePath(path string) (string, error) {
    absPath := filepath.Join(f.baseDir, path)
    // Additional checks for path traversal vulnerabilities could be implemented here
    return absPath, nil
}

// ReadFile reads the content of a file.
func (f *FilesystemTool) ReadFile(path string) (string, error) {
    resolvedPath, err := f.resolvePath(path)
    if err != nil {
        return "", err
    }
    content, err := ioutil.ReadFile(resolvedPath)
    if err != nil {
        return "", fmt.Errorf("failed to read file '%s': %w", path, err)
    }
    return string(content), nil
}

// WriteFile writes content to a file. If the file does not exist, it creates it.
func (f *FilesystemTool) WriteFile(path string, content string) error {
    resolvedPath, err := f.resolvePath(path)
    if err != nil {
        return err
    }
    err = ioutil.WriteFile(resolvedPath, []byte(content), 0644) // Default permissions
    if err != nil {
        return fmt.Errorf("failed to write file '%s': %w", path, err)
    }
    return nil
}

// CreateDirectory creates a new directory.
func (f *FilesystemTool) CreateDirectory(path string) error {
    resolvedPath, err := f.resolvePath(path)
    if err != nil {
        return err
    }
    err = os.MkdirAll(resolvedPath, 0755) // Create all necessary parent directories
    if err != nil {
        return fmt.Errorf("failed to create directory '%s': %w", path, err)
    }
    return nil
}

// ListDirectory lists the names of files and directories within a given path.
func (f *FilesystemTool) ListDirectory(path string) ([]string, error) {
    resolvedPath, err := f.resolvePath(path)
    if err != nil {
        return nil, err
    }
    entries, err := ioutil.ReadDir(resolvedPath)
    if err != nil {
        return nil, fmt.Errorf("failed to list directory '%s': %w", path, err)
    }
    var names []string
    for _, entry := range entries {
        names = append(names, entry.Name())
    }
    return names, nil
}

// DeletePath removes a file or an empty directory.
func (f *FilesystemTool) DeletePath(path string) error {
    resolvedPath, err := f.resolvePath(path)
    if err != nil {
        return err
    }
    if err := os.Remove(resolvedPath); err != nil {
        return fmt.Errorf("failed to delete path '%s': %w", path, err)
    }
    return nil
}

These functions would be exposed to the AI model orchestration layer as part of the AvailableTools in the ModelInput, allowing the model to generate ToolCall objects that invoke these operations.

Context Management and AI Tooling

For an AI model to be effective in a development environment, it requires rich, contextual information. OpenYak's "cowork" paradigm implies an agent that understands the project state as deeply as a human developer.

Context Providers

OpenYak's backend would integrate various context providers:

Filesystem Context: Content of currently open files, project tree structure, recent changes, commit history (via Git).
Terminal Context: History of executed commands, their outputs, current working directory, environment variables.
Editor Context: Selected text, cursor position, syntax highlighting information (though this might be too granular).
External Context: Search results (web), documentation, API specifications.

This context is dynamically aggregated and selectively provided to the AI model based on the user's prompt and the model's token window limits. A sophisticated ranking and summarization engine would be essential to ensure relevant information is prioritized.

The Tool-Use Framework

The ability for AI models to invoke external tools is a critical enabler of agentic behavior. OpenYak's design leverages this by allowing models to call functions like filesystem.ReadFile, filesystem.WriteFile, or terminal.ExecuteCommand. The workflow is typically:

User Prompt: Developer enters a request (e.g., "Refactor this component to use hooks").
Context Gathering: OpenYak gathers relevant code, project structure, and other information.
Model Inference: The prompt and context are sent to the AI model.
Tool Call Generation: The model responds with a plan, which might include specific ToolCall suggestions (e.g., "read file src/component.js", then "write file src/component.js with new content").
Tool Execution (with User Approval): OpenYak intercepts the ToolCall, executes the corresponding tool function (e.g., filesystem.ReadFile), and presents the proposed changes (e.g., diff of src/component.js) to the user for explicit approval before committing them.
Observation and Iteration: The output of the tool (e.g., the content of the file, the result of a terminal command) is fed back to the model as part of the ongoing conversation, allowing for iterative refinement.

This cycle of prompt -> model -> tool call -> execution -> observation is fundamental to OpenYak's agentic capabilities.


go
// orchestrator.go
package main

import (
    "context"
    "encoding/json"
    "fmt"
    "log"

    "your_module/models"
    "your_module/tools"
)

// Simplified orchestrator for demonstrating model-tool interaction
type Orchestrator struct {
    modelAdapter models.ModelAdapter
    fsTool       *tools.FilesystemTool
    // Add other tools like TerminalTool, WebSearchTool etc.
}

func NewOrchestrator(modelAdapter models.ModelAdapter, baseDir string) *Orchestrator {
    return &Orchestrator{
        modelAdapter: modelAdapter,
        fsTool:       tools.NewFilesystemTool(baseDir),
    }
}

func (o *Orchestrator) ProcessRequest(ctx context.Context, userPrompt string, userContext map[string]string) (string, error) {
    // 1. Define available tools for the model
    fsToolDef := models.ToolDefinition{
        Name:        "filesystem",
        Description: "Interact with the local filesystem (read, write, list, create directory, delete).",
        Parameters: map[string]interface{}{
            "type": "object",
            "properties": map[string]interface{}{
                "action": map[string]interface{}{
                    "type": "string",
                    "enum": []string{"read_file", "write_file", "create_directory", "list_directory", "delete_path"},
                },
                "path": map[string]interface{}{"type": "string"},
                "content": map[string]interface{}{"type": "string", "description": "Required for write_file"},
            },
            "required": []string{"action", "path"},
        },
    }
    // Add other tool definitions here

    modelInput := models.ModelInput{
        Prompt:         userPrompt,
        Context:        userContext,
        AvailableTools: []models.ToolDefinition{fsToolDef},
    }

    // 2. Initial model call
    modelOutput, err := o.modelAdapter.Generate(ctx, modelInput)
    if err != nil {
        return "", fmt.Errorf("model generation failed: %w", err)
    }

    // 3. Process model output: content or tool calls
    if modelOutput.Content != "" {
        return modelOutput.Content, nil // Direct text response
    }

    if len(modelOutput.ToolCalls) > 0 {
        var toolResults []string
        for _, toolCall := range modelOutput.ToolCalls {
            log.Printf("Model requested tool: %s with args: %+v", toolCall.Name, toolCall.Arguments)

            // 4. Execute tool (with implicit user approval in this simplified example)
            result, err := o.executeTool(ctx, toolCall)
            if err != nil {
                return "", fmt.Errorf("tool execution failed: %w", err)
            }
            toolResults = append(toolResults, result)
        }

        // 5. Feed tool results back to the model for further processing/summarization
        toolFeedbackPrompt := fmt.Sprintf("I executed the following tools and got these results:\n%s\nWhat should I do next or what is the final answer?",
            jsonStringify(toolResults)) // Helper to convert slice to JSON string

        // Append tool results to userContext for the next model call
        userContext["tool_results"] = toolFeedbackPrompt 

        // Recursive call or new model call with updated context
        return o.ProcessRequest

---
*Originally published in Spanish at [www.mgatc.com/blog/openyak-open-source-cowork-runs-any-model-owns-filesystem/](https://www.mgatc.com/blog/openyak-open-source-cowork-runs-any-model-owns-filesystem/)*

DEV Community