DEV Community: Rafael Pierre

AI Development Roundup: Plugin Distribution, Custom Chips, and Cinematic Video Control

Rafael Pierre — Sun, 19 Oct 2025 13:33:46 +0000

This week brought major updates across AI tooling, infrastructure, and creative capabilities, signaling shifts in how developers build and deploy AI systems.

Streamlined Plugin Distribution

Anthropic launched Claude Code Plugins, enabling developers to package slash commands, subagents, MCP servers, and hooks into a single JSON file for marketplace distribution. The system eliminates the manual configuration process that previously required copying setups from GitHub repositories. Teams can now import standardized agent behaviors with one click, ensuring reproducible workflows across multiple development environments. Since these plugins are JSON-based, they remain portable across different CLI tools without vendor lock-in.

Custom Silicon Race Heats Up

OpenAI announced a multi-year partnership with Broadcom to develop 10GW of custom AI accelerators. The chips, designed in-house by OpenAI and manufactured by Broadcom, will incorporate the company's Ethernet, PCIe, and optical connectivity technology. Initial deployment begins in late 2026, with full rollout targeted for end of 2029. This move diversifies OpenAI's compute infrastructure beyond existing partnerships with Nvidia and AMD, aiming to reduce supply chain dependencies and optimize performance specifically for OpenAI's models.

AI Video Gains Precision Camera Work

Higgsfield AI released DoP I2V-01, a model that adds cinematic motion control to AI-generated video. The system converts single images into 3-5 second clips with over 50 camera style presets, including dolly shots, whip-pans, and bullet time effects. Beyond their web studio and mobile app Diffuse, Higgsfield now integrates with Kling, Google Veo, and Sora 2/Pro, allowing users to combine scene generation from other models with Higgsfield's camera choreography. Current limitations include a 5-second maximum clip length and 720p resolution cap.

Additional Developments

Meta acquired Thinking Machines co-founder Andrew Tulloch for its Superintelligence Lab, reinforcing the company's $72B infrastructure investment this year. Meanwhile, content creators are adopting tools like Wisprflow to transform voice notes into publish-ready posts, streamlining the path from spoken ideas to written content through AI transcription and existing workflows.

Implications

The convergence of easier plugin distribution, custom silicon development, and specialized creative tools suggests AI infrastructure is maturing rapidly. Developers gain more modular tooling, major players are hedging against supply constraints, and creative workflows are becoming increasingly automated. Teams should evaluate how these shifts affect their tech stack dependencies and content production pipelines.

This article was originally published on Lighthouse Newsletter. Subscribe for weekly AI engineering and product updates.

AI Development Update: Skills, Security, and Parallel Workflows

Rafael Pierre — Sun, 19 Oct 2025 13:31:32 +0000

Three major developments are reshaping how teams build and deploy AI agents in production.

Modular Agent Intelligence

Anthropic introduced Agent Skills, a framework for packaging procedural knowledge into discoverable modules. Instead of overloading system prompts or maintaining separate agents for each workflow, Skills let Claude load instructions contextually through SKILL.md files. The system supports progressive disclosure - starting with metadata, expanding to full instructions when needed, and bundling executable code for deterministic operations. This approach works across Claude.ai, Claude Code, and the API, turning specialized knowledge into portable, composable assets.

Persistent Security Threats

Security researchers have identified memory poisoning and goal hijacking as emerging threats to agentic systems. Unlike single-shot prompt injections, these attacks exploit persistence. Memory poisoning involves injecting malicious content into an agent's long-term storage (vector databases, conversation logs), causing every future session to recall corrupted data. Goal hijacks gradually redirect an agent's objectives toward an attacker's agenda. Both attacks unfold across workflows rather than surfacing in isolated responses, requiring teams to treat memory as untrusted input and monitor complete task flows.

Parallelized Development Workflows

At DevDay 2025, OpenAI demonstrated Codex handling multiple simultaneous development tasks - seven parallel terminal sessions building arcade games, porting Streamlit apps to FastAPI + Next.js, and generating MCP servers for legacy protocols. The key pattern was delegation at scale: teams launched 3-4 independent jobs, context-switched freely, and reviewed results asynchronously. This approach compressed timelines by treating agentic tools as parallel collaborators rather than sequential assistants.

The Bottom Line

Production AI is becoming simultaneously more modular, more vulnerable, and more capable of parallel execution. Teams shipping agents should modularize workflows, red-team memory stores proactively, and experiment with parallel task delegation for multi-workstream projects. The infrastructure exists - the challenge is building for both velocity and durability.

This article was originally published on Lighthouse Newsletter. Subscribe for weekly AI insights and development updates.

Surviving the LLM Jungle: When to use Prompt Engineering, Retrieval Augmented Generation or Fine Tuning?

Rafael Pierre — Mon, 25 Sep 2023 18:28:24 +0000

Introduction

Navigating the complex world of Large Language Models (LLMs) utilization can sometimes feel like wandering through an uncharted jungle. With a myriad of techniques at your disposal, choosing the right path can be daunting. In this blog, we explore three key strategies for harnessing the power of LLMs: Prompt Engineering, Retrieval Augmented Generation, and Fine Tuning. By the end of this article, you'll have a clearer understanding of when and how to employ these techniques to achieve your Generative AI goals.

Prompt Engineering: Crafting the Right Query

Prompt engineering is a technique used in the context of Large Language Models (LLMs) to design and craft effective prompts or input queries. The goal of prompt engineering is to optimize the input provided to the model to achieve desired outcomes, improve model performance, and guide the model to produce more accurate or contextually relevant responses.

Imagine the process of interacting with an LLM as a conversation between you and a highly knowledgeable but somewhat literal-minded expert. In this scenario, prompt engineering is akin to formulating the right question. This technique involves designing precise and effective prompts to elicit the desired responses from the model.

For example, if you want to generate a creative piece of writing, your prompt should be open-ended and encourage creativity. Conversely, if you seek specific factual information, your prompt should be clear and structured. Effective prompt engineering not only requires an understanding of your task but also a grasp of how language models interpret and respond to prompts.

When to Use Prompt Engineering

When you need fine-grained control over the output.
When generating specific, structured content.
When exploring creative possibilities by carefully designing prompts.

Prompt Engineering: Key Aspects and Considerations

Task Definition: Here, you define a specific task or question you want the LLM to perform. This task could be anything from language translation and text summarization to question answering or even more specialized tasks like image captioning (although LLMs are primarily text-based).
Prompt or Query: You then formulate a prompt or query for the model that specifies the task. The prompt serves as the input to the model.
Inference: The LLM processes the prompt and generates an output based on the provided examples and task description. It leverages its pre-trained language understanding capabilities and generalizes from the limited examples to produce a response.
Evaluation: You evaluate the model's output to determine if it successfully performed the task according to your requirements.

Example

Text Generation Without Prompt Engineering

Prompt: "Write a product description for the new smartphone."

In this case, the prompt is relatively vague, and the LLM might generate a generic or less informative response because it lacks specific details about the smartphone.

Text Generation With Prompt Engineering

Prompt: "Write a compelling product description for the new XYZ Phone, highlighting its key features such as the 6.5-inch AMOLED display, Snapdragon 855 processor, dual-camera setup for stunning photography, and long-lasting battery life of up to 2 days."

In this engineered prompt:

The model is explicitly instructed to write a compelling product description with a clear task.
Specific details about the smartphone are provided, such as the display size, processor, camera features, and battery life.
By mentioning compelling, you convey the expectation of persuasive and engaging language.

Retrieval Augmented Generation: Expanding the Horizon

Retrieval augmented generation (RAG) is a technique that combines the strengths of large language models with external knowledge sources. It involves retrieving relevant information from a vast corpus of data and then using it to enhance the generation capabilities of the LLM. This approach can lead to more accurate and contextually rich responses.

For instance, when generating medical advice, you can retrieve the latest research papers and clinical guidelines to ensure that the information provided is up-to-date and evidence-based. This strategy allows LLMs to function as dynamic encyclopedias, offering insights and recommendations grounded in real-world data.

When to Use Retrieval Augmented Generation

When your task requires access to external knowledge.
When you need to provide accurate and current information.
When you want to enhance the contextuality of generated content.

Fine Tuning: Tailoring the Model to Your Needs

Fine tuning involves training a pre-trained LLM on a specific dataset or task to adapt it to your unique requirements. This technique allows you to specialize a general-purpose large language model for a particular domain, making it more efficient and proficient in a specific area.

For example, if you are building a chatbot for customer support in the fashion industry, fine tuning can help the model understand and respond to fashion-related queries with greater accuracy. It refines the model's knowledge and behavior to align with the nuances of the domain in question.

When to Use Fine Tuning

When you have access to domain-specific data.
When you want the model to excel in a particular field.
When you need to optimize the model's performance for a specific task.

Conclusion

In the vast LLM jungle, understanding when to use prompt engineering, retrieval augmented generation, or fine tuning is crucial for achieving your goals. These techniques offer versatile tools for tailoring large language models to your specific needs, whether you require precise responses, access to external knowledge, or domain expertise.

Remember: the choice between these techniques often depends on the unique demands of your project. Each of these approaches has distinct requirements - in terms of volume and quality of data, as well as costs - and also particular advantages and caveats. But that is the topic for another post.

PyJaws: A Pythonic Way to Define Databricks Jobs and Workflows

Rafael Pierre — Fri, 12 May 2023 13:01:40 +0000

PyJaws enables declaring Databricks Jobs and Workflows as Python code, allowing for:

Code Linting (e.g. with Flake or Ruff)
Formatting (e.g. with Black)
Parameter Validation
Modularity and reusability

In addition to those, PyJaws also provides some nice features such as cycle detection out of the box.

Folks who have used Python-based orchestration tools such as Apache Airflow, Luigi and Mage will be familiar with the concepts and the API if PyJaws.

PyJaws leverages some existing libraries in order to allow for modularisation, reusability and validation, such as:

Click - for providing a rich CLI functionality
Pydantic - for efficient parameter validation
NetworkX - for Graph and Cycle Detection features
Jinja2 - for templating

Check it out: https://github.com/rafaelpierre/pyjaws

Keeping Your Machine Learning Models on the Right Track: Getting Started with MLflow, Part 2

Rafael Pierre — Thu, 21 Jul 2022 09:24:48 +0000

This post was originally published at MLOpsHowTo.com

TLDR; MLflow Model Registry allows you to keep track of different Machine Learning models and their versions, as well as tracking their changes, stages and artifacts. Companion Github Repo for this post

In our last post, we discussed the importance of tracking Machine Learning experiments, metrics and parameters. We also showed how easy it is to get started in these topics by leveraging the power of MLflow (for those who are not aware, MLflow is currently the de-facto standard platform for machine learning experiment and model management).

In particular, Databricks makes it even easier to leverage MLflow, since it provides you with a completely managed version of the platform.

This means you don’t need to worry about the underlying infrastructure to run MLflow, and it is completely integrated with other Machine Learning features from Databricks Workspaces, such as Feature Store, AutoML and many others.

Coming back to our experiment and model management discussion, although we covered the experiment part in the last post, we still haven’t discussed how to manage the models that we obtain as part of running our experiments. This is where MLflow Model Registry comes in.

The Case for Model Registry

As the processes to create, manage and deploy machine learning models evolve, organizations need to have a central platform that allows different personas such as data scientists and machine learning engineers to collaborate, share code, artifacts and control the stages of machine learning models. Breaking this down in terms of functional requirements, we are talking about the following desired capabilities:

discovering models, visualizing experiment runs and the code associated with models
transitioning models across different deployment stages, such as Staging, Production and Archived
deploying different versions of a registered model in different stages, offering Machine Learning engineers and MLOps engineers the ability to deploy and conduct testing of different model versions (for instance, A/B testing, Multi-Armed Bandits etc)
archiving older models for traceability and compliance purposes
enriching model metadata with textual descriptions and tags
managing authorization and governance for model transitions and modifications with access control lists (ACLs)

A typical MLOps Model Lifecycle using MLflow

Getting Started with MLflow Model Registry

Now to the practical part. We will run some code to train a model and showcase MLflow Model Registry capabilities. Hereby we present two possible options for running the notebooks from this quickstarter: you can choose to run them on Jupyter Notebooks with a local MLflow instance, or in a Databricks workspace.

Jupyter Notebooks

If you want to run these examples using Jupyter Notebooks, please follow these steps:

Clone this Github repo to your local machine
Make sure you are running Python 3.8.7 (quick hint: you can run multiple Python versions on a single machine by installing pyenv)
Once you have a working Python 3.8.7 installation, create a virtual environment by running python -m venv .venv
Configure your virtual environment by running make env. Alternatively, you can do it manually by running the following from the terminal:

export SYSTEM_VERSION_COMPAT=1 && \
source .venv/bin/activate && \
pip install --upgrade pip && \
pip install wheel && \
pip install -r requirements.txt && \
pre-commit install

Run the first notebook jupyter/01_train_model.ipynb. This will create an experiment and multiple runs with different hyperparameters for a diabetes prediction model.
Run the second notebook jupyter/02_register_model.ipynb. By doing so, we will register our model artifact into MLflow model registry. We will also do some basic sanity checks in order to confirm that our model can be promoted to Staging.
For this example we are running a simple, local instance of MLflow with a SQLite backend — which is good enough for a toy example, but not recommended for a test or production setup. It is also possible to run MLflow locally or remotely as a standalone web application, and also with a Postgresql backend. For more details on how to achieve this, please refer to the different scenarios presented in this link.

Databricks

Running the same code with Databricks Managed MLflow is even simpler, since the environment is already configured when you use an LTS ML cluster. Please make sure that you have such a cluster available, and then clone the repo into your workspace. For more details on Databricks integration with repos, please refer to this article.
Run the first notebook databricks/01_train_model.py.
Run the second notebook databricks/02_register_model.py.
Bonus: if you run these notebooks on a Databricks Workspace, you will be able to visualize the different runs associated with your experiment:

Experiments, runs and models

Looking at the screenshot above, you might notice that on the first row of our table, in the models column, we have an icon which differs from the other rows. This is due to the fact that the model artifact for that specific run was registered as a model, and a new model version was created (version 1).

If we click on its link, we get redirected to the following window.

On the window above, we have an overview of the model version that was registered. We can see that it has the tag prediction_works = true. We can also see that it is in Staging. Depending on which persona is accessing this data, it might be possible to manually change the stage (to promote the model to Production, for instance), or reverting it back to None.

Moreover, with Workspace Object Access Control Lists, you could limit the permissions for each type of user. Let’s say that you wish to block data scientists from transitioning model stages, while you want to allow team manager to do so. In such scenario, Data Scientists would have to request transitions to a given stage.

These transitions would then need to be approved by someone with the right permissions. Finally, all of the requests and model stage transitions are tracked in the same window (and of course, they are also available programatically).

Once a model has been transitioned to Production, it is quite simple to deploy it either as an automated job or as a Real time REST API Endpoint. But that is the topic for another post.

All the code used in this post is available in this Github Repo.