I have seen a similar library in the Laravel ecosystem. And I think your library has the same problem, namely assuming the HTML content is the right content for an LLM.
HTML pages are meant for humans and search bots. We could get away adding extra information for search bots because their main goal is to make the pages discoverable in a search.
AI scapers use the content to get knowledge for their output, that is a totally different use case.
Converting HTML output to markdown is only a partial solution. The content needs to be as meaningful for the LLM as possible because you don't want it to fill the context window with useless information.
I saw you added a preprocessor, for me manipulating content should not be the task of a markdown output library. It should be the task of the controller.
You can still have a single controller for multiple output types. That is how API endpoint output worked for years.
Senior Software Engineer focused on PHP & Laravel & Symfony. Passionate about clean architecture, APIs, and open source. Exploring Go, Rust, and distributed systems.
Thanks for the thoughtful feedback — I agree that HTML designed for humans is not automatically optimal for LLM consumption.
That said, this bundle does not aim to optimise or redesign domain content for LLMs. Its responsibility is intentionally narrow: handling content negotiation and transforming an already-rendered HTML response into a Markdown representation.
It does not rewrite, summarise, or semantically restructure content. Controllers remain fully responsible for deciding what data is exposed and how it is structured.
If an application requires LLM-specific shaping or restructuring, that logic should live at the controller or domain level. The bundle simply provides extension points for cases where additional processing is explicitly desired.
I appreciate the architectural perspective — it’s a valuable distinction.
The main reason that I reacted is because the solution is becoming mainstream, Cloudflare has functionality to do it, also Laravel cloud. Because it is getting more in the spotlight I think developers are going to assume that is the best way forward.
I think the better option could be as simple as return $this->render($markdownOutput ? 'home.md.twig : 'home.html.twig', $data);. This gives you the most options.
For further actions, you may consider blocking this person and/or reporting abuse
We're a place where coders share, stay up-to-date and grow their careers.
I have seen a similar library in the Laravel ecosystem. And I think your library has the same problem, namely assuming the HTML content is the right content for an LLM.
HTML pages are meant for humans and search bots. We could get away adding extra information for search bots because their main goal is to make the pages discoverable in a search.
AI scapers use the content to get knowledge for their output, that is a totally different use case.
Converting HTML output to markdown is only a partial solution. The content needs to be as meaningful for the LLM as possible because you don't want it to fill the context window with useless information.
I saw you added a preprocessor, for me manipulating content should not be the task of a markdown output library. It should be the task of the controller.
You can still have a single controller for multiple output types. That is how API endpoint output worked for years.
Thanks for the thoughtful feedback — I agree that HTML designed for humans is not automatically optimal for LLM consumption.
That said, this bundle does not aim to optimise or redesign domain content for LLMs. Its responsibility is intentionally narrow: handling content negotiation and transforming an already-rendered HTML response into a Markdown representation.
It does not rewrite, summarise, or semantically restructure content. Controllers remain fully responsible for deciding what data is exposed and how it is structured.
If an application requires LLM-specific shaping or restructuring, that logic should live at the controller or domain level. The bundle simply provides extension points for cases where additional processing is explicitly desired.
I appreciate the architectural perspective — it’s a valuable distinction.
The main reason that I reacted is because the solution is becoming mainstream, Cloudflare has functionality to do it, also Laravel cloud. Because it is getting more in the spotlight I think developers are going to assume that is the best way forward.
I think the better option could be as simple as
return $this->render($markdownOutput ? 'home.md.twig : 'home.html.twig', $data);. This gives you the most options.