<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Alessandro T.</title>
    <description>The latest articles on DEV Community by Alessandro T. (@trincadev).</description>
    <link>https://dev.to/trincadev</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1407028%2F35ebea9b-3ec3-40b4-88eb-6b2041cd8814.jpeg</url>
      <title>DEV Community: Alessandro T.</title>
      <link>https://dev.to/trincadev</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/trincadev"/>
    <language>en</language>
    <item>
      <title>My Ghost Writer e lite.koboldai.net, una panoramica</title>
      <dc:creator>Alessandro T.</dc:creator>
      <pubDate>Thu, 03 Jul 2025 20:01:14 +0000</pubDate>
      <link>https://dev.to/trincadev/my-ghost-writer-e-litekoboldainet-una-panoramica-4a8g</link>
      <guid>https://dev.to/trincadev/my-ghost-writer-e-litekoboldainet-una-panoramica-4a8g</guid>
      <description>&lt;h1&gt;
  
  
  Integrazione di My Ghost Writer con lite.koboldai.net, Un'Analisi Tecnica Approfondita
&lt;/h1&gt;

&lt;p&gt;Tempo fa ho iniziato a scrivere la bozza di un testo.  Un po' per curiosità professionale, un po' per semplice noia, ho deciso di pensare a quale tipo di applicazione dell'intelligenza artificiale fosse fattibile, a parte l'ovvi generazione di testo tramite un prompt ad un LLM.&lt;/p&gt;

&lt;p&gt;In particolare ho notato specialmente i &lt;a href="https://en.wikipedia.org/wiki/Large_language_model" rel="noopener noreferrer"&gt;LLM (Large Language Models)&lt;/a&gt; "piccoli" abbiano la tendenza a ripetersi ed a inserire parole duplicate. Per questo motivo ho cercato un progetto open source che potessi eseguire sul mio pc e tramite cui individuare parole duplicate: non ho trovato niente di utile o che comunque facesse quel che volevo io.&lt;/p&gt;

&lt;p&gt;Questo ha portato alla creazione di &lt;strong&gt;&lt;a href="https://github.com/trincadev/my_ghost_writer" rel="noopener noreferrer"&gt;My Ghost Writer&lt;/a&gt;&lt;/strong&gt;, un progetto open source che sta ora sto integrando in &lt;strong&gt;lite.koboldai.net&lt;/strong&gt; — un'interfaccia web scritta in JS ed HTML senza dipendenze per &lt;a href="https://github.com/LostRuins/koboldcpp" rel="noopener noreferrer"&gt;KoboldCpp&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;a href="https://github.com/lostruins/lite.koboldai.net/" rel="noopener noreferrer"&gt;lite.koboldai.net&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/lostruins/lite.koboldai.net/" rel="noopener noreferrer"&gt;lite.koboldai.net&lt;/a&gt; è un'interfaccia web senza dipendenze progettata per l'uso come backend per &lt;a href="https://www.cloudflare.com/learning/ai/what-is-large-language-model/" rel="noopener noreferrer"&gt;modelli linguistici di grandi dimensioni (LLM)&lt;/a&gt; come KoboldCpp.&lt;br&gt;
Funziona interamente nel browser (non richiede installazione) ed è confezionata come un singolo file HTML statico:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Modalità multiple: Modalità Storia, Modalità Chat, Modalità Istruttoria e Modalità Avventura per diversi tipi di interazione con l'IA.&lt;/li&gt;
&lt;li&gt;Ampia compatibilità: Funziona con KoboldAI Client, KoboldCpp e AI Horde; supporta sia modelli locali che remoti.&lt;/li&gt;
&lt;li&gt;Strumenti creativi: Include un editor di testo, la generazione di immagini tramite Stable Diffusion e il supporto per le schede dei personaggi e gli scenari.&lt;/li&gt;
&lt;li&gt;Facile da usare: Facile da usare, stili dell'interfaccia utente personalizzabili e funzioni come il salvataggio automatico, il text-to-speech e le opzioni di ripetizione/modifica.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;È una buona opzione se si desidera un'interfaccia leggera e flessibile per la narrazione, il gioco di ruolo o la scrittura assistita dall'intelligenza artificiale.&lt;br&gt;
La struttura del codice è un po' disordinata:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Index.html monolitico con oltre 26000 righe di codice js, css e html.&lt;/li&gt;
&lt;li&gt;Solo JS, nessun dattiloscritto ovviamente.&lt;/li&gt;
&lt;li&gt;Il codice JS incorporato di terze parti è obsoleto.&lt;/li&gt;
&lt;li&gt;Mancano test E2E.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Il Problema con WordSearch in lite.koboldai.net
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;WordSearch&lt;/code&gt; (basata sulla mia &lt;a href="https://github.com/LostRuins/lite.koboldai.net/pull/115" rel="noopener noreferrer"&gt;prima implementazione&lt;/a&gt;) in lite.koboldai.net fa semplicemente una ricerca testuale per rilevare duplicati avendo però limitazioni significative:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Identifica anche parti di testo non rilevanti (es. la singola lettera "a", anche dove presente dentro ad altre parole).&lt;/li&gt;
&lt;li&gt;Non distingue tra parole semanticamente diverse (es. "the" e "they").&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  La Soluzione: Stemming NLP con My Ghost Writer
&lt;/h2&gt;

&lt;p&gt;Per risolvere questo problema, ho reimplementato la logica di rilevazione dei duplicati utilizzando lo &lt;strong&gt;stemming NLP&lt;/strong&gt; (tramite l'algoritmo &lt;a href="https://tartarus.org/martin/PorterStemmer/" rel="noopener noreferrer"&gt;Porter Stemming&lt;/a&gt;, già incluso dentro a lite.koboldai.net), che riduce le parole alla loro &lt;strong&gt;forma radice&lt;/strong&gt; (es. "running" → "run"). Questo approccio:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Raggruppa &lt;strong&gt;parole semanticamente correlate&lt;/strong&gt; (es. "run", "running", "ran").
&lt;/li&gt;
&lt;li&gt;Riduce i falsi positivi concentrandosi su &lt;strong&gt;veri duplicati&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;Supporta sia l'&lt;strong&gt;input manuale&lt;/strong&gt; che l'&lt;strong&gt;upload di file&lt;/strong&gt; per flessibilità.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Funzionalità Attuali e Limitazioni
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Funzionalità Principali
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Ricerca delle parole duplicate&lt;/strong&gt;, tramite stemming.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Thesaurus (work in progress)&lt;/strong&gt;:

&lt;ul&gt;
&lt;li&gt;Alimentato da chiamate ad &lt;a href="https://www.wordsapi.com/" rel="noopener noreferrer"&gt;WordsAPI&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Persistenza dei dati opzionale con un database &lt;strong&gt;MongoDB&lt;/strong&gt; locale.
&lt;/li&gt;
&lt;li&gt;Limitato a termini comuni ⚠️, non supporta (per ora) nomi propri o espressioni con parole multiple.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;




&lt;h2&gt;
  
  
  Tecnologie Utilizzate
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Backend&lt;/strong&gt;:

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Python 3.10+&lt;/strong&gt; con &lt;strong&gt;FastAPI&lt;/strong&gt; per eseguire l'applicazione web.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Structlog&lt;/strong&gt; per il logging e la gestione degli errori.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Poetry&lt;/strong&gt; per la gestione delle dipendenze.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Docker&lt;/strong&gt; per la containerizzazione.
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Frontend&lt;/strong&gt;:

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;JavaScript vanilla&lt;/strong&gt; (nessun framework a causa dell'integrazione con lite.koboldai.net).
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Playwright&lt;/strong&gt; per i test end-to-end (E2E).&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Risorse
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/trincadev/my_ghost_writer" rel="noopener noreferrer"&gt;Repository GitHub di My Ghost Writer&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/lostruins/lite.koboldai.net" rel="noopener noreferrer"&gt;lite.koboldai.net&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.wordsapi.com/" rel="noopener noreferrer"&gt;WordsAPI&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>llm</category>
      <category>ai</category>
      <category>nlp</category>
      <category>writing</category>
    </item>
    <item>
      <title>My Ghost Writer and lite.koboldai.net, an overview</title>
      <dc:creator>Alessandro T.</dc:creator>
      <pubDate>Thu, 03 Jul 2025 19:59:40 +0000</pubDate>
      <link>https://dev.to/trincadev/my-ghost-writer-and-litekoboldainet-an-overview-ol0</link>
      <guid>https://dev.to/trincadev/my-ghost-writer-and-litekoboldainet-an-overview-ol0</guid>
      <description>&lt;h1&gt;
  
  
  My Ghost Writer and lite.koboldai.net, an overview
&lt;/h1&gt;

&lt;p&gt;Some time ago I started drafting a text. Out of professional curiosity and sheer boredom, I wondered what kind of AI applications were feasible beyond the obvious text generation via prompts to LLMs.&lt;/p&gt;

&lt;p&gt;In particular, I noticed that smaller &lt;a href="https://en.wikipedia.org/wiki/Large_language_model" rel="noopener noreferrer"&gt;LLM (Large Language Models)&lt;/a&gt; tend to repeat themselves and insert duplicate words. This led me to search for an open-source project I could run on my PC to identify duplicate words – but I found nothing useful or that did exactly what I wanted.&lt;/p&gt;

&lt;p&gt;This ultimately led to the creation of &lt;strong&gt;&lt;a href="https://github.com/trincadev/my_ghost_writer" rel="noopener noreferrer"&gt;My Ghost Writer&lt;/a&gt;&lt;/strong&gt;, an open-source project now being integrated into &lt;strong&gt;lite.koboldai.net&lt;/strong&gt; – a lightweight, dependency-free web interface for &lt;a href="https://github.com/LostRuins/koboldcpp" rel="noopener noreferrer"&gt;KoboldCpp&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;a href="https://github.com/lostruins/lite.koboldai.net/" rel="noopener noreferrer"&gt;lite.koboldai.net&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/lostruins/lite.koboldai.net/" rel="noopener noreferrer"&gt;lite.koboldai.net&lt;/a&gt; is a dependency-free web interface designed as a backend for &lt;a href="https://www.cloudflare.com/learning/ai/what-is-large-language-model/" rel="noopener noreferrer"&gt;large language models (LLM)&lt;/a&gt; like KoboldCpp.&lt;br&gt;
It runs entirely in the browser (no installation required) and is packaged as a single static HTML file:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multiple modes&lt;/strong&gt;: Story mode, Chat mode, Instruction mode, and Adventure mode for different types of AI interaction.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Broad compatibility&lt;/strong&gt;: Works with KoboldAI Client, KoboldCpp, and AI Horde; supports both local and remote models.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Creative tools&lt;/strong&gt;: Includes a text editor, image generation via Stable Diffusion, and support for character sheets and scenarios.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;User-friendly&lt;/strong&gt;: Easy to use, customizable UI styles, and features like auto-save, text-to-speech, and repeat/edit options.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It's a great option if you want a lightweight, flexible interface for storytelling, role-playing, or AI-assisted writing.&lt;br&gt;
However, the code structure is somewhat messy:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A monolithic &lt;code&gt;index.html&lt;/code&gt; with over 26,000 lines of JS, CSS, and HTML.&lt;/li&gt;
&lt;li&gt;Only vanilla JS, no TypeScript obviously.&lt;/li&gt;
&lt;li&gt;Outdated third-party JS code.&lt;/li&gt;
&lt;li&gt;Missing E2E tests.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Problem with WordSearch in lite.koboldai.net
&lt;/h2&gt;

&lt;p&gt;The initial version of &lt;code&gt;WordSearch&lt;/code&gt; (based on my &lt;a href="https://github.com/LostRuins/lite.koboldai.net/pull/115" rel="noopener noreferrer"&gt;first implementation&lt;/a&gt;) in lite.koboldai.net used simple text search to detect duplicates, but had significant limitations:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Identified irrelevant text fragments (e.g., the single letter "a" even when embedded in other words).&lt;/li&gt;
&lt;li&gt;Couldn't distinguish between semantically different words (e.g., "the" vs. "they").&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Solution: NLP Stemming with My Ghost Writer
&lt;/h2&gt;

&lt;p&gt;To solve this, I reimplemented the duplicate detection logic using &lt;strong&gt;NLP stemming&lt;/strong&gt; (via the &lt;a href="https://tartarus.org/martin/PorterStemmer/" rel="noopener noreferrer"&gt;Porter Stemming&lt;/a&gt; algorithm, already included in lite.koboldai.net), which reduces words to their &lt;strong&gt;root form&lt;/strong&gt; (e.g., "running" → "run"). This approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Groups &lt;strong&gt;semantically related words&lt;/strong&gt; (e.g., "run", "running", "ran").&lt;/li&gt;
&lt;li&gt;Reduces false positives by focusing on &lt;strong&gt;real duplicates&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Supports both &lt;strong&gt;manual input&lt;/strong&gt; and &lt;strong&gt;file upload&lt;/strong&gt; for flexibility.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Current Features and Limitations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Main Features
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Duplicate word detection&lt;/strong&gt; via stemming.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Thesaurus (work in progress)&lt;/strong&gt;:

&lt;ul&gt;
&lt;li&gt;Powered by calls to &lt;a href="https://www.wordsapi.com/" rel="noopener noreferrer"&gt;WordsAPI&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Optional data persistence with a local &lt;strong&gt;MongoDB&lt;/strong&gt; database.&lt;/li&gt;
&lt;li&gt;Limited to common terms ⚠️, doesn't support (for now) proper nouns or multi-word expressions.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Technologies Used
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Backend&lt;/strong&gt;:

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Python 3.10+&lt;/strong&gt; with &lt;strong&gt;FastAPI&lt;/strong&gt; to run the webapp.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Structlog&lt;/strong&gt; for logging and error handling.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Poetry&lt;/strong&gt; for dependency management.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Docker&lt;/strong&gt; for containerization.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Frontend&lt;/strong&gt;:

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Vanilla JavaScript&lt;/strong&gt; (no framework due to integration with lite.koboldai.net).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Playwright&lt;/strong&gt; for end-to-end (E2E) testing.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/trincadev/my_ghost_writer" rel="noopener noreferrer"&gt;GitHub Repository for My Ghost Writer&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/lostruins/lite.koboldai.net" rel="noopener noreferrer"&gt;lite.koboldai.net&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.wordsapi.com/" rel="noopener noreferrer"&gt;WordsAPI&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>llm</category>
      <category>nlp</category>
      <category>ai</category>
      <category>writing</category>
    </item>
    <item>
      <title>AI Pronunciation Trainer</title>
      <dc:creator>Alessandro T.</dc:creator>
      <pubDate>Mon, 16 Dec 2024 20:33:20 +0000</pubDate>
      <link>https://dev.to/trincadev/ai-pronunciation-trainer-4ep6</link>
      <guid>https://dev.to/trincadev/ai-pronunciation-trainer-4ep6</guid>
      <description>&lt;p&gt;In questo articolo presento progetto a cui sto lavorando attualmente: &lt;a href="https://github.com/trincadev/ai-pronunciation-trainer" rel="noopener noreferrer"&gt;AI Pronunciation Trainer&lt;/a&gt; (online &lt;a href="https://huggingface.co/spaces/aletrn/ai-pronunciation-trainer" rel="noopener noreferrer"&gt;qui&lt;/a&gt;), uno strumento progettato per aiutarvi a migliorare la vostra pronuncia utilizzando la potenza dell'intelligenza artificiale. Questo progetto è un refactor dell'originale &lt;a href="https://github.com/Thiagohgl/ai-pronunciation-trainer" rel="noopener noreferrer"&gt;AI Pronunciation Trainer&lt;/a&gt; di &lt;a href="https://github.com/Thiagohgl" rel="noopener noreferrer"&gt;Thiagohgl&lt;/a&gt; a cui ho fatto diversi miglioramenti per rendere lo strumento più efficace e facile da usare.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cos'è e cosa fa
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://huggingface.co/spaces/aletrn/ai-pronunciation-trainer" rel="noopener noreferrer"&gt;AI Pronunciation Trainer&lt;/a&gt; è uno strumento che utilizza l'intelligenza artificiale per valutare la vostra pronuncia e fornire feedback, aiutandovi a migliorare e a essere compresi più chiaramente. Utilizza i modelli &lt;a href="https://github.com/snakers4/silero-models" rel="noopener noreferrer"&gt;Silero STT / TTS&lt;/a&gt;, &lt;a href="https://openai.com/index/whisper/" rel="noopener noreferrer"&gt;openai whisper&lt;/a&gt; e &lt;a href="https://github.com/SYSTRAN/faster-whisper" rel="noopener noreferrer"&gt;faster whisper&lt;/a&gt; per le funzionalità di speech-to-text (Silero permette anche di fare text-to-speech), garantendo una valutazione della pronuncia accurata e affidabile.&lt;/p&gt;

&lt;h3&gt;
  
  
  Refactor: aggiornamento delle Librerie Frontend e Backend
&lt;/h3&gt;

&lt;p&gt;A proposito del backend:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://pytorch.org/" rel="noopener noreferrer"&gt;PyTorch&lt;/a&gt; è adesso alla versione 2.6.x&lt;/li&gt;
&lt;li&gt;aggiornato il modello Silero tedesco di Speech-to-Text per risolvere un bug che impediva l'utilizzo di PyTorch successivo alla versione 1.13.x.&lt;/li&gt;
&lt;li&gt;Migliorati i test di backend python usando la &lt;a href="https://en.wikipedia.org/wiki/Mutation_testing" rel="noopener noreferrer"&gt;mutation test suite&lt;/a&gt; &lt;a href="https://cosmic-ray.readthedocs.io" rel="noopener noreferrer"&gt;Cosmic Ray&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Risolto un &lt;a href="https://github.com/Thiagohgl/ai-pronunciation-trainer/issues/14" rel="noopener noreferrer"&gt;bug&lt;/a&gt; per cui &lt;a href="https://huggingface.co/docs/transformers/model_doc/whisper" rel="noopener noreferrer"&gt;whisper&lt;/a&gt; non leggeva correttamente il timestamp finale for l'ultima parola nella registrazione (alla fine ho risolto usando il &lt;a href="https://pypi.org/project/openai-whisper/" rel="noopener noreferrer"&gt;pacchetto pip openai whisper&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Aggiunto supporto per il &lt;a href="https://pypi.org/project/faster-whisper/" rel="noopener noreferrer"&gt;pacchetto pip faster whisper&lt;/a&gt;:

&lt;ul&gt;
&lt;li&gt;evita i valori &lt;code&gt;None&lt;/code&gt; sui &lt;code&gt;end_ts&lt;/code&gt; timestamp nell'ultima parola della registrazione al contrario dell'output dell'output di whisper creato con la pipeline HuggingFace&lt;/li&gt;
&lt;li&gt;permette di individuare momenti di silenzio prolungato tramite &lt;a href="https://github.com/snakers4/silero-vad" rel="noopener noreferrer"&gt;silero-vad&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Inoltre, per quanto riguarda il frontend:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Aggiornate le librerie javascript utilizzando le versioni più recenti di jQuery (3.7.1) e Bootstrap (5.3.3)&lt;/li&gt;
&lt;li&gt;Nuovo frontend basato su &lt;a href="https://gradio.app" rel="noopener noreferrer"&gt;Gradio&lt;/a&gt; 5.x&lt;/li&gt;
&lt;li&gt;Aggiunti test E2E con &lt;a href="https://playwright.dev" rel="noopener noreferrer"&gt;Playwright&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Aggiunta la possibilità di scrivere, leggere ed ovviamente valutare una frase a scelta libera&lt;/li&gt;
&lt;li&gt;Tour guidato per i nuovi utenti con &lt;a href="https://github.com/kamranahmedse/driver.js/" rel="noopener noreferrer"&gt;driver.js&lt;/a&gt; ed &lt;a href="https://www.gradio.app/guides/custom-CSS-and-JS" rel="noopener noreferrer"&gt;css/javascript custom dentro ai Gradio blocks&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Riproduzione delle singole parole nella registrazione seguite dalla pronuncia 'ideale' della stessa parola letta dal motore Text-to-Speech&lt;/li&gt;
&lt;li&gt;Aggiunto anche una funzionalità di Text-to-Speech in-browser (su Windows 11 funziona solo nel caso siano installati i pacchetti linguistici inglesi e tedesco)&lt;/li&gt;
&lt;li&gt;Frontend custom webApp - migliorato lo stile CSS su dispositivi mobile&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Versione online: la demo nello spazio HuggingFace
&lt;/h3&gt;

&lt;p&gt;Potete provare online il mio progetto sul mio &lt;a href="https://huggingface.co/spaces/aletrn/ai-pronunciation-trainer" rel="noopener noreferrer"&gt;HuggingFace Space&lt;/a&gt;. Questa demo online vi permette di sperimentare le capacità dello strumento senza alcuna installazione o configurazione. Lo spazio HuggingFace fornisce un modo conveniente e accessibile per testare AI Pronunciation Trainer e vedere come può aiutarvi a migliorare la vostra pronuncia. Si prega di essere pazienti, a volte è un po' lento oppure in sleeping nel caso non sia utilizzato da nessuno da un po' (localmente è molto più veloce, soprattutto se avete un computer potente). Esiste anche una &lt;a href="https://aletrn-ai-pronunciation-trainer.hf.space" rel="noopener noreferrer"&gt;versione embedded dello spazio HuggingFace&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lavori Futuri
&lt;/h2&gt;

&lt;p&gt;Pur funzionando piuttosto bene, ci sono ovviamente margini di miglioramento. Ecco alcuni dei miglioramenti futuri che intendo implementare:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ricevere feedback dall'autore del lavoro originale sulla mia documentazione e sulle modifiche&lt;/li&gt;
&lt;li&gt;Chiedere all'autore del lavoro originale alcune spiegazioni sulle scelte architetturali e funzionali che ha fatto&lt;/li&gt;
&lt;li&gt;Valutare il passaggio da PyTorch ad ONNX Runtime&lt;/li&gt;
&lt;li&gt;Aggiungere più test E2E con Playwright&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusione
&lt;/h2&gt;

&lt;p&gt;Ritengo che &lt;a href="https://huggingface.co/spaces/aletrn/ai-pronunciation-trainer" rel="noopener noreferrer"&gt;AI Pronunciation Trainer&lt;/a&gt; sia uno strumento utile per chiunque desideri migliorare in autonomia la propria pronuncia. Con la potenza dell'IA ed i miglioramenti apportati durante il refactor, questo strumento fornisce feedback accurati e affidabili per aiutarvi a parlare in modo più chiaro e sicuro. Vi invito a provare la &lt;a href="https://huggingface.co/spaces/aletrn/ai-pronunciation-trainer" rel="noopener noreferrer"&gt;demo HuggingFace Space&lt;/a&gt; e capire come questo progetto possa aiutarvi nel vostro percorso verso una migliore pronuncia.&lt;/p&gt;

</description>
      <category>python</category>
      <category>javascript</category>
      <category>pytorch</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>AI Pronunciation Trainer</title>
      <dc:creator>Alessandro T.</dc:creator>
      <pubDate>Mon, 16 Dec 2024 20:31:01 +0000</pubDate>
      <link>https://dev.to/trincadev/ai-pronunciation-trainer-3nbm</link>
      <guid>https://dev.to/trincadev/ai-pronunciation-trainer-3nbm</guid>
      <description>&lt;h1&gt;
  
  
  AI Pronunciation Trainer
&lt;/h1&gt;

&lt;p&gt;In this article, I present the project I am working on: &lt;a href="https://github.com/trincadev/ai-pronunciation-trainer" rel="noopener noreferrer"&gt;AI Pronunciation Trainer&lt;/a&gt; (online &lt;a href="https://huggingface.co/spaces/aletrn/ai-pronunciation-trainer" rel="noopener noreferrer"&gt;here&lt;/a&gt;), a tool designed to help you improve your pronunciation using the power of artificial intelligence. This project is a refactor of the original &lt;a href="https://github.com/Thiagohgl/ai-pronunciation-trainer" rel="noopener noreferrer"&gt;AI Pronunciation Trainer&lt;/a&gt; by &lt;a href="https://github.com/Thiagohgl" rel="noopener noreferrer"&gt;Thiagohgl&lt;/a&gt; to which I have made several improvements to make the tool more effective and easier to use.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it is and what it does
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://huggingface.co/spaces/aletrn/ai-pronunciation-trainer" rel="noopener noreferrer"&gt;AI Pronunciation Trainer&lt;/a&gt; is a tool that uses AI to evaluate your pronunciation and provide feedback, helping you to improve and be understood more clearly. It leverages the &lt;a href="https://github.com/snakers4/silero-models" rel="noopener noreferrer"&gt;Silero STT / TTS&lt;/a&gt;, &lt;a href="https://openai.com/index/whisper/" rel="noopener noreferrer"&gt;openai whisper&lt;/a&gt; and &lt;a href="https://github.com/SYSTRAN/faster-whisper" rel="noopener noreferrer"&gt;faster whisper&lt;/a&gt; models for speech-to-text functionalities (Silero does also text-to-speech), ensuring accurate and reliable pronunciation assessment.&lt;/p&gt;

&lt;h3&gt;
  
  
  Refactor: upgraded frontend and backend libraries
&lt;/h3&gt;

&lt;p&gt;About the backend:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Updated &lt;a href="https://pytorch.org/" rel="noopener noreferrer"&gt;PyTorch&lt;/a&gt; at version 2.6.x&lt;/li&gt;
&lt;li&gt;Updated Silero German Speech-to-Text model to resolve a bug that prevented the use of PyTorch versions later than 1.13.x&lt;/li&gt;
&lt;li&gt;Improved backend tests with the &lt;a href="https://en.wikipedia.org/wiki/Mutation_testing" rel="noopener noreferrer"&gt;mutation test suite&lt;/a&gt; &lt;a href="https://cosmic-ray.readthedocs.io" rel="noopener noreferrer"&gt;Cosmic Ray&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Fixed a &lt;a href="https://github.com/Thiagohgl/ai-pronunciation-trainer/issues/14" rel="noopener noreferrer"&gt;bug&lt;/a&gt; with &lt;a href="https://huggingface.co/docs/transformers/model_doc/whisper" rel="noopener noreferrer"&gt;whisper&lt;/a&gt; not properly transcribing the end timestamp for the last word in the recorded audio (in the end I solved it switching to the &lt;a href="https://pypi.org/project/openai-whisper/" rel="noopener noreferrer"&gt;openai whisper python pip package&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Added &lt;a href="https://pypi.org/project/faster-whisper/" rel="noopener noreferrer"&gt;faster whisper&lt;/a&gt; model support:

&lt;ul&gt;
&lt;li&gt;it avoids &lt;code&gt;None&lt;/code&gt; values on &lt;code&gt;end_ts&lt;/code&gt; timestamps for the last elements, unlike the HuggingFace Whisper's output&lt;/li&gt;
&lt;li&gt;it uses &lt;a href="https://github.com/snakers4/silero-vad" rel="noopener noreferrer"&gt;silero-vad&lt;/a&gt; to detect long silences within the audio&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Furthermore, regarding the frontend:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Updated the JavaScript libraries using the latest versions of jQuery (3.7.1) and Bootstrap (5.3.3)&lt;/li&gt;
&lt;li&gt;New frontend based on &lt;a href="https://gradio.app" rel="noopener noreferrer"&gt;Gradio&lt;/a&gt; 5.x&lt;/li&gt;
&lt;li&gt;Added E2E tests with &lt;a href="https://playwright.dev" rel="noopener noreferrer"&gt;Playwright&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Added the ability to insert custom sentences to read and evaluate&lt;/li&gt;
&lt;li&gt;Onboarding tour for new users made with &lt;a href="https://github.com/kamranahmedse/driver.js/" rel="noopener noreferrer"&gt;driver.js&lt;/a&gt; and &lt;a href="https://www.gradio.app/guides/custom-CSS-and-JS" rel="noopener noreferrer"&gt;custom css/javascript in Gradio blocks&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Playback of individual words in the recording followed by the 'ideal' pronunciation of the same word read by the Text-to-Speech engine&lt;/li&gt;
&lt;li&gt;Also added an in-browser Text-to-Speech functionality (on Windows 11, it only works if the English and German language packs are installed)&lt;/li&gt;
&lt;li&gt;Custom webApp frontend - improved CSS style on mobile devices&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Online version: the HuggingFace Space Demo
&lt;/h2&gt;

&lt;p&gt;You can try it online using the &lt;a href="https://huggingface.co/spaces/aletrn/ai-pronunciation-trainer" rel="noopener noreferrer"&gt;HuggingFace Space&lt;/a&gt;. This online demo allows you to experience the tool's capabilities without any installation or configuration. The HuggingFace Space provides a convenient and accessible way to test the AI Pronunciation Trainer and see how it can help you improve your pronunciation. Please be patient, sometimes it is a bit slow or in sleeping mode (locally it is much faster, especially if you have a powerful computer). There is also an &lt;a href="https://aletrn-ai-pronunciation-trainer.hf.space" rel="noopener noreferrer"&gt;embedded version of my HuggingFace Space&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Future Work
&lt;/h2&gt;

&lt;p&gt;Although this tool works pretty good, there are still some areas for improvement. Here are some of the future enhancements I plan to implement:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Receive feedback from the original project author (&lt;a href="https://github.com/Thiagohgl" rel="noopener noreferrer"&gt;Thiago Lobato&lt;/a&gt;) on my documentation and changes&lt;/li&gt;
&lt;li&gt;Ask the original author for explanations on the architectural and functional choices he made&lt;/li&gt;
&lt;li&gt;Explore transitioning &lt;a href="https://pytorch.org/" rel="noopener noreferrer"&gt;PyTorch&lt;/a&gt; to &lt;a href="https://onnxruntime.ai/" rel="noopener noreferrer"&gt;onnxruntime&lt;/a&gt; (if possible)&lt;/li&gt;
&lt;li&gt;Re-add the docker container (if possible)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;I believe &lt;a href="https://huggingface.co/spaces/aletrn/ai-pronunciation-trainer" rel="noopener noreferrer"&gt;AI Pronunciation Trainer&lt;/a&gt; is a valuable tool for anyone looking to improve their pronunciation. With the power of AI and the improvements made in the refactoring project, this tool provides accurate and reliable feedback to help you speak more clearly and confidently. I invite you to try the &lt;a href="https://huggingface.co/spaces/aletrn/ai-pronunciation-trainer" rel="noopener noreferrer"&gt;HuggingFace Space demo&lt;/a&gt; and understand how this little project can help you on your journey to better pronunciation.&lt;/p&gt;

</description>
      <category>python</category>
      <category>machinelearning</category>
      <category>javascript</category>
      <category>pytorch</category>
    </item>
    <item>
      <title>LISA+SamGIS adattato ad hardware HuggingFace ZeroGPU</title>
      <dc:creator>Alessandro T.</dc:creator>
      <pubDate>Wed, 14 Aug 2024 21:15:40 +0000</pubDate>
      <link>https://dev.to/trincadev/lisasamgis-adattato-ad-hardware-huggingface-zerogpu-f11</link>
      <guid>https://dev.to/trincadev/lisasamgis-adattato-ad-hardware-huggingface-zerogpu-f11</guid>
      <description>&lt;h1&gt;
  
  
  LISA+SamGIS adattato ad hardware HuggingFace ZeroGPU
&lt;/h1&gt;

&lt;p&gt;Per una comprensione di base del mio progetto, si veda &lt;a href="https://trinca.tornidor.com/it/projects/samgis-segment-anything-applied-to-GIS" rel="noopener noreferrer"&gt;questa&lt;/a&gt; e &lt;a href="https://trinca.tornidor.com/it/projects/lisa-adapted-for-samgis" rel="noopener noreferrer"&gt;questa pagina&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Oggi invece sto scrivendo della mia nuova demo utilizzando un hardware &lt;a href="https://huggingface.co/zero-gpu-explorers" rel="noopener noreferrer"&gt;ZeroGPU&lt;/a&gt;. Si noti che &lt;a href="https://huggingface.co/zero-gpu-explorers" rel="noopener noreferrer"&gt;ZeroGPU Spaces&lt;/a&gt; è attualmente in versione beta. Gli utenti &lt;a href="https://huggingface.co/subscribe/pro" rel="noopener noreferrer"&gt;PRO&lt;/a&gt; o le &lt;a href="https://huggingface.co/enterprise" rel="noopener noreferrer"&gt;Enterprise organizations&lt;/a&gt; possono creare i propri space ZeroGPU a loro nome. Inoltre è necessario pagare ogni mese per mantenere il diritto di utilizzare l'hardware ZeroGPU.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ho riscontrato inizialmente dei problemi causati dall’uso del decoratore &lt;code&gt;spaces.GPU&lt;/code&gt; su una funzione inappropriata la cui esecuzione richiedeva troppo tempo, causando timeout. Risolto facendo debug per usare il decoratore solo sulle funzioni che ne richiedevano effettivamente l’uso.&lt;/li&gt;
&lt;li&gt;Frontend custom: non mi piace molto &lt;a href="https://svelte.dev/" rel="noopener noreferrer"&gt;svelte&lt;/a&gt; (la libreria js scelta dal team di Gradio) ma soprattutto ho già un progetto ben avviato scritto in &lt;a href="https://vuejs.org/" rel="noopener noreferrer"&gt;vuejs&lt;/a&gt; e &lt;a href="https://vitejs.dev/" rel="noopener noreferrer"&gt;vite&lt;/a&gt; che voglio riutilizzare. Risolto facendo &lt;a href="https://huggingface.co/docs/hub/spaces-dependencies" rel="noopener noreferrer"&gt;l’installazione del pacchetto Debian&lt;/a&gt; nodejs 18 per poi installare le dipendenze e fare la build del progetto nodejs direttamente da dentro il file &lt;code&gt;app.py&lt;/code&gt; usando &lt;code&gt;subpropcess.run()&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Nota che sto usando un periodo di timeout di 48 ore prima di mettere in pausa il mio space. Qualsiasi interazione successiva potrebbe richiedere un po' di tempo prima che lo space riparta.&lt;/p&gt;

&lt;p&gt;Ultimo, ma non ultimo, la pagina della demo è online &lt;a href="https://huggingface.co/spaces/aletrn/samgis-lisa-on-zero" rel="noopener noreferrer"&gt;qui (interfaccia Gradio)&lt;/a&gt; e &lt;a href="https://aletrn-samgis-lisa-on-zero.hf.space/lisa" rel="noopener noreferrer"&gt;qui (la mia pagina SPA custom)&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>python</category>
      <category>machinelearning</category>
      <category>nlp</category>
      <category>llm</category>
    </item>
    <item>
      <title>LISA+SamGIS on ZeroGPU HuggingFace hardware</title>
      <dc:creator>Alessandro T.</dc:creator>
      <pubDate>Wed, 14 Aug 2024 21:15:25 +0000</pubDate>
      <link>https://dev.to/trincadev/lisasamgis-on-zerogpu-huggingface-hardware-23k0</link>
      <guid>https://dev.to/trincadev/lisasamgis-on-zerogpu-huggingface-hardware-23k0</guid>
      <description>&lt;h1&gt;
  
  
  LISA+SamGIS on ZeroGPU HuggingFace hardware
&lt;/h1&gt;

&lt;p&gt;See &lt;a href="https://trinca.tornidor.com/projects/samgis-segment-anything-applied-to-GIS" rel="noopener noreferrer"&gt;this&lt;/a&gt; and &lt;a href="https://trinca.tornidor.com/projects/lisa-adapted-for-samgis" rel="noopener noreferrer"&gt;this page&lt;/a&gt; for a basic understand of what is about my project.&lt;/p&gt;

&lt;p&gt;Today instead I'm writing about my new demo on &lt;a href="https://huggingface.co/zero-gpu-explorers" rel="noopener noreferrer"&gt;ZeroGPU&lt;/a&gt; space. Note that &lt;a href="https://huggingface.co/zero-gpu-explorers" rel="noopener noreferrer"&gt;ZeroGPU Spaces&lt;/a&gt; is currently in beta. &lt;a href="https://huggingface.co/subscribe/pro" rel="noopener noreferrer"&gt;PRO&lt;/a&gt; users or &lt;a href="https://huggingface.co/enterprise" rel="noopener noreferrer"&gt;Enterprise organizations&lt;/a&gt; can host their own ZeroGPU Spaces under their namespaces. Also there is need to pay every month for keep the right to use ZeroGPU hardware.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I solved some problems caused by &lt;code&gt;spaces.GPU&lt;/code&gt; decorator on a function which execution time was too high, causing a timeout. To solve it I started debugging and I ended using &lt;code&gt;spaces.GPU&lt;/code&gt; only on functions that really needed the GPU acceleration.&lt;/li&gt;
&lt;li&gt;I don't like very much &lt;a href="https://svelte.dev/" rel="noopener noreferrer"&gt;svelte&lt;/a&gt; (the js library chosen by Gradio team) and I already have a &lt;a href="https://vuejs.org/" rel="noopener noreferrer"&gt;vuejs&lt;/a&gt;/&lt;a href="https://vitejs.dev/" rel="noopener noreferrer"&gt;vite&lt;/a&gt; frontend project that I can re-use. I solved this installing the nodejs 18 &lt;a href="https://huggingface.co/docs/hub/spaces-dependencies" rel="noopener noreferrer"&gt;Debian package&lt;/a&gt; and starting the nodejs build from within the &lt;code&gt;app.py&lt;/code&gt; file using &lt;code&gt;subpropcess.run()&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Note that I'm using a timeout period of 48h before putting my space in pause. Any interaction after that could take a while until the space restart.&lt;/p&gt;

&lt;p&gt;Last but not least there is my online demo &lt;a href="https://huggingface.co/spaces/aletrn/samgis-lisa-on-zero" rel="noopener noreferrer"&gt;here (Gradio interface)&lt;/a&gt; and &lt;a href="https://aletrn-samgis-lisa-on-zero.hf.space/lisa" rel="noopener noreferrer"&gt;here (my custom SPA page)&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>python</category>
      <category>machinelearning</category>
      <category>nlp</category>
      <category>llm</category>
    </item>
    <item>
      <title>SamGIS - Alcuni appunti su Segment Anything</title>
      <dc:creator>Alessandro T.</dc:creator>
      <pubDate>Mon, 27 May 2024 18:27:17 +0000</pubDate>
      <link>https://dev.to/trincadev/samgis-alcuni-appunti-su-segment-anything-144p</link>
      <guid>https://dev.to/trincadev/samgis-alcuni-appunti-su-segment-anything-144p</guid>
      <description>&lt;h1&gt;
  
  
  SamGIS - Alcuni appunti su Segment Anything
&lt;/h1&gt;

&lt;p&gt;Rimando alle mie note in inglese su &lt;a href="https://dev.to/projects/notes-about-segment-anything"&gt;Segment Anything&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  A proposito del riutilizzo degli embedding delle immagini e SamGIS
&lt;/h2&gt;

&lt;p&gt;Dopo aver riletto questo paper ho capito che avrei potuto migliorare l'efficienza di SamGIS conservando e riutilizzando gli embedding delle immagini.&lt;/p&gt;

&lt;p&gt;Ho implementato questa modifica in &lt;a href="https://docs.ml-trinca.tornidor.com/#version-1-3-0" rel="noopener noreferrer"&gt;SamGIS versione 1.3.0&lt;/a&gt;. Alcuni dati di test dalla &lt;a href="https://huggingface.co/spaces/aletrn/samgis" rel="noopener noreferrer"&gt;demo SamGIS&lt;/a&gt; che ho utilizzato:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;prima chiamata: 5.42s

&lt;ul&gt;
&lt;li&gt;modello &lt;a href="https://github.com/CASIA-IVA-Lab/FastSAM" rel="noopener noreferrer"&gt;fastsam&lt;/a&gt; istanziato&lt;/li&gt;
&lt;li&gt;immagine creata dalla mappa web (uso OpenStreetMap come tile provider e Mapnik come layer della webmap)&lt;/li&gt;
&lt;li&gt;creato il embedding dell'immagine&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;seconda chiamata: 0,41 s&lt;/li&gt;

&lt;li&gt;dalla terza alla settima chiamata: ~0,34s&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Si tenga presente che effettuando una chiamata immediatamente dopo l'altra la durata rimane bassa, probabilmente a causa dell'utilizzo della cache durante il download delle tile nel back-end. Aspettando più di 10 minuti sembra invalidare la cache, quindi &lt;a href="https://github.com/geopandas/contextily" rel="noopener noreferrer"&gt;contextily&lt;/a&gt; (la libreria di GeoPanda che utilizzo come client di Tiles) ha impiegato da 0.5s a 1.5s di tempo, durante le mie prove, per il download delle tile.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Espandere qui per il dettaglio del payload delle chiamate di test.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"bbox"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"ne"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"lat"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;46.236615111857255&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"lng"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;9.519996643066408&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"sw"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"lat"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;46.13405108959001&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"lng"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;9.29821014404297&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"prompt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;146&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"point"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="nl"&gt;"lat"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;46.18483299780137&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="nl"&gt;"lng"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;9.418864745562386&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"zoom"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"source_type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"OpenStreetMap"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  A proposito della conversione "dal testo alla maschera" "zero shot": LISA e SamGIS
&lt;/h2&gt;

&lt;p&gt;La versione originale di SAM può utilizzare anche semplici prompt testuali in linguaggio naturale. Per un uso pratico di questa funzionalità, si veda:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/IDEA-Research/Grounded-Segment-Anything" rel="noopener noreferrer"&gt;Grounded-SAM&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/dvlab-research/LISA" rel="noopener noreferrer"&gt;LISA&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Naturalmente potrebbe interessare anche il mio &lt;a href="https://trinca.tornidor.com/projects/lisa-adapted-for-samgis" rel="noopener noreferrer"&gt;lavoro di integrazione di LISA con SamGIS&lt;/a&gt; e la corrispondente &lt;a href="https://huggingface.%20co/spaces/aletrn/samgis-lisa-on-cuda" rel="noopener noreferrer"&gt;demo&lt;/a&gt;. Devo tenerlo in pausa a causa dei costi, ma sto richiedendo l'uso di una GPU gratuita da HuggingFace.&lt;/p&gt;

&lt;p&gt;Nel caso il mio &lt;a href="https://huggingface.co/spaces/aletrn/samgis-lisa-on-cuda" rel="noopener noreferrer"&gt;progetto&lt;/a&gt; fosse interessante, metti "mi piace" o commenta il &lt;a href="https://huggingface.co%20/spaces/aletrn/samgis-lisa-on-cuda/discussions/1" rel="noopener noreferrer"&gt;thread di richiesta di risorse GPU di HuggingFace&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>python</category>
      <category>computervision</category>
      <category>machinelearning</category>
      <category>maps</category>
    </item>
    <item>
      <title>SamGIS - Some notes about Segment Anything</title>
      <dc:creator>Alessandro T.</dc:creator>
      <pubDate>Mon, 27 May 2024 18:27:09 +0000</pubDate>
      <link>https://dev.to/trincadev/samgis-some-notes-about-segment-anything-5a3</link>
      <guid>https://dev.to/trincadev/samgis-some-notes-about-segment-anything-5a3</guid>
      <description>&lt;h1&gt;
  
  
  SamGIS - Some notes about Segment Anything
&lt;/h1&gt;

&lt;h2&gt;
  
  
  From the &lt;a href="https://arxiv.org/abs/2304.02643" rel="noopener noreferrer"&gt;Segment Anything paper&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;"&lt;a href="https://github.com/facebookresearch/segment-anything" rel="noopener noreferrer"&gt;SAM&lt;/a&gt;" is a &lt;a href="https://aws.amazon.com/what-is/foundation-models/" rel="noopener noreferrer"&gt;foundation model&lt;/a&gt; aiming for performing "zero-shot" image segmentation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;it's build and trained with a large image dataset with a massive amount of segmentation masks&lt;/li&gt;
&lt;li&gt;the SAM team propose the "promptable" segmentation task, where the goal is to return a valid segmentation mask given any segmentation prompt.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Since this model should perform "zero-shot" segmentation the model must support flexible prompts, needs to compute masks in amortized real-time to allow interactive use and must be ambiguity-aware. That's the model architecture:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;source 1: an image encoder computes an image embedding&lt;/li&gt;
&lt;li&gt;source 2: a fast prompt encoder embeds prompts&lt;/li&gt;
&lt;li&gt;output: a fast mask decoder combines these two sources to predict segmentation masks&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Because annotation masks are not abundant online, especially of high quality, the SAM developers opted for developing a "data engine", developing both the model and the dataset annotations (from manual stage to semi-automated to fully automated). Images in SA-1B span a geographically and economically diverse set of countries and we found that SAM performs similarly across different groups of people.&lt;/p&gt;

&lt;h3&gt;
  
  
  Segment Anything Tasks
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Task
&lt;/h4&gt;

&lt;p&gt;Here SAM team translate prompts from NLP to segmentation (selecting/de-selecting points, box, mask, free-form text). Like a language model should output a coherent response to an ambiguous prompt, the promptable segmentation task should return a valid segmentation mask given any prompt.&lt;/p&gt;

&lt;h4&gt;
  
  
  Pre-Training
&lt;/h4&gt;

&lt;p&gt;The promptable segmentation task suggests a natural pre-training algorithm that simulates a sequence of prompts (e.g., points, boxes, masks) for each training sample and compares the model’s mask predictions against the ground truth.&lt;/p&gt;

&lt;h3&gt;
  
  
  Segment Anything Model
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Image encoder
&lt;/h4&gt;

&lt;p&gt;The algorithm use a MAE (&lt;a href="https://arxiv.org/abs/2111.06377" rel="noopener noreferrer"&gt;"Masked Autoencoders Are Scalable Vision Learners"&lt;/a&gt;) pre-trained Vision Transformer (&lt;a href="https://arxiv.org/abs/2010.11929" rel="noopener noreferrer"&gt;ViT&lt;/a&gt;) minimally adapted to process &lt;a href="https://arxiv.org/abs/2203.16527" rel="noopener noreferrer"&gt;high resolution inputs&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  Prompt encoder
&lt;/h4&gt;

&lt;p&gt;SAM supports two sets of prompts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;sparse (points, boxes, text)&lt;/li&gt;
&lt;li&gt;dense (masks)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;SAM prompts &lt;a href="https://arxiv.org/abs/2006.10739" rel="noopener noreferrer"&gt;handle points and boxes by positional encodings&lt;/a&gt; summed with &lt;a href="https://arxiv.org/abs/2103.00020" rel="noopener noreferrer"&gt;learned embeddings for each prompt type&lt;/a&gt;. Dense prompts (i.e., masks) are embedded using convolutions and summed element-wise with the image embedding.&lt;/p&gt;

&lt;h4&gt;
  
  
  Mask decoder
&lt;/h4&gt;

&lt;p&gt;The mask decoder efficiently maps the image embedding, prompt embeddings, and an output token to a mask. This design employs a modification of a &lt;a href="https://arxiv.org/abs/1706.03762" rel="noopener noreferrer"&gt;Transformer decoder block&lt;/a&gt; followed by a dynamic mask prediction head. The decoder block uses prompt self-attention and cross-attention in two directions (prompt-to-image embedding and vice-versa) to update all embeddings. After running two blocks, the procedure upsample the image embedding and an MLP maps the output token to a dynamic linear classifier, which then computes the mask foreground probability at each image location.&lt;/p&gt;

&lt;h4&gt;
  
  
  Resolving ambiguity
&lt;/h4&gt;

&lt;p&gt;With one output, to avoid masks merging in case of an ambiguous prompt the model can predict more than one output mask for a single prompt. 3 masks should address most common cases (nested masks are often at most three deep: whole, part, and subpart). During training, the procedure backprops only the minimum loss over masks. To rank masks, the model predicts a confidence score (i.e., estimated IoU) for each mask.&lt;/p&gt;

&lt;h2&gt;
  
  
  About image embedding re-use and SamGIS
&lt;/h2&gt;

&lt;p&gt;After reading this paper I understood that I could improve SamGIS software design storing and re-using the image embeddings.&lt;/p&gt;

&lt;p&gt;I implemented this change in &lt;a href="https://docs.ml-trinca.tornidor.com/#version-1-3-0" rel="noopener noreferrer"&gt;SamGIS version 1.3.0&lt;/a&gt;. Some test data from the &lt;a href="https://huggingface.co/spaces/aletrn/samgis" rel="noopener noreferrer"&gt;SamGIS demo&lt;/a&gt; I used:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;first request: 5.42s

&lt;ul&gt;
&lt;li&gt;instantiated &lt;a href="https://github.com/CASIA-IVA-Lab/FastSAM" rel="noopener noreferrer"&gt;fastsam&lt;/a&gt; model&lt;/li&gt;
&lt;li&gt;created image from webmap (I'm using &lt;a href="https://www.openstreetmap.org/" rel="noopener noreferrer"&gt;OpenStreetMap&lt;/a&gt; as tiles provider and Mapnik as map layer)&lt;/li&gt;
&lt;li&gt;created image embedding&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;second request: 0.41s&lt;/li&gt;

&lt;li&gt;from third to seventh request: ~0.34s&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Note that making one request immediately after another keep requests duration low probably because of cache during tiles download on backend side. Instead waiting more than 10 minutes it seems invalidate the cache, then &lt;a href="https://github.com/geopandas/contextily" rel="noopener noreferrer"&gt;contextily&lt;/a&gt; (the GeoPandas' library that I use as a tiles client) added from 0.5s to 1.5s of time, during my tests, to download the tiles.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Click here to show my test request payload
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"bbox"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"ne"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"lat"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;46.236615111857255&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"lng"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;9.519996643066408&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"sw"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"lat"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;46.13405108959001&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"lng"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;9.29821014404297&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"prompt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;146&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"point"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="nl"&gt;"lat"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;46.18483299780137&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="nl"&gt;"lng"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;9.418864745562386&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"zoom"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"source_type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"OpenStreetMap"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  About Zero-Shot Text-to-Mask: LISA and SamGIS
&lt;/h2&gt;

&lt;p&gt;SAM can use also simple free-form text prompts. For a practical use of this feature, see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/IDEA-Research/Grounded-Segment-Anything" rel="noopener noreferrer"&gt;Grounded-SAM&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/dvlab-research/LISA" rel="noopener noreferrer"&gt;LISA&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Of course could be of your interest also my &lt;a href="https://trinca.tornidor.com/projects/lisa-adapted-for-samgis" rel="noopener noreferrer"&gt;integration work of LISA with SamGIS&lt;/a&gt; and its &lt;a href="https://huggingface.co/spaces/aletrn/samgis-lisa-on-cuda" rel="noopener noreferrer"&gt;demo&lt;/a&gt;. I need to keep it paused because of cost, but I am requesting the use of a free GPU from HuggingFace.&lt;/p&gt;

&lt;p&gt;If you like my &lt;a href="https://huggingface.co/spaces/aletrn/samgis-lisa-on-cuda" rel="noopener noreferrer"&gt;project&lt;/a&gt;, please like or comment on the &lt;a href="https://huggingface.co/spaces/aletrn/samgis-lisa-on-cuda/discussions/1" rel="noopener noreferrer"&gt;HuggingFace GPU resource request thread&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>python</category>
      <category>computervision</category>
      <category>machinelearning</category>
      <category>maps</category>
    </item>
    <item>
      <title>LISA integrato in SamGIS</title>
      <dc:creator>Alessandro T.</dc:creator>
      <pubDate>Sun, 26 May 2024 14:38:43 +0000</pubDate>
      <link>https://dev.to/trincadev/lisa-integrato-in-samgis-1f82</link>
      <guid>https://dev.to/trincadev/lisa-integrato-in-samgis-1f82</guid>
      <description>&lt;h1&gt;
  
  
  LISA integrato in SamGIS
&lt;/h1&gt;

&lt;p&gt;La segmentazione d'immagine è un compito cruciale nella visione artificiale, dove l'obiettivo di fare &lt;a href="https://www.ibm.com/topics/instance-segmentation" rel="noopener noreferrer"&gt;"instance segmentation"&lt;/a&gt; di un dato oggetto. Ho già lavorato ad un progetto, &lt;a href="https://trinca.tornidor.com/it/projects/samgis-segment-anything-applied-to-GIS" rel="noopener noreferrer"&gt;SamGIS&lt;/a&gt;, a riguardo. Un passo logico successivo sarebbe integrare la capacità di riconoscere gli oggetti attraverso prompt testuali. Quest'attività apparentemente semplice in effetti comporta però delle differenze rispetto a quanto fatto in SamGIS che utilizza &lt;a href="https://segment-anything.com/" rel="noopener noreferrer"&gt;Segment Anything&lt;/a&gt; (il backend di machine learning usato da SamGIS). Mentre infatti "SAM" non categorizza ciò che identifica, partire da un prompt scritto necessita della conoscenza di quali classi di oggetti esistano nell'immagine in analisi. Un &lt;a href="https://arxiv.org/abs/2305.11175" rel="noopener noreferrer"&gt;modello di linguaggio visivo&lt;/a&gt; (o VLM) che funziona bene per questo compito è &lt;a href="https://github.com/dvlab-research/LISA" rel="noopener noreferrer"&gt;LISA&lt;/a&gt;. Gli autori di LISA hanno basato il loro lavoro su &lt;a href="https://segment-anything.com/" rel="noopener noreferrer"&gt;Segment Anything&lt;/a&gt; e &lt;a href="https://llava-vl.github.io/" rel="noopener noreferrer"&gt;Llava&lt;/a&gt;, un LLM con capacità multimodali (può elaborare sia istruzioni di testo che immagini). Sfruttando le capacità di "segmentazione ragionata" di LISA, SamGIS può eseguire analisi di tipo "zero-shot", ovvero senza addestramento pregresso specifico e specialistico in ambito geologico, geomorfologico o fotogrammetrico.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prompts testuali d'input e relativi geojson di output
&lt;/h2&gt;

&lt;p&gt;Non riesco a mostrare qui su dev.to questa parte, quindi rimando alla &lt;a href="https://trinca.tornidor.com/it/projects/lisa-adapted-for-samgis#prompts-testuali-d-input-e-relativi-geojson-di-output" rel="noopener noreferrer"&gt;pagina dedicata sul mio blog&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Durata dei task di segmentazione
&lt;/h2&gt;

&lt;p&gt;Al momento, un prompt che richieda anche la spiegazione di quanto identificato nell'immagine rallenta notevolmente l'analisi. Lo stesso prompt d'analisi eseguito sulla stessa immagine però senza richieste di spiegazione viene elaborato molto più velocemente. I test contenenti richieste di spiegazioni vengono eseguiti in più di 60 secondi mentre senza la durata è intorno o inferiore a 4 secondi, utilizzando il profilo hardware HuggingFace "Nvidia T4 Small" con 4 vCPU, 15 GB RAM e 16 GB VRAM.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architettura software
&lt;/h2&gt;

&lt;p&gt;Dal punto di vista tecnico e architetturale, la &lt;a href="https://huggingface.co/spaces/aletrn/samgis-lisa-on-cuda" rel="noopener noreferrer"&gt;demo&lt;/a&gt; consiste di un frontend simile a quello sulla demo di &lt;a href="https://huggingface.co/spaces/aletrn/samgis" rel="noopener noreferrer"&gt;SamGIS&lt;/a&gt;. Niente barra degli strumenti per disegnare, sostituita dalla casella di testo per le richieste in linguaggio naturale. Il backend utilizza un'API basata su FastAPI e che invoca una funzione ad hoc basata su LISA.&lt;/p&gt;

&lt;p&gt;Ho dovuto mettere in pausa la demo a causa del costo della GPU, ma sto richiedendo l'uso di una GPU gratuita da HuggingFace. Non esitate a contattarmi su LinkedIn per una dimostrazione dal vivo, chiedere maggiori informazioni o ulteriori chiarimenti.&lt;/p&gt;

</description>
      <category>llm</category>
      <category>machinelearning</category>
      <category>python</category>
      <category>computervision</category>
    </item>
    <item>
      <title>LISA adapted to SamGIS</title>
      <dc:creator>Alessandro T.</dc:creator>
      <pubDate>Sun, 26 May 2024 14:38:30 +0000</pubDate>
      <link>https://dev.to/trincadev/lisa-adapted-to-samgis-c5k</link>
      <guid>https://dev.to/trincadev/lisa-adapted-to-samgis-c5k</guid>
      <description>&lt;h1&gt;
  
  
  LISA adapted to SamGIS
&lt;/h1&gt;

&lt;p&gt;Image segmentation is a crucial task in computer vision, where the goal is to extract the &lt;a href="https://www.ibm.com/topics/instance-segmentation" rel="noopener noreferrer"&gt;instance segmentation mask&lt;/a&gt; for a desired object within the image. I've already worked on a project, &lt;a href="https://trinca.tornidor.com/projects/samgis-segment-anything-applied-to-GIS" rel="noopener noreferrer"&gt;SamGIS&lt;/a&gt;, that focuses on this particular application of computer vision. A logical progression now would be incorporating the ability to recognize objects through text prompts. This apparently simple activity is actually different compared to what &lt;a href="https://segment-anything.com/" rel="noopener noreferrer"&gt;Segment Anything&lt;/a&gt; (the ML backend used by SamGIS) does. In fact "SAM" does not outputs descriptions nor categorizations for its input images. Starting from a written prompt at the contrary requires understanding which classes of objects exist in the image under analysis. A &lt;a href="https://arxiv.org/abs/2305.11175" rel="noopener noreferrer"&gt;visual language model&lt;/a&gt; (or VLM) that performs well for this task is &lt;a href="https://github.com/dvlab-research/LISA" rel="noopener noreferrer"&gt;LISA&lt;/a&gt;. LISA's authors built their work on top of &lt;a href="https://segment-anything.com/" rel="noopener noreferrer"&gt;Segment Anything&lt;/a&gt; and &lt;a href="https://llava-vl.github.io/" rel="noopener noreferrer"&gt;Llava&lt;/a&gt;, a large language model with multimodal capabilities (it can process both text prompts and images). By leveraging LISA's "reasoned segmentation" abilities, SamGIS can now conduct "zero-shot" analyses, meaning it can operate without specific or specialistic prior training in geological, geomorphological, or photogrammetric fields.&lt;/p&gt;

&lt;h2&gt;
  
  
  Some input text prompts with their geojson outputs
&lt;/h2&gt;

&lt;p&gt;I can't show this part on dev.to, then I refer you to my &lt;a href="https://trinca.tornidor.com/projects/lisa-adapted-for-samgis#some-input-text-prompts-with-their-geojson-outputs" rel="noopener noreferrer"&gt;blog page&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Duration of segmentation tasks
&lt;/h2&gt;

&lt;p&gt;At the moment, a prompt that also requires an explanation about the segmentation task slows down greatly the analysis. The same prompt on the same image without "descriptive" or "explanatory" questions instead finish much faster. Tests with explanatory text perform in more than 60 seconds while without duration is between 3 and 8 seconds, using the HuggingFace hardware profile "Nvidia T4 Small" with 4 vCPU, 15 GB RAM and 16 GB VRAM.&lt;/p&gt;

&lt;h2&gt;
  
  
  Software architecture
&lt;/h2&gt;

&lt;p&gt;Technically and architecturally, the &lt;a href="https://huggingface.co/spaces/aletrn/samgis-lisa-on-cuda" rel="noopener noreferrer"&gt;demo&lt;/a&gt; consists of a frontend page like &lt;a href="https://huggingface.co/spaces/aletrn/samgis" rel="noopener noreferrer"&gt;SamGIS&lt;/a&gt; demo. Instead of the drawing tool bar there is a text prompt for natural language requests with some selectable examples displayed at the top of the page. The backend utilizes a FastAPI-based API that calls a custom LISA function wrapper.&lt;/p&gt;

&lt;p&gt;Unfortunately I have to pause my demo due to GPU cost, but I am requesting the use of a free GPU from HuggingFace. Please feel free to reach out to me on LinkedIn for a live demonstration, ask for more information or further clarifications.&lt;/p&gt;

</description>
      <category>llm</category>
      <category>machinelearning</category>
      <category>python</category>
      <category>computervision</category>
    </item>
    <item>
      <title>Cosa ho imparato durante lo sviluppo di SamGIS con LISA (finora)</title>
      <dc:creator>Alessandro T.</dc:creator>
      <pubDate>Sun, 26 May 2024 13:57:08 +0000</pubDate>
      <link>https://dev.to/trincadev/cosa-ho-imparato-durante-lo-sviluppo-di-samgis-con-lisa-finora-40m</link>
      <guid>https://dev.to/trincadev/cosa-ho-imparato-durante-lo-sviluppo-di-samgis-con-lisa-finora-40m</guid>
      <description>&lt;h1&gt;
  
  
  Cosa ho imparato durante lo sviluppo di SamGIS con LISA (finora)
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Leggere le pubblicazioni inerenti ai progetti su cui lavoro
&lt;/h2&gt;

&lt;p&gt;Per migliorare la mia comprensione del mio progetto di machine learning ho deciso di leggere l'articolo su cui si basano LISA e Segment Anything. Oltre ad alcune informazioni teoriche su LLM, ho notato che l'architettura modulare di "SAM" consente di creare e riutilizzare gli image embedding. Dato che SamGIS non funzionava in questo modo inizialmente, ho formulato un'ipotesi al riguardo.&lt;/p&gt;

&lt;h2&gt;
  
  
  Debug, misure ed ottimizzazione: ipotesi sul image embedding
&lt;/h2&gt;

&lt;p&gt;A questo punto ho continuato il mio lavoro di debug misurando la durata dei singoli passaggi durante l'esecuzione delle funzioni di SamGIS. La creazione di un image embedding è un'operazione abbastanza onerosa, quindi è vantaggioso salvarlo e riutilizzarlo (ho verificato implementare la mia ipotesi migliorerebbe le prestazioni del software). Utilizzando il profilo hardware HuggingFace "Nvidia T4 Small" (con 4 vCPU, 15 GB RAM e 16 GB VRAM) è possibile risparmiare circa 1 secondo per ogni inferenza successiva alla prima, utilizzando la stessa immagine (quindi senza modificare il tile provider e l'area geografica).&lt;/p&gt;

&lt;h2&gt;
  
  
  Il ruolo dei LLM con prompt aventi differenti caratteristiche
&lt;/h2&gt;

&lt;p&gt;LISA eredita le capacità di generazione del linguaggio dei LLM multi-modali come &lt;a href="https://llava-vl.github.io/" rel="noopener noreferrer"&gt;Llava&lt;/a&gt;. Questi modelli eccellono nella gestione di ragionamenti complessi, conoscenza del mondo, risposte esplicative e conversazioni a più turni. Sono strumenti potenti per colmare il divario tra testo e comprensione visiva.&lt;/p&gt;

&lt;p&gt;LISA permette di effettuare &lt;a href="https://trinca.tornidor.com/it/projects/lisa-adapted-for-samgis#prompts-testuali-d-input-e-relativi-geojson-di-output" rel="noopener noreferrer"&gt;ragionamenti piuttosto complessi&lt;/a&gt; durante la segmentazione delle immagini (es. "identify the houses near the trees..." vs "identify the houses...") senza particolari peggioramenti prestazionali. Al contrario, richieste contenenti la spiegazione del motivo ("explain why") per cui il task di segmentazione sia fatto in un certo modo avranno tempi di esecuzione molto più elevati (nell'ordine di minuti).&lt;/p&gt;

&lt;p&gt;Sono disponibili &lt;a href="https://trinca.tornidor.com/it/projects/lisa-adapted-for-samgis#durata-dei-task-di-segmentazione" rel="noopener noreferrer"&gt;maggiori dettagli qui&lt;/a&gt; su questi miglioramenti in seguito alle modifiche descritte e relativamente alle differenti prestazioni dovute a diversi casi durante l'utilizzo di SamGIS con LISA.&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>learning</category>
      <category>programming</category>
      <category>llm</category>
    </item>
    <item>
      <title>What I learnt from development on LISA with SamGIS (So far)</title>
      <dc:creator>Alessandro T.</dc:creator>
      <pubDate>Sun, 26 May 2024 13:52:16 +0000</pubDate>
      <link>https://dev.to/trincadev/what-i-learnt-from-development-on-lisa-with-samgis-so-far-5eon</link>
      <guid>https://dev.to/trincadev/what-i-learnt-from-development-on-lisa-with-samgis-so-far-5eon</guid>
      <description>&lt;h1&gt;
  
  
  What I learnt from development on LISA with SamGIS (So far)
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Read publications related to the projects I work on
&lt;/h2&gt;

&lt;p&gt;To improve my understanding of my machine learning project I decided to read the papers on which &lt;a href="https://arxiv.org/abs/2308.00692" rel="noopener noreferrer"&gt;LISA&lt;/a&gt; and &lt;a href="https://arxiv.org/abs/2304.02643" rel="noopener noreferrer"&gt;Segment Anything&lt;/a&gt; are based. Besides some theoretical informations about LLM, I noticed that the modular architecture of "SAM" permits to save and re-use image embeddings. Since SamGIS didn't work this way initially, I formulated an hypothesis about this.&lt;/p&gt;

&lt;h2&gt;
  
  
  Debugging, measures and optimization: Image Embedding Hypothesis
&lt;/h2&gt;

&lt;p&gt;At this point I continued my debugging work by measuring the duration of individual steps during the execution of SamGIS functions. Creating an image embedding is quite an expensive operation, so it is advantageous saving it and re-using it (I verified that implementing my hypothesis would improve the performance of the software). Using the HuggingFace hardware profile "Nvidia T4 Small" (with 4 vCPU, 15 GB RAM and 16 GB VRAM) it's possible to save almost 1 second on every inference after the first, using the same image (without change the geographical area tiles provider).&lt;/p&gt;

&lt;h2&gt;
  
  
  The role of LLMs with prompts having different characteristics
&lt;/h2&gt;

&lt;p&gt;LISA inherits the language generation capabilities of multi-modal LLMs such as &lt;a href="https://llava-vl.github.io/" rel="noopener noreferrer"&gt;Llava&lt;/a&gt;. These models excel at handling complex reasoning, world knowledge, explanatory answers and multi-turn conversations. They’re powerful tools for bridging the gap between text and visual understanding.&lt;/p&gt;

&lt;p&gt;LISA allows you to perform &lt;a href="https://trinca.tornidor.com/projects/lisa-adapted-for-samgis#some-input-text-prompts-with-their-geojson-outputs" rel="noopener noreferrer"&gt;rather complex reasoning&lt;/a&gt; during image segmentation (e.g. "identify the houses near the trees..." vs "identify the houses...") without any particular performance degradation. On the contrary, requests containing the explanation of reason ("explain why") the segmentation task is done in a certain way will have much higher execution times (in the order of minutes).&lt;/p&gt;

&lt;p&gt;There are &lt;a href="https://trinca.tornidor.com/projects/lisa-adapted-for-samgis#duration-of-segmentation-tasks" rel="noopener noreferrer"&gt;more details here&lt;/a&gt; about these improvements following the changes described and regarding different performance due to different cases when using SamGIS with LISA.&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>learning</category>
      <category>programming</category>
      <category>llm</category>
    </item>
  </channel>
</rss>
