<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jack</title>
    <description>The latest articles on DEV Community by Jack (@saintpetejackboy).</description>
    <link>https://dev.to/saintpetejackboy</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1341103%2F548dcdc5-6a53-4a98-8ca8-5efb11bda86a.jpeg</url>
      <title>DEV Community: Jack</title>
      <link>https://dev.to/saintpetejackboy</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/saintpetejackboy"/>
    <language>en</language>
    <item>
      <title>How I discovered Named Entity Recognition while trying to remove gibberish from a string.</title>
      <dc:creator>Jack</dc:creator>
      <pubDate>Tue, 07 May 2024 02:05:45 +0000</pubDate>
      <link>https://dev.to/saintpetejackboy/how-i-discovered-named-entity-recognition-while-trying-to-remove-gibberish-from-a-string-3243</link>
      <guid>https://dev.to/saintpetejackboy/how-i-discovered-named-entity-recognition-while-trying-to-remove-gibberish-from-a-string-3243</guid>
      <description>&lt;p&gt;During a project designed to split singular .pdf files (that had been condensed by a third party) back into their constituent parts, I ran across a problem I'm surprised I'd never encountered before...&lt;/p&gt;

&lt;p&gt;The segmenting to sections, I did intentionally &lt;em&gt;sans-AI&lt;/em&gt;. All of the logic is handled through traditional pattern matching, with an extra step of "ignore words" to help correct false positives. Surprisingly, this worked incredibly well. The pesky problem appeared when I was trying to also generate thumbnail preview images of the new .pdf components. &lt;/p&gt;

&lt;p&gt;Every .pdf could arrive with arbitrary names. Many included UUID, some had dates... zero standardization and there were lots of other "words" that might sometimes appear anywhere in the string.&lt;/p&gt;

&lt;p&gt;I quickly discovered that a massive list of words to remove and diligent pattern matching wasn't going to get me very far. At one point I stubbornly did a Google on 'how to remove gibberish from a string, Python'. In 20 years, I'd never had this &lt;strong&gt;exact&lt;/strong&gt; problem. When I've encountered it in some form before, the patterns were easy to match or I was able to correct the problem further back in the pipeline, nullifying the issue.&lt;/p&gt;

&lt;p&gt;This time, those options were not producing desirable results. Far from it. The best I could do with "classical" methods was to get around 70% accuracy on removing the correct parts of the string. As I was also appending this string to the 'sections' of various .pdf being generated, the conundrum eventually led me to &lt;u&gt;Named Entity Recognition&lt;/u&gt;. &lt;/p&gt;

&lt;p&gt;If you're like me, you may have a surface interest in AI and ML, but have never really pondered Named Entity Recognition. The solution to my problem was something people had been working on for decades. &lt;/p&gt;

&lt;p&gt;While you could easily download a &lt;em&gt;corpus&lt;/em&gt; of the English language, you'd be taking a step in the &lt;strong&gt;wrong&lt;/strong&gt; direction if your intention is to pull the names of people, places or things out of text: you'd be excluding them in almost every instance.&lt;/p&gt;

&lt;p&gt;What is it called when you're looking for a word... that is possibly a name, but definitely not NOT a word? Named Entity Recognition. Lisa F. Rau is credited as being the first person to actually implement one of these solutions.&lt;/p&gt;

&lt;p&gt;At the very outset of this project, I had the idea that I could just convert all the .pdf to text (or at least the first 150 characters of each page) and feed it into an LLM (or even Local LLM) for page ranges to extract. While this is a valid solution to the problem, it consumes a lot of resources. Some pages are just images and require OCR, and we're also doing Named Entity Recognition on each incoming file. There are also many instances where, even with advanced parsing, the first 150 characters of any given document could be nearly identical to an unrelated document (or worse, nonsensical encoding garbage from Docusign).&lt;/p&gt;

&lt;p&gt;Other developers who are miserly might appreciate the solution I used in Python, called spaCy. &lt;/p&gt;

&lt;p&gt;A warning: spaCy took an act of congress for me to get working properly. At one point, I was compiling various dependencies (been a while since I used cmake). The results were well worth it: barely added any processing time and performed almost flawlessly.&lt;/p&gt;

&lt;p&gt;The downside is that there were two other server environments (one older, one newer, go figure, all three Ubuntu), where I &lt;strong&gt;could NOT&lt;/strong&gt; get spaCy and other dependencies of this project to work properly ("could not" as in, gave up after squandering several hours trying to complete a task I'd just done hours prior). &lt;/p&gt;

&lt;p&gt;Don't let that scare you, however, I'm fairly n00bish with Python in general. If you're using a proper virtual environment and don't normally have problems with these type of things, you should be fine.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://spacy.io/"&gt;https://spacy.io/&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;I'm able to use spaCy on an $8 per month unmanaged VPS - meaning the resources required to utilize it are pretty minimal. &lt;/p&gt;

&lt;p&gt;NER is a rabbit hole that leads you down to how machine learning and large language models depend on rather arbitrary rules and semantics to tag and understand the world around them. This was rather serendipitous for me and a great ride.&lt;/p&gt;

&lt;p&gt;"How do I remove gibberish from a string?"&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>spacy</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Having problems completing your project or tasks? Curtis Mayfield is here to help!</title>
      <dc:creator>Jack</dc:creator>
      <pubDate>Sun, 10 Mar 2024 04:15:01 +0000</pubDate>
      <link>https://dev.to/saintpetejackboy/having-problems-completing-your-project-or-tasks-curtis-mayfield-is-here-to-help-56el</link>
      <guid>https://dev.to/saintpetejackboy/having-problems-completing-your-project-or-tasks-curtis-mayfield-is-here-to-help-56el</guid>
      <description>&lt;p&gt;Almost any goal, large or small, can be chunked into smaller "projects". If you are anything like most of us, you sometimes find it difficult to complete a project. The zodiac signs are a complete mismatch and for &lt;em&gt;whatever reason&lt;/em&gt;, it just doesn't come together.&lt;/p&gt;

&lt;p&gt;This happened to me recently. And by recently, I mean for around a year or so. Around May of 2023, Google announced support for Passkey and I quickly rolled out a proprietary implementation over a single weekend for a large project. Drunk off my success, I decided to tackle another triviality: push notifications.&lt;/p&gt;

&lt;p&gt;While this may seem extremely easy for some people, I'm about two decades into this "hobby", and I just couldn't get it to click. I tried several different implementations, even venturing away from my native language (PHP), in the hopes that I could get something, anything, to work.&lt;/p&gt;

&lt;p&gt;Fortunately, I always have a dozen or more things going on at once. A fairly today is no big deal: I'll go work on a few other things and feel better about myself after I shoot some fish in a barrel. Days turned to weeks and weeks into months - sitting in my backlog was this project: push notifications.&lt;/p&gt;

&lt;p&gt;Why couldn't I get them to work reliably? What was so difficult about doing this thing that everybody else seemed to have no problem with? Against my ethos, I'd even decided it might be worth a few dollars if I could find a good third party solution - which also spawned several other failed attempts.&lt;/p&gt;

&lt;p&gt;Tenth time is the  charm, right? Wrong.&lt;/p&gt;

&lt;p&gt;Instead, I finally got fed up with not having push notifications. A real world scenario happened where it was no longer me just toying in my spare time, rather, we absolutely &lt;strong&gt;needed&lt;/strong&gt; push notifications, and &lt;em&gt;yesterday&lt;/em&gt;. Turns out, a key user was having issues getting SMS and some other notifications, where they needed to be able to respond in real-time via a phone call. Push notifications were an obvious solution: their phone could even initiate the required call (or so you'd think, Chrome removing +tel links is a different post... - ended up having to solve with Twilio).&lt;/p&gt;

&lt;p&gt;Before I stray too far off track, I'm writing this to share a very useful technique that anybody can use which helps you to complete projects. It can help you do most anything, even beyond programming, as it is just a creative exercise.&lt;/p&gt;

&lt;p&gt;I didn't finally get push notifications to work because I read the right documentation or tutorial: I got them to work by making a directory called "curtis" and imagining that Curtis Mayfield was the 'Pusherman', here to help me send push notifications. Stay with me here, because I know this sounds outrageous. Why would a sane human do this?&lt;/p&gt;

&lt;p&gt;This is a way to not anthropomorphize the &lt;em&gt;problem&lt;/em&gt;, but instead, to manifest the solution through will and intent. It helps occupy the mind and create an integrated distraction from the drudgery. I was no longer upset I couldn't get push notifications to work: I was sad that Curtis wasn't singing.&lt;/p&gt;

&lt;p&gt;As I plopped through various implementations (quickly landing on a successful integration with node, express, pm2 and some other magic), "Diamond in the back, sun roof top" was playing as a medley in my head. A 1972 song from Superfly was the key I needed to unlock my brain.&lt;/p&gt;

&lt;p&gt;This isn't the first time I've done something similar, nor will it be the last. Just don't get carried away and start littering your namespaces with obscure references - keep those in the comments. Trust me, you and other people will later come to regret trying to figure out what sunRoofTop() actually does.&lt;/p&gt;

&lt;p&gt;Do you have an unpleasant task that has been giving you difficulty? Maybe you need a way to bring back "deleted"/hidden entries from a database, for example. It sounds like a real drudgery. What if you had a Lazarus to "revive" them, instead? &lt;/p&gt;

&lt;p&gt;When you give your projects an identity and a purpose, no matter how small the segment may be, it can help to breathe life into them. The external concepts and ideas can assist your brain in forming meaningful connections between abstractions and legitimate purpose - carefully discerning between the two. While I did prior use the word anthropomorphize (attributing human qualities to the project), there is no real limitation - any substitution is sufficient. The overall meta of it is that you are taking two often equally intangible things and conflating them mentally. &lt;/p&gt;

&lt;p&gt;Production just went down? Wait until you hear about my dragon balls...&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Oracle Free Tier Setup Conundrums</title>
      <dc:creator>Jack</dc:creator>
      <pubDate>Sat, 09 Mar 2024 22:43:53 +0000</pubDate>
      <link>https://dev.to/saintpetejackboy/oracle-free-tier-setup-conundrums-3hlm</link>
      <guid>https://dev.to/saintpetejackboy/oracle-free-tier-setup-conundrums-3hlm</guid>
      <description>&lt;p&gt;If you haven't explored &lt;a href="https://www.oracle.com/cloud/free/"&gt;Oracle's free tier&lt;/a&gt;, you're missing out on some truly unbelievable offerings. The moment I stumbled upon their deal, I was whisked back to the days of receiving a beta invitation for Gmail from Google, marveling, "How can they afford to offer this?" But I digress—let's dive into the meat of setting up their services and some of the problems I had along the way.&lt;/p&gt;

&lt;p&gt;The allure of not one, but two free, unmanaged VPSs was irresistible—a siren call for someone whose DNA is practically coded for pressing the "unmanaged VPS" option. I self-hosted for far too many years and graduated to more professional projects where redundancy and uptime became more of a priority. The fact I ever even paid for a VPS is still astonishing, let alone that something might try to compete with that value - for the ~$18 I spend per month on VPS, they've paid for themselves innumerable times over.&lt;/p&gt;

&lt;p&gt;The opportunity to start afresh with a service from &lt;em&gt;frickin' ORACLE&lt;/em&gt; was also just too good to pass up. Despite my historical grievances with Oracle since their acquisition of MySQL, which had me firmly on the "fsck Oracle" bandwagon, I wondered if this offer could thaw my frosty disposition. &lt;/p&gt;

&lt;p&gt;I'd originally learned of this some months ago and refused to complete the sign-up process. There would be excuses like: "I just don't have the time", but somewhere deep down, I think I just hated Oracle. Even before they bought MySQL. The same way most of us recoil when we see a snake or an insect.&lt;/p&gt;

&lt;p&gt;And so, with mixed emotions, I plunged into the setup process, only to be greeted by a labyrinthine of steps far removed from the simplicity of launching a VPS elsewhere. My initial foray into creating a compute instance was a comedy of errors, leading to a cluttered graveyard of terminated attempts that refused to initialize.&lt;/p&gt;

&lt;p&gt;Oracle's web interface felt like whimsically resetting my progress and steering me away from my chosen Ubuntu setup towards Oracle's preferred configurations several times. Yet, the most pivotal revelation was the necessity of setting up block storage before initiating the compute instance — and never once having to step foot inside "Create Dedicated Virtual Machine Hosts" area. While the compute instance creation offers an option to create block storage during setup, my compute instances would never actually initialize until there was a pre-existing block. I also had to juggle which area had available resources - this means you create a block likely not knowing if it is going to be in an area with an available server shape for what you are trying to do... among other things.&lt;/p&gt;

&lt;p&gt;The exact sequence with which to approach the events never really seems clearly defined and the scope of all the documentation is either too narrow or unrelated to this specific ask, even if it were clearer, the GUI itself offers options that, when used, mean your instance is never going to initialize.&lt;/p&gt;

&lt;p&gt;The journey was similarly strewn with cryptic error messages and a barrage of unfamiliar terms and syntax, leaving me to wonder about the state of my VNIC default route table, NSG, or security list - all things I'd never considered when configuring a VPS, prior. It was only after a series of missteps and a forgotten private key for SSH access that I finally managed to establish a connection, naively believing the hardest part was behind me.&lt;/p&gt;

&lt;p&gt;However, next time, on Dragon Ball Z, with unanticipated issues related to running UFW alongside Oracle's setup, leading to a frustrating game of cat and mouse with my httpd configuration, iptables rules, and the Oracle GUI. The resolution lay in the intricate dance of VNIC, Network Security Group, and Security Lists settings — a puzzle that eventually succumbed to my Goku-like hardheadedness... and a dash of luck.&lt;/p&gt;

&lt;p&gt;You can have Network Security Groups and/or Security List (seems to be the old way of doing it) - you can also have Ingress and Egress rules, stateful or stateless. &lt;/p&gt;

&lt;p&gt;I toggled every single conceivable option. I got so tired of making my own rules, I looked at what was there - and at last, I seen under the SSH, there was an option present for the source port to be "any", so I figured that was causing me the problems: nobody could connect since everything was being sent in as SSH. Eureka!&lt;/p&gt;

&lt;p&gt;Except, not. That wasn't even it. Worse, my SSH connection terminated. It didn't take me long to add back in the "any" to source port and regain my SSH, but folks, when I tell you that it took me a second to figure that I should just do that with my http port configurations as well, it was definitely not a New York second.&lt;/p&gt;

&lt;p&gt;So, heed my call and claim your free servers from Oracle—before the winds of change sweep this opportunity into the annals of internet history (and we all have to get grandfathered in on some different plan). Embark now, and you might just conquer the setup in a little over an hour... or two... maybe more, armed with your trusty patience and a willingness to decode a barrage of new jargon. Good luck!&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
