<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Rafael Buzzi de Andrade</title>
    <description>The latest articles on DEV Community by Rafael Buzzi de Andrade (@buojira).</description>
    <link>https://dev.to/buojira</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F179187%2F21c84f82-ee8b-4967-8262-5190f09e81a7.jpg</url>
      <title>DEV Community: Rafael Buzzi de Andrade</title>
      <link>https://dev.to/buojira</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/buojira"/>
    <language>en</language>
    <item>
      <title>I ain't afraid of no regex</title>
      <dc:creator>Rafael Buzzi de Andrade</dc:creator>
      <pubDate>Fri, 14 Jun 2019 12:06:49 +0000</pubDate>
      <link>https://dev.to/buojira/i-ain-t-afraid-of-no-regex-54ig</link>
      <guid>https://dev.to/buojira/i-ain-t-afraid-of-no-regex-54ig</guid>
      <description>&lt;p&gt;Sometimes knowing what you do &lt;strong&gt;not&lt;/strong&gt; want is as important as knowing what you do want. This is true for regular expressions &lt;em&gt;(regex)&lt;/em&gt;, but it is also true for this article which is not for someone already used to regex in daily basis.&lt;/p&gt;

&lt;p&gt;But, if you want some understandable examples in order to use it more often, I hope this series of articles might help you. &lt;em&gt;(&lt;a href="https://en.wikipedia.org/wiki/Regular_expression"&gt;here&lt;/a&gt; is some history if you want)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;All examples below are available at my &lt;a href="https://github.com/buojira/samples/tree/master/regexing"&gt;github repository&lt;/a&gt;, although if you want to test the expressions as long as you read, I recomend using some page that parses it in realtime. I enjoy using &lt;a href="https://rubular.com/"&gt;https://rubular.com/&lt;/a&gt;, but feel free to choose yours. Now let us code:&lt;br&gt;
Import this packages in your class:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;import java.util.regex.Matcher;&lt;br&gt;
import java.util.regex.Pattern;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Let us say you want to find all word terminating in &lt;em&gt;"thing"&lt;/em&gt; withing a text. You could do this way:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Pattern pattern = Pattern.compile("\\w+thing");
Matcher matcher = pattern.matcher("A thing I want is to find something, or anything. I do not really care, but I do no want go with nothing at hand.");
while (matcher.find()) {
    System.out.println("Found " + matcher.group());
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;If you want to achieve the exact same result without any use of regex, it would appear with something like this:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;String text = "A thing I want is to find something, or anything. I do not really care, but I do no want go with nothing at hand.";
String sufix = "thing";
String[] words = text.split(" ");
for (int i = 0; i &amp;lt; words.length; i++) {
    String word = words[i];
    if (word != null &amp;amp;&amp;amp; word.contains(sufix)) { //"endsWith does not work because of "," and "."
        if (word.length() != sufix.length()) { //remember, you want words ending with "thing", but not the words itself
            System.out.println(word
                    .replace(".", "")
                    .replace(",", "")
            );
        }
    }
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Note that this is a simple example. In a more complex scenario you would need to manually check many other things.&lt;/p&gt;

&lt;p&gt;But let us continue, shall we?&lt;/p&gt;

&lt;p&gt;What does &lt;em&gt;"\\w+thing"&lt;/em&gt; mean? Well, &lt;em&gt;"thing"&lt;/em&gt; is the sufix you want, I believe this is pretty obvious, let us take a look at &lt;em&gt;"\\w+"&lt;/em&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  "\\"
&lt;/h3&gt;

&lt;p&gt;When you see two backslashes it merely means a escape character escaping another escape character. So read as it was only one backslash (&lt;em&gt;"[\w]+thing"&lt;/em&gt;);&lt;/p&gt;

&lt;h3&gt;
  
  
  \w
&lt;/h3&gt;

&lt;p&gt;Means any word character. Any letter from a to z &lt;em&gt;(and A to Z)&lt;/em&gt;, any digit and &lt;em&gt;"_".&lt;/em&gt; Could you write it in a different way? Yes, the regex &lt;em&gt;"[a-zA-Z_]+thing"&lt;/em&gt; has the exact same result &lt;em&gt;(We will just talk about the brackets)&lt;/em&gt;. I you believe because this variant if more explicit will be easier to maintain, go on. Regex, like most of things, has many ways to get the same result. So the brackets...&lt;/p&gt;

&lt;h3&gt;
  
  
  [  ]
&lt;/h3&gt;

&lt;p&gt;It means the options of characters you want to find. If you antes only a &lt;em&gt;"a"&lt;/em&gt; or &lt;em&gt;"b"&lt;/em&gt;, you would write &lt;em&gt;[ab]&lt;/em&gt;. If ou want letters from lowercase a to z you would write &lt;em&gt;[a-z]&lt;/em&gt;, if you want only lowercase a to h, write &lt;em&gt;[a-h]&lt;/em&gt; and so on. Oh... you will notice that samples have way more brackets than what is needed, but the results are the same.;&lt;/p&gt;

&lt;h3&gt;
  
  
  +
&lt;/h3&gt;

&lt;p&gt;This is not a append operation. The &lt;em&gt;"+"&lt;/em&gt; means that the characters on the left are mandatory. If you replace it for a &lt;em&gt;"*"&lt;/em&gt;, thas means optional, you will see that the word &lt;em&gt;"thing"&lt;/em&gt; I now be displayed as well&lt;/p&gt;

&lt;p&gt;Go on. Try it. I will wait here...&lt;/p&gt;

&lt;p&gt;Now, if you take of this operation characters &lt;em&gt;("\wthing")&lt;/em&gt; you will see a different result. It will bring the following values:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;ething&lt;br&gt;
ything&lt;br&gt;
othing&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Because the matcher will understand you want any word character before &lt;em&gt;"thing".&lt;/em&gt; But only one. Do you want two? Use &lt;em&gt;"\w{2}thing"&lt;/em&gt; and you will get:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;mething&lt;br&gt;
nything&lt;br&gt;
nothing&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Do you want at least three befor the sufix, but do not want to limit de size? Use &lt;em&gt;"\w{3,}thing"&lt;/em&gt; and &lt;em&gt;"nothing"&lt;/em&gt; will not be brought:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;something&lt;br&gt;
anything&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Do you want at least one character but no more than three? Try &lt;em&gt;"\w{1,3}thing"&lt;/em&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;omething&lt;br&gt;
anything&lt;br&gt;
nothing&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And now you might be thinking that you do not want broken words in your results. Try &lt;em&gt;"\W\w{1,3}thing"&lt;/em&gt;. This &lt;em&gt;"\W"&lt;/em&gt; means any non word character. The exact oposite of &lt;em&gt;"\w"&lt;/em&gt;. The result will be:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;anything&lt;br&gt;
nothing&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I could have been writen &lt;em&gt;"\s\w{1,3}thing"&lt;/em&gt; as well. &lt;em&gt;"\s"&lt;/em&gt; means any whitespace character (yes, &lt;em&gt;"\S"&lt;/em&gt; means any non-whitespace character).&lt;/p&gt;

&lt;p&gt;As a developer, you probably thought &lt;em&gt;"What would happen if there was a target word in the begining of the phrase?"&lt;/em&gt; &lt;del&gt;(In the repository there is a solution. It does not use purely regex to solve, but hey, we are not confined to one pure solution, right?)&lt;/del&gt;&lt;/p&gt;

&lt;p&gt;Try as many variations you want.&lt;/p&gt;

&lt;p&gt;See you soon with more complex situations envolving regex.&lt;/p&gt;

</description>
      <category>regex</category>
      <category>java</category>
    </item>
  </channel>
</rss>
