loading...
Cover image for Attempting to Learn Go - Listing Files By Extension

Attempting to Learn Go - Listing Files By Extension

shindakun profile image Steve Layton ・6 min read

Hello World

Near the end of the last post, I noted we would put the static site generator project aside for the time being. I decided that to keep things moving forward I'd change up what we're looking at every few posts. @ladydascalie suggested a couple of exercises that I thought would be good to tackle. This time around we are going to take a swing at the first idea.


End Goal

  • Write a program to sort files within a folder by their extension
    • Later make it sort them in logical folders ex: .txt in Documents, .jpg in Images etc...

We are going to focus on the first point this time around. The idea is that we'll take a bunch of file names (strings) and print to standard out in alphabetical order. With that in mind, I decided to start with a slice of filename-like strings. That is strings with a period . in there somewhere. We could then take this slice of strings range through them. In each step of our range, we will strings.Split() the string at the .. If Split() returns more than one element we have an extension. Extensions are two to three characters but could be any number. We're not judging, and will take anything after the last .. The extension and the filename will go into a map[string][]string. We can imagine our final map as JSON which looks something like:

{
    "epub": [
        "lil-go-book.epub"
    ],
    "jpg": [
        "as23dsd.jpg"
    ],
    "md": [
        "README.md"
    ],
    "mp3": [
        "something.mp3"
    ],
    "pdf": [
        "go-in-action.pdf"
    ],
    "txt": [
        "asdf.txt",
        "qwerwe.txt"
    ]
}

In fact, I'll add in a feature to print the list in plain text output or a JSON object. Then you could pipe it to jq, that might be useful.


Let's Go

Let's take a look at our first iteration of the code! It follows the pattern I laid out in my head and works as expected - which was a nice touch.

package main

import (
  "fmt"
  "strings"
)

func main() {
  var m = make(map[string][]string)
  list := []string{"no-ext", "README.md", "asdf.txt", "qwe.rwe.txt", "as23dsd.jpg", "something.mp3", "go-in-action.pdf", "lil-go-book.epub"}
  for _, s := range list {
    ext := strings.Split(s, ".")
    if len(ext) > 1 {
      m[ext[len(ext)-1]] = append(m[ext[len(ext)-1]], s)
    }
  }
  fmt.Printf("%v", m)
}

From here I added an if statement to account for files with no extensions. While we're at it let's add a sort.Strings so we print each group in alphabetical order. I'm not sorting extensions though at this point, though, that comes later. You can see our small tweaks in the snippet below.

...
  for _, s := range list {
    ext := strings.Split(s, ".")
    if len(ext) > 1 {
      m[ext[len(ext)-1]] = append(m[ext[len(ext)-1]], s)
    }
    if len(ext) == 1 {
      m["no-ext"] = append(m["no-ext"], s)
    }
    sort.Strings(m[ext[len(ext)-1]])
  }
  fmt.Printf("%v", m)
}
...

Edit As pointed out by @detunized the sort.Strings() is not in the best spot. As it is in the examples it would trigger every loop which is not what we want in the end.



<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">

Hah. You caught me! The sort should have been moved up into the print function(s) at the very least. It's a bad design decision - I put it there at first just for the sake of simplicity and never got around to cleaning it up. It doesn't really matter in a directory of a few files but would really impact performance in a larger directory. Something like the following might be fine, and still pretty simple to follow.

func plainList(m map[string][]string, v []string) {
    for _, value := range v {
        sort.Strings(m[value])
        for _, file := range m[value] {
            fmt.Println(file)
        }
    }
}

I think I may update the article to make sure it's called out for clarity.


Do It For Real-ish

We have the basic program done! Now we need to be able to run it against the actual file system. To do this we are going to use the os standard library, as well as reflect. We're going to add a couple of different pieces in this iteration of the code so let's dive in.

package main

import (
  "fmt"
  "io/ioutil"
  "log"
  "os"
  "reflect"
  "sort"
  "strings"
)

In main() we're adding os.Getwd() to grab the users current working directory. If we can't determine it for some reason we'll panic with a message. Note, that I'm trying to give a bit of a more detailed error. We also don't panic but instead os.Exit(). Why? Exiting with an error code felt better in this situation rather than a wordy panic(). If not we'll try and read the directory, again failing if we can't read it. We also check to see if the file is a directory and skip over since we're only looking at files for now. We could sort them into a "directory" group I suppose, next time.

func main() {
  wd, err := os.Getwd()
  if err != nil {
    msg := fmt.Sprintf("An error occured getting the current working directory.\n%s", err)
    fmt.Println(msg)
    os.Exit(1)
  }

  dir, err := ioutil.ReadDir(wd)
  if err != nil {
    msg := fmt.Sprintf("An error occured reading the current working directory.\n%s", err)
    fmt.Println(msg)
    os.Exit(1)
  }

  var m = make(map[string][]string)
  for _, file := range dir {
    if !file.IsDir() {
      fileName := file.Name()
      ext := strings.Split(fileName, ".")
      if len(ext) > 1 {
        m[ext[len(ext)-1]] = append(m[ext[len(ext)-1]], fileName)
      }
      if len(ext) == 1 {
        m["no-ext"] = append(m["no-ext"], fileName)
      }
      sort.Strings(m[ext[len(ext)-1]])
    }
  }

We're using reflect to get the values of our extension strings. Thank goodness for Go Docs! This will let us print them out as a separated list with the extension followed by the files that are in each group.

  values := reflect.ValueOf(m).MapKeys()

  for i, k := range values {
    fmt.Println(values[i])
    for _, val := range m[k.String()] {
      fmt.Println(" -", val)
    }
  }
}

That seems to fulfill the base program requirements...


But Wait There's More

We're not done yet! We need to do one more iteration. Since this post is already getting a bit long we're going to skip forward. I'm going to add in several things we mentioned above. A switch to output JSON, a "plain" ls style and the nested style hinted at above. We'll read the output format from the command line and use a simple switch statement to choose the right one. I wasn't very explicit with the variable names, it should be followable though.

package main

import (
  "encoding/json"
  "fmt"
  "io/ioutil"
  "log"
  "os"
  "reflect"
  "sort"
  "strings"
)

First thing I did on this iteration is pulling the print routines out of the main loop and into its own function. I then make two more print functions for each output type. I was going to try and be clever and over complicate things but having only one "print" function. In the end, I decided they were different enough it would be fine to have each routine on its own.

func plainList(m map[string][]string, v []string) {
  for _, value := range v {
    for _, file := range m[value] {
      fmt.Println(file)
    }
  }
}

func nestedList(m map[string][]string, v []string) {
  for i, value := range v {
    fmt.Println(v[i])
    for _, file := range m[value] {
      fmt.Println(" - ", file)
    }
  }
}

If you look at the next three err sections you'll see that they are more or less the same. If I extend this program any further beyond the basics it may be worth pulling these bits out. We could make an isOK() type of function I suppose. This function would check the error and either exit or return as needed at the time.

func jsonList(m map[string][]string) {
  j, err := json.Marshal(m)
  if err != nil {
    msg := fmt.Sprintf("An error occured formatting the JSON.\n%s", err)
    fmt.Println(msg)
    os.Exit(1)
  }
  fmt.Printf("%s", j)
}

func main() {
  wd, err := os.Getwd()
  if err != nil {
    msg := fmt.Sprintf("An error occured getting the current working directory.\n%s", err)
    fmt.Println(msg)
    os.Exit(1)
  }

  dir, err := ioutil.ReadDir(wd)
  if err != nil {
    msg := fmt.Sprintf("An error occured reading the current working directory.\n%s", err)
    fmt.Println(msg)
    os.Exit(1)
  }

  var m = make(map[string][]string)
  for _, file := range dir {
    if !file.IsDir() {
      fileName := file.Name()
      ext := strings.Split(fileName, ".")
      if len(ext) > 1 {
        m[ext[len(ext)-1]] = append(m[ext[len(ext)-1]], fileName)
      }
      if len(ext) == 1 {
        m["no-ext"] = append(m["no-ext"], fileName)
      }
      sort.Strings(m[ext[len(ext)-1]])
    }
  }
  values := reflect.ValueOf(m).MapKeys()

To print the extensions in alphabetical order, I've added this quick loop. We use the values that we got from the reflect to and an ordered list of the extensions. The now sorted extensions are passed into our print functions.

  var extensions []string
  for _, value := range values {
    extensions = append(extensions, value.String())
  }
  sort.Strings(extensions)

When the program executes we check for the number of command line arguments. If we have more than one we check if it matches one of the cases. If not we print the usage instructions. If we have no command line arguments we print out the nested style file list.

  if len(os.Args) > 1 {
    switch arg := os.Args[1]; arg {
    case "plain":
      plainList(m, extensions)
    case "nested":
      nestedList(m, extensions)
    case "json":
      jsonList(m)
    default:
      fmt.Println("Usage: gls [plain|nested|json]")
    }
  } else {
    nestedList(m, extensions)
  }
}

Next time

And there we go! The post is getting a bit long so we'll hold off on the "bonus goal" of sorting files into directories. This code will become the base for that next time around. In the meantime, how would you have written a similar program? Let me know in the comments!


You can find the code for this and most of the other Attempting to Learn Go posts in the repo on GitHub.



Posted on by:

shindakun profile

Steve Layton

@shindakun

I've been known to write some code from time to time.

Discussion

markdown guide
 

Steve, why do you sort inside the loop on every iteration?

for _, file := range dir {
    if !file.IsDir() {
        ...
        sort.Strings(m[ext[len(ext)-1]]) // <-- HERE
    }
}

And here's my take on it. You can sort by predicate. It's not exactly very efficient though, since the extension is recalculated every time. But come on, Go could be really annoying sometimes. Look at this verbosity:

type ByExt []string

func (a ByExt) Len() int           { return len(a) }
func (a ByExt) Swap(i, j int)      { a[i], a[j] = a[j], a[i] }
func (a ByExt) Less(i, j int) bool { return filepath.Ext(a[i]) < filepath.Ext(a[j]) }

...

func main() {
    ...
    files := []string{}
    for _, file := range dir {
        if !file.IsDir() {
            files = append(files, file.Name())
        }
    }
    sort.Sort(ByExt(files))
    ...
}

In Ruby that would be:

filenames.sort_by { |f| File.extname f }
 

Nah, no need for the boilerplate to define a custom sort. This would do sorting the map:

var m = make(map[string][]string)
for _, file := range dir {
    if !file.IsDir() {
        fileName := file.Name()
        ext := strings.Split(fileName, ".")
        switch {
        case len(ext) > 1:
            m[ext[len(ext)-1]] = append(m[ext[len(ext)-1]], fileName)
        case len(ext) == 1:
            m["no-ext"] = append(m["no-ext"], fileName)
        }
    }
}
for ext := range m { sort.Strings(m[ext]) }
 

Edit: sorting the ˋ[]stringˋ within the ˋmap[string][]stringˋ. The map itself can't be sorted.

I don't have a map in my version. I sort an array by a predicate.

Yes, I have seen it. Your approach is different by just sorting a list of filenames. The orginal intent is to sort files by extension into different buckets.

Your use of filepath.Ext() is quit clever. Haven't thought of that.

This would make the example even shorter:

var m = make(map[string][]string)
for _, file := range dir {
    if !file.IsDir() {
        fileName := file.Name()
        ext := filepath.Ext(fileName)
        m[ext] = append(m[ext], fileName)
    }
}
for ext := range m { sort.Strings(m[ext]) }

@detunized @dirkolbrich

Thanks for the replies! filepath.Ext()! Didn't occur to me to try that. It goes to show that the standard library really is pretty complete.

Dirk, I like the for ext := range m { sort.Strings(m[ext]) } solution then I wouldn't need to have a separate sort in each "print" function, it's much clearer that way.

 

Hah. You caught me! The sort should have been moved up into the print function(s) at the very least. It's a bad design decision - I put it there at first just for the sake of simplicity and never got around to cleaning it up. It doesn't really matter in a directory of a few files but would really impact performance in a larger directory. Something like the following might be fine, and still pretty simple to follow.

func plainList(m map[string][]string, v []string) {
    for _, value := range v {
        sort.Strings(m[value])
        for _, file := range m[value] {
            fmt.Println(file)
        }
    }
}

I think I may update the article to make sure it's called out for clarity.

 

You don't mind?

func plainlist(m map[string][]string, order string) string {
    // 1. get all keys of the map
    var keys []string
    for k := range m {
        keys = append(keys, k)
    }

    // 2. sort by order type
    switch order {
    case "desc", "Desc", "DESC":
        for ext := range m {
            sort.Sort(sort.Reverse(sort.StringSlice(m[ext])))
        }
        sort.Sort(sort.Reverse(sort.StringSlice(keys)))
    default:
        for ext := range m {
            sort.Strings(m[ext])
        }
        sort.Strings(keys)
    }

    // 3. build a concatenated string
    var list string
    for _, k := range keys {
        list = fmt.Sprintf("%v\n%v", list, m[k])
    }
    return list
}

use it with:

fmt.Println(plainlist(m, "asc"))
fmt.Println(plainlist(m, "desc"))
 

Hey Steve, fantastic start.

You've come up with some great solutions in there, so I thought I'd share my own.

I restricted myself to fitting a subset of what you've solved thus far, that is to say, get all the files organised by category, and print them out as JSON. I've ignored plain / nested output, since that is somewhat trivial/not business logic.

Here's my solution:

package main

import (
    "encoding/json"
    "flag"
    "io/ioutil"
    "log"
    "os"
    "path/filepath"
    "strings"
)

var directory string

func main() {
    flag.StringVar(&directory, "dir", ".", "sorter -dir ./path/to/dir")
    flag.Parse()

    files, err := ioutil.ReadDir(directory)
    if err != nil {
        log.Fatal(err)
    }

    var categories = make(map[string][]string)
    for _, file := range files {
        // skip directories.
        if file.IsDir() {
            continue
        }

        ext := filepath.Ext(file.Name())
        name := strings.TrimSuffix(file.Name(), ext)

        // empty name signified a dotfile, skip that.
        if name == "" {
            continue
        }

        // get the absolute path to the file, or error out
        fpath, err := filepath.Abs(filepath.Join(directory, file.Name()))
        if err != nil {
            log.Fatalf("failed building absolute path: %v", err)
        }

        // trim dots before adding to the map.
        ext = strings.TrimPrefix(ext, ".")
        categories[ext] = append(categories[ext], fpath)
    }

    if err := json.NewEncoder(os.Stdout).Encode(categories); err != nil {
        panic(err)
    }
}

As you can see, I've drastically cut down on the number of operations needed to get there, as well as corrected for a few problems you weren't looking out for yet. These are mainly:

  • You should skip dotfiles or hidden files, which start with a . character (at least by default), as these are frequently config files or important somehow.

  • You're expending a lot of effort sorting / printing your data, when really all you need is a map to handle the listing

output from my program (against a sample directory):

usage: sorter -dir ./sample | jq

{
  "jpg": [
    "/Users/bc/code/Personal/sorter/samples/3.jpg"
  ],
  "pdf": [
    "/Users/bc/code/Personal/sorter/samples/2.pdf"
  ],
  "txt": [
    "/Users/bc/code/Personal/sorter/samples/1.txt",
    "/Users/bc/code/Personal/sorter/samples/2.txt",
    "/Users/bc/code/Personal/sorter/samples/3.txt"
  ]
}

If I wanted plain output, with a map I could do something like this:

for key, category := range categories {
    fmt.Println("kind:", key)
    for _, file := range category {
        fmt.Println("\t", file)
    }
}

which would output like so:

kind: txt
     /Users/bc/code/Personal/sorter/samples/1.txt
     /Users/bc/code/Personal/sorter/samples/2.txt
     /Users/bc/code/Personal/sorter/samples/3.txt
kind: pdf
     /Users/bc/code/Personal/sorter/samples/2.pdf
kind: jpg
     /Users/bc/code/Personal/sorter/samples/3.jpg

This is somewhat trite and gross but you get the point, dealing with one map makes this much easier to handle!

Looking forward to seeing what you come up with next!

 

I think the core of my issue is I'm also not leaning on the standard library as much as I should. I didn't realize filepath.Ext() was a thing. :/ Yeah, I read "sorting files by ext" as just that sorting alphabetically, had I left that out I would have been done quite a bit quicker. I suppose that made me go off the rails a bit so to speak. The different printing methods were not needed at all but what are you gonna do lol.