DEV Community

Bruce Axtens
Bruce Axtens

Posted on

Linq Love

I've been struggling with my photo collection. I'm back to using Shotwell on Linux but for a while was fiddling with a Windows solution.

In the process I tried to reduce the collection to just the unique items. I had my "Pictures" folder and then various copies and mis-catalogued items in places like "MasterPictures" and "Transfer".

So I tried writing a tool to get a list of photos (and everything else), sorting it biggest first, and then hunting for files that match on size and content in the other folders, generating batch statements to delete the copies.

For those unfamiliar with Urdu, NaqlTalash (نقل تلاش) means /Duplicate Search/. (Whether an Urdu speaker would actually say that is another question entirely.)

There are a number of unaddressed inefficiencies. One might, for example, read the source file into a byte array first in case one will compare it against many targets. And then there's the issue of not giving the right number of arguments or giving arguments that don't resolve to real directories. And then there's the issue of what happens if you don't have adequate permissions.

The source is up on Github.

using System;
using System.Linq;
using System.IO;

namespace NaqlTalash
{
    class Program
    {
        static void Main(string[] args)
        {
            if (args.Length == 2)
            {
                var sourceList = from file in Directory.GetFiles(args[0], "*.*", SearchOption.AllDirectories)
                                 let finfo = new System.IO.FileInfo(file)
                                 orderby -finfo.Length
                                 where finfo.Length > 0
                                 select finfo;
Enter fullscreen mode Exit fullscreen mode

sourceList now contains an enumerator of FileInfo objects representing all the files in args[0]'s folder tree, ordered by the size of the file (largest to smallest), where the file length is greater than zero.

                var destList = from file in Directory.GetFiles(args[1], "*.*", SearchOption.AllDirectories)
                               let finfo = new System.IO.FileInfo(file)
                               orderby -finfo.Length
                               where finfo.Length > 0
                               select finfo;
Enter fullscreen mode Exit fullscreen mode

Same again but for the folder tree under the args[1].

                foreach (var finfo in sourceList)
                {
                    var matchFound = false;
                    var matching = from item in destList where item.Length == finfo.Length select item;
Enter fullscreen mode Exit fullscreen mode

Iterate through the FileInfos in the sourceList and select items from destList where the size of the first is the same as that of the second.

                    foreach (var match in matching)
                    {
Enter fullscreen mode Exit fullscreen mode

Iterate through the enumerator of matching FileInfos.

                        if (FileEquals(match.FullName, finfo.FullName))
                        {
                            if (!matchFound)
                            {
                                Console.WriteLine("REM " + finfo.FullName);
                                Console.WriteLine("REM " + finfo.Length);
                            }
                            matchFound = true;
                            Console.WriteLine("\tDEL \"" + match.FullName + "\"");
                        }
                    }
Enter fullscreen mode Exit fullscreen mode

Check to see if the files match each other on contents. The names of the files are not compared. The first time we compare, we output REM statements of the source's files name and size. On the first and subsequent matches we output a DEL statement to remove the duplicate file.

                }
            }
        }

        static bool FileEquals(string path1, string path2)
        {
            var file1 = File.ReadAllBytes(path1);
            var file2 = File.ReadAllBytes(path2);
            if (file1.Length == file2.Length)
            {
                for (int i = 0; i < file1.Length; i++)
                {
                    if (file1[i] != file2[i])
                    {
                        return false;
                    }
                }
                return true;
            }
            return false;
        }
Enter fullscreen mode Exit fullscreen mode

A generic byte-wise file comparison function.

    }
}
Enter fullscreen mode Exit fullscreen mode

Output from a run is redirected to a .CMD file so that it can be checked before being executed.

Discussion (0)