Tool tips: Find
The find utility is so useful for automation that it deserves a special mention on its own. I will not attempt to cover all of its features, only the ones that I use on a daily basis.
I typically use find for two purposes:
- To create a list of files matching some search terms.
- To execute a batch operation on those files using the -exec argument.
In Shell Scripting: Becoming a Wizard, I layed out a recipe for productive and useful bash scripts that uses a while loop over some sort of multi-line input. This construct is more powerful than find's built in -exec flag, but that power is not always needed.
Here is how I use find:
Stage 1: Build up a list of files to operate on.
By default, find just prints the list of matching files, which is really useful because you can check the list of files you'll be working with before you move on to actually doing something to them. It also makes find useful as a general purpose file searching tool that's available on most Unix systems. Note that find does not inspect the contents of a file directly, only its external attributes. If you want to search through file contents from the command line, check out grep/fgrep or ack. (fgrep -r searchterm file1 [file2 ...])
The general form for find looks something like this:
find directory argument1 [argument2 ...]
Typically, I want to run a recursive search from my current directory (which is '.' in bash convention.) I'll include a bash prompt signified by the $ character, and example output from the find command to better illustrate what's going on:
$ find .
.
./bar.jpg
./baz.png
./foo
./foo/foo2.jpg
./foo.jpg
This command (find .) is the simplest working invocation of find, and produces a full listing of every file and directory inside your current directory (including the current directory itself.) I rarely ever want this though, so I will begin by limiting my search. A common way I do this is by searching for filename matches. For instance:
$ find . -name '*.jpg'
./foo/foo2.jpg
./foo.jpg
Uses wildcard/glob syntax to search for only for files with names ending in .jpg. If we wanted to also include files ending in .JPG or any other case combinations, we could use the -iname (case insensitive name) expression instead.
$ find . -iname '*.jpg'
./bar.JPG
./foo/foo2.jpg
./foo.jpg
This is working well so far. Supposing we wanted to limit our recursive search to directories of only a certain depth, we can do that using the -maxdepth flag. To search only the immediate children of the current directory, we use -maxdepth 1. To search only immediate children, and the children of immediate subdirectories, use -maxdepth 2, and so on:
$ find . -iname '*.jpg' -maxdepth 1
./bar.JPG
./foo.jpg
Let's say for the sake of example that we wanted to search based off of partial filename instead, for files called 'foo':
$ find . -name '*foo*'
./foo
./foo/foo2.jpg
./foo.jpg
Whoops! We don't want any directories in this case, so let's narrow our search criteria to files only and not directories:
$ find . -name '*foo*' -type f
./foo/foo2.jpg
./foo.jpg
Perfect! -type f selects files and -type d selects directories, respectively. In this example we could also use glob searching on the name for files such as -name '*foo*.jpg'. However, -type is still useful since Unix systems allow file 'extensions' on directory names and allow files without extensions.
How about if we only want to find files in a certain subfolder? We can use -path :
$ find . -path '*foo/*.jpg'
./foo/foo2.jpg
A trivial example, but this is really useful for searching large source code trees for a particular file, etc. For instance, if I know there's a class that has Twitter in its filename and resides in some subfolder of Services, a search for -path '*Services*Twitter*' should turn it up. A case insensitive version of this flag is -iwholename.
What if we want to use some logic in our search? find supports that. For instance, we can search for all jpg files that do NOT contain 'foo' in their name using the ! operator:
$ find . -iname '*.jpg' ! -name '*foo*'
./bar.JPG
How about if we want to find all jpg and png files? Use -o for logical or. It will be true if either the previous flag or the next flag is true (but not if both are):
$ find . -iname '*.jpg' -o -iname '*.png'
./bar.JPG
./baz.png
./foo/foo2.jpg
./foo.jpg
Tons of other functionality exists in find, although the above are all the commands I use on a daily basis. find can filter on modification/creation time, ownership, more unusual file types like FIFOs, and use other types of logic such as AND. Run man find for the full list.
Stage 2: Execute a command across all selected files.
Now that we have a list of files we want to use, we can take advantage of find's most useful and powerful flag, -exec. The syntax for -exec seems cryptic at first, but comes to make sense with repeated use.
There are two forms I typically use when calling -exec:
-exec somecommand {} \;
and
-exec somecommand {} +
The former will execute some command once for each file in our search results. The latter will execute one command once (or as seldom as possible) and pass the files in as multiple arguments. For example:
$ find . -iname '*.jpg' -maxdepth 1 -exec echo file {} \;
file ./bar.JPG
file ./foo.jpg
The command echo file {} is being run for each result, with the path to the result being substituted where the {} is. \; is just a terminator character to let find know that that's the end of the command we want to run. If we get rid of the echo (a debugging trick I use to preview a list of batch commands) the file command will actually be run against all these files, printing out their true type for us:
$ find . -iname '*.jpg' -maxdepth 1 -exec file {} \;
./bar.JPG: PNG image, 64 x 32, 8-bit/color RGBA, non-interlaced
./foo.jpg: JPEG image data, EXIF standard
Good thing I checked - looks like bar.JPG is really a PNG with an incorrect extension!
The file utility is even kind enough to print out the file name of the file it's talking about. It's also spiffy enough to accept a space delimited list of files to parse (as do many command line utilities) so we can improve the performance of this batch operation by reducing the invocations of file:
$ find . -iname '*.jpg' -maxdepth 1 -exec echo file {} +
file ./bar.JPG ./foo.jpg
Rather than file getting called twice (once for each result) it's only called once, with each result being passed as an argument. In this trivial example this behaves the same. But if you need to run a batch operation over many files, it can make a difference in speed.
Also, if your batch operation involves something like gluing a bunch of files together (like with cat to concatenate files) or opening them all at once in one editor, the + style invocation is necessary. It basically serves the same function as xargs, so there's no reason to pipe the output of find to xargs. Using -exec {} + is safer and faster. xargs is still useful when taking input from another source such as a text file, but be aware of paths with spaces in them, they need to be escaped properly.
Gotchas
If you are combining -exec with logical operations such as -o, you may see some odd results. Find is capable of running different -exec commands for different groups of matched files, and this is the default behavior. So for instance:
$ find . -iname '*.jpg' -o -iname '*.png' -exec echo {} \;
./baz.png
The jpg files don't show up, since the -exec is only running on files that match the second search term (png.) The way to fix this is by grouping logical searches using parentheses. These have to be escaped, otherwise bash will try to interpret them. So now:
$ find . \( -iname '*.jpg' -o -iname '*.png' \) -exec echo {} \;
./bar.JPG
./baz.png
./foo/foo2.jpg
./foo.jpg
Much better!
Limitations
find's -exec flag is only designed to run one command at a time, and doesn't support fancy bash tricks such as output substitution and piping. If you need those, use the bash while loop outlined in Shell Scripting: Becoming a Wizard. You can still use find to create the list of files you want to operate on, just pipe the output of find into the bash while loop.
Note that the -exec {} + style invocation may not be present in older versions of find. If that's the case and you have no choice but to use xargs, avoid nasty escaping issues by telling both tools to use the null character as a filename delimiter instead of newlines using find -print0 and xargs -0, like so:
find directory -print0 [argument2 ...] | xargs -0 somecommand
Conclusion
I hope you learned a thing or two about find today. Remember, practice! I always had to look up the -exec syntax the first several times I used it, or if I haven't used it in a while (which is no longer likely since I now use it almost every day!)
After reading this article and experimenting a bit, you should now be able to understand the find examples for doing useful work I mentioned in my article on The Shell and Coreutils. I hope this is enough to spark your imagination for the massively powerful automation techniques available by coupling find with other command line utilities.
Other resources
Here are some great resources to learn more:
- man find - The find manpage on your system has a definitive list of valid expressions and operators.
- Greg's Wiki - UsingFind - This is where I learned most of what I know about find along with the man page.