Thursday, February 5, 2009

Total size of all files with a certain extension

In our product, we have code that traverses a directory and its subdirs and totals the sizes of all files with a particular extension. I wrote a unit test for this code, but needed to independently verify the size that I was looking for to make sure the code was giving me the right answer. In our code, we use the File.length() method, which returns a long representing the amount of bytes taken up by the file, so I needed to calculate the total in bytes.

After some digging around on Google, I came up with the following bash command line expression:

find . -name *.ext | xargs du -b | cut -f 1 | awk '{total+=$0}END{print total}'

See the command man pages for details, bus as a quick summary, the find command finds all files with extension ext, which get fed into the du command that prints out the size, which gets picked up out of the du output and fed to awk, which totals everything (I am, admittedly, not initiated into the mysteries of awk, I just know that it works in this case ;).

Works like a charm! Except on Solaris, where du doesn't have a -b switch, so you get a little less detail (Solaris du reports sizes in 512-byte blocks).