D&C GLug - Home Page

[ Date Index ] [ Thread Index ] [ <= Previous by date / thread ] [ Next by date / thread => ]

Re: [LUG] filtering recovered jpeg files

 

On Mon, 20 Aug 2018, Pentiddy wrote:

Hello all...
A few years back I made the novice mistake of losing all my photos due to a disk crash... I did however manage to run a recovery program on the disk and get images off it. I am wanting to finally sort through these and delete spurious thumbnails, internet cached images etc. and wondered if there is a convenient way of for instance filtering just full sized pictures out of the folders, or deleting files below a certain size... I am Xfce based so thunar scripts an option... Any help with this greatly appreciated- trying to sort out some memorable photo's for my Daughter leaving home.

How good is your shell-fu?

You can use the jpeginfo command to get the size:

  jpeginfo *.jpg > /tmp/foo

then edit /tmp/foo and look for the ones that are not thumbnails.

A quick example output:

  IMG_20180116_193807.jpg 4160 x 3120 24bit Exif  N 3832530
  j.jpg  495 x 811  24bit Exif  N  104293
  loaf.jpg 2000 x 2694 24bit JFIF  P  753672

So the first is a camera image, the 2nd - an odd size, but probably something scaled for the web, the last another (probably) processed image. Thumbnails might be 128x or smaller, so you can manually get rid of them by sifting through the file...

However something like:

  sort -k2 -nr /tmp/foo > /tmp/foo2

will reverse sort the file on the 2nd field (the X size) and write it to /tmp/foo2 - using the example above yields:

  IMG_20180116_193807.jpg 4160 x 3120 24bit Exif  N 3832530
  loaf.jpg 2000 x 2694 24bit JFIF  P  753672
  j.jpg  495 x 811  24bit Exif  N  104293

... you can then manually edit the file and cull the lines with X size smaller than something you determine (this sorts largest at the top, so go to the end of the file) - or futher edit commands into the file, to turn the file into a script with e.g. rm commands at the start if the files you want to delete.

And so on.

You can get more clever using extra stuff like the awk command which is better at finding a <= number thing, but then you're into diminishing returns for a one-off task. Faced with a few directories of a few 1000 random files then this is how I'd do it.

Gordon

--
The Mailing List for the Devon & Cornwall LUG
https://mailman.dclug.org.uk/listinfo/list
FAQ: http://www.dcglug.org.uk/listfaq