my package of the day: file – classify (unknown) files and mime-types on the console

You know this? Somebody just sent you a mail with attachments that don’t have usable file extensions so you don’t really know how to handle them. Audio file? PDF? What is it? The same problem might occur after a file recovery, on web pages with upload features or just when you are really and time pressure and have time for messing around with file type guessing.

While you can try to give the file an extension and open it with a software you think might be suitable, the more sophisticated way is to let your computer find out what is all about. As a GNU/Linux user you probably already think „There is surely a command line tool for this“. Of course there is: The package „file„, that often gets automatically installed by dependencies or just an „aptitude install file“ will help you out.

„file“ depends on „libmagic“ which provides patterns for the so called „magic number“ detection. You don’t have to know, what that is, but if you want, see this Wikipedia article for reference. So all you have to know, is how to handle the file command. And actually there is not much to learn. Let’s assume we have the following directory with unknown files:

file1.png

Now we want to know what’s inside those black boxes. Therefore we just call „file *“ on the console:

file2.png

Hey, that’s all. Pretty impressive, isn’t it? „file“ does even not only differs binary from text files, it even tries to guess what programming language a text file is written in. And the magic is not that much magic: In case of the zsh file it just sees a shebang pointing to the zsh in the first line of the file, a PDF file typically starts with „%PDF“ and so on. It’s all about patterns.

„file“ provides you with some command line options that make it’s usage even more helpful. The most interesting is „-i“ as it prints out mime types instead of verbose file types. If you are a web developer and want to know the exact mime type for a file download, this can save you a lot of time:

file3.png

Great, isn’t it? The Apache webserver also uses libmagic for this purpose. With „file“ you just use a wrapper for the same task.

That’s all about „file“ for today. Happy file detection – and feel free to report back.

3 Gedanken zu “my package of the day: file – classify (unknown) files and mime-types on the console

  1. @Schalken: This prompt is a … ZSH prompt. ZSH provides you with the possibility to split your prompt to the left and the right side, while the right prompt disappears when your cursor hits it. Very neat. If you still want the code, let me know.

Schreib einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind markiert *