| Z | Ô | I | O | N |
Some complementary material on interpreter directives can be found in Wikipedia
Herein, a problem with filename extensions is described in a manner perhaps more pragmatic than, yet inspired by, the well known Go To Statement Considered Harmful by Edsger W. Dijkstra (Communications of the ACM, Vol. 11, No. 3, March 1968). Dijkstra's work addresses the issue of how the use of the go to statement largely abridges the ability to parametrically describe the progress of a process, engendering an unnecessary impediment to the code's clarity and manageability. This new document details, based on practical experience under Unix-like operating systems, how filename extensions, particularly but not limited to those files implementing commands, create a secondary set of semantic tags in the interfaces between between programs which are demonstrably both superfluous and treacherous.
Consider the following example, which a file is name with a .sh extension, to indicate the type of the file as well as to make it easy to list all files of the same type (shell scripts).
$ ./frob.sh
hello world
$ ls *sh
frob.sh
$ sh frob.sh
frob.sh: line 2: use: command not found
hello world
$ cat frob.sh
#!/usr/bin/perl -w
use strict;
printf "hello world\n";
$
Such a file is typical of scripts written by relatively inexperienced users of Unix, where the code has been later reimplemented but the filename left unaltered for backwards compatibility. The surprises in the second two commands should be self-evident, and are the focus of what follows.
Three common mechanisms exist within Unix to determine how a file should be processed as a set of directives:
Interpreter directives can only be changed by modifying the files' contents, whereas file extensions can be changed arbitrarily using general filesystem commands like mv. File extensions also have a disturbing tendency to get lost in some contexts, in contrast with interpreter directives which are quite stable. With scripts, interpreter directives are typically changed in the same manner as the other contents, by using an editor of same kind, usually vi or emacs. Modern editors can usually recognize scripts by their interpreter directive, although historically special handling of certain types of text files was usually done based on the file extension. It's noteworthy that the file extension model applied almost exclusively to non-script, non-executable text files lacking interpreter directives (C files ending in .c and .h), which use extensions to trigger special handling by an application such as the C compiler rather than from the kernel.
Now, so far, extensions might look like no more than triggers for special handling for editing sessions, or as human-readable metadata allowing easy categorization without the effort of viewing the files' contents. But there is a more insidious problem with them, in that using them breaks part of the mechanism by which the implementation details are hidden from the user, from the kernel, and from other programs which might call the script.
Typically, programs in Unix often start their lives as quickly written, inefficient, underfeatured shell scripts. Later, they get converted to something faster, like PERL or python. Finally, they are often rewritten C, C++, or something else fully compiled. If the author violates encapsulation by exposing the underlying language in a spurious extension, the command name may change from a name.sh, to name.pl, to name, breaking all existing coded calls to the program each time, as well as adding to the congnitive load of human users. The more effective the user base has been at script-based factoring and reuse, the more treacherous the extensions become (ie. proficient users often build more readily on preëxisting programs, increasing the number of dependencies on the names of those programs).
In fact, what usually happens is that the name.sh script ends up being rewritten in something like PERL, yet with the now-misleading old name retained to keep from breaking other programs which refer to it. The resulting mismatch causes extra maintenance hassles principally to users trying to maintain the extensions, who naïvely type things like ls -l *.sh without realizing some of the the listing files aren't shell scripts anymore. Such semantic dissonance leads easily to more insidious issues, with scripts called by the wrong interpretors in error-suppressed contexts, truncated processing due to the resulting errors, and the resulting arbitrarily disastrous problems.
The issue of using the wrong interpreter can be subtle, since a user seeing a name.py program may enter python name.py, not realizing that the program only works with python 2.5 when 2.4 is still the system default (the former would have a directive like #!/usr/bin/python2.5). Scripts also often make delicate use of interpreter directives to have the PATH used or ignored, or special options passed in.
There are cases where scripts are executed as a result of special extensions, such as the model currently used by most webservers where file handling is cued by filename extensions. However, even such subsystems often have other, more sophisticated approaches allowing those same extensions to be hidden, and thus protect URIs from a a variant of the script filename extension problem, namely, how to keep all links to your website from breaking with you switch from *.html files to *.cgi, *.php, or something else. Futhermore, of the extensions just listed, note that .html files aren't scripts, .php files use a webserver builtin, and that .cgi scripts themselves require interpretor directives as well as the .cgi.
Commands should never have filename extensions.
Interpretor directives should always be used for scripts, and no filename extension used at all. But most importantly, lies don't have be introduced, and then maintained, in the pursuid of backwards-compatibility.
So define the interpreter for your scripts using only the interpreter directive - the others lay on the path to darkness :-)
| contact ζωιον |