Perl - Modules
From LXF Wiki
| Table of contents |
Perl tutorial part 4
(Original version written by Marco Fioretti for Linux Format magazine issue 72.)
Perl functions and modules: what they are and how to use them.
Alas, we have come to the end of this Perl series. We'll make good use of this last part, however, by looking at how some of the most powerful features of Perl work. We'll start with functions and then introduce modules, with a final example showing how to use the latter to access a database.
Like in any good programming language, reusable chunks of code can be embedded in functions, which in Perl are called subroutines. This is their basic syntax:
sub print_array
{
my $COUNTER = 0; # auxiliary counter
foreach $ITEM (@_) {
$COUNTER++;
print "# $COUNTER: $ITEM\n";
}
$COUNTER;
}
# Let's use it:
@FRIENDS = ('John', 'Jill', 'Kelly', 'Martin');
$NUMBER = &print_array(@FRIENDS);
print "These are my $NUMBER friends\n";
OK, let's look at the easiest parts first. As you can see, subroutines are defined by enclosing their code, however complex, in curly braces, and preceding it with the “sub” Perl keyword and the subroutine name. When you call them, instead, the & character must be prepended to the name: arguments are passed, inside parentheses, right after that same name.
The two most exotheric bits are that arcane symbol @_ and the fact that I “assigned” the subroutine to $NUMBER. @_ is nothing bad, just the special array into which Perl places all the arguments passed to the subroutine from the main program.
The assignment to $NUMBER relies on the fact that all Perl subroutines return the result of their last instruction. What? $COUNTER is just a variable and not an instruction? Yes, but that's perfectly fine with Perl.
Subroutines can be combined with one of the most powerful Perl structures, that is hashes, in a really cool, although a bit voodoo, way. For all practical purposes, you can use Perl subroutines as hash values, that is you can refer to them by keys. Check this out:
%FUNZ = (
'L', {'D', 'LIST', 'F', \&MONTHLY_LIST}, # List all calls in this month.
'P', {'D', 'PLOT_2D', 'F', \&GEN_PLOT }, # Generate a plot of all the calls
'B', {'D', 'BACKUP', 'F', \&BACKUP_ALL } # Complete backup
# many other similar elements, omitted here for brevity
);
#Let's display all choices to the user
foreach (sort keys %FUNZ) {
printf "%2.2s %-20.20s", $_, $FUNZ{$_}{'D'}, "\n";
}
#...and launch the function he requested
if ($USER_INPUT eq 'B') { # Backup everything
my $rsub = $FUNZ{'B'}{'F'};
&$rsub(('All', 'the', 'arguments', 'for', 'this', 'subroutine'));
Yes, it looks weird but it isn't. This comes from an actual script of mine, which asks to the user which tasks should be performed. %FUNZ is an hash with single letter keys (L, P, B...) and values which are other, unnamed, hashes (being lists inside curly braces). Each of these hashes has two keys, D (Description) and F (Function). The value associated to D is a normal string. The one linked to F is (as far as we are concerned) a pointer to a subroutine: this is what the \& notation means. In other words, $FUNZ{'B'}{'F'} is a pointer to the subroutine BACKUP_ALL. When the script is run on the command line, it prints something like this to the terminal:
L LIST P PLOT_2D B BACKUP
The letter that the user types in response is loaded in $USER_INPUT. Whenever 'B' is typed, the last snippet of code loads in $rsub the pointer to BACKUP_ALL, and the last line launches it, with any arguments it might need. &$rsub stands (again, as far as we are concerned), for “execute the code at the address contained in $rsub”. Cool, huh?
The online mother of all modules
Do you need Perl modules for your script? Do you wonder if somebody else on the planet already wrote Perl code to process that arcane data format? Stop worrying and go to THE central online repository for these and many other Perl things. The Comprehensive Perl Archive Network (www.cpan.org) already contains almost 8000 modules covering every conceivable area of computing. Of course, they'll also be happy to add your own modules to the database.
Perl modules
Function are good to reuse code, but have some limits. They are usually made to work only for a given set of scripts, interacting (some would say messing) with the rest of the code in a too strict and non portable way. For example, it is very likely that you couldn't run as is any function of the previous example. I didn't write them with global reuse in mind, so they are full of references to external variables like $BACKUP_SUBFOLDER which in your script will have a different name and meaning, when they exist at all. If you want really black boxes of code which, will the proper inputs, will work properly in any program without nasty side effects, you must go for Perl modules. These objects all contain a package declaration which, minus the .pm file name extension, is the same as the name of the file itself:
package My_Mail_Filter; # Package declaration
sub filter_incoming_messages {
# proper code here
}
sub remove_duplicate_messages {
# more code here
}
1;
The effect, and purpose, of a package declaration is to put all the code which follows in its own separate namespace: this is simply the list of symbols (names of functions, variable etc..) which are immediately accessible to the code. The content of a package still has ways to look at other namespaces, but you have to type their name on purpose. This is a feature: it guarantees that, unless you knowingly do something extra, what you write will not mess with other modules or some part of the main script. The final “1” statements is simply the standard trick used to check that all the package code was properly loaded, since it returns a non-null value. The main script will load the module, and use its functions, in a way similar to this:
use My_Mail_Filter; My_Mail_Filter::remove_duplicate_messages($CURRENT_MAILBOX);
All modules on CPAN come with at least enough documentation to let you know what the needed parameter of each subroutine are. Now, the Perl interpreter can find by itself any module stored below the folders defined in the system variable @INC, and all their subfolders. To print its content, just type at the command prompt:
perl -e 'print join("\n",@INC)."\n"'
Sometimes (typically on an office or school computer), you have no write access to the system folders listed in @INC and must therefore install somewhere else the modules you need. Luckily, even in these cases it is still possible to make Perl find and use them. One easy way is the “use lib” construct. If you place My_Mail_Filter.pm in /home/my_account/.perl_modules/ and then start your script with this:
use lib qw(/home/my_account/.perl_modules/); use My_Mail_Filter;
Perl with still find them. Many Perl script accomplish an (apparently) equivalent result with the require command:
require “/home/my_account/.perl_modules/My_Mail_Filter.pm”;
The main difference between this and the “use” keyword is that “require” loads the module code when the script, already running, arrives to that line. This might be just what you need, as it makes it possible to load this or that module depending on some conditions. However, it also means that if you aren't careful, a missing module may make a script abort after it has already started to alter your data...
“use”, on the other hand, executes the module content before even starting the actual script (at “compile time” in programmer lingo), with the overall effect of changing how all the following code is interpreted.
One example: Database access
Computers and the Internet are all about collecting, storing and retrieving information. Databases play an essential part in this. They are usually made of several files which contains data in some custom format and of a program, called server or engine, which actually writes and reads data from those files. The server does it on request when it is contacted by client programs which, you guessed it, can be also written in Perl using the module for the corresponding database. Using modules, the basic structure of a Perl interface to the most common Open Source databases becomes pretty simple, as shown in this short example based on MySQL:
use strict;
use Mysql;
######################################################################
my $DBHOST = "the_machine_where_the_database_server_is_running";
my $DBNAME = "name_of_database_we_want_to_query";
my $DBUSER = "our_MySql_account"; # might be different by the Linux one
my $DBPASS = "the_password_for_that_MySql_account";
my @RESULTS;
######################################################################
# Connect to the MySql database server
my $DB = Mysql->connect($DBHOST, $DBNAME, $DBUSER, $DBPASS);
# Query the database
my $qry = “SELECT * FROM appdb WHERE P_ID < 100”;
my $res = $DB->query($qry);
while( @RESULTS = $res->fetchrow) {
print “$RESULTS[0], $RESULTS[1], $RESULTS[2]\n”;
}
The code above implies that the database has been already created and filled with data. Since this is nothing specific to Perl, we'll not cover it here. The first interesting command is the one with which we connect to the MySql server. Since we told the script to “use Mysql” this operation is as simple as calling Mysql->connect with the proper parameters. Isn't it great? Regardless of who wrote and maintains that MySql module and how, you load it and it just works. The next two lines are where we actually query the database. The one used in that script had a table called appdb, with a numeric index called P_ID. The first command creates the SQL query and stores it into the $qry scalar. This particular query is constant, and simply requests all fields (SELECT *) from all appdb records whose P_ID index is less than 100. In the most general case, you could obviously generate it on the fly, inserting the current values of any other variable in the $qry string.
The following instruction executes, so to speak, the actual query and links it to the $res handle. This is what makes possible to access from the script, in a standard way, all the records matching the query. The insertion of another record to the database would have the same structure, obviously with the proper SQL code inside $qry. Have you noticed the while loop at the end? $res->fetchrow applies the fetchrow subroutine (or “method”) of the MySql module to the $res handle. The net result is to return, every time it is called, the next record among those who matched the current query. Every field of that record is loaded, in the same order as returned, in a different element of the $RESULTS array. This is why the print instructions inside the loop can directly print the content of those fields.
Conclusion
I hope that this short tutorial series has shown you enough to get started (and enjoy) with all the great functionalities of Perl. On to coding your own scripts now: have fun!

