Variables and data structures
Andrew Emerson, High Performance Systems, CINECA
The “Hello World” program
Consider the following:
Hello World
$message=“Ciao, Mondo”;
print “$message \n”;
Perl Variables
$message is called a variable, something with a
name used to hold one or more pieces of
All computer languages have the ability to create
variables to store and manipulate data.
Perl differs from other languages because you do
not specify the “type” (i.e. integer, real, character,
etc.) only the “complexity” of the data.
Perl Variables
Perl has 3 ways of storing data:
1. Scalar
 For single data items, like numbers or strings.
2. Arrays
 For ordered lists of scalars. Scalars indexed by
3. Associative arrays or “hashes”
 Like arrays, but uses “keys” to identify the scalars.
Scalar Variables
# integer
# also integer
# redefined as real
$pi = 3.1415926535;
# floating point (real)
# using scientific notation
$dna=“GCCTACCGTTCCACCAAAAAAAA”; # string -double quotes
$dna=‘GCCTACCGTTCCACCAAAAAAAA’; # string -single quotes
Scalar Variables
CASE is important, $DNA ≠ $dna;
(true for all variables)
Scalars must be prefixed with a $ whenever they are used (is
there a $? Yes → it is a scalar). The next character should
be a letter and not a number (true for all variables).
Scalars can be happily redefined at any time (e.g. integer →
real → string):
# unlikely example
$dna = 0; # integer
$dna = “GGCCTCGAACGTCCAGAAA”; # now it’s a
# string
Doing things with scalars..
$a =1.5;
$b =2.0; $c=3;
$sum = $a+$b*$c; # multiply by $b by $c, add to $a
while ($j<100) {
$j++; # means $j=$j+1, i.e. add 1 to j
print “$j\n”;
$dna1 .= $polyA; # add one string to another
# (equiv. $dna1 = $dna1.$polyA)
$no_of_bases = length($dna2); # length of a scalar
More about strings..
There is a difference between strings with ‘ and “
double quotes
$nchr = 24;
$message=“chromosones in human cell
print $message;
$message = ‘chromosones in human cell
print $message;
single quotes
chromosones in
human cell =24
chromosones in
human cell
More about strings
Double quotes “ interpret variables, single quotes ‘
do not:
print “sequence=$dna”;
print ‘sequence=$dna’;
Normally you would want double quotes
when using print.
Collections of numbers, strings etc can be stored in arrays.
In Perl arrays are defined as ordered lists of scalars and
are represented with the @ character.
@days_of_the_week=(‘mon’, ‘tue’, ‘wed’
@bases = (‘adenine’, ‘guanine’, ‘thymine’, ‘cytosine’,
@GenBank_fields=( ‘LOCUS’,
Initializing arrays with lists
Arrays - elements
To access the individual array elements you use [ and ] :
# now mutate the peptide
# print out what we have
while ($i<8) {
print “$poly_peptide[$i] “;
The numbers used to identify the elements are
called indices.
array index
Arrays - elements
When accessing array elements you use $ - why ?
Because array elements are scalar and scalars must
have $;
$poly_peptide[0] = ‘val’;
This means that you can have a separate variable
called $poly_peptide because $poly_peptide[0] is part
of @poly_peptide, NOT $poly_peptide.
This may seem a bit weird, but that's
okay, because it is weird.
Unix Perl Manual
Array elements
Array indices start from 0 not 1 ;
The last index of the array can be found from
$#name_of_array, e.g. $#poly_peptide. You can
also use negative indices: it means you count back from
the end of the array. Therefore
$poly_peptide[$#poly_peptide] =
Array properties
Length of an array:
$len = $#poly_peptide+1;
The size of the array does not need to be defined – it can grow
# begin program
while ($i<100) {
Useful Array functions
Functions commonly used for manipulating a stack:
F.I.L.O = First In
Last Out
Very common in computer programs
Array functions – PUSH and POP
# part of a
program that reads a database into an
# open database etc first..
# resets @dblines
while ($line=<DB>) {
push @dblines,$line; # push $line onto array
while (@dblines) {
$record = pop @dblines; # pop line off and use it
.... do something
Scalar Contexts
If you provide an expression (e.g. an array) when Perl
expects a scalar, Perl attempts to evaluate the expression
in a scalar context. For an array this is the length of an
[email protected]_peptide;
This is equivalent to
while (@dblines) {
array in scalar
context = length of
Special variables
Perl defines some variables for special purposes,
Set in many situations such as reading from a file or in a foreach
Name of the file currently being executed.
Version of Perl being used.
Contains the parameters passed to a subroutine.
Contains the command line arguments passed to the program.
Some are read-only and cannot be changed: see man
perlvar for more details.
Associative Arrays (Hashes)
Similar to normal arrays but the elements are identified by
keys and not indices. The keys can be more complicated,
such as strings of characters.
Hashes are indicated by % and can be initialized with lists
like arrays:
%hash = (key1,val1,key2,val2,key3,val3..);
Associative Arrays (Hashes)
%months=(‘jan’=> 31,
’feb’=> 28,
’mar’=> 31,
’apr’=> 30);
=> is a synonym for ,
Associative Arrays (Hashes)
Further examples
%classification = (‘dog’ => ‘mammal’, ‘robin’ =>
‘bird’, ‘snake’ => ‘reptile’);
%genetic_code = (
‘TCA’ => ‘ser’,
‘TTC’ => ‘phe’,
‘TTA’ => ‘leu’,
‘TTA’ => ‘STOP’
‘CCC’ => ‘pro’,
Associative Arrays (Hashes) - elements
The elements of a hash are accessed using curly
brackets, { and } :
$genetic_code{TCA} = ‘ser’;
$genetic_code{CCC} = ‘pro’;
$genetic_code{TGA} = ‘STOP’;
Note the $ sign: the elements are scalars
and so must be preceded by $, even
though they belong to a % (just as for
Associative Arrays (Hashes) – useful
indicates whether a key exists in the hash
if (exists $genetic_code{$codon}) {
}else {
print “Bad codon $codon\n”;
Associative Arrays (Hashes) – useful
keys and values
makes arrays from the keys and values of a
@codons = keys %genetic_code;
@amino_acids = values %genetic_code;
Often you will see code like the following:
foreach $codon (keys %genetic_code) {
if ($genetic_code{$codon} eq ‘STOP’) {
last; # i.e. stop translating
} else {
$protein .= $genetic_code{$codon};