Wednesday, 18 January 2017

Character Sequences - C++ Tutorials



As you may already know, the C++ Standard Library implements a powerful string class, which is very useful to handle and manipulate strings of characters. However, because strings are in fact sequences of characters, we can represent them also as plain arrays of char elements.
For example, the following array:
char jenny [20];
is an array that can store up to 20 elements of type char. It can be represented as:





Therefore, in this array, in theory, we can store sequences of characters up to 20 characters long. But we can also store shorter sequences. For example, jenny could store at some point in a program either the sequence "Hello" or the sequence "Merry christmas", since both are shorter than 20 characters.
Therefore, since the array of characters can store shorter sequences than its total length, a special character is used to signal the end of the valid sequence: the null character, whose literal constant can be written as '\0' (backslash, zero).
Our array of 20 elements of type char, called jenny, can be represented storing the characters sequences "Hello" and "Merry Christmas" as:







Notice how after the valid content a null character ('\0') has been included in order to indicate the end of the sequence. The panels in gray color represent char elements with undetermined values.

Initialization of null-terminated character sequences

Because arrays of characters are ordinary arrays they follow all their same rules. For example, if we want to initialize an array of characters with some predetermined sequence of characters we can do it just like any other array:
char myword[] = { 'H', 'e', 'l', 'l', 'o', '\0' }; 
In this case we would have declared an array of 6 elements of type char initialized with the characters that form the word "Hello" plus a null character '\0' at the end.
But arrays of
char elements have an additional method to initialize their values: using string literals.
In the expressions we have used in some examples in previous chapters, constants that represent entire strings of characters have already showed up several times. These are specified enclosing the text to become a string literal between double quotes ("). For example:
"the result is: "
is a constant string literal that we have probably used already.
Double quoted strings (") are literal constants whose type is in fact a null-terminated array of characters. So string literals enclosed between double quotes always have a null character ('\0') automatically appended at the end.
Therefore we can initialize the array of char elements called myword with a null-terminated sequence of characters by either one of these two methods:
char myword [] = { 'H', 'e', 'l', 'l', 'o', '\0' };
char myword [] = "Hello"; 
In both cases the array of characters myword is declared with a size of 6 elements of type char: the 5 characters that compose the word "Hello" plus a final null character ('\0') which specifies the end of the sequence and that, in the second case, when using double quotes (") it is appended automatically.
Please notice that we are talking about initializing an array of characters in the moment it is being declared, and not about assigning values to them once they have already been declared. In fact because this type of null-terminated arrays of characters are regular arrays we have the same restrictions that we have with any other array, so we are not able to copy blocks of data with an assignation operation.
Assuming mytext is a char[] variable, expressions within a source code like:
mystext = "Hello";
mystext[] = "Hello"; 
would not be valid, like neither would be:
mystext = { 'H', 'e', 'l', 'l', 'o', '\0' };
The reason for this may become more comprehensible once you know a bit more about pointers, since then it will be clarified that an array is in fact a constant pointer pointing to a block of memory.

Using null-terminated sequences of characters

Null-terminated sequences of characters are the natural way of treating strings in C++, so they can be used as such in many procedures. In fact, regular string literals have this type (char[]) and can also be used in most cases.
For example, cin and cout support null-terminated sequences as valid containers for sequences of characters, so they can be used directly to extract strings of characters from cin or to insert them into cout.


For example:
// null-terminated sequences of characters
#include <iostream>
using namespace std;
 
int main ()
{
  char question[] = "Please, enter your first name: ";
  char greeting[] = "Hello, ";
  char yourname [80];
  cout << question;
  cin >> yourname;
  cout << greeting << yourname << "!";
  return 0;
}
Please, enter your first name: John
Hello, John!
As you can see, we have declared three arrays of char elements. The first two were initialized with string literal constants, while the third one was left uninitialized. In any case, we have to speficify the size of the array: in the first two (question and greeting) the size was implicitly defined by the length of the literal constant they were initialized to. While for yourname we have explicitly specified that it has a size of 80 chars.
Finally, sequences of characters stored in char arrays can easily be converted into string objects just by using the assignation operator:
string mystring;
char myntcs[]="some text";
mystring = myntcs;

Arrays - C++ Tutorials

An array is a series of elements of the same type placed in contiguous memory locations that can be individually referenced by adding an index to a unique identifier.
That means that, for example, we can store 5 values of type int in an array without having to declare 5 different variables, each one with a different identifier. Instead of that, using an array we can store 5 different values of the same type, int for example, with a unique identifier.
For example, an array to contain 5 integer values of type int called billy could be represented like this:





where each blank panel represents an element of the array, that in this case are integer values of type int. These elements are numbered from 0 to 4 since in arrays the first index is always 0, independently of its length.
Like a regular variable, an array must be declared before it is used. A typical declaration for an array in C++ is:
type name [elements];
where type is a valid type (like int, float...), name is a valid identifier and the elements field (which is always enclosed in square brackets []), specifies how many of these elements the array has to contain.
Therefore, in order to declare an array called billy as the one shown in the above diagram it is as simple as: 

int billy [5];

NOTE: The elements field within brackets [], which represents the number of elements, the array is going to hold, must be a constant value, since arrays are blocks of non-dynamic memory whose size must be determined before execution. In order to create arrays with a variable length dynamic memory is needed, which is explained later in these tutorials.

Initializing arrays.

When declaring a regular array of local scope (within a function, for example), if we do not specify otherwise, its elements will not be initialized to any value by default, so their content will be undetermined until we store some value in them. Global and static arrays, on the other hand, are automatically initialized filled with zeros.
In both cases, local and global, when we declare an Array, we have the possibility to assign initial values to each one of its elements by enclosing the values in braces { }. For example:

int billy [5] = { 16, 2, 77, 40, 12071 }; 

This declaration would have created an array like this:




The amount of values between braces { } must not be larger than the number of elements that we declare for the array between square brackets [ ]. For example, in the example of array billy we have declared that it has 5 elements and in the list of initial values within braces { } we have specified 5 values, one for each element.
When an initialization of values is provided for an array, C++ allows the possibility of leaving the square brackets empty [ ]. In this case, the compiler will assume a size for the Array that matches the number of values included between braces { }:

int billy [] = { 16, 2, 77, 40, 12071 };

After this declaration, Array billy would be 5 ints long, since we have provided 5 initialization values.

Accessing the values of an Array.

In any point of a program in which an array is visible, we can access the value of any of its elements individually as if it was a normal variable, thus being able to both read and modify its value. The format is as simple as:
name[index]
Following the previous examples in which billy had 5 elements and each of those elements was of type int, the name which we can use to refer to each element is the following:




For example, to store the value 75 in the third element of billy, we could write the following statement:

billy[2] = 75;

and, for example, to pass the value of the third element of billy to a variable called a, we could write:

a = billy[2];

Therefore, the expression billy[2] is for all purposes like a variable of type int.
Notice that the third element of billy is specified billy[2], since the first one is billy[0], the second one is billy[1], and therefore, the third one is billy[2]. By this same reason, its last element is billy[4]. Therefore, if we write billy[5], we would be accessing the sixth element of billy and therefore exceeding the size of the array.
In C++ it is syntactically correct to exceed the valid range of indices for an Array. This can create problems, since accessing out-of-range elements do not cause compilation errors but can cause runtime errors. The reason why this is allowed will be seen farther ahead when we begin to use pointers.
At this point it is important to be able to clearly distinguish between the two uses that brackets [ ] have related to arrays. They perform two different tasks: one is to specify the size of arrays when they are declared; and the second one is to specify indices for concrete array elements. Do not confuse these two possible uses of brackets [ ] with arrays.

int billy[5];         // declaration of a new Array
billy[2] = 75;        // access to an element of the Array.

If you read carefully, you will see that a type specifier always precedes a variable or array declaration, while it never precedes an access.
Some other valid operations with arrays:

billy[0] = a;
billy[a] = 75;
b = billy [a+2];
billy[billy[a]] = billy[2] + 5;

// arrays example
#include <iostream>
using namespace std;
 
int billy [] = {16, 2, 77, 40, 12071};
int n, result=0;
 
int main ()
{
  for ( n=0 ; n<5 ; n++ )
  {
    result += billy[n];
  }
  cout << result;
  return 0;
}
12206


Multidimensional Arrays

Multidimensional arrays can be described as "arrays of arrays". For example, a bidimensional array can be imagined as a bidimensional table made of elements, all of them of a same uniform data type.







jimmy represents a bidimensional array of 3 per 5 elements of type int. The way to declare this array in C++ would be:

int jimmy [3][5];

and, for example, the way to reference the second element vertically and fourth horizontally in an expression would be:

jimmy[1][3]








(remember that array indices always begin by zero).
Multidimensional arrays are not limited to two indices (i.e., two dimensions). They can contain as many indices as needed. But be careful! The amount of memory needed for an array rapidly increases with each dimension. For example:

char century [100][365][24][60][60];

declares an array with a char element for each second in a century, that is more than 3 billion chars. So this declaration would consume more than 3 gigabytes of memory!
Multidimensional arrays are just an abstraction for programmers, since we can obtain the same results with a simple array just by putting a factor between its indices:

int jimmy [3][5];   // is equivalent to
int jimmy [15];     // (3 * 5 = 15)

With the only difference that with multidimensional arrays the compiler remembers the depth of each imaginary dimension for us. Take as example these two pieces of code, with both exactly the same result. One uses a bidimensional array and the other one uses a simple array:


multidimensional array
pseudo-multidimensional array
#define WIDTH 5

#define HEIGHT 3
int jimmy [HEIGHT][WIDTH];
int n,m;
int main ()
{
  for (n=0;n<HEIGHT;n++)
    for (m=0;m<WIDTH;m++)
    {
      jimmy[n][m]=(n+1)*(m+1);
    }
  return 0;
}
#define WIDTH 5

#define HEIGHT 3
int jimmy [HEIGHT * WIDTH];
int n,m;
int main ()
{
  for (n=0;n<HEIGHT;n++)
    for (m=0;m<WIDTH;m++)
    {
      jimmy[n*WIDTH+m]=(n+1)*(m+1);
    }
  return 0;
}

None of the two source codes above produce any output on the screen, but both assign values to the memory block called jimmy in the following way:







We have used "defined constants" (#define) to simplify possible future modifications of the program. For example, in case that we decided to enlarge the array to a height of 4 instead of 3 it could be done simply by changing the line:

#define HEIGHT 3
to:
#define HEIGHT 4

with no need to make any other modifications to the program.

Arrays as parameters

At some moment we may need to pass an array to a function as a parameter. In C++ it is not possible to pass a complete block of memory by value as a parameter to a function, but we are allowed to pass its address. In practice this has almost the same effect and it is a much faster and more efficient operation.
In order to accept arrays as parameters the only thing that we have to do when declaring the function is to specify in its parameters the element type of the array, an identifier and a pair of void brackets []. For example, the following function:

void procedure (int arg[])

accepts a parameter of type "Array of int" called arg. In order to pass to this function an array declared as:

int myarray [40];

it would be enough to write a call like this:

procedure (myarray);

Here you have a complete example:

// arrays as parameters
#include <iostream>
using namespace std;
 
void printarray (int arg[], int length) {
  for (int n=0; n<length; n++)
    cout << arg[n] << " ";
  cout << "\n";
}
 
int main ()
{
  int firstarray[] = {5, 10, 15};
  int secondarray[] = {2, 4, 6, 8, 10};
  printarray (firstarray,3);
  printarray (secondarray,5);
  return 0;
}
5 10 15
2 4 6 8 10

As you can see, the first parameter (int arg[]) accepts any array whose elements are of type int, wathever its length. For that reason we have included a second parameter that tells the function the length of each array that we pass to it as its first parameter. This allows the for loop that prints out the array to know the range to iterate in the passed array without going out of range.
In a function declaration it is also possible to include multidimensional arrays. The format for a tridimensional array parameter is:

base_type[][depth][depth]

for example, a function with a multidimensional array as argument could be:

void procedure (int myarray[][3][4])

Notice that the first brackets [] are left blank while the following ones are not. This is so because the compiler must be able to determine within the function which is the depth of each additional dimension.

Arrays, both simple or multidimensional, passed as function parameters are a quite common source of errors for novice programmers. I recommend the reading of the chapter about Pointers for a better understanding on how arrays operate.