Sunday, 5 February 2017

Preprocessor directives & Ascii Codes - C++ Tutorials

Preprocessor directives

Preprocessor directives are lines included in the code of our programs that are not program statements but directives for the preprocessor. These lines are always preceded by a pound sign (#). The preprocessor is executed before the actual compilation of code begins, therefore the preprocessor digests all these directives before any code is generated by the statements.
These preprocessor directives extend only across a single line of code. As soon as a newline character is found, the preprocessor directive is considered to end. No semicolon (;) is expected at the end of a preprocessor directive. The only way a preprocessor directive can extend through more than one line is by preceding the newline character at the end of the line by a backslash (\).

macro definitions (#define, #undef)

To define preprocessor macros we can use #define. Its format is:
#define identifier replacement
When the preprocessor encounters this directive, it replaces any occurrence of identifier in the rest of the code by replacement. This replacement can be an expression, a statement, a block or simply anything. The preprocessor does not understand C++, it simply replaces any occurrence of identifier by replacement.
#define TABLE_SIZE 100
int table1[TABLE_SIZE];
int table2[TABLE_SIZE]; 
After the preprocessor has replaced TABLE_SIZE, the code becomes equivalent to:
int table1[100];
int table2[100]; 
This use of #define as constant definer is already known by us from previuos tutorials, but #define can work also with parameters to define function macros:
#define getmax(a,b) a>b?a:b
This would replace any occurrence of getmax followed by two arguments by the replacement expression, but also replacing each argument by its identifier, exactly as you would expect if it was a function:
// function macro
#include <iostream>
using namespace std;
 
#define getmax(a,b) ((a)>(b)?(a):(b))
 
int main()
{
  int x=5, y;
  y= getmax(x,2);
  cout << y << endl;
  cout << getmax(7,x) << endl;
  return 0;
}
5
7
Defined macros are not affected by block structure. A macro lasts until it is undefined with the #undef preprocessor directive:
#define TABLE_SIZE 100
int table1[TABLE_SIZE];
#undef TABLE_SIZE
#define TABLE_SIZE 200
int table2[TABLE_SIZE];
This would generate the same code as:
int table1[100];
int table2[200];
Function macro definitions accept two special operators (# and ##) in the replacement sequence:
If the operator
# is used before a parameter is used in the replacement sequence, that parameter is replaced by a string literal (as if it were enclosed between double quotes)
#define str(x) #x
cout << str(test);
This would be translated into:
cout << "test";
The operator ## concatenates two arguments leaving no blank spaces between them:
#define glue(a,b) a ## b
glue(c,out) << "test";
This would also be translated into:
cout << "test";
Because preprocessor replacements happen before any C++ syntax check, macro definitions can be a tricky feature, but be careful: code that relies heavily on complicated macros may result obscure to other programmers, since the syntax they expect is on many occasions different from the regular expressions programmers expect in C++.

Conditional inclusions (#ifdef, #ifndef, #if, #endif, #else and #elif)

These directives allow to include or discard part of the code of a program if a certain condition is met.
#ifdef allows a section of a program to be compiled only if the macro that is specified as the parameter has been defined, no matter which its value is. For example:
#ifdef TABLE_SIZE
int table[TABLE_SIZE];
#endif 
In this case, the line of code int table[TABLE_SIZE]; is only compiled if TABLE_SIZE was previously defined with #define, independently of its value. If it was not defined, that line will not be included in the program compilation.
#ifndef serves for the exact opposite: the code between #ifndef and #endif directives is only compiled if the specified identifier has not been previously defined. For example:
#ifndef TABLE_SIZE
#define TABLE_SIZE 100
#endif
int table[TABLE_SIZE];
In this case, if when arriving at this piece of code, the TABLE_SIZE macro has not been defined yet, it would be defined to a value of 100. If it already existed it would keep its previous value since the #define directive would not be executed.
The #if, #else and #elif (i.e., "else if") directives serve to specify some condition to be met in order for the portion of code they surround to be compiled. The condition that follows #if or #elif can only evaluate constant expressions, including macro expressions. For example:
#if TABLE_SIZE>200
#undef TABLE_SIZE
#define TABLE_SIZE 200
 
#elif TABLE_SIZE<50
#undef TABLE_SIZE
#define TABLE_SIZE 50
 
#else
#undef TABLE_SIZE
#define TABLE_SIZE 100
#endif
 
int table[TABLE_SIZE]; 
Notice how the whole structure of #if, #elif and #else chained directives ends with #endif.
The behavior of #ifdef and #ifndef can also be achieved by using the special operators defined and !defined respectively in any #if or #elif directive:
#if !defined TABLE_SIZE
#define TABLE_SIZE 100
#elif defined ARRAY_SIZE
#define TABLE_SIZE ARRAY_SIZE
int table[TABLE_SIZE];

Line control (#line)

When we compile a program and some error happen during the compiling process, the compiler shows an error message with references to the name of the file where the error happened and a line number, so it is easier to find the code generating the error.
The #line directive allows us to control both things, the line numbers within the code files as well as the file name that we want that appears when an error takes place. Its format is:
#line number "filename"
Where number is the new line number that will be assigned to the next code line. The line numbers of successive lines will be increased one by one from this point on.
"filename" is an optional parameter that allows to redefine the file name that will be shown. For example:
#line 20 "assigning variable"
int a?; 
This code will generate an error that will be shown as error in file "assigning variable", line 20.

Error directive (#error)

This directive aborts the compilation process when it is found, generating a compilation the error that can be specified as its parameter:


#ifndef __cplusplus
#error A C++ compiler is required!
#endif
This example aborts the compilation process if the macro name __cplusplus is not defined (this macro name is defined by default in all C++ compilers).

Source file inclusion (#include)

This directive has also been used assiduously in other sections of this tutorial. When the preprocessor finds an #include directive it replaces it by the entire content of the specified file. There are two ways to specify a file to be included:
#include "file"
#include <file>
The only difference between both expressions is the places (directories) where the compiler is going to look for the file. In the first case where the file name is specified between double-quotes, the file is searched first in the same directory that includes the file containing the directive. In case that it is not there, the compiler searches the file in the default directories where it is configured to look for the standard header files.  If the file name is enclosed between angle-brackets <> the file is searched directly where the compiler is configured to look for the standard header files. Therefore, standard header files are usually included in angle-brackets, while other specific header files are included using quotes.

Pragma directive (#pragma)

This directive is used to specify diverse options to the compiler. These options are specific for the platform and the compiler you use. Consult the manual or the reference of your compiler for more information on the possible parameters that you can define with #pragma.
If the compiler does not support a specific argument for #pragma, it is ignored - no error is generated.

Predefined macro names

The following macro names are defined at any time:
macro
Value
__LINE__
Integer value representing the current line in the source code file being compiled.
__FILE__
A string literal containing the presumed name of the source file being compiled.
__DATE__
A string literal in the form "Mmm dd yyyy" containing the date in which the compilation process began.
__TIME__
A string literal in the form "hh:mm:ss" containing the time at which the compilation process began.
__cplusplus
An integer value. All C++ compilers have this constant defined to some value. If the compiler is fully compliant with the C++ standard its value is equal or greater than 199711L depending on the version of the standard they comply.
For example:
// standard macro names
#include <iostream>
using namespace std;
 
int main()
{
  cout << "This is the line number " << __LINE__;
  cout << " of file " << __FILE__ << ".\n";
  cout << "Its compilation began " << __DATE__;
  cout << " at " << __TIME__ << ".\n";
  cout << "The compiler gives a __cplusplus value of " << __cplusplus;
  return 0;
}
This is the line number 7 of file /home/jay/stdmacronames.cpp.
Its compilation began Nov  1 2005 at 10:12:29.
The compiler gives a __cplusplus value of 1
C++ provides the following classes to perform output and input of characters to/from files:
  • ofstream: Stream class to write on files
  • ifstream: Stream class to read from files
  • fstream: Stream class to both read and write from/to files.
These classes are derived directly or indirectly from the classes istream, and ostream. We have already used objects whose types were these classes: cin is an object of class istream and cout is an object of class ostream. Therfore, we have already been using classes that are related to our file streams. And in fact, we can use our file streams the same way we are already used to use cin and cout, with the only difference that we have to associate these streams with physical files.
Let's see an example:
// basic file operations
#include <iostream>
#include <fstream>
using namespace std;
 
int main () {
  ofstream myfile;
  myfile.open ("example.txt");
  myfile << "Writing this to a file.\n";
  myfile.close();
  return 0;
}
[file example.txt]
Writing this to a file
This code creates a file called example.txt and inserts a sentence into it in the same way we are used to do with cout, but using the file stream myfile instead.
But let's go step by step:

Open a file

The first operation generally performed on an object of one of these classes is to associate it to a real file. This procedure is known as to open a file. An open file is represented within a program by a stream object (an instantiation of one of these classes, in the previous example this was myfile) and any input or output operation performed on this stream object will be applied to the physical file associated to it.
In order to open a file with a stream object we use its member function open():
open (filename, mode);
Where filename is a null-terminated character sequence of type const char * (the same type that string literals have) representing the name of the file to be opened, and mode is an optional parameter with a combination of the following flags:
ios::in
Open for input operations.
ios::out
Open for output operations.
ios::binary
Open in binary mode.
ios::ate
Set the initial position at the end of the file.
If this flag is not set to any value, the initial position is the beginning of the file.
ios::app
All output operations are performed at the end of the file, appending the content to the current content of the file. This flag can only be used in streams open for output-only operations.
ios::trunc
If the file opened for output operations already existed before, its previous content is deleted and replaced by the new one.
All these flags can be combined using the bitwise operator OR (|). For example, if we want to open the file example.bin in binary mode to add data we could do it by the following call to member function open():
ofstream myfile;
myfile.open ("example.bin", ios::out | ios::app | ios::binary); 
Each one of the open() member functions of the classes ofstream, ifstream and fstream has a default mode that is used if the file is opened without a second argument:
class
default mode parameter
ofstream
ios::out
ifstream
ios::in
fstream
ios::in | ios::out
For ifstream and ofstream classes, ios::in and ios::out are automatically and respectivelly assumed, even if a mode that does not include them is passed as second argument to the open() member function.
The default value is only applied if the function is called without specifying any value for the mode parameter. If the function is called with any value in that parameter the default mode is overridden, not combined.
File streams opened in binary mode perform input and output operations independently of any format considerations. Non-binary files are known as text files, and some translations may occur due to formatting of some special characters (like newline and carriage return characters).
Since the first task that is performed on a file stream object is generally to open a file, these three classes include a constructor that automatically calls the open() member function and has the exact same parameters as this member. Therefor, we could also have declared the previous myfile object and conducted the same opening operation in our previous example by writing:
ofstream myfile ("example.bin", ios::out | ios::app | ios::binary);
Combining object construction and stream opening in a single statement. Both forms to open a file are valid and equivalent.
To check if a file stream was successful opening a file, you can do it by calling to member is_open() with no arguments. This member function returns a bool value of true in the case that indeed the stream object is associated with an open file, or false otherwise:
if (myfile.is_open()) { /* ok, proceed with output */ }

Closing a file

When we are finished with our input and output operations on a file we shall close it so that its resources become available again. In order to do that we have to call the stream's member function close(). This member function takes no parameters, and what it does is to flush the associated buffers and close the file:
myfile.close();
Once this member function is called, the stream object can be used to open another file, and the file is available again to be opened by other processes.
In case that an object is destructed while still associated with an open file, the destructor automatically calls the member function close().

Text files

Text file streams are those where we do not include the ios::binary flag in their opening mode. These files are designed to store text and thus all values that we input or output from/to them can suffer some formatting transformations, which do not necessarily correspond to their literal binary value.
Data output operations on text files are performed in the same way we operated with cout:
// writing on a text file
#include <iostream>
#include <fstream>
using namespace std;
 
int main () {
  ofstream myfile ("example.txt");
  if (myfile.is_open())
  {
    myfile << "This is a line.\n";
    myfile << "This is another line.\n";
    myfile.close();
  }
  else cout << "Unable to open file";
  return 0;
}
[file example.txt]
This is a line.
This is another line.
Data input from a file can also be performed in the same way that we did with cin:
// reading a text file
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
 
int main () {
  string line;
  ifstream myfile ("example.txt");
  if (myfile.is_open())
  {
    while (! myfile.eof() )
    {
      getline (myfile,line);
      cout << line << endl;
    }
    myfile.close();
  }
 
  else cout << "Unable to open file"; 
 
  return 0;
}
This is a line.
This is another line.  
This last example reads a text file and prints out its content on the screen. Notice how we have used a new member function, called eof() that returns true in the case that the end of the file has been reached. We have created a while loop that finishes when indeed myfile.eof() becomes true (i.e., the end of the file has been reached).

Checking state flags

In addition to eof(), which checks if the end of file has been reached, other member functions exist to check the state of a stream (all of them return a bool value):
bad()
Returns true if a reading or writing operation fails. For example in the case that we try to write to a file that is not open for writing or if the device where we try to write has no space left.
fail()
Returns true in the same cases as bad(), but also in the case that a format error happens, like when an alphabetical character is extracted when we are trying to read an integer number.
eof()
Returns true if a file open for reading has reached the end.
good()
It is the most generic state flag: it returns false in the same cases in which calling any of the previous functions would return true.
In order to reset the state flags checked by any of these member functions we have just seen we can use the member function clear(), which takes no parameters.

get and put stream pointers

All i/o streams objects have, at least, one internal stream pointer:
ifstream, like istream, has a pointer known as the get pointer that points to the element to be read in the next input operation.
ofstream, like ostream, has a pointer known as the put pointer that points to the location where the next element has to be written.
Finally, fstream, inherits both, the get and the put pointers, from iostream (which is itself derived from both istream and ostream).
These internal stream pointers that point to the reading or writing locations within a stream can be manipulated using the following member functions:

tellg() and tellp()

These two member functions have no parameters and return a value of the member type pos_type, which is an integer data type representing the current position of the get stream pointer (in the case of tellg) or the put stream pointer (in the case of tellp).

seekg() and seekp()

These functions allow us to change the position of the get and put stream pointers. Both functions are overloaded with two different prototypes. The first prototype is:
seekg ( position );
seekp ( position );

Using this prototype the stream pointer is changed to the absolute position position (counting from the beginning of the file). The type for this parameter is the same as the one returned by functions tellg and tellp: the member type pos_type, which is an integer value.
The other prototype for these functions is:
seekg ( offset, direction );
seekp ( offset, direction );

Using this prototype, the position of the get or put pointer is set to an offset value relative to some specific point determined by the parameter direction. offset is of the member type off_type, which is also an integer type. And direction is of type seekdir, which is an enumerated type (enum) that determines the point from where offset is counted from, and that can take any of the following values:
Ios::beg
offset counted from the beginning of the stream
Ios::cur
offset counted from the current position of the stream pointer
Ios::end
offset counted from the end of the stream
The following example uses the member functions we have just seen to obtain the size of a file:
// obtaining file size
#include <iostream>
#include <fstream>
using namespace std;
 
int main () {
  long begin,end;
  ifstream myfile ("example.txt");
  begin = myfile.tellg();
  myfile.seekg (0, ios::end);
  end = myfile.tellg();
  myfile.close();
  cout << "size is: " << (end-begin) << " bytes.\n";
  return 0;
}
size is: 40 bytes.

Binary files

In binary files, to input and output data with the extraction and insertion operators (<< and >>) and functions like getline is not efficient, since we do not need to format any data, and data may not use the separation codes used by text files to separate elements (like space, newline, etc...).
File streams include two member functions specifically designed to input and output binary data sequentially: write and read. The first one (write) is a member function of ostream inherited by ofstream. And read is a member function of istream that is inherited by ifstream. Objects of class fstream have both members. Their prototypes are:
write ( memory_block, size );
read ( memory_block, size );

Where memory_block is of type "pointer to char" (char*), and represents the address of an array of bytes where the read data elements are stored or from where the data elements to be written are taken. The size parameter is an integer value that specifies the number of characters to be read or written from/to the memory block.
// reading a complete binary file
#include <iostream>
#include <fstream>
using namespace std;
 
ifstream::pos_type size;
char * memblock;
 
int main () {
  ifstream file ("example.txt", ios::in|ios::binary|ios::ate);
  if (file.is_open())
  {
    size = file.tellg();
    memblock = new char [size];
    file.seekg (0, ios::beg);
    file.read (memblock, size);
    file.close();
 
    cout << "the complete file content is in memory";
 
    delete[] memblock;
  }
  else cout << "Unable to open file";
  return 0;
}
the complete file content is in memory
In this example the entire file is read and stored in a memory block. Let's examine how this is done:
First, the file is open with the ios::ate flag, which means that the get pointer will be positioned at the end of the file. This way, when we call to member tellg(), we will directly obtain the size of the file. Notice the type we have used to declare variable size:
ifstream::pos_type size;
ifstream::pos_type is a specific type used for buffer and file positioning and is the type returned by file.tellg(). This type is defined as an integer type, therefore we can conduct on it the same operations we conduct on any other integer value, and can safely be converted to another integer type large enough to contain the size of the file. For a file with a size under 2GB we could use int:
Int size;
size = (int) file.tellg();
Once we have obtained the size of the file, we request the allocation of a memory block large enough to hold the entire file:
memblock = new char[size];
Right after that, we proceed to set the get pointer at the beginning of the file (remember that we opened the file with this pointer at the end), then read the entire file, and finally close it:
file.seekg (0, ios::beg);
file.read (memblock, size);
file.close();
At this point we could operate with the data obtained from the file. Our program simply announces that the content of the file is in memory and then terminates.

Buffers and Synchronization

When we operate with file streams, these are associated to an internal buffer of type streambuf. This buffer is a memory block that acts as an intermediary between the stream and the physical file. For example, with an ofstream, each time the member function put (which writes a single character) is called, the character is not written directly to the physical file with which the stream is associated. Instead of that, the character is inserted in that stream's intermediate buffer.
When the buffer is flushed, all the data contained in it is written to the physical medium (if it is an output stream) or simply freed (if it is an input stream). This process is called synchronization and takes place under any of the following circumstances:

  • When the file is closed: before closing a file all buffers that have not yet been flushed are synchronized and all pending data is written or read to the physical medium.
  • When the buffer is full: Buffers have a certain size. When the buffer is full it is automatically synchronized.
  • Explicitly, with manipulators: When certain manipulators are used on streams, an explicit synchronization takes place. These manipulators are: flush and endl.
  • Explicitly, with member function sync(): Calling stream's member function sync(), which takes no parameters, causes an immediate synchronization. This function returns an int value equal to -1 if the stream has no associated buffer or in case of failure. Otherwise (if the stream buffer was successfully synchronized) it returns 0



Ascii Codes

It is a very well-known fact that computers can manage internally only 0s (zeros) and 1s (ones). This is true, and by means of sequences of 0s and 1s the computer can express any numerical value as its binary translation, which is a very simple mathematical operation (as explained in the paper numerical bases).

Nevertheless, there is no such evident way to represent letters and other non-numeric characters with 0s and 1s. Therefore, in order to do that, computers use ASCII tables, which are tables or lists that contain all the letters in the roman alphabet plus some additional characters. In these tables each character is always represented by the same order number. For example, the ASCII code for the capital letter "A" is always represented by the order number 65, which is easily representable using 0s and 1s in binary: 65 expressed as a binary number is 1000001.

The standard ASCII table defines 128 character codes (from 0 to 127), of which, the first 32 are control codes (non-printable), and the remaining 96 character codes are representable characters:


*
0
1
2
3
4
5
6
7
8
9
A
B
C
D
E
F
0
NUL
SOH
STX
ETX
EOT
ENQ
ACK
BEL
BS
TAB
LF
VT
FF
CR
SO
SI
1
DLE
DC1
DC2
DC3
DC4
NAK
SYN
ETB
CAN
EM
SUB
ESC
FS
GS
RS
US
2

!
"
#
$
%
&
'
(
)
*
+
,
-
.
/
3
0
1
2
3
4
5
6
7
8
9
:
;
< 
=
> 
?
4
@
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
5
P
Q
R
S
T
U
V
W
X
Y
Z
[
\
]
^
_
6
`
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
7
p
q
r
s
t
u
v
w
x
y
z
{
|
}
~



* This panel is organized to be easily read in hexadecimal: row numbers represent the first digit and the column numbers represent the second one. For example, the "A" character is located at the 4th row and the 1st column, for that it would be represented in hexadecimal as 0x41 (65).

Because most systems nowadays work with 8bit bytes, which can represent 256 different values, in addition to the 128 standard ASCII codes there are other 128 that are known as extended ASCII, which are platform- and locale-dependent. So there is more than one extended ASCII character set.

The two most used extended ASCII character sets are the one known as OEM, that comes from the default character set incorporated by default in the IBM-PC and the other is the ANSI extend ASCII which is used by most recent operating systems.

The first of them, the OEM character set, is the one used by the hardware of the immense majority of PC compatible machines, and was also used under the old DOS system. It includes some foreign signs, some marked characters and pieces to represent panels.




The ANSI character set is a standard that many systems incorporate, like W
indows, some UNIX platforms and many standalone applications. It includes many more local symbols and marked letters so that it can be used with no need of being redefined in many more languages:



No comments:

Post a Comment