There's strictly no warranty for the correctness of this text. You use any of the information provided here at your own risk.
The programming-language C was created at the beginning of the 1970s by Dennis Ritchie (1941-2011) to (re-)implement the operating-system UNIX.
It is a somehow laconical programming-language, that has only a minimal set of instructions. Everything else has to be imported as libraries, for example even functions for string-manipulation.
C-code is close to the system and runs fast. Only assembler-code would be faster. Programs can be linked together from numerous files, so huge applications can be written in C by large teams of programmers.
C is not object-oriented (although we will see later, that it's possible to emulate something like classes in C).
Mainly to add classes, the language C++ was designed by Bjarne Stroustrup in the 1980s, but C++, though compatible to C, can be seen as a different language. This text is just about C.
Quite a classic is the book "The C Programming Language" by Kernighan and Ritchie (often referred to as "K&R").
C is a compiled language. That means, you write some code with an editor (like vim) or with an IDE (like Geany) in a file called "h.c", for example. Then you run a compiler on that file. If everything is fine with your code, this creates an executable file, which is called "a.out", if you don't tell the compiler to use another name.
I use gcc, the C-compiler of the GNU-project. It is free software. It comes with basically every GNU/Linux-distribution.
So, if you have a file "h.c", you can run
gcc h.c
and get a file "a.out", which you can run then with
./a.out
As very large and complicated applications can be written in C, and the executable often has to be linked against many libraries, there are lots of options to gcc. This is a topic of its own.
But concerning the very small example-programs on this page, you should be fine with the simple gcc-command above. It may be a good idea to use the "-Wall"-option, so "gcc -Wall h.c", to get all warnings.
Here's a "Hello World" in C:
#include <stdio.h> int main(void) { puts("Hello World!"); return 0; }
The first line imports the standard library for input/output. How this line works, will be explained later.
The function "main()" is found in every C-program. It is executed, when the program is run.
The datatypes and names of the arguments that are passed to a function are written in round brackets after the name of the function. The term "void" represents "nothing". So in this case, no argument will be passed to "main()".
The datatype of the value, the function returns, is written before the name of the function, in this case, it's "int" before the name "main", as the function will return the integer 0 at the end, "return 0;" (notice, that "return" is not a function).
C is case-sensitive.
Here's an example of a for-loop in C:
#include <stdio.h> int main(void) { int i; for (i = 1; i <= 10; i++) { printf("%d\n", i); } return 0; }
Variables, like the integer "i" here, have to be declared, before they can be used.
To print integers, the function "printf()" has to be used. The first argument to it can be a "format-string", like the one in the example: "%d" means, the second argument will be an integer, it means "print the next argument as an integer". "\n" is the newline-character on Linux.
The term in brackets of the for-loop "(i=1; i<=10; i++)" has the following meaning:
while-loops work similar. The while-loop runs, as long as the condition is met:
#include <stdio.h> int main(void) { int x = 1; while (x ≤ 10) { printf("%d\n", x); x++; } return 0; }
Comments are written between "/*" and "*/". They can be longer than one line.
In general, whitespace characters are ignored by the C-compiler.
#include <stdio.h> int main(void) { int a = 1; int b = 2; /* If there is just one line after the condition, the curly brackets can be left out. Use with care: */ if (a == 1) printf("a is %d.\n", a); if (b > a) printf("b is greater than %d.\n", a); /* Meanings of symbols for logical operators: == : equal != : not equal && : and || : or ! : not */ if (a < b && b == 2) puts("a is less than b and b is 2."); if (a < b && b == 2) puts("a is less than b and b is 2."); if (a == 10) { puts("a is 10."); } else { puts("a is not 10."); } if (a != 10) puts("a is really not 10."); return 0; }
At debugging, you want to comment out large ranges of code.
But when there's a comment inside that range, it won't work, because there's a "*/" inside the range.
In C++, you can also write "// ...", and the rest of the line is a comment then. Probably because of the debugging-problem, the compiler "gcc" knows this syntax too. I suggest using it.
There is also "else if":
if ( ... ) { ... } else if ( ... ) { ... }
Notice, that after the last command of the if-statement, that is before the right curly bracket, there has to be a semicolon.
As already shown in the example above, the curly brackets can be left out at all, if there is just a single line of code following the if-condition. This somehow breaks the general rule of writing code blocks. But if there are many conditions followed by just one line, it makes code more readable. But obviously this exception from the rule should be used with care.
With "continue", a loop (a for-loop, a while-loop), can be skipped to the next round.
With "break", a loop can be left at all. Example:
#include <stdio.h> int main(void) { int i; puts(""); for (i=1; i<1000; i++) { if (i == 3) { puts(""); continue; } printf("%d\n", i); if (i == 10) { puts("End.\n"); break; } } return 0; }
If a variable needs to be compared to several values (integers or single chars, that can be seen as integers), you could write many if/else if-statements. There's an alternative construction in C, called "switch case-statement":
#include <stdio.h> int main(void) { int a = 3; int i; switch (a) { case 1: puts("a is 1."); break; case 2: puts("a is 2."); break; case 3: for (i=0; i<5; i++) { printf("%d\n", i); } puts("a is 3."); break; default: puts("a is none of these."); break; } return 0; }
Notice the colons (":") at the end of the "case"-lines.
The "default"-part is executed, if no condition of any case is met (like the "else" in "if, else if, else").
Built-in datatypes in C are:
Actually, there is:
And also the attributes:
"printf()"'s format-string: "%d".
To specify "long int", use "%ld".
(For "long long int", "printf"'s format-string is "%lld".)
Datatypes for floating-point numbers. Often, "double" is, what you want.
"printf"'s format-string: "%f". "%.2f" rounds to two decimal places.
If you pass a "float" to printf, it becomes a "double" anyway.
Used to store ASCII-characters. Use single quotes for the character (like 'a'). It also can be interpreted as a number from 0 to 255.
"printf"'s format-string: "%c" as a character, "%d" as a number.
It may depend on the compiler, if a pure "char" is interpreted as signed or unsigned. Therefore, you should specify, if you want to use "signed char" or "unsigned char".
"void" means just "nothing". It can be used as the datatype of a return-value of functions (when nothing is returned). It can't be used for variables.
Let's see:
#include <stdio.h> int main(void) { int a = 10; double pi = 3.14159265359; char c = 'm'; printf("%d\n", a); printf("%f \t %.3f \n", pi, pi); printf("%c \t %d \n", c, c); puts(""); printf("%d\n", sizeof(a)); printf("%d\n", sizeof(pi)); printf("%d\n", sizeof(c)); return 0; }
"\t" in a format-string of "printf" means "tabulator", so it moves text away a bit in the line.
Notice, there are certain limits to the datatypes of numbers. So, if you want to use very large integers or floating point numbers with many decimal places, you'll have to read more about this.
"sizeof()" is useful sometimes: It returns the amount of memory, a variable occupies, in bytes. (Actually, "sizeof ()" is not a function, but an operator. But it works kind-of like a function.)
This amount (and therefore the output of "sizeof()") is dependent on the used operating-system.
"sizeof()" returns a special datatype called
Often this is just an "unsigned int" (it is on my system). (It's defined by a "typedef" (I'll explain later)). It's used to store the size of something in memory. As it's "unsigned", it mustn't get negative.
Stand-alone numbers and strings in the code like '5' or '"Hello"' are called "literals". They are constants, so they cannot be changed.
An integer literal without a prefix is decimal, with the prefix '0x' or '0X', it's hexadecimal, and with '0', it's octal.
So for example '0xA5' is a hexadecimal integer literal meaning '165' (as decimal).
An integer literal can have a prefix 'U', 'L' or 'UL', to indicate, that it is 'unsigned' or 'long'. So '0xAAB5UL' would be a hexadecimal integer literal, that is explicitly 'unsigned long'.
String literals are valid C-code, although C doesn't have a datatype "string" for variables (only "arrays of char"). This has a few consequences, that are described later, on the second page.
Notice, that literals are also used, when initializing variables, like in "int a = 5;".
Mathematically, "10 / 3" would be 3.33333.... . And if you use the datatype "double" for the division, you will get that:
#include <stdio.h> int main(void) { double a = 10; double b = 3; double c = a / b; printf("%f\n", c); return 0; }
However, if you use the datatype "int", the decimal places are cut, and "10 / 3" would be just "3" then (which is mathematically incorrect):
#include <stdio.h> int main(void) { int a = 10; int b = 3; int c = a / b; printf("%d\n", c); return 0; }
When using literal numbers (without variables), there's a difference between for example "9" (an int) and "9." (a float):
#include <stdio.h> int main(void) { printf("%d\n", 9 / 5); printf("%f\n", 9 / 5); printf("%f\n", 9. / 5.); return 0; }
gives this output:
1 0.000000 1.800000
So actually, in first example above, there is an internal conversion, when the int number is assigned to a double variable.
You have to take care of that. In some situations, you even may take use of that though.
With the modulo-operator "%", you can get the "rest" of the division: "10 / 3" is "3, rest 1". With modulo, you get this "1":
#include <stdio.h> int main(void) { int a = 10; int b = 3; int c = a % b; printf("%d\n", c); return 0; }
When using "printf()", a socalled "format string" (which is technically a string literal) has to be passed as the first argument. It typically contains one or more "format specifiers" like for example "%d" to print integers.
Here's an overview of the format specifiers that are recognized by "printf()":
datatype | format string |
int | %d |
long long int | %lld |
unsigned int | %u |
unsigned long int | %lu |
int, with leading zeros ("005" instead of "5") | %03d |
size_t (usually "unsigned int") | %zu |
ssize_t (usually "signed int") (I'm not sure about this) | %zd |
hexadezimal | %x |
octal | %o |
float and double | %f |
float, rounded to two decimal places | %.2f |
float in exponential form | %e |
char | %c |
string (argument is a pointer to char) | %s |
pointer-address | %p |
How "%s" and "%p" are used, is explained on the second page.
More on printf()'s format strings can be found here and on the Wiki-page.
You get power and square root of a number like this:
#include <stdio.h> #include <math.h> int main(void) { int a = 2; int b = 3; int c = 49; double d = pow(a, b); double e = sqrt(c); printf("%f\n", d); printf("%f\n", e); return 0; }
Notice, that you have to import the "math"-library here. And you need to tell gcc this with the option "-lm" (which means: Link the program with the library "m"):
gcc -Wall -lm prog.c
Output is then: 8 and 7 (as float).
Macros
In word processor programs like "Microsoft Word" or "LibreOffice Writer", there is a function, that automatically replaces text. You can tell it for example to replace "hl" to "hello", and the next time you write "hl", it is automatically turned into "hello".
In the process of compiling C programs, there is a stage where similar simple text replacement is done by the socalled "preprocessor".
In your program, you can write certain instructions for the compiler's preprocessor, and it will replace the texts as defined. These instructions are called "preprocessor directives". If you write for example
#define TRUE 1
every occurrence of "TRUE" in your code will be changed to "1" during compilation by the preprocessor.
You won't notice it though, because the texts won't be changed in the source files, and you usually don't look into the compiled executables as they are hardly readable for humans (you can only take a look with a hex editor like "ht").
These "#define"-lines, that are called "macros", are used, because you can then write for example the word "TRUE" in your source code instead of "1", when you want to check, if an expression evaluates to "true".
These macros are also used to define constants, that should be visible in the whole program, like for example:
#define GRAVITY 9.81
So, when the word "GRAVITY" appears lateron in the source code, it means "9.81". But not, because "GRAVITY" was a variable (which it is not), but because it is replaced to the term "9.81" by the preprocessor. It's also not necessary then to think about the way, that value is passed into functions. It's simply written directly into these functions as numbers by the preprocessor.
When defining macros for strings, quotation marks have to be used, like:
#define MESSAGE "Hello World"
According to Wikipedia, the word "macro" is an abbreviation of the general term "macroinstruction". When a macro(instruction) is applied to a text, the text is changed to another text.
The C compiler's preprocessor just does simple text substitution. (While a "macro" in "Microsoft Word" is a small program, that can change a text in a more complicated way. So that's something slightly different.)
The C macro is written with the keyword "#define" followed by what word is to be be replaced by what other word, separated by a space character.
Unlike regular lines of C code, lines with preprocessor directives like macros are not terminated by a semicolon.
Conditions for the preprocessor can be programmed with:
#if #if defined #ifdef #ifndef #elif #else #endif
This is especially useful in large programs: Often, macros are defined only under the condition, that they aren't defined already.
Macros can be unset with the keyword "#undef".
When using the compiler gcc, a list of the defined macros of file "hello.c" (for example) can be printed with:
gcc -dM -E hello.c
#include-Statements
There is another kind of preprocessor directive, that uses the keyword "#include". The "#include" directive is followed by the name of a text-file, either in "< ... >" or in quotation marks (" ... "). When the preprocessor reaches that line, it searches for the file of the given filename. When the file is found, the preprocessor replaces the line of the directive with the whole content of that file. That way, (the header files of) libraries are imported into the source code.
For example, the line
#include <stdio.h>
is replaced by the preprocessor with the content of the file "stdio.h", which is "/usr/include/stdio.h" on Linux.
With the content of the header file "stdio.h", often necessary I/O routines like "printf()" are imported into the program.
Back from the preprocessor to ordinary C code.
Variables can be declared as "const". That means, they can't be altered afterwards. Therefore, the initialization has to be in the same line as the definition:
#include <stdio.h> int main(void) { const int a = 10; /* a = 15; Wouldn't work */ printf("%d\n", a); return 0; }
Variables declared that way are stored in a read-only-area of the program's memory.
If "const" doesn't behave as expected, it may be due to the problem described here.
Constant integers can also be defined with the datatype "enum":
#include <stdio.h> enum colours { black, blue, red, magenta, green, cyan, yellow, white }; enum months { january = 1, february }; int main(void) { printf("The number of cyan is: %d.\n", cyan); printf("February is month number %d.\n", february); return 0; }
You can use these enum statements to define boolean values (like in Python):
enum bool {False, True}; enum none {None};
Then, in your code, "False" and "None" will be evaluated to 0, "True" to 1.
This kind of "new datatype" can then also be used like other datatypes, for example as a return-value of a function:
#include <stdio.h> enum bool {False, True}; enum bool test(int a) { if (a == 5) { return True; } } int main(void) { enum bool r = test(5); printf("%d\n", r); }
The core of C-programming, which may have been new back in 1970, is structured programming.
The functionality of a large program is separated into small pieces, that deal with more specific problems.
These small pieces are code-blocks called "functions".
Functions take some arguments, process them and return a return-value.
Consider this program:
#include <stdio.h> int addTen(int b) { b += 10; return b; } int main(void) { int a = 5; a = addTen(a); printf("%d\n", a); return 0; }
Now, this may be a bit complicated, but it is important, to understand it:
"main()" and "addTen()" are different functions, that are completely separated. With the line
a = addTen(a);
"addTen()" is called and variable "a" is passed to it by value (there's another possible way to pass arguments to functions in C, but for now, it's by value).
So variable "b" in "addTen()" gets the value of variable "a" from "main()".
But "addTen()" doesn't know "a" itself. On the other hand, "main()" is totally unaware of variable "b" in "addTen()".
Variable "b" in "addTen()" is created, when "addTen()" is called. When "addTen()" is finished, variable "b" is destroyed.
All "local variables" (these are the variables inside a function) are destroyed, when the function is finished.
The Declaration of a variable is the statement, that there is a variable of a certain name and type in the program.
Definition of a variable means having memory allocated for it.
Declaration and definition of a variable "a" would just be
int a;
Usually, the two may be only separated, when several source-code-files are used.
Initialization of a variable means giving it a value, like in:
a = 10;
In other words:
A declaration provides basic attributes of a symbol: Its type and its name. Memory isn't allocated for the variable yet.
A definition provides all of the details of that symbol; if it's a variable, where that variable is stored. Memory is allocated for the variable.
The more modern C-language definition "C99" allows to combine all three: "int a = 10;". If the compiler doesn't support C99, it may be required, that all declarations are written at the beginning of a function, and the initializations follow separately after the declarations:
void main(void) { int a; int b; float c; char d[50]; a = 5; b = 10; }
There can also be declarations of functions. They look like the first line of the function, but without the curly brackets, instead the line is terminated by a semicolon:
int addTen(int b);
If you write the function "addTen()" below the function "main()" and try to compile with "gcc -Wall ", you'll get a warning.
If you declare the function before it is called, it compiles without warning:
#include <stdio.h> int addTen(int b); int main(void) { int a = 5; a = addTen(a); printf("%d\n", a); return 0; } int addTen(int b) { b += 10; return b; }
There can be several declarations, but just one definition.
This is quite obvious for functions. For variables, it becomes important, if you compile from several files of source-code.
Then, all declarations outside functions are usually written into a central file, which is called "header-file" and has the suffix ".h". It is imported into the other source-files with an "#include"-directive.
If you want to use a variable across these files, you write a definition of it into one of the source-files. Then, you write a declaration of the variable with the keyword "extern" into the header-file. That means, this is just a declaration, the definition is somewhere else (i.e. in the source-file previously mentioned). Then you import the header-file into all source-files. Then, the variable is known everywhere in the program.
It is possible to use global variables in C.
If you use just one source-file, you can declare (and define) such variables at the beginning of the file outside any function. Then they are known in every function, without any further declarations inside the functions.
If the global variable is for some reason declared below the function, where it is used, it has to be declared again inside the function using the keyword "extern". Although this case might be rare.
But this declaration with "extern" is also required, if the global variable is declared in another source-file. And this case is rather common.
#include <stdio.h> int main(void) { extern int a; printf("%d\n", a); return 0; } int a = 15;
Or here's another example, declaring the variable outside a function, but defining the variable (setting the value of the variable) inside a function. It is then known in other functions too, as it's a global variable after all:
#include <stdio.h> int a; void showA(void) { printf("%d\n", a); } int main(void) { a = 5; showA(); return 0; }
Of course, you should prefer local variables, but global variables are for example useful, if complex data shall be easily made available to several functions.
It's also alright to use global variables, in case you should write C-code for very small systems, like a vintage Sinclair ZX Spectrum with just 48K or an Atari 800 XL with 64K for example.
The program can't become that long, that you get confused by your variables then.
If the keyword "static" is put in front of the declaration of a function, the function is hidden from code outside the concrete source-file.
If "static" is put in front of the declaration of global variables (outside any functions), the global variable is hidden from code outside the source-file.
If "static" is put in front of the declaration of a local variable , something totally different happens: The local variable doesn't get destroyed when the function closes. That means, when the function is called again, the static variable "remembers" the value it had, when the function closed last time. That is pretty weird. Better not use it.
With "typedef", you can define your own names of datatypes. "typedef" can create new names for existing datatypes, but not create new datatypes. If you encounter strange words in code, that look like unknown datatypes, look out for a typedef-declaration somewhere.
#include <stdio.h> typedef int Number; Number main(void) { Number n = 10; printf("%d\n", n); return 0; }
The source-code of C-programs can be separated into multiple files.
Declarations of structures and functions should go into a "header"-file with the suffix ".h". It is imported with "#include" into the file "main.c".
The quotation-marks in the #include-statement makes the compiler search for the ".h"-file in the current directory.
Other ".c"-files are used for compilation by passing their filenames to the compiler together with the "main.c"-file's name:
gcc main.c second.c -o test
This works with these three files:
#include <stdio.h> #include "second.h" /* 'main.c': Compile with 'gcc main.c second.c -o test' */ int main(void) { printf("%d\n", blue); test(); return 0; }
------------------------------------
#include <stdio.h> /* 'second.c': */ void test(void) { puts("Hello from function 'test()'."); }
------------------------------------
/* 'second.h' */ #ifndef second_headers #define second_headers enum colors {red, green, blue}; /* Function declarations: */ void test(void); #endif
Then, also a socalled "Makefile" can be written, to compile using the GNU-tool "make". How this can be done, I've described here.
The journey continues on the second page.