Writing and structuring programs
You should keep the following principles in mind as you develop larger and more complex programs:
- Choose descriptive names for your variables and functions
Self documenting code is easier to read and interpret. Code tells you how, comments tell you why. - Dont repeat yourself
Refactor common code into functions so you dont need to repeat yourself many times. - Avoid creating large functions
Split up large blocks of code into smaller functions. The Unix philosophy comes into play here, you should aim to create small, concise functions that focus on doing one thing and doing that one thing well. - Prefer concise code
Bigger functions and programs generally take more effort for a human to interpret. - Prefer immutability
Use const liberally. Delegate as much work to the compiler as possible, let it check invariants for you. - Dont reinvent the wheel, use the standard library
Do you know what functions that come with the C standard library? Spend some time looking though the documentation for the C standard library so you dont end up recreating a function that already exists. - Use widely accepted naming and coding conventions for the language you are working in
For example i, j, k are typically reserved for looping variables. It is expected that functions that take non const pointers will mutate them, so mark them as const if your function only needs to read access. - Be consistent
Use a single standard naming and indention convention throughout your entire codebase
Reading list
- Code Complete by Steve McConnell
- The Art of Unix Programming by Eric Raymond
- The Practice of Programming by Brian Kernighan and Rob Pike
- The C Programming Language (K & R) by Brian Kernighan and Dennis Ritchie
- Computer Systems - A Programmers Perspective by Randal Bryant and David OHallaron
String parsing in C
The C standard library provides the following string functions. Remember to compile with -std=c11.
1 |
|
The GNU extensions provide the some additional functions. Remember to compile with -std=gnu11.
1 | ssize_t getline(char ** lineptr, size_t * linecap, FILE * stream); |
Exercise 1: String function implementations
Write your own implementation of the atoi, strlen, strcpy, strtok and strcasecmp functions.
When you have implemented these functions, you can compare your code to the implementations in glibc.
The C compiler pipeline
Lets explore what the compiler does behind the scenes when we create a more complex program.
Makefile - builds the program
1 | CC=clang |
tasks.c - the scaffold code for the task list application
1 |
|
list.c - the implementation of the circular linked list
1 |
|
list.h - function prototypes for a circular linked list
1 |
|
Running the make creates the object files tasks.o and list.o and then finally the tasks program.
$ makeclang -c -g -std=c11 -Wall -Werror tasks.c -o tasks.oclang -c -g -std=c11 -Wall -Werror list.c -o list.oclang -g -std=c11 -Wall -Werror tasks.o list.o -o task
Preprocessor
Your code is first processed through the C preprocessor. This executes all of the preprocessor directives.
You can examine the raw output of the preprocessor by calling it directly:
$ cpp tasks.c
Or by instructing the compiler to only perform the preprocessing step.
$ clang -E tasks.c
This output is very helpful when debugging the problems related to macros and other preprocessor utilities.
Exercise 2: C preprocessor
- What does the #include directive do?
- What are include guards and when should they be used?
- We have seen how the #define directive can be used to create compile time constants.
The #define directive can also be used to create macros.
1 |
Similar to the #define directives, macros are substituted into their call site in a very similar manner to text search and replace. Why are the extra brackets around a, b and a < b
necessary in the macro definition for MIN? For example what happens with MIN(a++, 1))
Code generation and assembly
The -c flag on clang asks the compiler to preprocess the C code, generate assembly and finally assemble the result into an object file. The object files contain machine code - assembly in binary format for the target CPU. We need to create an object file for every translation unit in our source code (every .c file is a translation unit).
You can ask the compiler to stop after assembly generation with the following command:
$ clang -S -g -std=c11 -Wall -Werror list.c
This command produces list.s - the assembly generated from list.c. clang calls the assembler behind the scenes to turn this into machine code for object file.
You can also extract assembly from object files with objdump. Assembly files have two different syntaxes that are equivalent in functionality. objdump defaults to the AT&T syntax but can also output the Intel syntax.
$ clang -c -g -std=c11 -Wall -Werror list.c$ objdump -M intel -S
Since we have compiled with -g debugging symbols. objdump can annotate the assembly with the source. Remember that compiling with address sanitizer will affect the source code annotation and output of objdump.
Linker
Now we have two compiled object files, one for each translation unit. The linking stage merges these object files together to generate the executable. Behind the scenes, clang calls the ld linker to perform this task.
Since we often need to use variables and functions that are declared in another translation unit, C defines the concept of linkage. The job of the linker is to connect these translation units together.
- A variable or function has internal linkage if it is defined in the current translation unit.
- A variable or function has external linkage if it is defined in another translation unit.
- Any variable or function that is declared static has internal linkage, it is good practice to declare every variable or function as static unless it needs to be accessible from another translation unit.
Exercise 3: Declarations, definitions and linkage
- Which of these are declarations and which are definitions?
- Classify the linkages in the above declarations as internal or external.
- Which definitions are accessible from another translation unit in the above C file?
- What happens if the linker cant find a function that has external linkage?
- Header files often contain only declarations. There is nothing stopping us from putting definitions into the header as well. When would this be useful?
Exercise 4: Task list application
Create an interactive task list application from the provided scaffold.
- Your application should load tasks.txt from the current directory and present each line as a task.
- Your application should prompt for commands (help, new, delete, move, undo) which manipulate the list.
- Your application should save the updated task list and exit once it encounters EOF on stdin.
- Your application should be able to handle lines of any length.
Note: since C does not have generics, we have edited the linked list to store the void* data type, now you can use it to store any pointer type. However, this means that you now have to do more than one allocation for every element stored in the list, which isnt very efficient. You can trivially upgrade list.h to the version used in the Linux kernel with some preprocessor tricks to prevent the need for any double allocations.