scanf is evil
Wiki
< All Topics
Print

scanf is evil

... when used improperly

Consider the example:

#include <stdio.h>

#define SIZE 6

int main(){    
    int password;
    int valid = 0;
    char userName[SIZE];

    scanf("%s", userName);
    scanf("%d", &password);

    if(password == 1234)
        valid = 1;

    if(valid != 0){
        printf("You are logged!!!\n");
        return 0;
    }else{
        printf("Login Fail!!!\n");
        return 1;
    }
}

Compile the program using the command: gcc main.c -fno-stack-protector

If we use any login and the password 1234, we can log in. For example:
Paulo
1234
The program prints: "You are logged!!!".
But for "long" logins, things become strange. For instance, you can use the following:
abcdefghijk
12
The program shows: "You are logged!!!". WTF!!!
Well … the scanf (and some similar functions) does not limit the buffer size by default. If we read a long enough string, we will overrun the buffer size and start to write in the space allocated to other variables.

When the buffer for the string userName reaches its maximum, the overflowing characters will be written in the neighboring variables (we are corrupting the memory). Check out what may happen in the memory stack (in the example, the ASCII code 103 of the letter 'g' was written in the neighboring 'valid' variable):

We can only pray for a canary variable injected by the compiler to detect the stack smashing (we made sure that no canary was present by using the parameter -fno-stack-protector) or to the operating system to trigger a segmentation fault. Worse, many operating systems may not detect a stack smashing, and the injection of canaries may be prohibitive in some hardwares - for instance, in a microcontroller.

As a side note, things are not so simple in the real world, especially in a x86-64. For example, the compiler will align the stack in a multiple of 16 bytes. Thus, we often need to input more than 7 characters to force the error since the variables may be "more distant than we expect" in the memory. To learn more, check the references and https://prlalmeida.com.br/2021/11/18/assembly.

How to read a string

We should use a function that stops reading the input when we reach the maximum buffer size. The options include the fgets, getline, and the bounded scanf. Let us use the fgets as an example. The prototype is char *fgets(char *s, int size, FILE *stream);

The fgets will read at most size-1 characters, and insert an '\0' character at the end of the string. The remaining characters (if they exist) will remain in the buffer. See a correct example:

#include <stdio.h>

#define SIZE 6

int main(){    
    int password;
    int valid = 0;
    char userName[SIZE];

    fgets(userName, SIZE , stdin);
    scanf("%d", &password);

    if(password == 1234)
        valid = 1;

    if(valid != 0){
        printf("You are logged!!!\n");
        return 0;
    }else{
        printf("Login Fail!!!\n");
        return 1;
    }
}

We could also check the return of the fgets function (it will return NULL if nothing was read, for example). Notice that the newline character ('\n') will be in the string if it fits.

The getline function is similar (check the documentation). The scanf can also be used in a bounded manner. For example, scanf("%5s", userName); will read at most 5 characters (remember that you will need space for the '\0' at the end of the string).

This was just a simple example of a buffer overflow. Not managing this problem can lead to severe problems, such as random crashes and security problems related to the execution of spurious code made by attackers. If you want to learn more, check the references.

References

Seacord, R. C. Effective C: An Introduction to Professional C Programming. No Starch Press. 2020.

Seacord, R. C. Secure Coding in C and C++. Reino Unido: Pearson Education. 2013.

C ISO Standard. ISO/IEC 9899:2018, 2018.

Plantz. Introduction to Computer Organization: A Guide to X86-64 Assembly Language and GNU/Linux. 2011.

https://prlalmeida.com.br/2021/11/18/assembly

Sumário