13

This is just a curiosity question. Yesterday, searching about a question on Stack Overflow, I experimented with this small program:

File test.c:

#include <stdbool.h>
#include <stdio.h>
#include <unistd.h>

int main()
{
   bool is_terminal = isatty(1);
   printf("is_terminal: %d ", is_terminal);

   return 0;
}

(we wanted to see its behavior when run with ./a.out |& tee b.txt: as it displays at the same time on the console and flushes to a file too, would it be is_terminal: 0 or 1?

Dumb, I was dreaming of seeing it print is_terminal: 1 is_terminal: 0 ... It's is_terminal: 0 only, of course. I'm getting old, really...)

I compiled it with a simple gcc test.c, and it produced an a.out file ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, [...], not stripped

The program works fine, but why is the a.out file already 16,008 bytes long?
A hexdump shows me that there aren't any blocks filled with zeroes, and there's a good density of values.

So, isatty(..) + printf(...) shouldn't take so much room. Except if everything declared in the header files are taken into the a.out file? Even functions that the program won't use?

Or that this a.out file format has, by default, a structure which gathers a lot of information about, let's say, the attributes that have been used for compilation or some system environment/properties?

5
  • 5
    Note that int main() { return 0; } with cc -O x.c -s -o x is 14kB on x86_64 but 64kB or more on aarch64 (Pi) and Cygwin Commented May 20 at 10:14
  • 2
    Two lines? What about the #include files? Do wc -l /usr/include/{stdbool,stdio,unistd}.h Commented May 20 at 22:48
  • 1
    Nitpick: this is not an a.out file, it is an ELF file. In fact, if it were an a.out file, it would likely be smaller since the a.out format is a lot simpler than the ELF format. a.out was superseded by ELF in Linux 1.2 in 1995 and support was completely removed in kernel 5.19 in 2022. Commented yesterday
  • @JörgWMittag if I attempt from here (without having done an ld, at the direct output of the gcc) a mv a.out a.o && ./a.o will it run the same? Commented yesterday
  • Yes. The name of the file has no effect on its interpretation. It's the contents - the first few bytes of the file are what determines how it's treated. Commented 11 hours ago

1 Answer 1

26

The Executable and Linkable Format defines the structure used for programs on Linux by default, and this is what your compiler produces. It sets out minimum requirements: every program includes a header (64 bytes on 64-bit x86), several program headers (56 bytes each on 64-bit x86), and several sections with their own headers (64 bytes each on 64-bit x86). On my system, a default build of your test program has 13 program headers and 32 sections, for a total of 2840 bytes of headers!

All those sections have their uses (some of them debatable), and explain all the contents of the program.

There is of course the “text” of the program itself, the compiled code; that isn’t very big, and doesn’t include functions that aren’t used! With a dynamic linker it doesn’t even include the functions that are used, only pointers to them.

A fairly large amount of space is used, by default, for debug information; that is to say, information that allows a debugger to match the source code to the binary. You can discard this with strip, saving around 1.5KiB on my system.

The biggest occupier of space after that is the information on symbols used by the program, which the dynamic linker needs to link in the various libraries at runtime, and various data structures used to manage pointers to those symbols. Thus when your program calls printf, that is changed to an indirect call that is resolved by the dynamic linker, using a couple of data structures.

Then there are many sections containing build information: a unique build identifier, annobin information on some systems, the version of the C compiler used, compilation options etc.

As Peter Cordes points out however, the largest contributor to program file size, at least with small programs such as this one, is padding. Programs are loaded by mapping their files into memory; but sections can have varying purposes and request different mapping attributes. Thus current compilers and linkers produce binaries where code sections must be mapped executable, read-only data sections must be mapped read-only and non-executable, and of course read-write data sections must be mapped read-write and non-executable. Since such mapping attributes are managed per page, file contents, organised in segments (one per program header) each containing one or more sections, tend to end up aligned to the architecture’s page size so that they can be loaded without requiring too many mappings or risking leaking data from one section into another. (ELF distinguishes on-disk alignment and in-memory alignment, so most of this is optional.)

Thus on 64-bit x86 you’ll find all the code for your program (along with name of the dynamic linker, the Procedure Linkage Table and the C library’s initialisation code) in a single 4KiB page, itself aligned on a 4KiB boundary. Since the program’s code is much smaller than this, this leaves most of that page empty (nearly 3KiB unused on my system); the same applies to the header page (again, nearly 3KiB unused on my system). This is much worse on 64-bit ARM where the ABI requires 64KiB pages, although some page-packing is allowed (so the minimum binary size using GCC defaults ends up being slightly over 64KiB, not 128KiB or even 192KiB).

You can explore all this using readelf: readelf -a test. See also A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux, The Art of Creating Minimal ELF64 Executables by Unconventional Methods, and How programs get run: ELF binaries. Architecture requirements are defined in “System V ABI” documents, also known as processor supplement ABIs (64-bit ARM, 64-bit x86).

Other interesting Q&As on this topic include Minimal executable size now 10x larger after linking than 2 years ago, for tiny programs?, Why an ELF executable could have 4 LOAD segments?, What is a reasonable minimum number of assembly instructions for a small C program including setup?, Investigating the size of an extremely small C program, and GCC is generating binaries filled with zeroes.

3

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.