Why binary instead of decimal?

Because binary maps cleanly to electronic switches. A switch is either on or off — two states. Anything that has a definite on/off can store a bit. Decimal would require 10 distinguishable states per digit, which is much harder to engineer reliably. The choice of binary is for electronic-engineering reasons, not because there's anything mathematically special about base 2.

8 bits — historically standardized because 8 bits is enough to encode any of 256 different characters (2^8 = 256), which covered the ASCII character set with room to spare. A byte is the smallest unit most software reads and writes; addresses in memory typically point to byte locations. Larger units include the kilobyte (1,024 bytes), megabyte (~1 million bytes), gigabyte, terabyte, petabyte, etc.

How much information is a bit, really?

Mathematically: one bit is the amount of information needed to distinguish between two equally-likely outcomes. Choosing between heads and tails is one bit. Choosing between 4 equally-likely things is 2 bits (you need 2 bits to identify which one). This is Shannon's foundational definition of information from 1948 — bits aren't just a computer convention, they're the natural unit of information in mathematics.

What Is a Bit? — The Atom of Information

The simplest possible piece of information

A bit is the smallest unit of information possible. It represents one binary choice: 0 or 1, on or off, yes or no, true or false.

That's it. You can't have less information than a bit. Half a bit doesn't mean anything (you can have fractional bits as averages over many decisions, but individual bits are discrete).

Inside your computer, every piece of information — every photo, song, text message, video, program — is stored as a long sequence of bits. Everything at the storage and transmission level is bits. The meaning of those bits depends entirely on the software that interprets them.

Why binary?

Why use just 0 and 1, when humans use the ten digits 0-9?

Because 0 and 1 map cleanly to physical states. A switch is either on or off. A capacitor either has charge or doesn't. A magnetic spot is either polarized one way or the other. These are easy to read and easy to set, with high reliability.

A 10-state physical system, by contrast, is hard to build reliably. How would you distinguish "30% full" from "40% full" billions of times per second without errors? Binary is the engineering compromise that won.

That said, binary isn't mathematically privileged. You could build computers in base 3 (ternary) or base 10. The Soviet Setun computer in the 1950s used balanced ternary (three states: -1, 0, +1). It was elegant but the economics of binary electronics won out.

How bits combine

A single bit can represent 2 things. Two bits can represent 4 things (00, 01, 10, 11). Three bits can represent 8 things. In general, n bits can represent 2ⁿ different states.

This exponential growth is enormously useful. With just 10 bits, you can represent 1,024 different things. With 16 bits, 65,536. With 32 bits, about 4 billion. With 64 bits, about 18 quintillion.

It's why memory addressing scales the way it does. A 32-bit computer can address 2³² = ~~4 billion memory locations (~~4 gigabytes). 64-bit computers can address ~18 quintillion locations — far more than current memory technology can deliver.

The byte

A byte is 8 bits. With 8 bits, you have 2⁸ = 256 possible values.

Why 8? Mostly historical. Early computers used various word sizes (6 bits, 7 bits, 12 bits, etc.). IBM's System/360 in 1964 standardized on 8 bits per byte, partly because 8 bits could comfortably encode a character (English letters, digits, punctuation, plus some control codes) — what became the ASCII character set.

8 bits stuck because it was big enough for letters, was a power of 2 (which made addressing convenient), and was small enough to be efficient. Every modern computer uses 8-bit bytes.

Common multiples:

1 kilobyte (KB) = 1,024 bytes ≈ a short text document.
1 megabyte (MB) = 1,048,576 bytes ≈ a small image.
1 gigabyte (GB) = ~10⁹ bytes ≈ a high-quality movie.
1 terabyte (TB) = ~10¹² bytes ≈ a hard drive in 2024.
1 petabyte (PB) = ~10¹⁵ bytes ≈ a small data center.

Note: there's a confusing tradition where some sources use "1 KB = 1000" and others "1 KB = 1024." The IEC tried to fix this with "kibibyte (KiB) = 1024" and "kilobyte (KB) = 1000", but the convention isn't fully universal.

How everything ends up as bits

Text. Each character is assigned a number; the number is stored as bits. ASCII uses 7 bits per character (fits in a byte with one bit unused). Unicode uses 8-32 bits per character to cover every script in human use.

Images. Each pixel has a color, encoded as bits. A common scheme is 24 bits per pixel (8 for red, 8 for green, 8 for blue). A 1920×1080 image has ~2 million pixels × 24 bits = ~50 megabits = ~6 megabytes uncompressed.

Audio. The sound waveform is sampled many times per second, and each sample is encoded as a number in bits. CD-quality audio is 44,100 samples per second × 16 bits per sample × 2 channels = ~1.4 megabits per second.

Video. Just a sequence of images (frames) at, say, 30 per second, plus audio. Raw video is enormous; compression (H.264, AV1, HEVC) finds patterns to encode it more efficiently.

Programs. Machine instructions for the CPU, each encoded as a specific bit pattern.

Files in general. Just sequences of bytes, with the meaning determined by software. A .jpg and a .txt file with the same bytes would be interpreted entirely differently.

Shannon's deep insight

In 1948, Claude Shannon published "A Mathematical Theory of Communication," which founded information theory. He showed that a bit isn't just a computer convention — it's the mathematically natural unit of information.

Shannon's definition: the information content of an outcome is the negative log of its probability. An outcome with 50% probability has 1 bit of information. With 25% probability, 2 bits. With 12.5% probability, 3 bits.

This formalism turned out to apply far beyond computing:

Compression. You can encode any source of information using approximately the entropy (average bits per symbol) of the source. This sets the absolute lower bound on file compression.
Error correction. You can compute exactly how much redundancy you need to recover messages corrupted by noise.
Channel capacity. Every communication channel has a maximum information rate, set by its bandwidth and noise.

All of modern telecommunications — Wi-Fi, mobile networks, fibre internet, deep-space communication with spacecraft — uses Shannon's mathematics. The bit isn't arbitrary; it's the unit that makes information quantifiable.

Quantum bits (qubits) — a brief mention

The classical bit is binary: 0 or 1. The quantum bit is a particle in superposition — it can be in a weighted combination of 0 AND 1 simultaneously, until measured. (See the superposition article for more.)

A quantum computer with n qubits can be in a superposition of 2ⁿ classical states simultaneously, manipulate them collectively, and use interference to make the right answer pop out at the end. For specific problems (factoring, certain searches, quantum chemistry), this is exponentially faster than classical computing.

But qubits are extremely fragile — easily disturbed by environmental noise — so building practical quantum computers has been a multi-decade engineering project. It's still in early days.

For everyday computing, the classical bit is what matters.

If you'd like a personalized 5-minute course on bits, bytes, and the basics of computing, NerdSip can generate one.

The takeaway

A bit is one binary choice. Eight bits make a byte. Everything a computer stores or sends is just very long sequences of bits, with meaning supplied by software. The bit is also the mathematically natural unit of information, as Shannon showed in 1948 — not just an engineering convention but the right unit for talking about information itself. Knowing this changes how you read claims about file sizes, transmission rates, encryption strength, and information storage.

What Is a Bit? — The Atom of Information

Article summary

The simplest possible piece of information

Why binary?

How bits combine

The byte

How everything ends up as bits

Shannon's deep insight

Quantum bits (qubits) — a brief mention

The takeaway

Frequently asked questions

See also

Where to go next

Comments 2