Internet Protocol

In subject area: Computer Science

Internet Protocol (IP) is the fundamental protocol on the Internet that provides connectionless, packet-based communication between computers globally or within isolated networks. It works at the network layer and allows data to flow seamlessly through different networks from one end user to another.

AI generated definition based on: Computers as Components (Fourth Edition), 2017

How useful is this definition?

Add to Mendeley

Chapters and Articles

You might find these chapters and articles relevant to this topic.

2003, MCSA/MCSE (Exam 70-291) Study GuideDeborah Littlejohn Shinder, ... Laura Hunter

Internet Protocol

The Internet Protocol (IP) is probably the best known of the TCP/IP protocols. Many people, especially those who have even a passing familiarity with computer technology, have heard or used the term IP address. Later in this chapter, we’ll take an in-depth look at how the IP protocol works and you’ll learn the intricacies of IP addressing.

With regard to the TCP/IP architecture, IP is a routable protocol (meaning it can be sent across networks) that handles addressing, routing, and the process of putting data into or taking data out of packets. IP is considered to be connectionless because it does not establish a session with a remote computer before sending data. Data sent via connectionless methods are called datagrams. An IP packet can be lost, delayed, duplicated, or delivered out of sequence and there is no attempt to recover from these errors. Recovery is the responsibility of higher layer protocols including Transport layer protocols such as TCP.

IP packets contain data that include:

Source IP address The IP address of the source of the datagram.

Destination IP address The IP address of the destination for the datagram.

Identification Identifies a specific IP datagram as well as all fragments of a specific IP datagram if the datagram becomes fragmented.

Protocol Indicates to which protocols the receiving IP should pass the packets.

Checksum A simple method of error control that performs a mathematical calculation to verify the integrity of the IP header.

Time-to-Live (TTL) Designates the number of networks the datagram can travel before it is discarded. This prevents datagrams from circling endlessly on the network.

Read full chapter
URL: https://www.sciencedirect.com/science/article/pii/B978193183692050007X

18.1 Basics of Internet Protocol (IP)

Internet protocol is a communication system. Unlike the telephone system, it does not require a connection between the sender and receiver. Instead, the information is broken up into packets, and each packet finds its own path over the IP network from source to destination. These networks can be public such as the internet, or private such as a corporate network. The source, destination and every node in the network has an address, which is 32 bits, or eight hexadecimal bits. When expressed decimally, it is in the familiar form of xxx.xxx.xx.xx, or 10 decimal digits. The packets have two major components: the header and the data. The IP header is 20 bytes, and the data is a variable length up to 65615 bytes.

IP addresses can be constant, or temporary. This is best illustrated by example: an organization or corporation may have a unique, constant IP address, used to communicate over the public IP network. However, within the corporation there’s a private IP network not intended to be accessible to outsiders, which may have hundreds or thousands of different computers or devices with IP addresses. Using dynamic host reconfiguration protocol (DHCP), a router within the corporation can assign temporary IP addresses to these nodes on the private network. Certain ranges of IP addresses are designated for use by DHCP. These addresses are assigned whenever a computer tries to connect to the private network. The dynamically assigned addresses need only to be unique within the private network – other private networks can use the same range of addresses within their own private network. A function in the router bridging the private network to the external network, called network address translation (NAT), is used to translate between the address space of the public internet and the DHCP-assigned addresses within the private network.

Routers are the key components that allow anyone to transmit data to anyone else over the internet. Routers are distributed throughout the IP networks – they examine the headers, and, using the destination address, forward the packets towards their destinations. Since the packet may pass though many routers, each router must decide how best to forward the packet to get it closer to its destination. This is done by maintaining large routing tables, and monitoring the status of various network connections for traffic levels, and sometimes by determining priority for a given packet.

BERJAYA

Figure 18.1. IP packet formatting.

Read full chapter
URL: https://www.sciencedirect.com/science/article/pii/B9780124157606000180

The Internet protocols

In order for various computers to talk to each other via any network, there must be a common language of understanding, a common protocol. In networking, the term protocol refers to a set of rules that govern communications. Protocols are to computers what language is to humans. Since this book is in English, to understand it you must be able to read English. Similarly, for two devices on a network to communicate successfully, they must both understand the same protocols.

Various protocols belonging to various OSI layers (see next chapter), are used in today’s world of the Internet, and this book would not be complete without listing most popular ones and describing them briefly.

TCP/IPTransmission Control Protocol / Internet Protocol

Two of the most popular suites of protocols used in the Internet today. They were introduced in the mid-1970s by Stanford University and Bolt Beranek and Newman (BBN) after funding by DARPA (Defence Advanced Research Projects Agency) and appeared under the Berkeley Software Distribution (BSD) Unix.

TCP is reliable; that is, packets are guaranteed to wind up at their target, in the correct order.

IP is the underlying protocol for all the other protocols in the TCP/IP protocol suite. IP defines the means to identify and reach a target computer on the network. Computers in the IP world are identified by unique numbers, which are known as IP addresses (explained further in this chapter).

PPP - Point-to-Point Protocol

A protocol for creating a TCP/IP connection over both synchronous and asynchronous systems. PPP provides connections for host to network or between two routers. It also has a security mechanism. PPP is well known as a protocol for connections over regular telephone lines using modems on both ends. This protocol is widely used for connecting personal computers to the Internet.

SLIP − Serial Line Internet Protocol

A point-to-point protocol to be used over a serial connection, a predecessor of PPP. There is also an advanced version of this protocol known as CSLIP (Compressed Serial Line Internet Protocol) which reduces overhead on a SLIP connection by sending just a header information when possible, thus increasing packet throughput.

FTP − File Transfer Protocol

A protocol that enables the transfer of text and binary files over a TCP connection. FTP allows for files transfer according to a strict mechanism of ownership and access restrictions. It is one of the most commonly used protocols over the Internet today.

Telnet

A terminal emulation protocol, defined in RFC854, for use over a TCP connection. It enables users to log in to remote hosts and use their resources from the local host.

SMTP − Simple Mail Transfer Protocol

A protocol dedicated for sending e-mail messages originating on a local host over a TCP connection to a remote server. SMTP defines a set of rules that allows two programs to send and receive mail over the network. The protocol defines the data structure that would be delivered with information regarding the sender, the recipient (or several recipients), and, of course, the mail’s body.

HTTP − Hyper Text Transport Protocol

A protocol used to transfer hypertext pages across the World Wide Web.

SNMP − Simple Network Management Protocol

A simple protocol that defines messages related to network management. Through the use of SNMP, network devices such as routers can be configured by any host on the LAN.

UDP − User Datagram Protocol

A simple protocol that transfers packets of data to a remote computer. UDP does not guarantee that packets will be received in the same order they were sent. In fact, it does not guarantee delivery at all. UDP is one of the most common protocols used in multicasting.

ARP−Address Resolution Protocol

In order to map an IP address into a hardware MAC address the computer uses the ARP protocol which broadcasts a request message that contains an IP address, to which the target computer replies with both the original IP address and the hardware MAC address.

NNTP − Network News Transport Protocol

A protocol used to carry USENET posting between News clients and USENET servers.

Read full chapter
URL: https://www.sciencedirect.com/science/article/pii/B9780124045576500112

Glossary

    Client

    A device that requests a service from a remote computer.

    Internet

    An internet is a collection of networks connected together. The Internet is a global collection of interlinked networks. Computers connected to the Internet communicate via the Internet Protocol (IP).

    Internet Protocol

    A protocol that was designed to provide a mechanism for transmitting blocks of data called datagrams from sources to destinations, where sources and destinations are hosts identified by fixed length addresses.

    Protocol

    A description of the messages and rules for interchanging messages in intercomputer communication.

    Server

    A device (normally a computer) that provides a service when it receives a remote request.

    TCP

    Transmission control protocol. A protocol that provides reliable connections across the Internet. Protocols other than TCP may be used to send messages across the Internet.

    TCP/IP

    Transmission control protocol implemented on top of the Internet protocol. The most commonly used protocol combination on the Internet.

Read full chapter
URL: https://www.sciencedirect.com/science/article/pii/B0122274105008607
2004, Communication NetworkingAnurag Kumar, ... Joy Kuri

2.3.5 The Internet

The Internet refers to the worldwide interconnection of packet networks that all use a suite of protocols that originated in the famous ARPANET project of the 1970s. In this protocol suite, IP (Internet Protocol) is the network layer protocol, and TCP (Transmission Control Protocol) and UDP (User Datagram Protocol) are the most commonly used protocols at the transport layer. The common noun “internet” is often used to connote a network that uses the Internet protocol suite. The IP protocol can operate over any link layer (and, by implication, any physical layer) that can transport IP packets. Because it simply requires the implementation of a packet driver to carry packets over any bit carrier, an internet can be operated over essentially any bit-carrier infrastructure. The Internet protocol suite also does not define the layers above the transport layer. Thus, for the Internet, Figure 2.21 simplifies to the depiction in Figure 2.27 (we show many more packet switches, and the physical layer is implicit in the links). We will see later that this is a simplified representation; for example, an application may run directly over IP (thus taking care of its transport needs itself).

BERJAYA

Figure 2.27. The Internet protocol architecture; “Apps” means applications. In this simplified depiction, the packet switches are shown to have only IP and the link layer. The end nodes have the transport protocol in addition to IP and the link layer.

The most widely deployed version of IP is version 4, which uses 32-bit addresses. The network address of an entity is also called its IP address. As in most communication networks, the addresses in the Internet are hierarchically assigned. The address of each device comprises some contiguous high-order bits that identify the subnetwork in which the device resides; this is also called the network prefix. The remaining bits identify the device uniquely in the subnetwork. So, for example, all the addresses in a campus may be of the form 10010000.00010000.010xxxxx.xxxxxxxx, where each x can be 0 or 1. In such a case the network prefix is 10010000.00010000.010, and it is of length 19 bits.

Unlike circuit-multiplexed networks or the packet-multiplexed X.25 and ATM networks, internets do not fix a path for the packet flow on a connection. The network simply provides connectivity between end points. Every packet carries the full network address of its destination end point. Each packet switch looks at the arriving packets, consults a routing table (which actually deals with network prefixes), and forwards the packet to an outgoing link that, hopefully, carries it closer to its destination. By rejecting the virtual-circuit approach in favor of per-packet, hop-by-hop, best-effort routing, the Internet gains the advantages of (1) quick delivery of small amounts of data, (2) automatic resilience to link failures, and (3) ease of multicast (i.e., the transmission of a packet to multiple destinations by replicating it at appropriate points in the network, rather than by the source sending multiple copies of the packet).

Notice that because IP routes each packet as a separate entity, it is possible for consecutive packets of the same session to follow different routes and then, owing to different delays on the routes, arrive out of order. The IP layer at the destination simply delivers the packets out of order! Measurements showing that there can be significant packet reordering in the Internet have been reported in [29]. In addition, a link layer may discard a packet after unsuccessfully attempting it a few times. A packet may also be discarded at the queue of a physical link because of exhaustion of buffer space or some packet scheduling decision. Hence the packet delivery service that the IP layer provides is unreliable and nonsequential. This is known as a datagram delivery service.

Figure 2.28 shows a fragment of an internet's topology. Each router is attached to physical links by its interfaces. Note that a multipoint link can have many routers attached to it. All the devices attached to a link are each other's neighbors. The Internet is equipped with routing protocols that permit routers to identify good paths on which to forward packets. These protocols work on the basis of metrics assigned to the network links and a distributed algorithm for determining shortest paths under these metrics. Note that a routing protocol is also an application running on the network! Hence it needs to use the packet transport services of the network. Fortunately there are distributed algorithms that can learn shortest paths through the network, and, in the execution of these algorithms, nodes need only exchange packets with their neighbors. Hence routing in an internet can bootstrap itself. A simple protocol (aptly called the Hello protocol) is used by routers to discover neighbors. After neighbors are discovered they begin to exchange routing protocol packets, which are used in computations that gradually lead to all the routers to learn shortest paths to network prefixes. One such distributed algorithm works by each router informing its neighbors about the status of the links to which the router is attached. This information is flooded through the network. Eventually every router obtains these link state advertisements (LSAs), which can be put together to obtain a full topology view in each router. A shortest path computation can then be locally performed and routing tables built up in each router. This algorithm is implemented by the currently most popular routing protocol, OSPF (Open Shortest Path First). The OSPF protocol is a routing application protocol, but it does not utilize the services of a transport protocol, instead running directly on the IP layer in routers.

BERJAYA

Figure 2.28. A fragment of an internet, showing routers (labeled R), hosts, and links (including multipoint “links,” such as LANs).

The basic Internet packet transport does not promise any QoS to the traffic streams it carries. The network provides connectivity and routing; any user attached to the Internet can initiate multiple flows. There is no connection admission control (CAC); all the flows end up sharing the network resources. Thus the Internet transport provides a highly variable quality of service that can vary widely depending on geographical location and time of day. To bring some sanity to the situation and to enforce some fairness in bandwidth sharing, the end-to-end Transmission Control Protocol (TCP) implements an adaptive window protocol. This protocol reacts to packet losses by reducing its window size and then slowly building it up again. This function of congestion control and bandwidth sharing is in addition to the two other functions that TCP performs: (1) reliable and sequential packet transport over IP's unreliable and nonsequential packet transport service, and (2) sender–receiver flow control (which prevents, for example, a fast computer from flooding a slow network printer).Chapter 7discusses TCP's congestion control and bandwidth-sharing function at length.

UDP (User Datagram Protocol) is another popular layer 4 protocol. This protocol simply permits a user above IP to utilize the basic datagram delivery service, thereby multiplexing several flows into one IP address. Note that this is the logical multiplexing of several flows originating and terminating at a common IP address. It should be distinguished from the physical multiplexing of flows into a bit carrier, something that is a link layer function. We distinguish the different flows by assigning them different UDP port numbers. UDP is used by applications, such as packet voice telephony, that must receive guaranteed service rates in the network and cannot deal with packet loss by retransmission, and hence cannot use the services of TCP. Other mechanisms, above the transport layer, are used to facilitate such applications.

We have stated that the Internet's packet transport is not designed to provide specific QoS to the flows it carries. The service model is quite simple. Nodes that have valid IP addresses can attach themselves to the network and send IP packets back and forth between themselves. The network provides an unreliable, nonsequential packet transport service, with no guarantee of transfer rate or delay, often called best-effort service. The network does not distinguish between the various traffic flows; it does the best it can, treating everyone alike. The idea is that the applications best know what they need and should adapt to the transport that the network provides by using end-to-end mechanisms. TCP's mechanisms for achieving a reliable and sequential packet transport service over the Internet, and some sort of fair sharing of network bandwidth, are a prototypical example of the end-to-end approach. Over the past decade, however, the Internet has become the packet transport network of choice for all kinds of store-and-forward data transfer and increasingly is being used by applications that require some minimal quality of packet transport. Broadly there are two approaches that can be followed to provide some level of QoS over the Internet: new QoS architectures and traffic engineering.

Two QoS architectures have been proposed and extensively studied: the Integrated Services Architecture, abbreviated as IntServ, and the Differentiated Services Architecture, abbreviated as DiffServ. The proposals in the IntServ architecture essentially allow each session arriving to the network to request QoS guarantees; it must declare its traffic characteristics to the network, and the network has the choice of rejecting the request or accepting it at some lower level of QoS. This architecture requires signaling protocols to be put in place, and packet-scheduling mechanisms are needed at the packet-multiplexed links. Evidently, these protocols and scheduling mechanisms need to be implemented in every router in parts of the network over which such QoS guarantees are needed.

In the high-speed core of the network, the session arrival rates and the packet rates are too high to permit session-by-session analysis and packet-by-packet scheduling. Hence, as in other transport systems (e.g., airlines and railways), it has been argued that only a few levels of differentiation may suffice (e.g., first class, business class, and economy class). The classes are, of course, priced differently. At the simplest level, there may be one priority class of traffic (reserved, for example, for interactive packet telephony), and another class for the remaining traffic. There could be one additional class for the premium store-and-forward traffic, and the remaining traffic could be handled by the default best-effort packet transport. This idea has led to the DiffServ proposals. The scheduling at the links distinguishes a few classes of packets (e.g., 2, 3, or 8, depending on the choice of the network operator). The class of a packet is identified by the contents of its header; such classification can be based on six special bits in the IP header, called the DS code (DiffServ code), and could also be based on source and destination addresses and even the source and destination transport protocol port numbers. The schedulers are aware only of such packet aggregates; there is no awareness of individual so-called microflows. A DiffServ core network may put a limit on the amount of traffic of each class it is willing to accept from each customer network that transports traffic through it. If a network violates such restrictions, the DiffServ core can reject the excess traffic or handle it at lower levels of service. It is up to the edge nodes of the customer network to police the traffic that it offers to the DiffServ core.

It has been argued that in conjunction with an IntServ architecture in the lower-speed edges of the network, a DiffServ architecture would suffice in the core to provide an overall end-to-end QoS to applications.

The other approach for providing QoS over the Internet is traffic engineering. Clearly, if there is sufficient bandwidth the best-effort packet transport suffices.

With the rapid deployment of optical networks, even to within a few hundred meters of network access points, it is becoming easier and less expensive to quickly deploy additional bandwidth where there are bottlenecks. Hence, it has been argued that it suffices to manage QoS by network and traffic engineering. The network topology (node placement, node interconnection, and link capacities) should be properly designed with the expected traffic in mind, and it should be tolerant of link failures. It also should be possible to deploy new bandwidth when needed. Furthermore, in the operating network the traffic should be carefully routed to prevent the formation of bottlenecks, and the routing should be monitored and revised from time to time. Traffic engineering alone, of course, cannot address the problem of congestion in access networks.

Finally, to end this section, we turn to the components of Internet hosts and routers. Figure 2.29 shows the protocols implemented in a typical Internet host. This host can handle email because it has SMTP (Simple Message Transfer Protocol) and can browse the Web because it has HTTP (Hypertext Transfer Protocol); both of these protocols run over TCP. The host can also participate in a packet voice call, something that requires the RTP (Real Time Transport) protocol, which in turn runs on UDP's simple datagram service.

BERJAYA

Figure 2.29. The typical protocols in an end system (or host) attached to the Internet.

Now let us look at the components of an Internet router (see Figure 2.30). A router is a packet switch and hence moves packets between several links. In the figure, the router shown has three links: a link to a wide area network (WAN), a link into a LAN, and a dial-up link (into which a connection can be made over the telephone network). There is a link protocol for each physical link: HDLC (High-Level Data Link Control) for the WAN link, the IEEE standard link protocol (IEEE 802.2) for the LAN link, and PPP (Point to Point Protocol) for the dial-up link. The IP layer runs across all these link layers and forwards packets between them. The packet switching could simply be done by reading into and copying out of the processor memory, or there could be a hardware switching fabric in a high-capacity router; the switch is not shown. In addition to these components (which were also depicted in Figure 2.27), there is a routing protocol (shown here directly over IP, as is the case for OSPF). There is telnet, a protocol for permitting an administrator to log in to the router to configure its parameters; because telnet requires TCP, TCP must also be implemented on a typical router. Furthermore, there is a network management protocol that is used to monitor the traffic in the router and the status of its links. As mentioned earlier, SNMP is the commonly used protocol for network monitoring and management in the Internet, and, as shown in Figure 2.30, it operates with UDP as the transport protocol. The bottom part of Figure 2.30 shows what is involved in routing a packet from one port to another. Not only must IP do forwarding, but there is also the need to classify the packets into QoS classes to appropriately queue them at the output port; two queues are shown in the figure, perhaps a high-priority queue for voice packets and a low-priority one for data packets.

BERJAYA

Figure 2.30. The protocols in a router (top) and the way an IP packet is forwarded from one interface of a router to another (bottom).

Read full chapter
URL: https://www.sciencedirect.com/science/article/pii/B9780124287518500021

8.4.2 IP

The Internet Protocol (IP) [Los97, Sta97A] is the fundamental protocol on the Internet. It provides connectionless, packet-based communication. Industrial automation has long been a good application area for Internet-based embedded systems. Information appliances that use the Internet are rapidly becoming another use of IP in embedded computing. The term Internet generally refers to the global network of computers connected by the IP. However, it is possible to build an isolated network not connected to the global Internet that uses IP.

Internetworking

IP is not defined over a particular physical implementation; it is an internetworking standard. Internet packets are assumed to be carried by some other network, such as Ethernet. Generally, an Internet packet travels over several different networks from source to destination. The IP allows data to flow seamlessly through these networks from one end user to another. The relationship between IP and individual networks is illustrated in Fig. 8.5. IP works at the NWK layer. When node A wants to send data to node B, the application’s data pass through several layers of the protocol stack to get to the Internet Protocol, which then creates packets for routing to the destination. These are then sent to the data link and PHY layers. A node that transmits data among different types of networks is known as a router. The router’s functionality must go up to the IP layer, but because it does not run applications, it does not need to go to higher levels of the OSI model. In general, a packet may go through several routers to reach its destination. At the destination, the IP layer provides data to the transport layer and ultimately to the receiving application. As the data pass through several layers of the protocol stack, the IP packet data are encapsulated in packet formats appropriate to each layer.

BERJAYA

FIGURE 8.5. Protocol utilization in Internet communication.

IP packets

The basic format of an IP packet is shown in Fig. 8.6. The header and data payload are both variable in length. The maximum total length of the header and data payload is 65,535 bytes.

BERJAYA

FIGURE 8.6. Internet protocol packet structure.

An Internet address is a number (32 bits in early versions of IP, 128 bits in IPv6). The IP address is typically written in the form xxx.xxx.xxx.xxx. The names by which users and applications typically refer to Internet nodes, such as foo.baz.com, are translated into IP addresses via calls to a Domain Name Server (DNS), one of the higher-level services built on top of IP.

The fact that IP works at the network layer tells us that it does not guarantee that a packet is delivered to its destination. Furthermore, packets that do arrive may come out of order. This is referred to as best-effort routing. Because routes for data may change quickly with subsequent packets being routed along different paths with different delays, the real-time performance of IP can be hard to predict. When a small network is contained totally within the embedded system, performance can be evaluated through simulation or other methods because the possible inputs are limited. Because the performance of the Internet may depend on worldwide usage patterns, its real-time performance is inherently harder to predict.

IP services

The Internet also provides higher-level services built on top of IP. The Transmission Control Protocol (TCP) is one such example. It provides a connection-oriented service that ensures that data arrive in the appropriate order, and it uses an acknowledgment protocol to ensure that packets arrive. Because many higher-level services are built on top of TCP, the basic protocol is often referred to as TCP/IP.

Fig. 8.7 shows the relationships between IP and higher-level Internet services. Using IP as the foundation, TCP is used to provide File Transport Protocol (FTP) for batch file transfers, Hypertext Transport Protocol (HTTP) for World Wide Web service, Simple Mail Transfer Protocol (SMTP) for email, and Telnet for virtual terminals. A separate transport protocol, the User Datagram Protocol (UDP), is used as the basis for the network management services provided by the Simple Network Management Protocol (SNMP).

BERJAYA

FIGURE 8.7. Internet service stack.

Read full chapter
URL: https://www.sciencedirect.com/science/article/pii/B9780323851282000086

5.3.3 Internet Protocol (IP)

Internet Protocol (IP) is not just the Internet. It is a very important connectionless layer 3 protocol used to move packets around on the Internet as well as on a lot of other networks. IP allows any computer on the Internet (or another IP network) to locate and exchange packets with any other computer. You often see TCP/IP together, where TCP is a layer 4 protocol often used with IP (as in, for instance, UDP). TCP and UDP are discussed later.

IP addresses are expressed in the dotted decimal format. In version 4 of IP, addresses are 32 bits long. Table 5.4 illustrates the dotted decimal address. The top row gives the dotted decimal representation; the second row expresses this as individual decimal numbers, the third row shows the corresponding binary numbers. This IP address was assigned to the WAN port on the writer's gateway connected to his cable modem at the time of writing. The assignment came from his cable operator. As shown later, the IP address may change from time to time.

Table 5.4. Dotted Decimal Notation for IPv4 Addresses

Dotted decimalEmpty Cell24.98.160.78Empty Cell
Base 10249816078
Base 20001100001100010101000001001110

The contents of an IP packet are called a datagram. Each datagram, as it traverses the Internet, retains the same source and destination IP addresses, even as it traverses different layer 2 networks. For example, a packet may travel part of the distance on a token ring network, partway on ATM, and partway on Ethernet. These are layer 2 protocols, and each will put on its own layer 2 header as the IP datagram enters that network and strip its layer 2 header when it is handed to another network. But the IP header stays with the datagram during the whole process (though it will be modified in each router through which it passes), defining the ultimate destination of the datagram. (An exception is that a firewall between networks or between a network and a computer may substitute its own IP address for that of the ultimate process using the datagram.)

When switching in a network is done by looking at the layer 3 (IP) address, the device doing the switching is usually called a router. Routers technically have hosts built in, because they themselves can be addressed (they have their own IP addresses), for management purposes.

IPv4 Versus IPv6

IP version 4 (IPv4) is in common use today. For years there has been a desire to replace IPv4 with IPv6, but the difficulty of managing such a change has prevented widespread deployment of IPv6. There are several new features in IPv6, the most notable being the expansion from a 32-bit address to a 128-bit address. This will simplify address management, removing the need for assigning temporary private IP addresses to devices. In addition, the header format is streamlined. A flow label is added to denote a stream of traffic with special QoS needs. Support for both authentication and confidentiality are mandatory, and there are a few other changes. Unless otherwise noted, we are describing IPv4 in this book.

IP Embedded in Ethernet

Figure 5.14 illustrates an IP datagram inside an Ethernet frame. This figure is laid out with four bytes (32 bits) in a horizontal row. Each horizontal row follows serially the one above it.

BERJAYA

Figure 5.14. IP Datagram Inside Ethernet Frame.

Compare Figure 5.14 with the illustration of an Ethernet frame in Figure 5.13, which identified all the elements of the Ethernet frame consistent with Figure 5.14. The “Data and pad” of Figure 5.13 are expanded in Figure 5.14 to show the elements of the IP datagram. The elements of the IP header are summarized next.

The IP Header

The first four bits, vers, contain the IP version number, currently 4 (0100) and possibly going to 6 (1010) at some point. The next four bits, hlen, counts the number of 32-bit units in the IP header. The basic IP header as shown is 20 bytes, or five units of 32 bits. We show later that some IP headers are longer as a result of options being embedded. IP defines a number of options that may create a longer header. These are covered in the endnotes. Options include, but are not limited to, the ability to record the route taken by the datagram, the ability to force a particular route, and time stamping.

The next field is a 6-bit field labeled diffserv, for differentiated service. The 6 bits contain a so-called diffserv codepoint (DSCP) value, which tells a router how to handle the packet. Special treatment may include expedited forwarding or extra protection from loss. Most values for this field are not defined in IP but rather are used by different networks in accordance with each network's policies. This field is an important enhancement to IP to bring it closer to parity with the QoS capabilities of ATM or similar protocols. It allows datagrams to be given higher priority if the datagram is something, such as voice, that needs to get through a network quickly. (Note, however, that just because ATM has good QoS features does not mean that they are always used.)

The final two bits in this field, ECN, are the explicit congestion notification field, which provide cooperating routers with a way to improve management of congestion on a link. When a router handling the packet notices that a link is getting congested, it sets these two bits to 11 before forwarding the packet on the congested link. When the packet is ultimately received, the receiving host sees that one of the routes the packet took is congested, so it slows down the sender, using methods shown later in the section on Transmission Control Protocol.

The next two bytes are the payload length, which defines the length of the datagram, not counting the header. IP datagrams can be quite long, though they may well have to be segmented into several packets or cells (depending on the protocol on which it is carried) on any particular transport segment. For example, if the datagram is being carried on Ethernet, as shown in Figure 5.14, the maximum amount of data in the Ethernet frame is 1500 bytes (this includes the IP header and the datagram). If the datagram goes through an ATM network, it will have to be segmented into cells no longer than 48 bytes, as described earlier. If the datagram is too long to fit, it will have to be segmented into several packets. The fragment identifier field is added at the sender, to be used to uniquely identify the packet to which a fragment belongs so that the fragments can be reassembled at the destination. This field is set to a value even if the sender is not fragmenting the packet, because it is possible that a later network will have to fragment it.

The next three bits are flags that instruct routers on segmentation. The first bit is always 0. The next bit, DF, indicates that the packet may not be fragmented. If a router encounters a datagram that is too long and the DF bit is set, then the router has no recourse but to discard the packet. The next bit, MF, indicates that more fragments follow. It is set except on the last segment of a fragmented datagram. Finally, the fragment offset field specifies how far into the datagram the fragment is so that the fragments can be reassembled as one datagram by the time they reach their destination (if not before). It is quite possible that the segments will arrive out of order, so segments received before predecessor segments will have to be held until all segments are received.

The next field is the hop limit. (This was originally called the time to live (TTL) field). The sender sets this field to some value; every time the datagram passes through a router, the field is decremented by 1. If the value reaches 0, the packet is discarded. The hop limit protects the Internet against an undeliverable packet, which would otherwise hop around a network forever if it could not be delivered. This can happen if a routing loop is created; that is, an error in a routing table sends a packet back to the previous router that handled it, which sends it to the router with the errored routing table, which sends it back, etc. If nothing caused the packet to be discarded, it could bounce around forever in the Internet, tying up resources needed by packets that can be delivered.

Next is the next header field, also called the protocol field. This indicates the next higher header, which may be a layer 4 header or may be intimately related to IP. We shall discuss some of these protocols later. Finally, the header checksum provides protection against corruption of the header. Each router that handles the datagram computes the checksum by doing a 16-bit addition of the header, discarding carries. The receiving router does the same computation; if the two disagree, it means that an error was made in the header, so the entire packet is discarded.

Finally, the IP header includes the source and destination IP addresses. These don't change as the datagram traverses the network. Each router handling the packet reads the destination address to know where to send the packet (controlled by a massive and continuously changing routing table in the router) and reads the source address to update its routing table for that address.

Types of IP Addresses

There are three types of IP addresses: unicast, multicast, and broadcast. Unicast addresses are the most straightforward. They are assigned to identify a single interface (for most practical purposes this equates to a single host, though there is a difference). A unicast message is intended for one and only one interface. Multicast addresses identify a set of interfaces. This allows a message to be generated once and sent to a number of interfaces. A router with receiving hosts attached to two or more ports will replicate the packets on each port. Multicast addressing is useful for video conferencing, in which the same signal is to be sent to several participants. Similarly, it is used in IP distribution of video (IPTV). If one router needs to pass messages to several other routers, it can do so using multicast addresses. Broadcast addresses are a special case of multicast addresses. They identify all interfaces on a network. The use of broadcast addresses is discouraged.

Internet Control Message Protocol (ICMP)

Internet Control Message Protocol (ICMP) is a layer 3 (network) protocol, but it sends its messages inside IP datagrams. The next header field (see Figure 5.14) identifies ICMP. ICMP coordinates many aspects of the operation of a network, including controlling address assignments, reporting errors, and providing diagnostic support.

Router discovery is one important function it performs. This allows hosts (any device on a network) to discover what routers are connected to the network. Periodically, routers send router advertisement messages to all systems on the network. The router advertisement message includes the address of every router known to the sending router and a relative preference value. When possible, all hosts should select the router with the highest preference value. The message also includes a lifetime field. If a host doesn't hear from a router within that lifetime (typically 30 minutes), it must assume that the router is no longer available. The router advertisement messages are sent more frequently than this.

Just knowing the addresses of all routers on a network does not ensure that a host will direct a message to the router that is the best path to the destination. It selects a router to which to send the message. If it selects the wrong router, that router redirects the message to the correct router and also sends an ICMP redirect message to the host, saying, in essence, “Next time you want to send a message to this host, send it to this other router, not to me.”

Internet Group Management Protocol (IGMP)

Like ICMP, Internet Group Management Protocol (IGMP) is a protocol in its own right, but it is an integral part of any IP implementation. Hosts use IGMP to announce (and later renounce) their membership in groups. Routers listen to these messages to track group membership. They then know how to forward datagrams addressed to groups.18

Dynamic Host Configuration Protocol (DHCP)

IP addresses are assigned to individual hosts, that is, all computers on a network. These IP addresses may be taken from public IP addresses, or, since IPv4 has a limited address space, most machines that don't have need for a public address are assigned private IP addresses owned by the network to which the host is attached. This allows reusing addresses on the Internet and simplifies the management of the network. IP addresses may be assigned manually to hosts, or they may be assigned automatically. Automatic assignment is usually preferable when possible.

The protocol used to assign IP addresses is the Dynamic Host Configuration Protocol (DHCP). When a host, such as your PC, comes up on a network, it asks for an IP address if it has not been assigned a permanent one. The host sends a DHCPDISCOVER packet. Connected somewhere to the network is at least one DHCP server, whose job it is to assign IP addresses to hosts. When a DHCP server receives a DHCPDISCOVER packet, it returns a DHCPOFFER packet containing an IP address and other information. It is possible for more than one DHCP server to be resident on a network, in which case the host receives more than one offer. It selects one and sends a DCHPREQUEST message to that server. Assuming the offer is still valid (for example, the IP address has not been assigned to another host in the meantime), the DHCP server returns a DHCPACK (DHCP acknowledgment) message.

Once the host receives the DHCPACK message, that IP address belongs to it for the duration of the lease, that is, the time for which the DHCP server has said that the address is valid. Before the end of a lease, the host must request another IP address by starting the process again. The lease allows an IP address to be placed back in the pool of available addresses if the host goes away for any reason, such as a power-down.19

Read full chapter
URL: https://www.sciencedirect.com/science/article/pii/B9781558608283500072

1. Internet Protocol Architecture

The Internet was designed to create standardized communication between computers. Computers communicate by exchanging messages. The Internet supports message exchange through a mechanism called protocols. Protocols are very detailed and stereotyped rules explaining exactly how to exchange a particular set of messages. Each protocol is defined as a set of finite state automata and a set of message formats. Each protocol specification defines one automaton for sending a message and another for receiving a message. The automata specify the message timing; they play the role of grammar, indicating whether any particular message is meaningful or is interpreted by the receiver as gibberish. The protocol formats restrict the information that the protocol can express.

Security has little utility as an abstract, disembodied concept. What the word security should mean depends very much on the context in which it is applied. The architecture, design, and implementation of a system each determine the kind of vulnerabilities and opportunities for exploits that exist and which features are easy or hard to attack or defend.

It is fairly easy to understand why this is true. An attack on a system is an attempt to make the system act outside its specification. An attack is different from “normal” bugs that afflict computers and that occur through random interactions between the system’s environment and undetected flaws in the system architecture, design, or implementation. An attack, on the other hand, is an explicit and systematic attempt by a party to search for flaws that make the computer act in a way its designers did not intend.

Computing systems consist of a large number of blocks or modules assembled together, each of which provides an intended set of functions. The system architecture hooks the modules together through interfaces, through which the various modules exchange information to activate the functions provided by each module in a coordinated way. An attacker exploits the architecture to compromise the computing system by interjecting inputs into these interfaces that do not conform to the specification for inputs of a specific module. If the targeted module has not been carefully crafted, unexpected inputs can cause it to behave in unintended ways. This implies that the security of a system is determined by its decomposition into modules, which an adversary exploits by injecting messages into the interfaces the architecture exposes. Accordingly, no satisfying discussion of any system is feasible without an understanding of the system architecture. Our first goal, therefore, is to review the architecture of the Internet communication protocols in an effort to gain a deeper understanding of its vulnerabilities.

Communications Architecture Basics

Since communication is an extremely complex activity, it should come as no surprise that the system components providing communication decompose into modules. One standard way to describe each communication module is as a black box with a well-defined service interface. A minimal communications service interface requires four primitives:

A send primitive, which an application using the communications module uses to send a message via the module to a peer application executing on another networked device. The send primitive specifies a message payload and a destination. The communication module responding to the send transmits the message to the specified destination, reporting its requester as the message source.

A confirm primitive, to report that the module has sent a message to the designated destination in response to a send request or to report when the message transmission failed, along with any failure details that might be known. It is possible to combine the send and confirm primitives, but network architectures rarely take this approach. The send primitive is normally defined to allow the application to pass a message to the communications module for transmission by transferring control of a buffer containing the message. The confirm primitive then releases the buffer back to the calling application when the message has indeed been sent. This scheme effects “a conservation of buffers” and enables the communications module and the application using it to operate in parallel, thus enhancing the overall communication performance.

A listen primitive, which the receiving application uses to provide the communications module with buffers into which it should put messages arriving from the network. Each buffer the application posts must be large enough to receive a message of the maximum expected size.

A receive primitive, to deliver a received message from another party to the receiving application. This releases a posted buffer back to the application and usually generates a signal to notify the application of message arrival. The released buffer contains the received message and the (alleged) message source.

Sometimes the listen primitive is replaced with a release primitive. In this model the receive buffer is owned by the receiving communications module instead of the application, and the application must recycle buffers containing received messages back to the communication module upon completion. In this case the buffer size selected by the receiving module determines the maximum message size. In a moment we will explain how network protocols work around this restriction.

It is customary to include a fifth service interface primitive for communications modules:

A status primitive, to report diagnostic and performance information about the underlying communications. This might report statistics, the state of active associations with other network devices, and the like.

Communications is effected by providing a communications module black box on systems, connected by a signaling medium. The medium connecting the two devices constitutes the network communications path. The media can consist of a direct link between the devices or, more commonly, several intermediate relay systems between the two communicating endpoints. Each relay system is itself a communicating device with its own communications module, which receives and then forward messages from the initiating system to the destination system.

Under this architecture, a message is transferred from an application on one networked system to an application on a second networked system as follows:

First the application sourcing the message invokes the send primitive exported by its communications module. This causes the communications module to (attempt) to transmit the message to a destination provided by the application in the send primitive.

The communications module encodes the message onto the network’s physical medium representing a link to another system. If the communications module implements a best-effort message service, it generates the confirm primitive as soon as the message has been encoded onto the medium. If the communication module implements a reliable message service, the communication delays generation of the confirm until it receives an acknowledgment from the message destination. If it has not received an acknowledgment from the receiver after some period of time, it generates a confirm indicating that the message delivery failed.

The encoded message traverses the network medium and is placed into a buffer by the receiving communications module of another system attached to the medium. This communications module examines the destination. The module then examines the destination specified by the message. If the module’s local system is not the destination, the module reencodes the message onto the medium representing another link; otherwise the module uses the deliver primitive to pass the message to the receiving application.

Getting More Specific

This stereotyped description of networked communications is overly simplified. Communications are actually torturously more difficult in real network modules. To tame this complexity, communications modules are themselves partitioned further into layers, each providing a different networking function. The Internet decomposes communications into five layers of communications modules:

The PHY layer

The MAC layer

The network layer

The transport layer

The sockets layer

These layers are also augmented by a handful of cross-layer coordination modules. The Internet depends on the following cross-layer modules:

ARP

DHCP

DNS

ICMP

Routing

An application using networking is also part of the overall system design, and the way it uses the network has to be taken into consideration to understand system security.

We next briefly describe each of these in turn.

The PHY Layer

The PHY (pronounced fie) layer is technically not part of the Internet architecture per se, but Ethernet jacks and cables, modems, Wi-Fi adapters, and the like represent the most visible aspect of networking, and no security treatment of the Internet can ignore the PHY layer entirely.

The PHY layer module is medium dependent, with a different design for each type of medium: Ethernet, phone lines, Wi-Fi, cellular phone, OC-48, and the like are based on different PHY layer designs. It is the job of the PHY layer to translate between digital bits as represented on a computing device and the analog signals crossing the specific physical medium used by the PHY. This translation is a physics exercise.

To send a message, the PHY layer module encodes each bit of each message from the sending device as a media-specific signal, representing the bit value 1 or 0. Once encoded, the signal propagates along the medium from the sender to the receiver. The PHY layer module at the receiver decodes the medium-specific signal back into a bit.

It is possible for the encoding step at the transmitting PHY layer module to fail, for a signal to be lost or corrupted while it crosses the medium, and for the decoding step to fail at the receiving PHY layer module. It is the responsibility of higher layers to detect and recover from these potential failures.

The MAC Layer

Like the PHY layer, the MAC (pronounced mack) layer is not properly a part of the Internet architecture, but no satisfactory security discussion is possible without considering it. The MAC module is the “application” that uses and controls a particular PHY layer module. A MAC layer is always designed in tandem with a specific PHY (or vice versa), so a PHY-MAC pair together is often referred to as the data link layer.

MAC is an acronym for media access control. As its name suggests, the MAC layer module determines when to send and receive frames, which are messages encoded in a media-specific format. The job of the MAC is to pass frames over a link between the MAC layer modules on different systems.

Although not entirely accurate, it is useful to think of a MAC module as creating links, each of which is a communication channel between different MAC modules. It is further useful to distinguish physical links and virtual links. A physical link is a direct point-to-point channel between the MAC layers in two endpoint devices. A virtual link can be thought of as a shared medium to which more than two devices can connect at the same time. There are no physical endpoints per se; the medium acts as though it is multiplexing links between each pair of attached devices. Some media such as Ethernet are implemented as physical point-to-point links but act more like virtual links in that more than a single destination is reachable via the link. This is accomplished by MAC layer switching, which is also called bridging. Timing requirements for coordination among communicating MAC layer modules make it difficult to build worldwide networks based on MAC layer switching, however.

A MAC frame consists of a header and a data payload. The frame header typically specifies information such as the source and destination for the link endpoints. Devices attached to the medium via their MAC + PHY modules are identified by MAC addresses. Each MAC module has its own MAC address assigned by its manufacturer and is supposed to be a globally unique identifier. The destination MAC address in a frame allows a particular MAC module to identify frames intended for it, and the destination address allows it to identify the purported frame source. The frame header also usually includes a preamble, which is a set of special PHY timing signals used to synchronize the interpretation of the PHY layer data signals representing the frame bits.

The payload portion of a frame is the data to be transferred across the network. The maximum payload size is always fixed by the medium type. It is becoming customary for most MACs to support a maximum payload size of 1500 bytes = 12,000 bits, but this is not universal. The maximum fixed size allows the MAC to make efficient use of the underlying physical medium. Since messages can be of an arbitrary length exceeding this fixed size, a higher-layer function is needed to partition messages into segments of the appropriate length.

As we have seen, it is possible for bit errors to creep into communications as signals representing bits traverse the PHY medium. MAC layers differ a great deal in how they respond to errors. Some PHY layers, such as the Ethernet PHY, experience exceedingly low error rates, and for this reason, the MAC layers for these PHYs make no attempt to more than detect errors and discard the mangled frames. Indeed, with these MACs it is cheaper for the Internet to resend message segments at a higher layer than at the MAC layer. These are called best-effort MACs. Others, such as the Wi-Fi MAC, experience high error rates due to the shared nature of the channel and natural interference among radio sources, and experience has shown that these MACs can deliver better performance by retransmitting damaged or lost frames. It is customary for most MAC layers to append a checksum computed over the entire frame, called a frame check sequence (FCS). The FCS allows the receiver to detect bit errors accumulated due to random noise and other physical phenomena during transmission and due to decoding errors. Most MACs discard frames with FCS errors. Some MAC layers also perform error correction on the received bits to remove random bit errors rather than relying on retransmissions.

The Network Layer

The purpose of the network layer module is to represent messages in a media-independent manner and forward them between various MAC layer modules representing different links. The media-independent message format is called an Internet Protocol, or IP, datagram. The network layer implements the IP layer and is the lowest layer of the Internet architecture per se.

As well as providing media independence, the network layer provides a vital forwarding function that works even for a worldwide network like the Internet. It is impractical to form a link directly between each communicating system on the planet; indeed, the cabling costs alone are prohibitive—no one wants billions, or even dozens, of cables connecting their computer to other computers—and too many MAC + PHY interfaces can quickly exhaust the power budget for a single computing system. Hence, each machine is attached by a small number of links to other devices, and some of the machines with multiple links comprise a switching fabric. The computing systems constituting the switching fabric are called routers.

The forwarding function supported by the network layer module is the key component of a router and works as follows: When a MAC module receives a frame, it passes the frame payload to the network layer module. The payload consists of an IP datagram, which is the media-independent representation of the message. The receiving network layer module examines the datagram to see whether to deliver it locally or to pass it on toward the datagram’s ultimate destination. To accomplish the latter, the network layer module consults a forwarding table to identify some neighbor router closer to the ultimate destination than itself. The forwarding table also identifies the MAC module to use to communicate with the selected neighbor and passes the datagram to that MAC layer module. The MAC module in turn retransmits the datagram as a frame encoded for its medium across its link to the neighbor. This process happens recursively until the datagram is delivered to its ultimate destination.

The network layer forwarding function is based on IP addresses, a concept that is critical to understanding the Internet architecture. An IP address is a media-independent name for one of the MAC layer modules within a computing system. Each IP address is structured to represent the “location” of the MAC module within the entire Internet. This notion of location is relative to the graph comprising routers and their interconnecting links, called the network topology, not to actual geography. Since this name represents a location, the forwarding table within each IP module can use the IP address of the ultimate destination as a sort of signpost pointing at the MAC module with the greatest likelihood of leading to the ultimate destination of a particular datagram.

An IP address is different from the corresponding MAC address already described. A MAC address is a permanent, globally unique identifier, whereas an IP address can be dynamic due to device mobility; an IP address cannot be assigned by the equipment manufacturer, since a computing device can change locations frequently. Hence, IP addresses are administered and blocks allocated to different organizations with an Internet presence. It is common, for instance, for an Internet service provider (ISP) to acquire a large block of IP addresses for use by its customers.

An IP datagram has a structure similar to that of a frame: It consists of an IP header, which is “extra” overhead used to control the way a datagram passes through the Internet, and a data payload, which contains the message being transferred. The IP header indicates the ultimate source and destinations, represented as IP addresses.

The IP header format limits the size of an IP datagram payload to 64K (216 = 65,536) bytes. It is common to limit datagram sizes to the underlying media size, although datagrams larger than this do occur. This means that normally each MAC layer frame can carry a single IP datagram as its data payload. IP version 4, still the dominant version deployed on the Internet today, allows fragmentation of larger datagrams, to split large datagrams into chunks small enough to fit the limited frame size of the underlying MAC layer medium. IPv4 reassembles any fragmented datagrams at the ultimate destination.

Network layer forwarding of IP datagrams is best effort, not reliable. Network layer modules along the path taken by any message can lose and reorder datagrams. It is common for the network layer in a router to recover from congestion—that is, when the router is overwhelmed by more receive frames than it can process—by discarding late-arriving frames until the router has caught up with its forwarding workload. The network layer can reorder datagrams when the Internet topology changes, because a new path between source and destination might be shorter or longer than an old path, so datagrams in flight before the change can arrive after frames sent after the change. The Internet architecture delegates recovery from these problems to high-layer modules.

The Transport Layer

The transport layer is implemented by TCP and similar protocols. Not all transport protocols provide the same level of service as TCP, but a description of TCP will suffice to help us understand the issues addressed by the transport layer. The transport layer provides a multitude of functions.

First, the transport layer creates and manages instances of two-way channels between communication endpoints. These channels are called connections. Each connection represents a virtual endpoint between a pair of communication endpoints. A connection is named by a pair of IP addresses and port numbers. Two devices can support simultaneous connections using different port numbers for each connection. It is common to differentiate applications on the same host through the use of port numbers.

A second function of the transport layer is to support delivery of messages of arbitrary length. The 64K byte limit of the underlying IP module is too small to carry really large messages, and the transport layer module at the message source chops messages into pieces called segments that are more easily digestible by lower-layer communications modules. The segment size is negotiated between the two transport endpoints during connection setup. The segment size is chosen by discovering the smallest maximum frame size supported by any MAC + PHY link on the path through the Internet used by the connection setup messages. Once this is known, the transmitter typically partitions a large message into segments no larger than this size, plus room for an IP header. The transport layer module passes each segment to the network layer module, where it becomes the payload for a single IP datagram. The destination network layer module extracts the payload from the IP datagram and passes it to the transport layer module, which interprets the information as a message segment. The destination transport reassembles this into the original message once all the necessary segments arrive.

Of course, as noted, MAC frames and IP datagrams can be lost in transit, so some segments can be lost. It is the responsibility of the transport layer module to detect this loss and retransmit the missing segments. This is accomplished by a sophisticated acknowledgment algorithm defined by the transport layer. The destination sends a special acknowledgment message, often piggybacked with a data segment being sent in the opposite direction, for each segment that arrives. Acknowledgments can be lost as well, and if the message source does not receive the acknowledgment within a time window, the source retransmits the unacknowledged segment. This process is repeated some number of times, and if the failure continues, the network layer tears down the connection because it cannot fulfill its reliability commitment.

One reason for message loss is congestion at routers, something blind retransmission of unacknowledged segments will only exacerbate. The network layer is also responsible for implementing congestion control algorithms as part of its transmit function. TCP, for instance, lowers its transmit rate whenever it fails to receive an acknowledgment message in time, and it slowly increases its rate of transmission until another acknowledgment is lost. This allows TCP to adapt to congestion in the network, helping to minimize frame loss.

It can happen that segments arrive at the destination out of order, since some IP datagrams for the same connection could traverse the Internet through different paths due to dynamic changes in the underlying network topology. The transport layer is responsible for delivering the segments in the order sent, so the receiver caches any segments that arrive out of order prior to delivery. The TCP reordering algorithm is closed tied to the acknowledgment and congestion control scheme so that the receiver never has to buffer too many out-of-order received segments and the sender not too many sent but unacknowledged segments.

Segment data arriving at the receiver can be corrupted due to undetected bit errors on the data link and copy errors within routers and the sending and receiving computing systems. Accordingly, all transport layers use a checksum algorithm called a cyclic redundancy check (CRC) to detect such errors. The receiving transport layer module typically discards segments with errors detected by the CRC algorithm, and recovery occurs through retransmission by the receiver when it fails to receive an acknowledgment from the receiver for a particular segment.

The Sockets Layer

The top layer of the Internet, the sockets layer, does not per se appear in the architecture at all. The sockets layer provides a set of sockets, each of which represents a logical communications endpoint. An application can use the sockets layer to create, manage, and destroy connection instances using a socket as well as send and receive messages over the connection. The sockets layer has been designed to hide much of the complexity of utilizing the transport layer. The sockets layer has been highly optimized over the years to deliver as much performance as possible, but it does impose a performance penalty. Applications with very demanding performance requirements tend to utilize the transport layer directly instead of through the sockets layer module, but this comes with a very high cost in terms of software maintenance.

In most implementations of these communications modules, each message is copied twice, at the sender and the receiver. Most operating systems are organized into user space, which is used to run applications, and kernel space, where the operating system itself runs. The sockets layer occupies the boundary between user space and kernel space. The sockets layer’s send function copies a message from memory controlled by the sending application into a buffer controlled by the kernel for transmission. This copy prevents the application from changing a message it has posted to send, but it also permits the application and kernel to continue their activities in parallel, thus better utilizing the device’s computing resources. The sockets layer invokes the transport layer, which partitions the message buffer into segments and passes the address of each segment to the network layer. The network layer adds its headers to form datagrams from the segments and invokes the right MAC layer module to transmit each datagram to its next hop. A second copy occurs at the boundary between the network layer and the MAC layer, since the data link must be able to asynchronously match transmit requests from the network layer to available transmit slots on the medium provided by its PHY. This process is reversed at the receiver, with a copy of datagrams across the MAC-network layer boundary and of messages between the socket layer and application.

Address Resolution Protocol

The network layer uses Address Resolution Protocol, or ARP, to translate IP addresses into MAC addresses, which it needs to give to the MAC layer in order to deliver frames to the appropriate destination.

The ARP module asks the question, “Who is using IP address X?” The requesting ARP module uses a request/response protocol, with the MAC layer broadcasting the ARP module’s requests to all the other devices on the same physical medium segment. A receiving ARP module generates a response only if its network layer has assigned the IP address to one of its MAC modules. Responses are addressed to the requester’s MAC address. The requesting ARP module inserts the response received in an address translation table used by the network layer to identify the next hop for all datagrams it forwards.

Dynamic Host Configuration Protocol

Remember that unlike MAC addresses, IP addresses cannot be assigned in the factory, because they are dynamic and must reflect a device’s current location within the Internet’s topology. A MAC module uses Dynamic Host Configuration Protocol, or DHCP, to acquire an IP address for itself, to reflect the device’s current location with respect to the Internet topology.

DHCP makes the request: “Please configure my MAC module with an IP address.” When one of a device’s MAC layer modules connects to a new medium, it invokes DHCP to make this request. The associated DHCP module generates such a request that conveys the MAC address of the MAC module, which the MAC layer module broadcasts to the other devices attached to the same physical medium segment. A DHCP server responds with a unicast DHCP response binding an IP address to the MAC address. When it receives the response, the requesting DHCP module passes the assigned IP address to the network layer to configure in its address translation table.

In addition to binding an IP address to the MAC module used by DHCP, the response also contains a number of network configuration parameters, including the address of one or more routers, to enable reaching arbitrary destinations, the maximum datagram size supported, and the addresses of other servers, such as DNS servers, that translate human-readable names into IP addresses.

Domain Naming Service

IP and MAC addresses are efficient means for identifying different network interfaces, but human beings are incapable of using these as reliably as computing devices can. Instead, human beings rely on names to identify the computing devices with which they want to communication. These names are centrally managed and called domain names. The Domain Naming Service, or DNS, is a mechanism for translating human-readable names into IP addresses.

The translation from human-readable names to IP addresses happens within the socket layer module. An application opens a socket with the name of the intended destination. As the first step of opening a connection to that destination, the socket sends a request to a DNS server, asking the server to translate the name into an IP address. When the server responds, the socket can open the connection to the right destination, using the IP address provided.

It is becoming common for devices to register their IP addresses under their names with DNS once DHCP has completed. This permits other devices to locate the registering device so that they can send messages to it.

Internet Control Message Protocol

Internet Control Message Protocol, or ICMP, is an important diagnostic tool for troubleshooting the Internet. Though ICMP provides many specialized message services, three are particularly important:

Ping. Ping is a request/response protocol designed to determine reachability of another IP address. The requester sends a ping request message to a designated IP address. If it’s delivered, the destination IP address sends a ping response message to the IP address that sourced the request. The responding ICMP module copies the contents of the ping request into the ping response so that the requester can match responses to requests. The requester uses pings to measure the roundtrip time to a destination.

Traceroute. Traceroute is another request/response protocol. An ICMP module generates a traceroute request to discover the path it is using to traverse the Internet to a destination IP address. The requesting ICMP module transmits a destination. Each router that handles the traceroute request adds a description of its own IP address that received the message and then forwards the updated traceroute request. The destination sends all this information back to the message source in a traceroute response message.

Destination unreachable. When a router receives a datagram for which it has no next hop, it generates a “destination unreachable” message and sends it back to the datagram source. When the message is delivered, the ICMP module marks the forwarding table of the message source so that its network layer will reject further attempts to send messages to the destination IP address. An analogous process happens at the ultimate destination when a message is delivered to a network layer, but the application targeted to receive the message is no longer on line. The purpose of “destination unreachable” messages is to suppress messages that will never be successfully delivered, to reduce network congestion.

Routing

The last cross-layer module we’ll discuss is routing. Routing is a middleware application to maintain the forwarding tables used by the network layer. Each router advertises itself by periodically broadcasting “hello” messages through each of its MAC interfaces. This allows routers to discover the presence or loss of all neighboring routers, letting them construct the one-hop topology of the part of the Internet directly visible through their directly attached media. The routing application in a router then uses a sophisticated gossiping mechanism to exchange this mechanism with their neighbors. Since some of a router’s neighbors are not its own direct neighbors, this allows each router to learn the two-hop topology of the Internet. This process repeats recursively until each router knows the entire topology of the Internet. The cost of using each link is part of the information gossiped. A routing module receiving this information uses all of it to compute a lowest-cost route to each destination. Once this is accomplished, the routing module reconfigures the forwarding table maintained by its network layer module. The routine module updates the forwarding table whenever the Internet topology changes, so each network layer can make optimal forwarding decisions in most situations and at the very worst at least reach any other device that is also connected to the Internet.

There are many different routing protocols, each of which are based on different gossiping mechanisms. The most widely deployed routing protocol between different administrative domains within the Internet is the Border Gateway Protocol (BGP). The most widely deployed routing protocols within wired networks controlled by a single administrative domain are OSPF and RIP. AODV, OLSR, and TBRPF are commonly used in Wi-Fi meshes. Different routing protocols are used in different environments because each one addresses different scaling and administrative issues.

Applications

Applications are the ultimate reason for networking, and the Internet architecture has been shaped by applications’ needs. All communicating applications define their own language in which to express what they need to say. Applications generally use the sockets layer to establish communication channels, which they then use for their own purposes.

It is worth emphasizing that since the network modules have been designed to be a generic communications vehicle, that is, designed to meet the needs of all (or at least most) applications, it is rarely meaningful for the network to attempt to make statements on behalf of the applications. There is widespread confusion on this point around authentication and key management, which are the source of many exploitable security flaws.

Read full chapter
URL: https://www.sciencedirect.com/science/article/pii/B9780123743541000078

Networking software

The Internet protocols

In order for various computers to talk to each other via any network, there must be a common language of understanding, a common protocol. In networking, the term protocol refers to a set of rules that govern communications. Protocols are to computers what language is to humans. Since this book is in English, to understand it you must be able to read English. Similarly, for two devices on a network to communicate successfully, they must both understand the same protocols.

Various protocols belonging to various OSI layers (see next chapter), are used in today’s world of the Internet, and this book would not be complete without listing most popular ones and describing them briefly.

TCP/IPTransmission Control Protocol / Internet Protocol

Two of the most popular suites of protocols used in the Internet today. They were introduced in the mid-1970s by Stanford University and Bolt Beranek and Newman (BBN) after funding by DARPA (Defence Advanced Research Projects Agency) and appeared under the Berkeley Software Distribution (BSD) Unix.

TCP is reliable; that is, packets are guaranteed to wind up at their target, in the correct order.

IP is the underlying protocol for all the other protocols in the TCP/IP protocol suite. IP defines the means to identify and reach a target computer on the network. Computers in the IP world are identified by unique numbers, which are known as IP addresses (explained further in this chapter).

PPP - Point-to-Point Protocol

A protocol for creating a TCP/IP connection over both synchronous and asynchronous systems. PPP provides connections for host to network or between two routers. It also has a security mechanism. PPP is well known as a protocol for connections over regular telephone lines using modems on both ends. This protocol is widely used for connecting personal computers to the Internet.

SLIP − Serial Line Internet Protocol

A point-to-point protocol to be used over a serial connection, a predecessor of PPP. There is also an advanced version of this protocol known as CSLIP (Compressed Serial Line Internet Protocol) which reduces overhead on a SLIP connection by sending just a header information when possible, thus increasing packet throughput.

FTP − File Transfer Protocol

A protocol that enables the transfer of text and binary files over a TCP connection. FTP allows for files transfer according to a strict mechanism of ownership and access restrictions. It is one of the most commonly used protocols over the Internet today.

Telnet

A terminal emulation protocol, defined in RFC854, for use over a TCP connection. It enables users to log in to remote hosts and use their resources from the local host.

SMTP − Simple Mail Transfer Protocol

A protocol dedicated for sending e-mail messages originating on a local host over a TCP connection to a remote server. SMTP defines a set of rules that allows two programs to send and receive mail over the network. The protocol defines the data structure that would be delivered with information regarding the sender, the recipient (or several recipients), and, of course, the mail’s body.

HTTP − Hyper Text Transport Protocol

A protocol used to transfer hypertext pages across the World Wide Web.

SNMP − Simple Network Management Protocol

A simple protocol that defines messages related to network management. Through the use of SNMP, network devices such as routers can be configured by any host on the LAN.

UDP − User Datagram Protocol

A simple protocol that transfers packets of data to a remote computer. UDP does not guarantee that packets will be received in the same order they were sent. In fact, it does not guarantee delivery at all. UDP is one of the most common protocols used in multicasting.

ARP−Address Resolution Protocol

In order to map an IP address into a hardware MAC address the computer uses the ARP protocol which broadcasts a request message that contains an IP address, to which the target computer replies with both the original IP address and the hardware MAC address.

NNTP − Network News Transport Protocol

A protocol used to carry USENET posting between News clients and USENET servers.

The OSI seven-layer model of networking

The basics of networking revolves around understanding the so-called seven-layer OSI model.

Proposed by the ISO (International Standards Organization) in 1984, the OSI acronym could be read as ISO backwards, but it actually means Open System Interconnection reference model.

The OSI model describes how information from a software application in one computer moves through a network medium to a software application in another computer. The OSI model is considered the primary architectural model for inter-computer communications.

The idea behind such a layered model is to simplify the task of moving information between networked computers and make it manageable. Within each layer, one or more entities implement its functionality. Each entity interacts directly only with the layer immediately beneath it, and provides facilities for use by the layer above it. A task, or group of tasks, is then assigned to each of the seven OSI layers. Each layer is reasonably self-contained, so that the tasks assigned to each layer can be implemented independently.

OSI has two major components:

An abstract model of networking (the Basic Reference Model, or seven-layer model)

A set of concrete protocols

Parts of OSI have influenced Internet protocol development, but none more than the abstract model itself, documented in OSI 7498 and its various addenda. In this model, a networking system is divided into layers. Within each layer, one or more entities implement its functionality. Each entity interacts directly only with the layer immediately beneath it, and provides facilities for use by the layer above it. Protocols enable an entity in one host to interact with a corresponding entity at the same layer in a remote host.

BERJAYA

The seven layers of the OSI Basic Reference Model are (from bottom to top):

Layer 7 − Application

Layer 6 − Presentation

Layer 5 − Session

Layer 4 − Transport

Layer 3 − Network

Layer 2 − Data link

Layer 1 − Physical

Many prefer to list the seven layers starting from layer one down to layer seven, but it does not really matter, as long as they are remembered as the basic building blocks of the whole networking technology. A handy way to remember the layers is the sentence “All people seem to need data processing” and each first letter of that sentence corresponds to the first letter of the layers starting from layer seven going to layer one.

The seven layers can be grouped into two main groups: upper or host layers and lower or media layers. The upper layers of the OSI model deal with application issues and generally are implemented in software only. The top layer, seven, is the closest to the computer user as it represents the software application passing the information to the user. Basically, both the user and the application layer processes interact with software application that contains a communication component.

BERJAYA

The seven layers model illustrated

As we go down through the layers, we get closer to the physical medium. So, the lower layers of the OSI are closer to the hardware (although do not exclude software) and handle the data transport issues. The lowest layer is closest to the physical medium, that is, network cards and network cables, and they are responsible for actually placing information on the network medium.

Let us now explain the meaning of each layer.

1. The Physical layer

The Physical layer describes the physical properties of the various communications media, as well as the electrical properties and interpretation of the exchanged signals. For example, this layer defines the size of Ethernet cable, the type of connectors used, and the termination method.The Physical layer is concerned with transmitting raw bits over a communication channel. The design issues have to do with making sure that when one side sends a 1 bit, it is received by the other side as a 1 bit, not as a 0 bit. Typical questions here are how many volts should be used to represent a 1 and how many for a 0, how many microseconds a bit lasts, whether transmission may proceed simultaneously in both directions, how the initial connection is established, how it is torn down when both sides are finished, how many pins the network connector has, and what each pin is used for. The design issues here deal largely with mechanical, electrical, and procedural interfaces and the physical transmission medium, which lies below the Physical layer. Physical layer design can properly be considered to be within the electrical engineer’s domain.

BERJAYA

Network Interface Card (NIC)

2. The Data link layer

The Data link layer describes the logical organization of data bits transmitted on a particular medium. This layer defines the framing, addressing, and check-summing of Ethernet packets. The main task of the Data link layer is to transform a raw transmission facility into a line that appears free of transmission errors in the Network layer. It accomplishes this task by having the sender break the input data up into data frames (typically, a few hundred bytes), transmit the frames sequentially, and process the acknowledgment frames sent back by the receiver. Since the Physical layer merely accepts and transmits a stream of bits without any regard to meaning of structure, it is up to the Data link layer to create and recognize frame boundaries. This can be accomplished by attaching special bit patterns to the beginning and end of the frame. If there is a chance that these bit patterns might occur in the data, special care must be taken to avoid confusion. The Data link layer should provide error control between adjacent nodes.

BERJAYA

Another issue that arises in the Data link layer (and most of the higher layers as well) is how to keep a fast transmitter from “drowning” a slow receiver in data. Some traffic regulation mechanism must be employed in order to let the transmitter know how much buffer space the receiver has at the moment. Frequently, flow regulation and error handling are integrated for convenience.

If the line can be used to transmit data in both directions, this introduces a new complication for the Data link layer software. The problem is that the acknowledgment frames for A to B traffic compete for use of the line with data frames for the B to A traffic. A clever solution in the form of piggybacking has been devised.

3. The Network layer

The Network layer describes how a series of exchanges over various data links can deliver data between any two nodes in a network. This layer defines the addressing and routing structure of the Internet. The Network layer is concerned with controlling the operation of the subnet. A key design issue is determining how packets are routed from source to destination. Routes could be based on static tables that are “wired into” the network and rarely changed. They could also be determined at the start of each conversation, for example, a terminal session. Finally, they could be highly dynamic, being newly determined for each packet, to reflect the current network load.

If too many packets are present in the subnet at the same time, they will get in each other’s way, forming bottlenecks. The control of such conestion also belongs to the Network layer.

Since the operators of the subnet may well expect remuneration for their efforts, often some accounting function is built into the Network layer. At the very least, the software must count how many packets or characters or bits are sent by each customer, to produce billing information. When a packet crosses a national border, with different rates on each side, the accounting can become complicated.

BERJAYA

When a packet has to travel from one network to another to get to its destination, many problems can arise. The addressing used by the second network may be different from that of the first one; the second one may not accept the packet at all because it is too large; the protocols may differ; and so on. It is up to the Network layer to overcome all these problems to allow the interconnecting of the heterogeneous networks.

In broadcast networks, the routing problem is simple, so the network layer is often thin or even nonexistent.

BERJAYA

A Layer 3 network switch

4. The Transport layer

The Transport layer describes the quality and nature of the data delivery. This layer ensures that messages are delivered error-free, in sequence, and with no losses or duplications. This layer defines if and how retransmissions will be used to ensure data delivery. The basic function of the Transport layer is to accept data from the session layer, split it up into smaller units if need be, pass these to the Network layer, and ensure that all the pieces arrive correctly at the other end. Furthermore, all this must be done efficiently and in a way that isolates the Session layer from the inevitable changes in the hardware technology.

Under normal conditions, the Transport layer creates a distinct network connection for each transport connection required by the Session layer. If the transport connection requires a high throughput, however, the Transport layer might create multiple network connections, dividing the data among the network connections to improve throughput. On the other hand, if creating or maintaining a network connection is expensive, the Transport layer might multiplex several transport connections onto the same network connection to reduce the cost. In all cases, the Transport layer is required to make the multiplexing transparent to the Session layer.

The transport layer also determines what type of service to provide to the Session layer, and ultimately, the users of the network. The most popular type of transport connection is an error-free point-to-point channel that delivers messages in the order in which they were sent. However, other possible kinds of transport, service and transport isolated messages exist, with no guarantee about the order of delivery to multiple destinations. The type of service is determined when the connection is established.

The Transport layer is a true source-to-destination or end-to-end layer. In other words, a program on the source machine carries on a conversation with a similar program on the destination machine, using the message headers and control messages.

Many hosts are multi-programmed, which implies that multiple connections will be entering and leaving each host. There needs to be some way to tell which message belongs to which connection. The transport header is one place where this information could be added.

BERJAYA

In addition to multiplexing several message streams onto one channel, the Transport layer must establish and delete connections across the network. This requires some kind of naming mechanism, so that the process on one machine has a way of describing with whom it wishes to converse. There must also be a mechanism to regulate the flow of information, so that a fast host cannot overrun a slow one. Flow control between hosts is distinct from flow control between switches, although similar principles apply to both.

5. The Session layer

The Session layer allows session establishment between processes running on different stations. This layer describes how request and reply packets are paired in a remote procedure call. The Session layer allows users on different machines to establish sessions between them. A session allows ordinary data transport, as does the transport layer, but it also provides some enhanced services useful in some applications. A session might be used to allow a user to log into a remote time-sharing system or to transfer a file between two machines.

One service provided by the Session layer is to manage dialogue control. Sessions can allow traffic to go in both directions at the same time or in only one direction at a time. If traffic can only go one way at a time, the Session layer can help keep track of whose turn it is.

A related Session service is token management. For some protocols it is essential that both sides do not attempt the same operation at the same time. To manage these activities, the Session layer provides tokens that can be exchanged. Only the side holding the token may perform the critical operation.

Another Session service is synchronization. Consider the problems that might occur when trying to complete a two-hour file transfer between two machines on a network with a 1 hour mean time between crashes. After each transfer is aborted, the whole transfer will have to start over again, and will probably fail again with the next network crash. To eliminate this problem, the Session layer provides a way to insert checkpoints into the data stream, so that after a crash, only the data after the last checkpoint has to be repeated.

6. The Presentation layer

The Presentation layer describes the syntax of data being transferred. This layer describes how floating point numbers can be exchanged between hosts with different math formats. The Presentation layer performs certain functions that are requested sufficiently often to warrant finding a general solution for them, rather than letting each user solve the problems. In particular, unlike all the lower layers, which are just interested in moving bits reliably from here to there, the Presentation layer is concerned with the syntax and semantics of the information transmitted.

A typical example of a Presentation service is encoding data in a standard, agreed-upon way. Most user programs do not exchange random binary bit strings; they exchange things such as people’s names, dates, amounts of money, and invoices. These items are represented as character strings, integers, floating point numbers, and data structures composed of several simpler items. Different computers have different codes for representing character strings, integers, and so on. In order to make it possible for computers with different representation to communicate, the data structures to be exchanged can be defined in an abstract way, along with a standard encoding to be used “on the wire.” The Presentation layer handles the job of managing these abstract data structures and converting from the representation used inside the computer to the network standard representation.

The Presentation layer is also concerned with other aspects of information representation. For example, data compression can be used here to reduce the number of bits that have to be transmitted, and cryptography is frequently required for privacy and authentication.

7. The Application layer

The Application layer describes how real work actually gets done. This layer would implement file system operations. The Application layer contains a variety of protocols that are commonly needed. For example, there are hundreds of incompatible terminal types in the world. Consider the plight of a full-screen editor that is supposed to work over a network with many different terminal types, each with different screen layouts, escape sequences for inserting and deleting text, moving the cursor, and so on. One way to solve this problem is to define an abstract network virtual terminal for which editors and other programs can be written to. To handle each terminal type, a piece of software must be written to map the functions of the network virtual terminal onto the real terminal. For example, when the editor moves the virtual terminal’s cursor to the upper left-hand corner of the screen, this software must issue the proper command sequence to the real terminal to get its cursor there too. All the virtual terminal software is in the Application layer.

BERJAYA

Another Application layer function is file transfer. Different file systems have different file naming conventions, different ways of representing text lines, and so on. Transferring a file between two different systems requires handling these and other incompatibilities. This work, too, belongs to the Application layer, as do e-mail, remote job entry, directory lookup, and various other general-purpose and special-purpose facilities.

Read full chapter
URL: https://www.sciencedirect.com/science/article/pii/B9780124045576500112

Chapter

Networks

Internet Protocol

IP is a protocol within the Internet layer of the TCP/IP model or the Network layer of the OSI model, which defines addressing and how individual messages are routed to their intended destination. IP addresses in IPv4 (the prevailing numbering system) follow a format of xxx.xxx.xxx.xxx, where each decimal value (0–255) translates into 8 binary bits called an octet. For example, 10.5.0.1 translates into 00001010.00000101.00000000.00000001. You’ll typically deal with IP addresses in the decimal format, but knowing the binary translation becomes important when dealing with subnetting which we’ll discuss in a little bit.

One important thing to note about IP addresses is that every machine on a TCP/IP network will have one or more IP addresses assigned to it. Unlike MAC addresses where only one address can be associated with a device, multiple IP addresses, through their nature of being a logical address versus a physical address, can be assigned to a single device. This leverages the capabilities provided by the ARP protocol in the Link layer to perform the address translation appropriately.

IPv6 is a newer addresses scheme and was created to address the shortage of IP addresses under the IPv4 scheme. IPv6 uses a 128-bit address as compared to IPv4 32-bit addressing scheme. This change increases the number of available addresses tremendously (from 232 to 2128) and changes quite a few things about how IP works. Due to these changes, IPv4 and IPv6 are not interoperable which has slowed the transition between the two versions of the protocol. Most enterprise applications as of the time of this writing work on networks that are still using the IPv4 protocol so most of our focus will be on how IPv4 works rather than IPv6.

Read full chapter
URL: https://www.sciencedirect.com/science/article/pii/B9780124077737000028