File Transfer Protocol (FTP)

What is File Transfer Protocol 

File Transfer Protocol (FTP) is a standard communication protocol used for the transfer of computer files from a server to a client on a computer network. FTP is built on a client-server model architecture using separate control and data connections between the client and the server. FTP users may authenticate themselves with a clear-text sign-in protocol, normally in the form of a username and password, but can connect anonymously if the server is configured to allow it. For secure transmission that protects the username and password, and encrypts the content, FTP is often secured with SSL/TLS (FTPS) or replaced with SSH File Transfer Protocol (SFTP).

Communication and data transfer

Illustration of starting a passive connection using port 21

FTP may run in active or passive mode, which determines how the data connection is established. (This sense of “mode” is different from that of the MODE command in the FTP protocol.

  • In active mode, the client starts listening for incoming data connections from the server on port M. It sends the FTP command PORT M to inform the server on which port it is listening. The server then initiates a data channel to the client from its port 20, the FTP server data port.
  • In situations where the client is behind a firewall and unable to accept incoming TCP connections, passive mode may be used. In this mode, the client uses the control connection to send a PASV command to the server and then receives a server IP address and server port number from the server, which the client then uses to open a data connection from an arbitrary client port to the server IP address and server port number received.

Both modes were updated in September 1998 to support IPv6. Further changes were introduced to the passive mode at that time, updating it to the extended passive mode.

The server responds over the control connection with three-digit status codes in ASCII with an optional text message. For example, “200” (or “200 OK”) means that the last command was successful. The numbers represent the code for the response and the optional text represents a human-readable explanation or request (e.g. <Need account for storing file>). An ongoing transfer of file data over the data connection can be aborted using an interrupt message sent over the control connection.

FTP needs two ports (one for sending and one for receiving) because it was originally designed to operate on Network Control Program (NCP), which was a simplex protocol that utilized two port addresses, establishing two connections, for two-way communications. An odd and an even port were reserved for each application layer application or protocol. The standardization of TCP and UDP reduced the need for the use of two simplex ports for each application down to one duplex port, but the FTP protocol was never altered to only use one port and continued using two for backward compatibility.

NAT and firewall traversal

FTP normally transfers data by having the server connect back to the client after the PORT command is sent by the client. This is problematic for both NATs and firewalls, which do not allow connections from the Internet to internal hosts. For NATs, an additional complication is that the representation of the IP addresses and port number in the PORT command refers to the internal host’s IP address and port, rather than the public IP address and port of the NAT.

There are two approaches to solving this problem. One is that the FTP client and FTP server use the PASV command, which causes the data connection to be established from the FTP client to the server. This is widely used by modern FTP clients. Another approach is for the NAT to alter the values of the PORT command, using an application-level gateway for this purpose.

File structures

File organization is specified using the STRU command. The following file structures are defined in section 3.1.1 of RFC959:

  • F or FILE structure (stream-oriented). Files are viewed as an arbitrary sequence of bytes, characters, or words. This is the usual file structure on Unix systems and other systems such as CP/M, MS-DOS, and Microsoft Windows. (Section 3.1.1.1)
  • R or RECORD structure (record-oriented). Files are viewed as divided into records, which may be fixed or variable length. This file organization is common on mainframe and midrange systems, such as VMS, VM/CMS, OS/400, and VMS, which support record-oriented filesystems.
  • P or PAGE structure (page-oriented). Files are divided into pages, which may either contain data or metadata; each page may also have a header giving various attributes. This file structure was specifically designed for TENEX systems and is generally not supported on other platforms. RFC1123 section 4.1.2.3 recommends that this structure not be implemented.

Most contemporary FTP clients and servers only support STRU F. STRU R is still in use in mainframe and minicomputer file transfer applications.

Data transfer modes

Data transfer can be done in any of three modes:

  • Stream mode (MODE S): Data is sent as a continuous stream, relieving FTP from doing any processing. Rather, all processing is left up to TCP. No End-of-file indicator is needed unless the data is divided into records.
  • Block mode (MODE B): Designed primarily for transferring record-oriented files (STRU R), although can also be used to transfer stream-oriented (STRU F) text files. FTP puts each record (or line) of data into several blocks (block header, byte count, and data field) and then passes it on to TCP.
  • Compressed mode (MODE C): Extends MODE B with data compression using run-length encoding.

Most contemporary FTP clients and servers do not implement MODE B or MODE C; FTP clients and servers for mainframe and minicomputer operating systems are the exceptions to that.

Some FTP software also implements a DEFLATE-based compressed mode, sometimes called “Mode Z” after the command that enables it. This mode was described in an Internet-Draft, but not standardized.

GridFTP defines additional modes, MODE E and MODE X, as extensions of MODE B.