Linux I/O port programming mini-HOWTO

        Author: rjs@spider.compart.fi (Riku Saikkonen)
        Last modified: Aug 26 1996

This document is Copyright 1995-1996 Riku Saikkonen. See the normal Linux
HOWTO COPYRIGHT for details.


This HOWTO document describes programming hardware I/O ports and waiting for
small periods of time in user-mode Linux programs running on an Intel x86
processor. This document is a descendant of the very small IO-Port
mini-HOWTO by the same author.

If you have corrections or something to add, feel free to e-mail me
(rjs@spider.compart.fi)...

Changes from the previous version (Dec 26 1995):
Added information on game port programming, non-C languages, nanosleep(),
and Pentium timing. Added example code. Lots of minor corrections.


        I/O ports in C programs, the normal way

Routines for accessing I/O ports are in /usr/include/asm/io.h (or
linux/include/asm-i386/io.h in the kernel source distribution). The routines
there are inline macros, so it is enough to #include <Asm/io.h>; you do not
need any additional libraries.

Because of a limitation in gcc (present at least in 2.7.2 and below), you
_have to_ compile any source code that uses these routines with optimisation
turned on (gcc -O1 or higher), or alternatively #define extern to be
empty before #including asm/io.h.

For debugging, you can use "gcc -g -O" (at least with modern versions of
gcc), though optimised code can make the debugger behave a bit strangely. If
this bothers you, put the routines that use I/O port access in a separate
source file and compile only that with optimisation turned on.

Before you access any ports, you must give your program permission to do
that. This is done by calling the ioperm(2) function (declared in unistd.h,
and defined in the kernel) somewhere near the start of your program (before
any I/O port accesses). The syntax is ioperm(from,num,turn_on), where from
is the first port number to give access to, and num the number of
consecutive ports to give access to. For example, ioperm(0x300,5,1); would
give access to ports 0x300 through 0x304 (a total of 5 ports). The last
argument is a Boolean value specifying whether to give access to the program
to the ports (true (1)) or to remove access (false (0)). You may call
ioperm() multiple times to enable multiple non-consecutive ports. See the
ioperm(2) manual page for details on the syntax.

The ioperm() call requires your program to have root privileges; thus you
need to either run it as the root user, or make it setuid root. You can drop
the root privileges after you have called ioperm() to enable the ports you
want to use. You are not required to explicitly drop your port access
privileges with ioperm(...,0); at the end of your program, it is done
automatically as the program exits.

Ioperm() priviledges are transferred across fork()s and exec()s, and across
a setuid to a non-root user.

Ioperm() can only give access to ports 0x000 through 0x3ff; for higher
ports, you need to use iopl(2) (which gives you access to all ports at
once). Use the level argument 3 (i.e. "iopl(3);") to give your program
access to all I/O ports. Again, you need root privileges to call iopl().

Then, to actually accessing the ports... To input a byte from a port, call
inb(port);, it returns the byte it got. To output a byte, call outb(value,
port); (notice the order of the parameters). To input a word from ports x
and x+1 (one byte from each to form the word, just like the assembler
instruction INW), call inw(x);. To output a word to the two ports,
outw(value,x);.

The inb_p(), outb_p(), inw_p(), and outw_p() macros work otherwise
identically to the ones above, but they do a short (about one microsecond)
delay after the port access; you can make the delay four microseconds by
#defining REALLY_SLOW_IO before including asm/io.h. These macros normally
(unless you #define SLOW_IO_BY_JUMPING, which probably isn't accurate) use a
port output to port 0x80 for their delay, so you need to give access to
port 0x80 with ioperm() first (outputs to port 0x80 should not affect any
part of the system). For more versatile methods of delaying, read on.

There are man pages for these macros in reasonably recent releases of the
Linux man-pages distribution.


Troubleshooting:

Q1. I get segmentation faults when accessing ports.

A1. Either your program does not have root privileges, or the ioperm() call
    failed for some other reason. Check the return value of ioperm().

Q2. I can't find the in*(), out*() functions defined anywhere, gcc complains
    about undefined references.

A2. You did not compile with optimisation turned on (-O), and thus gcc could
    not resolve the macros in asm/io.h. Or you did not #include <Asm/io.h>
    at all.

Q3. out*() doesn't do anything, or does something weird.

A3. Check the order of the parameters; it should be outb(value,port), not
    outportb(port,value) as is common in MS-DOS.


        An alternate method for I/O port access

Another way is to open() /dev/port (a character device, major number 1,
minor 4) for reading and/or writing (the stdio f*() functions have internal
buffering, so avoid them). Then lseek() to the appropriate byte in the file
(file position 0 = port 0, file position 1 = port 1, and so on), and read()
or write() a byte or word from or to it.

Of course, for this your program needs read/write access to /dev/port. This
method is probably slower than the normal method above, but does not need
optimisation or ioperm().


        Interrupts (IRQs) and DMA access

You cannot use IRQs or DMA directly from a user-mode program. You need to
make a kernel driver; see the Linux Kernel Hacker's Guide for details and
the kernel source code for examples.

You also cannot disable interrupts from within a user-mode program.


        High-resolution timing: Delays

First of all, I should say that you cannot guarantee user-mode programs to
have exact control of timing because of the multi-tasking, pre-emptive
nature of Linux. Your process might be scheduled out at any time for
anything from about 20 milliseconds to a few seconds (on a system with very
high load). However, for most applications using I/O ports, this does not
really matter. To minimise this, you may want to nice your process to a
high-priority value.

There are plans to include real-time support in the Linux kernel; see
http://luz.cs.nmt.edu/~rtlinux for more information on this.

Now, let me start with the easier timing calls. For delays of multiple
seconds, your best bet is probably to use sleep(3). For delays of at least
tens of milliseconds (about 20 ms seems to be the minimum delay), usleep(3)
should work. These functions give the CPU to other processes, so CPU time
isn't wasted. See the manual pages for details.

For delays of under about 20 milliseconds (probably depending on the speed
of your processor and machine, and the system load), giving up the CPU
doesn't work because the Linux scheduler usually takes at least about 20
milliseconds before it returns control to your process. Due to this, in
small delays, usleep(3) usually delays somewhat more than the amount that
you specify in the parameters, and at least 20 ms.

For short delays (tens of us to 20 ms or so), the easiest method is to
use udelay(), defined in /usr/include/asm/delay.h (linux/include/asm-i386/
delay.h). Udelay() takes the number of microseconds to delay (an unsigned
long) as its sole parameter, and returns nothing. It may take up to a few
microseconds more time than the parameter specifies because of the overhead
in the calculation of how long to wait (see delay.h for details).

To use udelay() outside of the kernel, you need to have the unsigned long
variable loops_per_sec defined with the correct value. As far as I know, the
only way to get this value from the kernel is to read /proc/cpuinfo for the
BogoMips value and multiply that by 500000 to get (an imprecise)
loops_per_sec.

In the 2.0.x series of Linux kernels, there is a new system call,
nanosleep(2), that should allow you to sleep or delay for short times. It
currently appears to use udelay() for delays <= 2 ms and sleep otherwise, so
its resolution is probably only a couple of microseconds. I suspect that
this will be improved in later kernels.

I cannot find documentation for nanosleep(2), but from
/usr/src/linux/kernel/sched.c it seems that the syntax is
   int nanosleep(struct timespec *rqtp, struct timespec *rmtp);
where rqtp is the time you want to wait, and if rmtp is non-NULL and
nanosleep() is aborted (setting errno to EINTR and returning -1), the
system call writes the time left to sleep in rmtp. The return value is 0 if
the delay was successful, and -1 if it failed (in which case errno will be
set appropriately).

For even shorter delays, there are a few methods. Outputting any byte to
port 0x80 (see above for how to do it) should wait for almost exactly 1
microsecond independent of your processor type and speed. You can do this
multiple times to wait a few microseconds. The port output should have no
harmful side effects on any standard machine (and some kernel drivers use
it). This is how {in|out}[bw]_p() normally do the delay (see asm/io.h).

If you know the processor type and clock speed of the machine the program
will be running on, you can hard-code shorter delays by running certain
assembler instructions (but remember, your process might be scheduled out at
any time, so the delays might well be longer every now and then). For the
table below, the internal processor speed determines the number of clock
cycles taken; e.g. for a 50 MHz processor (e.g. 486DX-50 or 486DX2-50), one
clock cycle takes 1/50000000 seconds.

Instruction   i386 clock cycles   i486 clock cycles
nop                   3                   1
xchg %ax,%ax          3                   3
or %ax,%ax            2                   1
mov %ax,%ax           2                   1
add %ax,0             2                   1
[source: Borland Turbo Assembler 3.0 Quick Reference]
(sorry, I don't know about Pentiums; probably close to the i486)
(I cannot find an instruction which would use one clock cycle on an i386)

The instructions nop and xchg in the table should have no side effects. The
rest may modify the flags register, but this shouldn't matter since gcc
should detect it.

To use these, call asm("instruction"); in your program. Have the
instructions in the syntax in the table above; to have multiple instructions
in one asm(), asm("instruction ; instruction ; instruction");. The asm() is
translated into inline assembler code by gcc, so there is no function call
overhead.

For Pentiums, you can get the number of clock cycles elapsed since the last
reboot with the following C code:
   extern __inline__ unsigned long long int rdtsc()
   {
     unsigned long long int x;
     __asm__ volatile (".byte 0x0f, 0x31" : "=A" (x));
     return x;
   }

Shorter delays than one clock cycle are impossible in the Intel x86
architecture.


        High-resolution timing: Measuring time

For times accurate to one second, it is probably easiest to use time(2). For
more accurate times, gettimeofday(2) is accurate to about a microsecond (but
see above about scheduling). For Pentiums, the code fragment above is
accurate to one clock cycle.

If you want your process to get a signal after some amount of time, use
setitimer(2). See the manual pages of the functions for details.


        Other programming languages

The description above concentrates on the C programming language. It should
apply directly to C++ and Objective C. In assembler, you have to call
ioperm() or iopl() as in C, but after that you can use the I/O port
read/write instructions directly.

In other languages, unless you can insert inline assembler or C code into
the program, it is probably easiest to write a simple C source file with
functions for the I/O port access you need, and compile and link it in with
the rest of your program.


        Some useful ports

Here is some programming information for common ports that can be directly
used for general-purpose TTL logic I/O.


The parallel port (BASE = 0x3bc for /dev/lp0, 0x378 for /dev/lp1, and 0x278
for /dev/lp2): [source: IBM PS/2 model 50/60 Technical Reference, and some
experimentation] (I do not know how the ECP/EPP mode in newer parallel
ports works, but they should also support the ports described below)
(if you only want to control something that acts like a normal printer, see
the Printing-HOWTO)

In addition to the standard output-only mode, there is an `extended'
bidirectional mode in most parallel ports. This mode has a direction bit
that can be set to either read or write mode. However, I don't know how the
extended mode can be turned on (it should be off by default)... Please
e-mail me if you know.

Port BASE+0 (Data port) controls the data signals of the port (D0 to D7 for
bits 0 to 7, respectively; states: 0 = low (0 V), 1 = high (5 V)). A write
to this port latches the data on the pins. A read returns the data last
written in standard or extended write mode, or the data in the pins from
another device in extended read mode.

Port BASE+1 (Status port) is read-only, and returns the state of the
following input signals:
Bits 0 and 1 are reserved.
Bit 2 IRQ status (not a pin, I don't know how this works)
Bit 3 ERROR (1=high)
Bit 4 SLCT (1=high)
Bit 5 PE (1=high)
Bit 6 ACK (1=high)
Bit 7 -BUSY (0=high)
(I'm not sure about the high and low states.)

Port BASE+2 (Control port) is write-only (a read returns the data last
written), and controls the following status signals:
Bit 0 -STROBE (0=high)
Bit 1 AUTO_FD_XT (1=high)
Bit 2 -INIT (0=high)
Bit 3 SLCT_IN (1=high)
Bit 4 enables the parallel port IRQ (which occurs on the low-to-high
transition of ACK) when set to 1.
Bit 5 controls the extended mode direction (0 = write, 1 = read), and is
completely write-only (a read returns nothing useful for this bit).
Bits 6 and 7 are reserved.
(Again, I am not sure about the high and low states.)

Pinout (a 25-pin female D-shell connector on the port) (i=input, o=output):
1io -STROBE, 2io D0, 3io D1, 4io D2, 5io D3, 6io D4, 7io D5, 8io D6, 9io D7,
10i ACK, 11i -BUSY, 12i PE, 13i SLCT, 14o AUTO_FD_XT, 15i ERROR, 16o -INIT,
17o SLCT_IN, 18-25 Ground

The IBM specifications say that pins 1, 14, 16, and 17 (the control outputs)
have open collector drivers pulled to 5 V through 4.7 kiloohm resistors
(sink 20 mA, source 0.55 mA, high-level output 5.0 V minus pullup). The rest
of the pins sink 24 mA, source 15 mA, and their high-level output is min.
2.4 V. The low state for both is max. 0.5 V. Non-IBM parallel ports probably
deviate from this standard.

Finally, a warning: Be careful with grounding. I've broken several parallel
ports by connecting to them while the computer is turned on. It might be a
good thing to use a parallel port not integrated on the motherboard for
things like this.


The game (joystick) port (ports 0x200-0x207): (for controlling normal
joysticks, there is a kernel-level joystick driver, see
ftp://sunsite.unc.edu/pub/Linux/kernel/patches/joystick-*)

Pinout (a 15-pin female D-shell connector on the port):
1,8,9,15: +5 V (power)
4,5,12: Ground
2,7,10,14: Digital inputs BA1, BA2, BB1, and BB2, respectively
3,6,11,13: Analog inputs AX, AY, BX, and BY, respectively

The +5 V pins seem to be connected directly to the power lines in the
motherboard, so they should be able to source quite a lot of power,
depending on the motherboard and power supply.

The digital inputs are used for the buttons of the two joysticks (joystick A
and joystick B, with two buttons each) that you can connect to the port.
They should be normal TTL-level inputs, and you can read their status
directly from the status port (see below). A real joystick returns a low (0
V) status when the button is pressed and a high (the 5 V from the power pins
through an 1 Kohm resistor) status otherwise.

The so-called analog inputs actually measure resistance. The game port has a
quad one-shot multivibrator (a 558 chip) connected to the four inputs. In
each input, there is a 2.2 Kohm resistor between the input pin and the
multivibrator output, and a 0.01 uF timing capacitor between the
multivibrator output and the ground. A real joystick has a potentiometer for
each axis (X and Y), wired between +5 V and the appropriate input pin (AX or
AY for joystick A, or BX or BY for joystick B).

The multivibrator, when activated, sets its output lines high (5 V) and
waits for each timing capacitor to reach 3.3 V before lowering the
respective output line. Thus the high period duration of the multivibrator
is proportional to the resistance of the potentiometer in the joystick (i.e.
the position of the joystick in the appropriate axis), as follows:
   R = (t - 24.2) / 0.011,
where R is the resistance (ohms) of the potentiometer and t the high period
duration (seconds).

Thus, to read the analog inputs, you first activate the multivibrator (with
a port write; see below), then poll the state of the four axes (with
repeated port reads) until they drop from high to low state, measuring their
high period duration. This polling uses quite a lot of CPU time, and on a
non-realtime multitasking system like Linux, the result is not very accurate
because you cannot poll the port constantly (unless you use a kernel-level
driver and disable interrupts while polling, but this wastes even more CPU
time).

The only I/O port you need to access is port 0x201 (the other ports either
behave identically or do nothing). Any write to this port (it doesn't matter
what you write) activates the multivibrator. A read from this port returns
the state of the input signals:
Bit 0: AX (status (1=high) of the multivibrator output)
Bit 1: AY (status (1=high) of the multivibrator output)
Bit 2: BX (status (1=high) of the multivibrator output)
Bit 3: BY (status (1=high) of the multivibrator output)
Bit 4: BA1 (digital input, 1=high)
Bit 5: BA2 (digital input, 1=high)
Bit 6: BB1 (digital input, 1=high)
Bit 7: BB2 (digital input, 1=high)


If you want good analog I/O, you can wire up ADC and/or DAC chips to the
parallel port (hint: for power, use the game port connector or a spare disk
drive power connector wired to outside the computer case, unless you have a
low-power device and can use the parallel port itself for power), or buy an
AD/DA card (most are controlled by I/O ports). Or, if you're satisfied with
1 or 2 channels, inaccuracy, and (probably) bad zeroing, a cheap sound card
supported by the Linux sound driver should do (and it's pretty fast).


        Software example

Here's a piece of simple example code for I/O port access:

/*
 * example.c: very simple example of port I/O
 *
 * This code does nothing useful, just a port write, a pause,
 * and a port read. Compile with `gcc -O2 -o example example.c'.
 */

#include <stdio.h>
#include <unistd.h>
#include <Asm/io.h>

#define BASEPORT 0x378 /* lp1 */

int main()
{
  /* Get access to the ports */
  if (ioperm(BASEPORT,3,1)) {perror("ioperm");exit(1);}
  
  /* Set the data signals (D0-7) of the port to all low (0) */
  outb(0,BASEPORT);
  
  /* Sleep for a while (100 ms) */
  usleep(100000);
  
  /* Read from the status port (BASE+1) and display the result */
  printf("status: %d\n",inb(BASEPORT+1));

  /* We don't need the ports anymore */
  if (ioperm(BASEPORT,3,0)) {perror("ioperm");exit(1);}

  exit(0);
}

/* end of example.c */


        Credits

Too many people have contributed for me to list, but thanks a lot, everyone.
I have not replied to all the contributions that I've received; sorry for
that, and thanks again for the help.


End of the Linux I/O port programming mini-HOWTO.