mirror of https://github.com/ossrs/srs.git
You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
505 lines
23 KiB
HTML
505 lines
23 KiB
HTML
5 years ago
|
<HTML>
|
||
|
<HEAD>
|
||
|
<TITLE>State Threads for Internet Applications</TITLE>
|
||
|
</HEAD>
|
||
|
<BODY BGCOLOR=#FFFFFF>
|
||
|
<H2>State Threads for Internet Applications</H2>
|
||
|
<H3>Introduction</H3>
|
||
|
<P>
|
||
|
State Threads is an application library which provides a
|
||
|
foundation for writing fast and highly scalable Internet Applications
|
||
|
on UNIX-like platforms. It combines the simplicity of the multithreaded
|
||
|
programming paradigm, in which one thread supports each simultaneous
|
||
|
connection, with the performance and scalability of an event-driven
|
||
|
state machine architecture.</P>
|
||
|
|
||
|
<H3>1. Definitions</H3>
|
||
|
<P>
|
||
|
<A NAME="IA">
|
||
|
<H4>1.1 Internet Applications</H4>
|
||
|
</A>
|
||
|
<P>
|
||
|
An <I>Internet Application</I> (IA) is either a server or client network
|
||
|
application that accepts connections from clients and may or may not
|
||
|
connect to servers. In an IA the arrival or departure of network data
|
||
|
often controls processing (that is, IA is a <I>data-driven</I> application).
|
||
|
For each connection, an IA does some finite amount of work
|
||
|
involving data exchange with its peer, where its peer may be either
|
||
|
a client or a server.
|
||
|
The typical transaction steps of an IA are to accept a connection,
|
||
|
read a request, do some finite and predictable amount of work to
|
||
|
process the request, then write a response to the peer that sent the
|
||
|
request. One example of an IA is a Web server;
|
||
|
the most general example of an IA is a proxy server, because it both
|
||
|
accepts connections from clients and connects to other servers.</P>
|
||
|
<P>
|
||
|
We assume that the performance of an IA is constrained by available CPU
|
||
|
cycles rather than network bandwidth or disk I/O (that is, CPU
|
||
|
is a bottleneck resource).
|
||
|
<P>
|
||
|
|
||
|
<A NAME="PS">
|
||
|
<H4>1.2 Performance and Scalability</H4>
|
||
|
</A>
|
||
|
<P>
|
||
|
The <I>performance</I> of an IA is usually evaluated as its
|
||
|
throughput measured in transactions per second or bytes per second (one
|
||
|
can be converted to the other, given the average transaction size). There are
|
||
|
several benchmarks that can be used to measure throughput of Web serving
|
||
|
applications for specific workloads (such as
|
||
|
<A HREF="http://www.spec.org/osg/web96/">SPECweb96</A>,
|
||
|
<A HREF="http://www.mindcraft.com/webstone/">WebStone</A>,
|
||
|
<A HREF="http://www.zdnet.com/zdbop/webbench/">WebBench</A>).
|
||
|
Although there is no common definition for <I>scalability</I>, in general it
|
||
|
expresses the ability of an application to sustain its performance when some
|
||
|
external condition changes. For IAs this external condition is either the
|
||
|
number of clients (also known as "users," "simultaneous connections," or "load
|
||
|
generators") or the underlying hardware system size (number of CPUs, memory
|
||
|
size, and so on). Thus there are two types of scalability: <I>load
|
||
|
scalability</I> and <I>system scalability</I>, respectively.
|
||
|
<P>
|
||
|
The figure below shows how the throughput of an idealized IA changes with
|
||
|
the increasing number of clients (solid blue line). Initially the throughput
|
||
|
grows linearly (the slope represents the maximal throughput that one client
|
||
|
can provide). Within this initial range, the IA is underutilized and CPUs are
|
||
|
partially idle. Further increase in the number of clients leads to a system
|
||
|
saturation, and the throughput gradually stops growing as all CPUs become fully
|
||
|
utilized. After that point, the throughput stays flat because there are no
|
||
|
more CPU cycles available.
|
||
|
In the real world, however, each simultaneous connection
|
||
|
consumes some computational and memory resources, even when idle, and this
|
||
|
overhead grows with the number of clients. Therefore, the throughput of the
|
||
|
real world IA starts dropping after some point (dashed blue line in the figure
|
||
|
below). The rate at which the throughput drops depends, among other things, on
|
||
|
application design.
|
||
|
<P>
|
||
|
We say that an application has a good <I>load scalability</I> if it can
|
||
|
sustain its throughput over a wide range of loads.
|
||
|
Interestingly, the <A HREF="http://www.spec.org/osg/web99/">SPECweb99</A>
|
||
|
benchmark somewhat reflects the Web server's load scalability because it
|
||
|
measures the number of clients (load generators) given a mandatory minimal
|
||
|
throughput per client (that is, it measures the server's <I>capacity</I>).
|
||
|
This is unlike <A HREF="http://www.spec.org/osg/web96/">SPECweb96</A> and
|
||
|
other benchmarks that use the throughput as their main metric (see the figure
|
||
|
below).
|
||
|
<P>
|
||
|
<CENTER><IMG SRC="fig.gif" ALT="Figure: Throughput vs. Number of clients">
|
||
|
</CENTER>
|
||
|
<P>
|
||
|
<I>System scalability</I> is the ability of an application to sustain its
|
||
|
performance per hardware unit (such as a CPU) with the increasing number of
|
||
|
these units. In other words, good system scalability means that doubling the
|
||
|
number of processors will roughly double the application's throughput (dashed
|
||
|
green line). We assume here that the underlying operating system also scales
|
||
|
well. Good system scalability allows you to initially run an application on
|
||
|
the smallest system possible, while retaining the ability to move that
|
||
|
application to a larger system if necessary, without excessive effort or
|
||
|
expense. That is, an application need not be rewritten or even undergo a
|
||
|
major porting effort when changing system size.
|
||
|
<P>
|
||
|
Although scalability and performance are more important in the case of server
|
||
|
IAs, they should also be considered for some client applications (such as
|
||
|
benchmark load generators).
|
||
|
<P>
|
||
|
|
||
|
<A NAME="CONC">
|
||
|
<H4>1.3 Concurrency</H4>
|
||
|
</A>
|
||
|
<P>
|
||
|
Concurrency reflects the parallelism in a system. The two unrelated types
|
||
|
are <I>virtual</I> concurrency and <I>real</I> concurrency.
|
||
|
<UL>
|
||
|
<LI>Virtual (or apparent) concurrency is the number of simultaneous
|
||
|
connections that a system supports.
|
||
|
<BR><BR>
|
||
|
<LI>Real concurrency is the number of hardware devices, including
|
||
|
CPUs, network cards, and disks, that actually allow a system to perform
|
||
|
tasks in parallel.
|
||
|
</UL>
|
||
|
<P>
|
||
|
An IA must provide virtual concurrency in order to serve many users
|
||
|
simultaneously.
|
||
|
To achieve maximum performance and scalability in doing so, the number of
|
||
|
programming entities than an IA creates to be scheduled by the OS kernel
|
||
|
should be
|
||
|
kept close to (within an order of magnitude of) the real concurrency found on
|
||
|
the system. These programming entities scheduled by the kernel are known as
|
||
|
<I>kernel execution vehicles</I>. Examples of kernel execution vehicles
|
||
|
include Solaris lightweight processes and IRIX kernel threads.
|
||
|
In other words, the number of kernel execution vehicles should be dictated by
|
||
|
the system size and not by the number of simultaneous connections.
|
||
|
<P>
|
||
|
|
||
|
<H3>2. Existing Architectures</H3>
|
||
|
<P>
|
||
|
There are a few different architectures that are commonly used by IAs.
|
||
|
These include the <I>Multi-Process</I>,
|
||
|
<I>Multi-Threaded</I>, and <I>Event-Driven State Machine</I>
|
||
|
architectures.
|
||
|
<P>
|
||
|
<A NAME="MP">
|
||
|
<H4>2.1 Multi-Process Architecture</H4>
|
||
|
</A>
|
||
|
<P>
|
||
|
In the Multi-Process (MP) architecture, an individual process is
|
||
|
dedicated to each simultaneous connection.
|
||
|
A process performs all of a transaction's initialization steps
|
||
|
and services a connection completely before moving on to service
|
||
|
a new connection.
|
||
|
<P>
|
||
|
User sessions in IAs are relatively independent; therefore, no
|
||
|
synchronization between processes handling different connections is
|
||
|
necessary. Because each process has its own private address space,
|
||
|
this architecture is very robust. If a process serving one of the connections
|
||
|
crashes, the other sessions will not be affected. However, to serve many
|
||
|
concurrent connections, an equal number of processes must be employed.
|
||
|
Because processes are kernel entities (and are in fact the heaviest ones),
|
||
|
the number of kernel entities will be at least as large as the number of
|
||
|
concurrent sessions. On most systems, good performance will not be achieved
|
||
|
when more than a few hundred processes are created because of the high
|
||
|
context-switching overhead. In other words, MP applications have poor load
|
||
|
scalability.
|
||
|
<P>
|
||
|
On the other hand, MP applications have very good system scalability, because
|
||
|
no resources are shared among different processes and there is no
|
||
|
synchronization overhead.
|
||
|
<P>
|
||
|
The Apache Web Server 1.x (<A HREF=#refs1>[Reference 1]</A>) uses the MP
|
||
|
architecture on UNIX systems.
|
||
|
<P>
|
||
|
<A NAME="MT">
|
||
|
<H4>2.2 Multi-Threaded Architecture</H4>
|
||
|
</A>
|
||
|
<P>
|
||
|
In the Multi-Threaded (MT) architecture, multiple independent threads
|
||
|
of control are employed within a single shared address space. Like a
|
||
|
process in the MP architecture, each thread performs all of a
|
||
|
transaction's initialization steps and services a connection completely
|
||
|
before moving on to service a new connection.
|
||
|
<P>
|
||
|
Many modern UNIX operating systems implement a <I>many-to-few</I> model when
|
||
|
mapping user-level threads to kernel entities. In this model, an
|
||
|
arbitrarily large number of user-level threads is multiplexed onto a
|
||
|
lesser number of kernel execution vehicles. Kernel execution
|
||
|
vehicles are also known as <I>virtual processors</I>. Whenever a user-level
|
||
|
thread makes a blocking system call, the kernel execution vehicle it is using
|
||
|
will become blocked in the kernel. If there are no other non-blocked kernel
|
||
|
execution vehicles and there are other runnable user-level threads, a new
|
||
|
kernel execution vehicle will be created automatically. This prevents the
|
||
|
application from blocking when it can continue to make useful forward
|
||
|
progress.
|
||
|
<P>
|
||
|
Because IAs are by nature network I/O driven, all concurrent sessions block on
|
||
|
network I/O at various points. As a result, the number of virtual processors
|
||
|
created in the kernel grows close to the number of user-level threads
|
||
|
(or simultaneous connections). When this occurs, the many-to-few model
|
||
|
effectively degenerates to a <I>one-to-one</I> model. Again, like in
|
||
|
the MP architecture, the number of kernel execution vehicles is dictated by
|
||
|
the number of simultaneous connections rather than by number of CPUs. This
|
||
|
reduces an application's load scalability. However, because kernel threads
|
||
|
(lightweight processes) use fewer resources and are more light-weight than
|
||
|
traditional UNIX processes, an MT application should scale better with load
|
||
|
than an MP application.
|
||
|
<P>
|
||
|
Unexpectedly, the small number of virtual processors sharing the same address
|
||
|
space in the MT architecture destroys an application's system scalability
|
||
|
because of contention among the threads on various locks. Even if an
|
||
|
application itself is carefully
|
||
|
optimized to avoid lock contention around its own global data (a non-trivial
|
||
|
task), there are still standard library functions and system calls
|
||
|
that use common resources hidden from the application. For example,
|
||
|
on many platforms thread safety of memory allocation routines
|
||
|
(<TT>malloc(3)</TT>, <TT>free(3)</TT>, and so on) is achieved by using a single
|
||
|
global lock. Another example is a per-process file descriptor table.
|
||
|
This common resource table is shared by all kernel execution vehicles within
|
||
|
the same process and must be protected when one modifies it via
|
||
|
certain system calls (such as <TT>open(2)</TT>, <TT>close(2)</TT>, and so on).
|
||
|
In addition to that, maintaining the caches coherent
|
||
|
among CPUs on multiprocessor systems hurts performance when different threads
|
||
|
running on different CPUs modify data items on the same cache line.
|
||
|
<P>
|
||
|
In order to improve load scalability, some applications employ a different
|
||
|
type of MT architecture: they create one or more thread(s) <I>per task</I>
|
||
|
rather than one thread <I>per connection</I>. For example, one small group
|
||
|
of threads may be responsible for accepting client connections, another
|
||
|
for request processing, and yet another for serving responses. The main
|
||
|
advantage of this architecture is that it eliminates the tight coupling
|
||
|
between the number of threads and number of simultaneous connections. However,
|
||
|
in this architecture, different task-specific thread groups must share common
|
||
|
work queues that must be protected by mutual exclusion locks (a typical
|
||
|
producer-consumer problem). This adds synchronization overhead that causes an
|
||
|
application to perform badly on multiprocessor systems. In other words, in
|
||
|
this architecture, the application's system scalability is sacrificed for the
|
||
|
sake of load scalability.
|
||
|
<P>
|
||
|
Of course, the usual nightmares of threaded programming, including data
|
||
|
corruption, deadlocks, and race conditions, also make MT architecture (in any
|
||
|
form) non-simplistic to use.
|
||
|
<P>
|
||
|
|
||
|
<A NAME="EDSM">
|
||
|
<H4>2.3 Event-Driven State Machine Architecture</H4>
|
||
|
</A>
|
||
|
<P>
|
||
|
In the Event-Driven State Machine (EDSM) architecture, a single process
|
||
|
is employed to concurrently process multiple connections. The basics of this
|
||
|
architecture are described in Comer and Stevens
|
||
|
<A HREF=#refs2>[Reference 2]</A>.
|
||
|
The EDSM architecture performs one basic data-driven step associated with
|
||
|
a particular connection at a time, thus multiplexing many concurrent
|
||
|
connections. The process operates as a state machine that receives an event
|
||
|
and then reacts to it.
|
||
|
<P>
|
||
|
In the idle state the EDSM calls <TT>select(2)</TT> or <TT>poll(2)</TT> to
|
||
|
wait for network I/O events. When a particular file descriptor is ready for
|
||
|
I/O, the EDSM completes the corresponding basic step (usually by invoking a
|
||
|
handler function) and starts the next one. This architecture uses
|
||
|
non-blocking system calls to perform asynchronous network I/O operations.
|
||
|
For more details on non-blocking I/O see Stevens
|
||
|
<A HREF=#refs3>[Reference 3]</A>.
|
||
|
<P>
|
||
|
To take advantage of hardware parallelism (real concurrency), multiple
|
||
|
identical processes may be created. This is called Symmetric Multi-Process
|
||
|
EDSM and is used, for example, in the Zeus Web Server
|
||
|
(<A HREF=#refs4>[Reference 4]</A>). To more efficiently multiplex disk I/O,
|
||
|
special "helper" processes may be created. This is called Asymmetric
|
||
|
Multi-Process EDSM and was proposed for Web servers by Druschel
|
||
|
and others <A HREF=#refs5>[Reference 5]</A>.
|
||
|
<P>
|
||
|
EDSM is probably the most scalable architecture for IAs.
|
||
|
Because the number of simultaneous connections (virtual concurrency) is
|
||
|
completely decoupled from the number of kernel execution vehicles (processes),
|
||
|
this architecture has very good load scalability. It requires only minimal
|
||
|
user-level resources to create and maintain additional connection.
|
||
|
<P>
|
||
|
Like MP applications, Multi-Process EDSM has very good system scalability
|
||
|
because no resources are shared among different processes and there is no
|
||
|
synchronization overhead.
|
||
|
<P>
|
||
|
Unfortunately, the EDSM architecture is monolithic rather than based on the
|
||
|
concept of threads, so new applications generally need to be implemented from
|
||
|
the ground up. In effect, the EDSM architecture simulates threads and their
|
||
|
stacks the hard way.
|
||
|
<P>
|
||
|
|
||
|
<A NAME="ST">
|
||
|
<H3>3. State Threads Library</H3>
|
||
|
</A>
|
||
|
<P>
|
||
|
The State Threads library combines the advantages of all of the above
|
||
|
architectures. The interface preserves the programming simplicity of thread
|
||
|
abstraction, allowing each simultaneous connection to be treated as a separate
|
||
|
thread of execution within a single process. The underlying implementation is
|
||
|
close to the EDSM architecture as the state of each particular concurrent
|
||
|
session is saved in a separate memory segment.
|
||
|
<P>
|
||
|
|
||
|
<H4>3.1 State Changes and Scheduling</H4>
|
||
|
<P>
|
||
|
The state of each concurrent session includes its stack environment
|
||
|
(stack pointer, program counter, CPU registers) and its stack. Conceptually,
|
||
|
a thread context switch can be viewed as a process changing its state. There
|
||
|
are no kernel entities involved other than processes.
|
||
|
Unlike other general-purpose threading libraries, the State Threads library
|
||
|
is fully deterministic. The thread context switch (process state change) can
|
||
|
only happen in a well-known set of functions (at I/O points or at explicit
|
||
|
synchronization points). As a result, process-specific global data does not
|
||
|
have to be protected by mutual exclusion locks in most cases. The entire
|
||
|
application is free to use all the static variables and non-reentrant library
|
||
|
functions it wants, greatly simplifying programming and debugging while
|
||
|
increasing performance. This is somewhat similar to a <I>co-routine</I> model
|
||
|
(co-operatively multitasked threads), except that no explicit yield is needed
|
||
|
--
|
||
|
sooner or later, a thread performs a blocking I/O operation and thus surrenders
|
||
|
control. All threads of execution (simultaneous connections) have the
|
||
|
same priority, so scheduling is non-preemptive, like in the EDSM architecture.
|
||
|
Because IAs are data-driven (processing is limited by the size of network
|
||
|
buffers and data arrival rates), scheduling is non-time-slicing.
|
||
|
<P>
|
||
|
Only two types of external events are handled by the library's
|
||
|
scheduler, because only these events can be detected by
|
||
|
<TT>select(2)</TT> or <TT>poll(2)</TT>: I/O events (a file descriptor is ready
|
||
|
for I/O) and time events
|
||
|
(some timeout has expired). However, other types of events (such as
|
||
|
a signal sent to a process) can also be handled by converting them to I/O
|
||
|
events. For example, a signal handling function can perform a write to a pipe
|
||
|
(<TT>write(2)</TT> is reentrant/asynchronous-safe), thus converting a signal
|
||
|
event to an I/O event.
|
||
|
<P>
|
||
|
To take advantage of hardware parallelism, as in the EDSM architecture,
|
||
|
multiple processes can be created in either a symmetric or asymmetric manner.
|
||
|
Process management is not in the library's scope but instead is left up to the
|
||
|
application.
|
||
|
<P>
|
||
|
There are several general-purpose threading libraries that implement a
|
||
|
<I>many-to-one</I> model (many user-level threads to one kernel execution
|
||
|
vehicle), using the same basic techniques as the State Threads library
|
||
|
(non-blocking I/O, event-driven scheduler, and so on). For an example, see GNU
|
||
|
Portable Threads (<A HREF=#refs6>[Reference 6]</A>). Because they are
|
||
|
general-purpose, these libraries have different objectives than the State
|
||
|
Threads library. The State Threads library is <I>not</I> a general-purpose
|
||
|
threading library,
|
||
|
but rather an application library that targets only certain types of
|
||
|
applications (IAs) in order to achieve the highest possible performance and
|
||
|
scalability for those applications.
|
||
|
<P>
|
||
|
|
||
|
<H4>3.2 Scalability</H4>
|
||
|
<P>
|
||
|
State threads are very lightweight user-level entities, and therefore creating
|
||
|
and maintaining user connections requires minimal resources. An application
|
||
|
using the State Threads library scales very well with the increasing number
|
||
|
of connections.
|
||
|
<P>
|
||
|
On multiprocessor systems an application should create multiple processes
|
||
|
to take advantage of hardware parallelism. Using multiple separate processes
|
||
|
is the <I>only</I> way to achieve the highest possible system scalability.
|
||
|
This is because duplicating per-process resources is the only way to avoid
|
||
|
significant synchronization overhead on multiprocessor systems. Creating
|
||
|
separate UNIX processes naturally offers resource duplication. Again,
|
||
|
as in the EDSM architecture, there is no connection between the number of
|
||
|
simultaneous connections (which may be very large and changes within a wide
|
||
|
range) and the number of kernel entities (which is usually small and constant).
|
||
|
In other words, the State Threads library makes it possible to multiplex a
|
||
|
large number of simultaneous connections onto a much smaller number of
|
||
|
separate processes, thus allowing an application to scale well with both
|
||
|
the load and system size.
|
||
|
<P>
|
||
|
|
||
|
<H4>3.3 Performance</H4>
|
||
|
<P>
|
||
|
Performance is one of the library's main objectives. The State Threads
|
||
|
library is implemented to minimize the number of system calls and
|
||
|
to make thread creation and context switching as fast as possible.
|
||
|
For example, per-thread signal mask does not exist (unlike
|
||
|
POSIX threads), so there is no need to save and restore a process's
|
||
|
signal mask on every thread context switch. This eliminates two system
|
||
|
calls per context switch. Signal events can be handled much more
|
||
|
efficiently by converting them to I/O events (see above).
|
||
|
<P>
|
||
|
|
||
|
<H4>3.4 Portability</H4>
|
||
|
<P>
|
||
|
The library uses the same general, underlying concepts as the EDSM
|
||
|
architecture, including non-blocking I/O, file descriptors, and
|
||
|
I/O multiplexing. These concepts are available in some form on most
|
||
|
UNIX platforms, making the library very portable across many
|
||
|
flavors of UNIX. There are only a few platform-dependent sections in the
|
||
|
source.
|
||
|
<P>
|
||
|
|
||
|
<H4>3.5 State Threads and NSPR</H4>
|
||
|
<P>
|
||
|
The State Threads library is a derivative of the Netscape Portable
|
||
|
Runtime library (NSPR) <A HREF=#refs7>[Reference 7]</A>. The primary goal of
|
||
|
NSPR is to provide a platform-independent layer for system facilities,
|
||
|
where system facilities include threads, thread synchronization, and I/O.
|
||
|
Performance and scalability are not the main concern of NSPR. The
|
||
|
State Threads library addresses performance and scalability while
|
||
|
remaining much smaller than NSPR. It is contained in 8 source files
|
||
|
as opposed to more than 400, but provides all the functionality that
|
||
|
is needed to write efficient IAs on UNIX-like platforms.
|
||
|
<P>
|
||
|
|
||
|
<TABLE CELLPADDING=3>
|
||
|
<TR>
|
||
|
<TD></TD>
|
||
|
<TH>NSPR</TH>
|
||
|
<TH>State Threads</TH>
|
||
|
</TR>
|
||
|
<TR>
|
||
|
<TD><B>Lines of code</B></TD>
|
||
|
<TD ALIGN=RIGHT>~150,000</TD>
|
||
|
<TD ALIGN=RIGHT>~3000</TD>
|
||
|
</TR>
|
||
|
<TR>
|
||
|
<TD><B>Dynamic library size <BR>(debug version)</B></TD>
|
||
|
<TD></TD>
|
||
|
<TD></TD>
|
||
|
</TR>
|
||
|
<TR>
|
||
|
<TD>IRIX</TD>
|
||
|
<TD ALIGN=RIGHT>~700 KB</TD>
|
||
|
<TD ALIGN=RIGHT>~60 KB</TD>
|
||
|
</TR>
|
||
|
<TR>
|
||
|
<TD>Linux</TD>
|
||
|
<TD ALIGN=RIGHT>~900 KB</TD>
|
||
|
<TD ALIGN=RIGHT>~70 KB</TD>
|
||
|
</TR>
|
||
|
</TABLE>
|
||
|
<P>
|
||
|
|
||
|
<H3>Conclusion</H3>
|
||
|
<P>
|
||
|
State Threads is an application library which provides a foundation for
|
||
|
writing <A HREF=#IA>Internet Applications</A>. To summarize, it has the
|
||
|
following <I>advantages</I>:
|
||
|
<P>
|
||
|
<UL>
|
||
|
<LI>It allows the design of fast and highly scalable applications. An
|
||
|
application will scale well with both load and number of CPUs.
|
||
|
<P>
|
||
|
<LI>It greatly simplifies application programming and debugging because, as a
|
||
|
rule, no mutual exclusion locking is necessary and the entire application is
|
||
|
free to use static variables and non-reentrant library functions.
|
||
|
</UL>
|
||
|
<P>
|
||
|
The library's main <I>limitation</I>:
|
||
|
<P>
|
||
|
<UL>
|
||
|
<LI>All I/O operations on sockets must use the State Thread library's I/O
|
||
|
functions because only those functions perform thread scheduling and prevent
|
||
|
the application's processes from blocking.
|
||
|
</UL>
|
||
|
<P>
|
||
|
|
||
|
<H3>References</H3>
|
||
|
<OL>
|
||
|
<A NAME="refs1">
|
||
|
<LI> Apache Software Foundation,
|
||
|
<A HREF="http://www.apache.org">http://www.apache.org</A>.
|
||
|
<A NAME="refs2">
|
||
|
<LI> Douglas E. Comer, David L. Stevens, <I>Internetworking With TCP/IP,
|
||
|
Vol. III: Client-Server Programming And Applications</I>, Second Edition,
|
||
|
Ch. 8, 12.
|
||
|
<A NAME="refs3">
|
||
|
<LI> W. Richard Stevens, <I>UNIX Network Programming</I>, Second Edition,
|
||
|
Vol. 1, Ch. 15.
|
||
|
<A NAME="refs4">
|
||
|
<LI> Zeus Technology Limited,
|
||
|
<A HREF="http://www.zeus.co.uk/">http://www.zeus.co.uk</A>.
|
||
|
<A NAME="refs5">
|
||
|
<LI> Peter Druschel, Vivek S. Pai, Willy Zwaenepoel,
|
||
|
<A HREF="http://www.cs.rice.edu/~druschel/usenix99flash.ps.gz">
|
||
|
Flash: An Efficient and Portable Web Server</A>. In <I>Proceedings of the
|
||
|
USENIX 1999 Annual Technical Conference</I>, Monterey, CA, June 1999.
|
||
|
<A NAME="refs6">
|
||
|
<LI> GNU Portable Threads,
|
||
|
<A HREF="http://www.gnu.org/software/pth/">http://www.gnu.org/software/pth/</A>.
|
||
|
<A NAME="refs7">
|
||
|
<LI> Netscape Portable Runtime,
|
||
|
<A HREF="http://www.mozilla.org/docs/refList/refNSPR/">http://www.mozilla.org/docs/refList/refNSPR/</A>.
|
||
|
</OL>
|
||
|
|
||
|
<H3>Other resources covering various architectural issues in IAs</H3>
|
||
|
<OL START=8>
|
||
|
<LI> Dan Kegel, <I>The C10K problem</I>,
|
||
|
<A HREF="http://www.kegel.com/c10k.html">http://www.kegel.com/c10k.html</A>.
|
||
|
</LI>
|
||
|
<LI> James C. Hu, Douglas C. Schmidt, Irfan Pyarali, <I>JAWS: Understanding
|
||
|
High Performance Web Systems</I>,
|
||
|
<A HREF="http://www.cs.wustl.edu/~jxh/research/research.html">http://www.cs.wustl.edu/~jxh/research/research.html</A>.</LI>
|
||
|
</OL>
|
||
|
<P>
|
||
|
<HR>
|
||
|
<P>
|
||
|
|
||
|
<CENTER><FONT SIZE=-1>Portions created by SGI are Copyright © 2000
|
||
|
Silicon Graphics, Inc. All rights reserved.</FONT></CENTER>
|
||
|
<P>
|
||
|
|
||
|
</BODY>
|
||
|
</HTML>
|
||
|
|