mirror of https://github.com/ossrs/srs.git
You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
435 lines
16 KiB
HTML
435 lines
16 KiB
HTML
5 years ago
|
<HTML>
|
||
|
<HEAD>
|
||
|
<TITLE>State Threads Library Programming Notes</TITLE>
|
||
|
</HEAD>
|
||
|
<BODY BGCOLOR=#FFFFFF>
|
||
|
<H2>Programming Notes</H2>
|
||
|
<P>
|
||
|
<B>
|
||
|
<UL>
|
||
|
<LI><A HREF=#porting>Porting</A></LI>
|
||
|
<LI><A HREF=#signals>Signals</A></LI>
|
||
|
<LI><A HREF=#intra>Intra-Process Synchronization</A></LI>
|
||
|
<LI><A HREF=#inter>Inter-Process Synchronization</A></LI>
|
||
|
<LI><A HREF=#nonnet>Non-Network I/O</A></LI>
|
||
|
<LI><A HREF=#timeouts>Timeouts</A></LI>
|
||
|
</UL>
|
||
|
</B>
|
||
|
<P>
|
||
|
<HR>
|
||
|
<P>
|
||
|
<A NAME="porting">
|
||
|
<H3>Porting</H3>
|
||
|
The State Threads library uses OS concepts that are available in some
|
||
|
form on most UNIX platforms, making the library very portable across
|
||
|
many flavors of UNIX. However, there are several parts of the library
|
||
|
that rely on platform-specific features. Here is the list of such parts:
|
||
|
<P>
|
||
|
<UL>
|
||
|
<LI><I>Thread context initialization</I>: Two ingredients of the
|
||
|
<TT>jmp_buf</TT>
|
||
|
data structure (the program counter and the stack pointer) have to be
|
||
|
manually set in the thread creation routine. The <TT>jmp_buf</TT> data
|
||
|
structure is defined in the <TT>setjmp.h</TT> header file and differs from
|
||
|
platform to platform. Usually the program counter is a structure member
|
||
|
with <TT>PC</TT> in the name and the stack pointer is a structure member
|
||
|
with <TT>SP</TT> in the name. One can also look in the
|
||
|
<A HREF="http://www.mozilla.org/source.html">Netscape's NSPR library source</A>
|
||
|
which already has this code for many UNIX-like platforms
|
||
|
(<TT>mozilla/nsprpub/pr/include/md/*.h</TT> files).
|
||
|
<P>
|
||
|
Note that on some BSD-derived platforms <TT>_setjmp(3)/_longjmp(3)</TT>
|
||
|
calls should be used instead of <TT>setjmp(3)/longjmp(3)</TT> (that is
|
||
|
the calls that manipulate only the stack and registers and do <I>not</I>
|
||
|
save and restore the process's signal mask).</LI>
|
||
|
<P>
|
||
|
Starting with glibc 2.4 on Linux the opacity of the <TT>jmp_buf</TT> data
|
||
|
structure is enforced by <TT>setjmp(3)/longjmp(3)</TT> so the
|
||
|
<TT>jmp_buf</TT> ingredients cannot be accessed directly anymore (unless
|
||
|
special environmental variable LD_POINTER_GUARD is set before application
|
||
|
execution). To avoid dependency on custom environment, the State Threads
|
||
|
library provides <TT>setjmp/longjmp</TT> replacement functions for
|
||
|
all Intel CPU architectures. Other CPU architectures can also be easily
|
||
|
supported (the <TT>setjmp/longjmp</TT> source code is widely available for
|
||
|
many CPU architectures).
|
||
|
<P>
|
||
|
<LI><I>High resolution time function</I>: Some platforms (IRIX, Solaris)
|
||
|
provide a high resolution time function based on the free running hardware
|
||
|
counter. This function returns the time counted since some arbitrary
|
||
|
moment in the past (usually machine power up time). It is not correlated in
|
||
|
any way to the time of day, and thus is not subject to resetting,
|
||
|
drifting, etc. This type of time is ideal for tasks where cheap, accurate
|
||
|
interval timing is required. If such a function is not available on a
|
||
|
particular platform, the <TT>gettimeofday(3)</TT> function can be used
|
||
|
(though on some platforms it involves a system call).
|
||
|
<P>
|
||
|
<LI><I>The stack growth direction</I>: The library needs to know whether the
|
||
|
stack grows toward lower (down) or higher (up) memory addresses.
|
||
|
One can write a simple test program that detects the stack growth direction
|
||
|
on a particular platform.</LI>
|
||
|
<P>
|
||
|
<LI><I>Non-blocking attribute inheritance</I>: On some platforms (e.g. IRIX)
|
||
|
the socket created as a result of the <TT>accept(2)</TT> call inherits the
|
||
|
non-blocking attribute of the listening socket. One needs to consult the manual
|
||
|
pages or write a simple test program to see if this applies to a specific
|
||
|
platform.</LI>
|
||
|
<P>
|
||
|
<LI><I>Anonymous memory mapping</I>: The library allocates memory segments
|
||
|
for thread stacks by doing anonymous memory mapping (<TT>mmap(2)</TT>). This
|
||
|
mapping is somewhat different on SVR4 and BSD4.3 derived platforms.
|
||
|
<P>
|
||
|
The memory mapping can be avoided altogether by using <TT>malloc(3)</TT> for
|
||
|
stack allocation. In this case the <TT>MALLOC_STACK</TT> macro should be
|
||
|
defined.</LI>
|
||
|
</UL>
|
||
|
<P>
|
||
|
All machine-dependent feature test macros should be defined in the
|
||
|
<TT>md.h</TT> header file. The assembly code for <TT>setjmp/longjmp</TT>
|
||
|
replacement functions for all CPU architectures should be placed in
|
||
|
the <TT>md.S</TT> file.
|
||
|
<P>
|
||
|
The current version of the library is ported to:
|
||
|
<UL>
|
||
|
<LI>IRIX 6.x (both 32 and 64 bit)</LI>
|
||
|
<LI>Linux (kernel 2.x and glibc 2.x) on x86, Alpha, MIPS and MIPSEL,
|
||
|
SPARC, ARM, PowerPC, 68k, HPPA, S390, IA-64, and Opteron (AMD-64)</LI>
|
||
|
<LI>Solaris 2.x (SunOS 5.x) on x86, AMD64, SPARC, and SPARC-64</LI>
|
||
|
<LI>AIX 4.x</LI>
|
||
|
<LI>HP-UX 11 (both 32 and 64 bit)</LI>
|
||
|
<LI>Tru64/OSF1</LI>
|
||
|
<LI>FreeBSD on x86, AMD64, and Alpha</LI>
|
||
|
<LI>OpenBSD on x86, AMD64, Alpha, and SPARC</LI>
|
||
|
<LI>NetBSD on x86, Alpha, SPARC, and VAX</LI>
|
||
|
<LI>MacOS X (Darwin) on PowerPC (32 bit) and Intel (both 32 and 64 bit) [universal]</LI>
|
||
|
<LI>Cygwin</LI>
|
||
|
</UL>
|
||
|
<P>
|
||
|
|
||
|
<A NAME="signals">
|
||
|
<H3>Signals</H3>
|
||
|
Signal handling in an application using State Threads should be treated the
|
||
|
same way as in a classical UNIX process application. There is no such
|
||
|
thing as per-thread signal mask, all threads share the same signal handlers,
|
||
|
and only asynchronous-safe functions can be used in signal handlers.
|
||
|
However, there is a way to process signals synchronously by converting a
|
||
|
signal event to an I/O event: a signal catching function does a write to
|
||
|
a pipe which will be processed synchronously by a dedicated signal handling
|
||
|
thread. The following code demonstrates this technique (error handling is
|
||
|
omitted for clarity):
|
||
|
<PRE>
|
||
|
|
||
|
/* Per-process pipe which is used as a signal queue. */
|
||
|
/* Up to PIPE_BUF/sizeof(int) signals can be queued up. */
|
||
|
int sig_pipe[2];
|
||
|
|
||
|
/* Signal catching function. */
|
||
|
/* Converts signal event to I/O event. */
|
||
|
void sig_catcher(int signo)
|
||
|
{
|
||
|
int err;
|
||
|
|
||
|
/* Save errno to restore it after the write() */
|
||
|
err = errno;
|
||
|
/* write() is reentrant/async-safe */
|
||
|
write(sig_pipe[1], &signo, sizeof(int));
|
||
|
errno = err;
|
||
|
}
|
||
|
|
||
|
/* Signal processing function. */
|
||
|
/* This is the "main" function of the signal processing thread. */
|
||
|
void *sig_process(void *arg)
|
||
|
{
|
||
|
st_netfd_t nfd;
|
||
|
int signo;
|
||
|
|
||
|
nfd = st_netfd_open(sig_pipe[0]);
|
||
|
|
||
|
for ( ; ; ) {
|
||
|
/* Read the next signal from the pipe */
|
||
|
st_read(nfd, &signo, sizeof(int), ST_UTIME_NO_TIMEOUT);
|
||
|
|
||
|
/* Process signal synchronously */
|
||
|
switch (signo) {
|
||
|
case SIGHUP:
|
||
|
/* do something here - reread config files, etc. */
|
||
|
break;
|
||
|
case SIGTERM:
|
||
|
/* do something here - cleanup, etc. */
|
||
|
break;
|
||
|
/* .
|
||
|
.
|
||
|
Other signals
|
||
|
.
|
||
|
.
|
||
|
*/
|
||
|
}
|
||
|
}
|
||
|
|
||
|
return NULL;
|
||
|
}
|
||
|
|
||
|
int main(int argc, char *argv[])
|
||
|
{
|
||
|
struct sigaction sa;
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
|
||
|
/* Create signal pipe */
|
||
|
pipe(sig_pipe);
|
||
|
|
||
|
/* Create signal processing thread */
|
||
|
st_thread_create(sig_process, NULL, 0, 0);
|
||
|
|
||
|
/* Install sig_catcher() as a signal handler */
|
||
|
sa.sa_handler = sig_catcher;
|
||
|
sigemptyset(&sa.sa_mask);
|
||
|
sa.sa_flags = 0;
|
||
|
sigaction(SIGHUP, &sa, NULL);
|
||
|
|
||
|
sa.sa_handler = sig_catcher;
|
||
|
sigemptyset(&sa.sa_mask);
|
||
|
sa.sa_flags = 0;
|
||
|
sigaction(SIGTERM, &sa, NULL);
|
||
|
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
|
||
|
}
|
||
|
|
||
|
</PRE>
|
||
|
<P>
|
||
|
Note that if multiple processes are used (see below), the signal pipe should
|
||
|
be initialized after the <TT>fork(2)</TT> call so that each process has its
|
||
|
own private pipe.
|
||
|
<P>
|
||
|
|
||
|
<A NAME="intra">
|
||
|
<H3>Intra-Process Synchronization</H3>
|
||
|
Due to the event-driven nature of the library scheduler, the thread context
|
||
|
switch (process state change) can only happen in a well-known set of
|
||
|
library functions. This set includes functions in which a thread may
|
||
|
"block":<TT> </TT>I/O functions (<TT>st_read(), st_write(), </TT>etc.),
|
||
|
sleep functions (<TT>st_sleep(), </TT>etc.), and thread synchronization
|
||
|
functions (<TT>st_thread_join(), st_cond_wait(), </TT>etc.). As a result,
|
||
|
process-specific global data need not to be protected by locks since a thread
|
||
|
cannot be rescheduled while in a critical section (and only one thread at a
|
||
|
time can access the same memory location). By the same token,
|
||
|
non thread-safe functions (in a traditional sense) can be safely used with
|
||
|
the State Threads. The library's mutex facilities are practically useless
|
||
|
for a correctly written application (no blocking functions in critical
|
||
|
section) and are provided mostly for completeness. This absence of locking
|
||
|
greatly simplifies an application design and provides a foundation for
|
||
|
scalability.
|
||
|
<P>
|
||
|
|
||
|
<A NAME="inter">
|
||
|
<H3>Inter-Process Synchronization</H3>
|
||
|
The State Threads library makes it possible to multiplex a large number
|
||
|
of simultaneous connections onto a much smaller number of separate
|
||
|
processes, where each process uses a many-to-one user-level threading
|
||
|
implementation (<B>N</B> of <B>M:1</B> mappings rather than one <B>M:N</B>
|
||
|
mapping used in native threading libraries on some platforms). This design
|
||
|
is key to the application's scalability. One can think about it as if a
|
||
|
set of all threads is partitioned into separate groups (processes) where
|
||
|
each group has a separate pool of resources (virtual address space, file
|
||
|
descriptors, etc.). An application designer has full control of how many
|
||
|
groups (processes) an application creates and what resources, if any,
|
||
|
are shared among different groups via standard UNIX inter-process
|
||
|
communication (IPC) facilities.<P>
|
||
|
There are several reasons for creating multiple processes:
|
||
|
<P>
|
||
|
<UL>
|
||
|
<LI>To take advantage of multiple hardware entities (CPUs, disks, etc.)
|
||
|
available in the system (hardware parallelism).</LI>
|
||
|
<P>
|
||
|
<LI>To reduce risk of losing a large number of user connections when one of
|
||
|
the processes crashes. For example, if <B>C</B> user connections (threads)
|
||
|
are multiplexed onto <B>P</B> processes and one of the processes crashes,
|
||
|
only a fraction (<B>C/P</B>) of all connections will be lost.</LI>
|
||
|
<P>
|
||
|
<LI>To overcome per-process resource limitations imposed by the OS. For
|
||
|
example, if <TT>select(2)</TT> is used for event polling, the number of
|
||
|
simultaneous connections (threads) per process is
|
||
|
limited by the <TT>FD_SETSIZE</TT> parameter (see <TT>select(2)</TT>).
|
||
|
If <TT>FD_SETSIZE</TT> is equal to 1024 and each connection needs one file
|
||
|
descriptor, then an application should create 10 processes to support 10,000
|
||
|
simultaneous connections.</LI>
|
||
|
</UL>
|
||
|
<P>
|
||
|
Ideally all user sessions are completely independent, so there is no need for
|
||
|
inter-process communication. It is always better to have several separate
|
||
|
smaller process-specific resources (e.g., data caches) than to have one large
|
||
|
resource shared (and modified) by all processes. Sometimes, however, there
|
||
|
is a need to share a common resource among different processes. In that case,
|
||
|
standard UNIX IPC facilities can be used. In addition to that, there is a way
|
||
|
to synchronize different processes so that only the thread accessing the
|
||
|
shared resource will be suspended (but not the entire process) if that resource
|
||
|
is unavailable. In the following code fragment a pipe is used as a counting
|
||
|
semaphore for inter-process synchronization:
|
||
|
<PRE>
|
||
|
#ifndef PIPE_BUF
|
||
|
#define PIPE_BUF 512 /* POSIX */
|
||
|
#endif
|
||
|
|
||
|
/* Semaphore data structure */
|
||
|
typedef struct ipc_sem {
|
||
|
st_netfd_t rdfd; /* read descriptor */
|
||
|
st_netfd_t wrfd; /* write descriptor */
|
||
|
} ipc_sem_t;
|
||
|
|
||
|
/* Create and initialize the semaphore. Should be called before fork(2). */
|
||
|
/* 'value' must be less than PIPE_BUF. */
|
||
|
/* If 'value' is 1, the semaphore works as mutex. */
|
||
|
ipc_sem_t *ipc_sem_create(int value)
|
||
|
{
|
||
|
ipc_sem_t *sem;
|
||
|
int p[2];
|
||
|
char b[PIPE_BUF];
|
||
|
|
||
|
/* Error checking is omitted for clarity */
|
||
|
sem = malloc(sizeof(ipc_sem_t));
|
||
|
|
||
|
/* Create the pipe */
|
||
|
pipe(p);
|
||
|
sem->rdfd = st_netfd_open(p[0]);
|
||
|
sem->wrfd = st_netfd_open(p[1]);
|
||
|
|
||
|
/* Initialize the semaphore: put 'value' bytes into the pipe */
|
||
|
write(p[1], b, value);
|
||
|
|
||
|
return sem;
|
||
|
}
|
||
|
|
||
|
/* Try to decrement the "value" of the semaphore. */
|
||
|
/* If "value" is 0, the calling thread blocks on the semaphore. */
|
||
|
int ipc_sem_wait(ipc_sem_t *sem)
|
||
|
{
|
||
|
char c;
|
||
|
|
||
|
/* Read one byte from the pipe */
|
||
|
if (st_read(sem->rdfd, &c, 1, ST_UTIME_NO_TIMEOUT) != 1)
|
||
|
return -1;
|
||
|
|
||
|
return 0;
|
||
|
}
|
||
|
|
||
|
/* Increment the "value" of the semaphore. */
|
||
|
int ipc_sem_post(ipc_sem_t *sem)
|
||
|
{
|
||
|
char c;
|
||
|
|
||
|
if (st_write(sem->wrfd, &c, 1, ST_UTIME_NO_TIMEOUT) != 1)
|
||
|
return -1;
|
||
|
|
||
|
return 0;
|
||
|
}
|
||
|
|
||
|
</PRE>
|
||
|
<P>
|
||
|
|
||
|
Generally, the following steps should be followed when writing an application
|
||
|
using the State Threads library:
|
||
|
<P>
|
||
|
<OL>
|
||
|
<LI>Initialize the library (<TT>st_init()</TT>).</LI>
|
||
|
<P>
|
||
|
<LI>Create resources that will be shared among different processes:
|
||
|
create and bind listening sockets, create shared memory segments, IPC
|
||
|
channels, synchronization primitives, etc.</LI>
|
||
|
<P>
|
||
|
<LI>Create several processes (<TT>fork(2)</TT>). The parent process should
|
||
|
either exit or become a "watchdog" (e.g., it starts a new process when
|
||
|
an existing one crashes, does a cleanup upon application termination,
|
||
|
etc.).</LI>
|
||
|
<P>
|
||
|
<LI>In each child process create a pool of threads
|
||
|
(<TT>st_thread_create()</TT>) to handle user connections.</LI>
|
||
|
</OL>
|
||
|
<P>
|
||
|
|
||
|
<A NAME="nonnet">
|
||
|
<H3>Non-Network I/O</H3>
|
||
|
|
||
|
The State Threads architecture uses non-blocking I/O on
|
||
|
<TT>st_netfd_t</TT> objects for concurrent processing of multiple user
|
||
|
connections. This architecture has a drawback: the entire process and
|
||
|
all its threads may block for the duration of a <I>disk</I> or other
|
||
|
non-network I/O operation, whether through State Threads I/O functions,
|
||
|
direct system calls, or standard I/O functions. (This is applicable
|
||
|
mostly to disk <I>reads</I>; disk <I>writes</I> are usually performed
|
||
|
asynchronously -- data goes to the buffer cache to be written to disk
|
||
|
later.) Fortunately, disk I/O (unlike network I/O) usually takes a
|
||
|
finite and predictable amount of time, but this may not be true for
|
||
|
special devices or user input devices (including stdin). Nevertheless,
|
||
|
such I/O reduces throughput of the system and increases response times.
|
||
|
There are several ways to design an application to overcome this
|
||
|
drawback:
|
||
|
|
||
|
<P>
|
||
|
<UL>
|
||
|
<LI>Create several identical main processes as described above (symmetric
|
||
|
architecture). This will improve CPU utilization and thus improve the
|
||
|
overall throughput of the system.</LI>
|
||
|
<P>
|
||
|
<LI>Create multiple "helper" processes in addition to the main process that
|
||
|
will handle blocking I/O operations (asymmetric architecture).
|
||
|
This approach was suggested for Web servers in a
|
||
|
<A HREF="http://www.cs.rice.edu/~vivek/flash99/">paper</A> by Peter
|
||
|
Druschel et al. In this architecture the main process communicates with
|
||
|
a helper process via an IPC channel (<TT>pipe(2), socketpair(2)</TT>).
|
||
|
The main process instructs a helper to perform the potentially blocking
|
||
|
operation. Once the operation completes, the helper returns a
|
||
|
notification via IPC.
|
||
|
</UL>
|
||
|
<P>
|
||
|
|
||
|
<A NAME="timeouts">
|
||
|
<H3>Timeouts</H3>
|
||
|
|
||
|
The <TT>timeout</TT> parameter to <TT>st_cond_timedwait()</TT> and the
|
||
|
I/O functions, and the arguments to <TT>st_sleep()</TT> and
|
||
|
<TT>st_usleep()</TT> specify a maximum time to wait <I>since the last
|
||
|
context switch</I> not since the beginning of the function call.
|
||
|
|
||
|
<P>The State Threads' time resolution is actually the time interval
|
||
|
between context switches. That time interval may be large in some
|
||
|
situations, for example, when a single thread does a lot of work
|
||
|
continuously. Note that a steady, uninterrupted stream of network I/O
|
||
|
qualifies for this description; a context switch occurs only when a
|
||
|
thread blocks.
|
||
|
|
||
|
<P>If a specified I/O timeout is less than the time interval between
|
||
|
context switches the function may return with a timeout error before
|
||
|
that amount of time has elapsed since the beginning of the function
|
||
|
call. For example, if eight milliseconds have passed since the last
|
||
|
context switch and an I/O function with a timeout of 10 milliseconds
|
||
|
blocks, causing a switch, the call may return with a timeout error as
|
||
|
little as two milliseconds after it was called. (On Linux,
|
||
|
<TT>select()</TT>'s timeout is an <I>upper</I> bound on the amount of
|
||
|
time elapsed before select returns.) Similarly, if 12 ms have passed
|
||
|
already, the function may return immediately.
|
||
|
|
||
|
<P>In almost all cases I/O timeouts should be used only for detecting a
|
||
|
broken network connection or for preventing a peer from holding an idle
|
||
|
connection for too long. Therefore for most applications realistic I/O
|
||
|
timeouts should be on the order of seconds. Furthermore, there's
|
||
|
probably no point in retrying operations that time out. Rather than
|
||
|
retrying simply use a larger timeout in the first place.
|
||
|
|
||
|
<P>The largest valid timeout value is platform-dependent and may be
|
||
|
significantly less than <TT>INT_MAX</TT> seconds for <TT>select()</TT>
|
||
|
or <TT>INT_MAX</TT> milliseconds for <TT>poll()</TT>. Generally, you
|
||
|
should not use timeouts exceeding several hours. Use
|
||
|
<tt>ST_UTIME_NO_TIMEOUT</tt> (<tt>-1</tt>) as a special value to
|
||
|
indicate infinite timeout or indefinite sleep. Use
|
||
|
<tt>ST_UTIME_NO_WAIT</tt> (<tt>0</tt>) to indicate no waiting at all.
|
||
|
|
||
|
<P>
|
||
|
<HR>
|
||
|
<P>
|
||
|
</BODY>
|
||
|
</HTML>
|
||
|
|