Email this Article
Printer-Friendly
Reader Comments
[Cover Feature]
RISE OF MULTIPROCESSING/ MULTITHREADING SHARPENS FOCUS ON INTERRUPTS
Using multiprocessing and multithreading architectures together helps generate higher performance in a range of applications.
Staff
ED Online ID #17861
October 11, 2007
by William E. Lamie, Express Logic Inc.
Potentially substantial performance
gains from the
use of multithreading and
multiprocessing architectures
have captured the attention
of designers of consumer devices
and other electronic products.
Multithreading uses cycles when
the processor would otherwise sit
idle to process instructions from
other threads. Multiprocessing,
on the other hand, introduces
additional independent processing
elements in order to execute
threads or applications concurrently.
Embedded applications
running in multiprocessor and
multithreading architectures,
just like those running in conventional
applications, require
interrupt service routines (ISRs)
to handle interrupts generated
by external events.
One key challenge for designers
implementing these new technologies
is avoiding the situation
where one thread is interrupted
while modifying a critical data
structure. As a result, a different
thread is able to make other
changes to the same structure.
Conventional applications overcome
this problem by briefly locking
out interrupts while an ISR or
system service modifies the crucial
data structures.
In a multithreaded or multiprocessing
application, this
approach isn’t sufficient because
of the potential for a switch to a
different thread context (TC), or
access by a different processing
element that’s not impeded by the
interrupt lockout. A more comprehensive
approach is required,
such as disabling multithreading
or halting other processing elements
while the data structure is
being modified.
IMPROVING PERFORMANCE
Manufacturers of consumer
devices and other embedded
computing products are eagerly
adding new features, such as Wi-
Fi, VoIP, Bluetooth, and video.
Historically, increased feature sets
have been accommodated by
ramping up the processor’s clock
speed. In the embedded space,
this approach rapidly loses viability
because most devices are
already running up against
power consumption and realestate
constraints that limit additional
processor speed increases.
Cycle-speed increases drive exponentially
greater power consumption,
making high cycle speeds
unmanageable for more and
more embedded applications.
In addition, processors are
already so much faster than
memory that more than half the
cycles in many applications are
spent waiting while the cache
line is refilled. Each time there’s
a cache miss or another condition
that requires off-chip memory
access, the processor needs to
load a cache line from memory,
write those words into the cache,
update the translation lookaside
buffer (TLB), write the old cache
line into memory, and resume
the thread. MIPS Technologies
stated that a high-end synthesizable
core taking 25 cache miss plausible value for multimedia
code) could be stalled
more than 50% of the time
if it must wait 50 cycles for
a cache fill.
MULTITHREADING
APPROACH
Multithreading solves this
problem by using the cycles
that the processor would
otherwise waste while waiting
for memory access. It
can then handle multiple
concurrent threads of program
execution. When one
thread stalls waiting for
memory, another thread
immediately presents itself
to the processor. This helps
keep computing resources
fully occupied.
Notably, conventional
processors can’t use this
approach because it takes
a large number of cycles to
switch the TC from one to
another. Multiple application
threads must be immediately
available and “ready-to-run” on
a cycle-by-cycle basis for this
approach to work. MIPS accommodates
this requirement through
its incorporation of multiple TCs,
each of which can retain the
context of a distinct application
thread (Fig. 1).
In a multithreaded environment
such as the MIPS 34K processor,
performance can be substantially
improved—when one thread
waits for a memory access,
another thread can use that
processor cycle that would
otherwise be wasted.
Figure 1 shows how multithreading
can speed up
an application. With just
Thread0 running, only five
out of 13 processor cycles
are used for instruction
execution and the rest are
spent waiting for the word
to be loaded into cache
from memory. In this case,
when using conventional
processing, the efficiency
is only 38%. Adding
Thread1 makes it possible
to use five additional
processor cycles that were
previously wasted. With
10 out of 13 processor
cycles now used, efficiency
improves to 77%, thus
providing a 100%
speedup over the base
case. By adding Thread2,
it becomes possible to fully load
the processor. Instructions are
able to be executed on 13 out of
13 cycles for 100% efficiency.
All told, this represents a 263%
speedup when compared to the
base case.
MULTIPROCESSING
APPROACH
Multiprocessing, on the other
hand, combines multiple processing
units (each capable of
running a separate concurrent
thread) into a single system.
Often, they’re combined onto on
a single die, as is the case in
ARM’s MPCore multiprocessor.
In the MPCore’s symmetric multiprocessing
(SMP) configuration,
the individual processor cores are
connected using a high-speed
bus. They share memory and
peripherals using a common bus
interface. Generally, the SMP system
runs a single instance of the
real-time operating system (RTOS)
that manages all “n” of the
processor cores. The RTOS
ensures that the n highest-priority
threads can run at any time.
The primary software challenge
in a multiprocessor system is partitioning
the design and adding
tasks. The primary hardware challenge
is finding the right infrastructure
to ensure high-bandwidth
communications among processors,
memory, and peripherals.
Continued on Page 2
<-- prev. page
[1]
2
next page -->
|