Home About us Products Services Contact us Bookmark
:: wikimiki.org ::
Multi-core

Multi-core

A multicore processor is a chip with more than one processing units (cores). Mostly, each unit can run multiple instructions at the same time and each has its own cache.

The limitations of single-processor architecture


- Narrow data bandwidth, big gap between CPU speed and memory speed: It is reported that on average, 75% of CPU time is wasted in waiting for memory access results.
- High frequencies have an upper size limit: a 100GHz chip at 0.01ns per clock cycle limits the chip size to 3mm due to the speed of light (~300mm/ns).
- Long pipeline introduces big penalty for mis-prediction/wrong speculation
- Higher energy consumption increases heat output and makes cooling hard
- Bad cost/performance ratio

Multicore architecture is a solution

A multicore architecture is actually a SMP implemented on a single VLSI circuit. The goal is to allow greater utilization of thread-level parallelism (TLP), especially for applications that lack sufficient instruction-level parallelism (ILP) to make good use of superscalar processors. Exploiting TLP at a chip level is usually called Chip-level multiprocessing (also known as CMP), or Chip-level multithreading (CMT) The characteristics of a CMP system
- A slow but wide approach: improve the throughput of the whole computer system.
  - Good for transaction processing, database and scientific computing applications.
  - No benefits for single application that cannot be parallelized (divided and run on several tasks or threads)
- Better data locality than regular multi-processor architectures
- Better communication between processing units
- Saves space, saves energy.
- Better cost/performance ratio than a single-core processor

Example multicore processors

The whole microprocessor industry is jumping into multicore today. The latest versions of most RISC architectures use CMP, including
- PA-RISC (PA-8800),
- IBM POWER (POWER4 and POWER5)
- Sun Microsystems SPARC (UltraSPARC IV, UltraSPARC T1)
- AMD Opteron (shipping in May of 2005)
- Intel Pentium D (shipping in May 2005)
- Cradle Technologies MDSP (CT3400) (CT3600)
- Cavium Networks OCTEON (CN3XXX) (CN3XXX) Other microprocessor families are also expected to use CMP in future versions.
- Intel's Itanium is expected to do so in the middle of 2006, with a release codenamed Montecito; then even more extensively in 2007 with a product codenamed Tukwila.

See also


- Dual-core
- CMP, SMP
- SMT Category:Computer architecture Category:Microprocessors ja:マルチコア

Processor

Processor can mean:
- A central processing unit of a computer.
- A digital signal processor (DSP).
- An analog audio processor, often used in studios and radio stations.
- A preprocessor, a kind of code generator.
- A person at a processing facility (such as for sorting mail) may also be called a processor.

Cache

For other uses, see Cache (disambiguation) or caché. In computer science, a cache (pronounced kăsh) is a collection of data duplicating original values stored elsewhere or computed earlier, where the original data are expensive (usually in terms of access time) to fetch or compute relative to reading the cache. Once the data are stored in the cache, future use can be made by accessing the cached copy rather than refetching or recomputing the original data, so that the average access time is lower. Caches have proven extremely effective in many areas of computing, because access patterns in typical computer applications have locality of reference. There are several sorts of locality, but we mainly mean that the same data are often used several times, with accesses that are close together in time, or that data near to each other are accessed close together in time.

Operation

locality of reference A cache is a pool of entries. Each entry has a datum, which is a copy of the datum in some backing store. Each entry also has a tag, which specifies the identity of the datum in the backing store of which the entry is a copy. When the cache client (a CPU, web browser, operating system) wishes to access a datum presumably in the backing store, it first checks the cache. If an entry can be found with a tag matching that of the desired datum, the datum in the entry is used instead. This situation is known as a cache hit. So, for example, a web browser program might check its local cache on disk to see if it has a local copy of the contents of a web page at a particular URL. In this example, the URL is the tag, and the contents of the web page is the datum. The percentage of accesses that result in cache hits is known as the hit rate or hit ratio of the cache. The alternative situation, when the cache is consulted and found not to contain a datum with the desired tag, is known as a cache miss. The datum fetched from the backing store during miss handling is usually inserted into the cache, ready for the next access. If the cache has limited storage, it may have to eject some other entry in order to make room. The heuristic used to select the entry to eject is known as the replacement policy. One popular replacement policy, LRU, replaces the least recently used entry. When a datum is written to the cache, it must at some point be written to the backing store as well. The timing of this write is controlled by what is known as the write policy. In a write-through cache, every write to the cache causes a write to the backing store. Alternatively, in a write-back cache, writes are not immediately mirrored to the store. Instead, the cache tracks which of its locations have been written over (these locations are marked dirty). The data in these locations is written back to the backing store when that data is evicted from the cache. For this reason, a miss in a write-back cache will often require two memory accesses to service. Data write-back may be triggered by other policies as well. The client may make many changes to a datum in the cache, and then explicitly notify the cache to write back the datum. The data in the backing store may be changed by entities other than the cache, in which case the copy in the cache may become out-of-date or stale. Alternatively, when the client updates the data in the cache, copies of that data in other caches will become stale. Communication protocols between the cache managers which keep the data consistent are known as coherency protocols.

Applications

CPU caches

Main article: CPU cache Small memories on or close to the CPU chip can be made faster than the much larger main memory. Most CPUs since the 1980s have used one or more caches, and modern general-purpose CPUs inside personal computers may have as many as half a dozen, each specialized to a different part of the problem of executing programs.

Disk buffer

(also known as disk cache or cache buffer) Hard disks have historically often been packaged with embedded computers used for control and interface protocols. Since the late 1980s, nearly all disks sold have these embedded computers and either an ATA, SCSI, or Fibre Channel interface. The embedded computer usually has some small amount of memory which it uses to store the bits going to and coming from the disk platter. The disk buffer is physically distinct from and is used differently than the page cache typically kept by the operating system in the computer's main memory. The disk buffer is controlled by the embedded computer in the disk drive, and the page cache is controlled by the computer to which that disk is attached. The disk buffer is usually quite small, 2 to 8 MB, and the page cache is generally all unused physical memory, which in a 2004 PC may be between 20 and 2000 MB. And while data in the page cache is reused multiple times, the data in the disk buffer is typically never reused. In this sense, the phrases disk cache and cache buffer are misnomers, and the embedded computer's memory is more appropriately called the disk buffer. The disk buffer has multiple uses:
- Readahead / readbehind: When executing a read from the disk, the disk arm moves the read/write head to (or near) the correct track, and after some settling time the read head begins to pick up bits. Usually, the first sectors to be read are not the ones that have been requested by the operating system. The disk's embedded computer typically saves these unrequested sectors in the disk buffer, in case the operating system requests them later.
- Speed matching: The speed of the disk's I/O interface to the computer almost never matches the speed at which the bits are transferred to and from the hard disk platter. The disk buffer is used so that both the I/O interface and the disk read/write head can operate at full speed.
- Write acceleration: The disk's embedded computer may signal the main computer that a disk write is complete immediately after receiving the write data, before the data are actually written to the platter. This early signal allows the main computer to continue working, but is somewhat dangerous because, if power is lost before the data are permanently fixed in the magnetic media, the data will be lost from the disk buffer, and the filesystem on the disk may be left in an inconsistent state. Write acceleration is controversial, and for this reason can usually be turned off. On some disks, this vulnerable period between signaling the write complete and fixing the data can be arbitrarily long, as the write can be deferred indefinitely by newly arriving requests. Write acceleration is very rarely used on database servers or other machines where the integrity of the data on the disks is very important. In some cases, write acceleration caching is done by a RAID controller, which uses a battery-backed memory system for caching data.
- Command queueing: Newer SATA and most SCSI disks can accept multiple commands while any one command is in operation. These commands are stored by the disk's embedded computer until they are completed. Should a read reference the data at the destination of a queued write, the write's data will be returned. Command queueing is different from write acceleration in that the main computer's operating system is notified when data are actually written onto the magnetic media. The OS can use this information to keep the filesystem consistent through rescheduled writes.

Other caches

CPU caches are generally managed entirely by hardware. Other caches are managed by a variety of software. The cache of disk sectors in main memory is usually managed by the operating system kernel or file system. The BIND DNS daemon caches a mapping of domain names to IP addresses, as does a resolver library. Write-through operation is common when operating over unreliable networks (like an ethernet LAN), because of the enormous complexity of the coherency protocol required between multiple write-back caches when communication is unreliable. For instance, web page caches and client-side network file system caches (like those in NFS or SMB) are typically read-only or write-through specifically to keep the network protocol simple and reliable. A cache of recently visited web pages can be managed by your Web browser. Some browsers are configured to use an external proxy web cache, a server program through which all web requests are routed so that it can cache frequently accessed pages for everyone in an organization. Many internet service providers use proxy caches to save bandwidth on frequently-accessed web pages. The Google search engine keeps a cached copy of each page it examines on the web. These copies are used by the Google indexing software, but they are also made available to Google users, in case the original page is unavailable. If you click on the "Cached" link in a Google search result, you will see the web page as it looked when Google indexed it. Another type of caching is storing computed results that will likely be needed again, or memoization. An example of this type of caching is ccache, a program that caches the output of the compilation to speed up the second-time compilation.

See also


- Cache algorithms
- Cache coloring
- CPU cache
- Web cache

External links

Category:Computer architecture Category:Computer hardware Category:Computer memory als:Cache ms:Cache ja:キャッシュ (コンピュータシステム)

Speed

:For alternate uses, see special education or speed (disambiguation). Speed (symbol: v) is the rate of motion, or equivalently the rate of change of position, expressed as distance d moved per unit of time t. Speed is a scalar quantity with dimensions distance/time; the equivalent vector quantity to speed is known as velocity. Speed is measured in the same physical units of measurement as velocity, but does not contain the element of direction that velocity has. Speed is thus the magnitude component of velocity. Units of speed include:
- metres per second, (symbol m/s), the SI derived unit
- kilometres per hour, (symbol km/h)
- miles per hour, (symbol mph)
- knots (nautical miles per hour, symbol kt)
- Mach, where Mach 1 is the speed of sound; Mach n is n times as fast. ::Mach 1 = ~343 m/s = ~1235 km/h = ~768 mi/h (see the speed of sound for more detail)
- speed of light in vacuum (symbol c) is one of the natural units ::c = 299,792,458 m/s
- [other important conversions] ::1 m/s = 3.6 km/h ::1 mph = 1.609 km/h ::1 knot = 1.852 km/h = 0.514 m/s Vehicles often have a speedometer to measure the speed. The rate of change of speed with respect to time is termed acceleration.

Average speed

Speed as a physical property represents primarily instantaneous speed. In real life we often use average speed (denoted \tilde), which is rate of total distance (or length) and time interval. For example, if you go 60 miles in 2 hours, your average speed during that time is 60/2 = 30 miles per hour, but your instantaneous speed may have varied. In mathematical notation: :\tilde = \frac. Instantaneous speed defined as a function of time on interval [t_0, t_1] gives average speed:
:\tilde = \frac while instant speed defined as a function of distance (or length) on interval [l_0, l_1] gives average speed:
:\tilde = \frac It is often intuitively expected that going half a distance with speed v_ and second half with speed v_, produce total average speed \tilde = \frac. The correct value is \tilde = \frac
(Note that the first is arithmetic mean while the second is harmonic mean).
Average speed can be derived also from speed distribution function (either in time or on distance): :v \sim D_t\; \Rightarrow \; \tilde = \int v D_t(v) \, dv :v \sim D_l\; \Rightarrow \; \tilde = \frac

Cultural significance

Speed or swiftness of motion plays a significant role in human culture, see racing. It is complementary to grace, precision and strength, e.g. in dancing or martial arts. Animals symbolizing speed are the horse (PIE
- ek'vos
is etymologically derived from
- ok'u-
"swift"), birds, especially raptors such as the hawk, and cats, e.g. the lynx (see e.g. Flos Duellatorum). The swiftest land animal is the cheetah, reaching running speeds of up to 110km/h.

See also


- Orders of magnitude (speed)
- Paul Virilio

External links


- [http://calc.skyrocket.de/en/ Online Unit Converter - Conversion of many different units]
-
Category:Physical quantity ko:속력 ja:速さ nb:Fart simple:Speed th:ความเร็ว

Memory

Memory is the ability to retain information, a faculty of the brain. Memory is studied extensively in the fields of cognitive psychology and neuroscience. There are several ways of classifying memories, based on duration, nature and retrieval of information. From an information processing perspective there are three main stages in the formation and retrieval of memory:
- Encoding (processing and combining of received information)
- Storage (creation of a permanent record of the encoded information)
- Retrieval/Recall (calling back the stored information in response to some cue for use in some process or activity)

Classification by duration

A basic and generally accepted classification of memory is based on the duration of memory retention, and identifies three distinct types of memory: sensory memory, short-term memory, and long-term memory. The sensory memory corresponds approximately to the initial moment that an item is perceived. Some of this information in the sensory area proceeds to the sensory store, which is referred to as short-term memory. Sensory memory is characterized by the duration of memory retention from milliseconds to seconds and short-term memory from seconds to minutes. These stores are generally characterised as of strictly limited capacity and duration, whereas in general stored information can be retrieved in a period of time which ranges from days to years; this type of memory is called long-term memory. It may be that short-term memory is supported by transient changes in neuronal communication, whereas long-term memories are maintained by more stable and permanent changes in neural structure that are dependent on protein synthesis. Some psychologists, however, argue that the distinction between long- and short-term memories is arbitrary, and is merely a reflection of differing levels of activation within a single store. If we are given a random seven-digit number, we may remember it only for a few seconds and then forget (short-term memory). On the other hand, we can remember telephone numbers for many years (assuming we use them often enough). Those long-lasting memories are said to be stored in long-term memory. Additionally, the term working memory is used to refer to the short-term store needed for certain mental tasks - it is not a synonym for short-term memory, since it is defined not in terms of duration, but rather in terms of purpose. Some theories consider working memory to be the combination of short-term memory and some attentional control. For instance, when we are asked to mentally multiply 45 by 4, we have to perform a series of simple calculations (additions and multiplications) to arrive at the final answer. The ability to store the information regarding the instructions and intermediate results is what is referred to as working memory.

Classification by information type

Long-term memory, the largest part of any model, can be divided into declarative (explicit) and procedural (implicit) memories. Declarative memory requires conscious recall, in that some conscious process must call back the information. It is sometimes called explicit memory, since it consists of information that is explicitly stored and retrieved. Declarative memory can be further sub-divided into semantic memory, which concerns facts taken independent of context; and episodic memory, which concerns information specific to a particular context, such as a time and place. Semantic memory allows the encoding of abstract knowledge about the world, such as "Paris is the capital of France". Episodic memory, on the other hand, is used for more personal memories, such as the sensations, emotions, and personal associations of a particular place or time. Autobiographical memory - memory for particular events within one's own life - is generally viewed as either equivalent to, or a subset of, episodic memory. Visual memory is part of memory preserving some characteristics of our senses pertaining to visual experience. We are able to place in memory information that resembles objects, places, animals or people in sort of a mental image. Visual memory can result in priming and it is assumed some kind of perceptual representational system or PRS underlies this phenomenon. [http://moodle.ed.uiuc.edu/wiked/index.php/Memory%2C_visual] In contrast, procedural memory (or implicit memory) is not based on the conscious recall of information, but on implicit learning. Procedural memory is primarily employed in learning motor skills and should be considered a subset of implicit memory. It is revealed when we do better in a given task due only to repetition - no new explicit memories have been formed, but we are unconsciously accessing aspects of those previous experiences. Procedural memory involved in motor learning depends on the cerebellum and basal ganglia. So far, nobody has successfully been able to isolate the time dependence of these suggested memory structures.

Classification by temporal direction

A further major way to distinguish different memory functions is whether the content to be remembered is in the past, retrospective memory, or whether the content is to be remembered in the future, prospective memory. Thus, retrospective memory as a category includes semantic memory and episodic/ autobiographical memory. In contrast, prospective memory is memory for future intentions, or remembering to remember (Winograd, 1988). Prospective memory can be further broken down into event- and time-based prospective remembering. Time-based prospective memories are triggered by a time-cue, such as going to the doctor (action) at 4pm (cue). Event-based prospective memories are intentions triggered by cues, such as remembering to post a letter (action) after seeing a mailbox (cue). Cues do not need to be related to the action (as the mailbox example is), and lists, sticky-notes, knotted hankerchiefs, or string around the finger (see box) are all examples of cues that are produced by people as a strategy to enhance prospective memory.

Physiology

prospective memory with the string.]] Overall, the mechanisms of memory are not well understood. Brain areas such as the hippocampus, the amygdala, or the mammillary bodies are thought to be involved in certain kinds of memory. For example, the hippocampus is believed to be involved in spatial learning and declarative learning. Damage to certain areas in patients and animal models and subsequent memory deficits is a primary source of information. However, rather than implicating a specific area, it could be that damage to adjacent areas, or to a pathway traveling through the area is actually responsible for the observed deficit. Further, it is not sufficient to describe memory, and its counterpart, learning, as solely dependent on specific brain regions. Learning and memory are attributed to changes in neuronal synapses, thought to be mediated by long-term potentiation and long-term depression.

Disorders

Much of the current knowledge of memory has come from studying memory disorders. Loss of memory is known as amnesia. There are many sorts of amnesia, and by studying their different forms, it has become possible to observe apparent defects in individual sub-systems of the brain's memory systems, and thus hypothesize their function in the normally working brain.

Artistic connections

Artworks often explore the nature of memory. The film Memento, about a man afflicted with anterograde amnesia, reflects on the nature and meaning of memory, and implications of its loss. The paintings of Howard Hodgkin, while apparently abstract, are said by the artist to be representations of his memories and their emotional associations. The late works of the 20th-century composer Morton Feldman explore the nature of memory and methods through which it can be disorientated. Several works of the Czech author Milan Kundera explore the nature of personal memory in relation to social or historical memory, especially the novels Ignorance, The Book of Laughter and Forgetting, and Immortality.

See also


- Amnesia
- Attention versus memory in prefrontal cortex
- Hebbian learning
- Long-term potentiation
- The Memory-Prediction Framework - an acclaimed "unifying theory of memory"
- Mnemonic
- Muscle memory or proprioception - the sense and memory of how parts of the body are trained to move
- Neural adaptation
- Synaptic plasticity
- List of memory biases

Further reading

Draaisma D (2005) Why Life Speeds Up As You Get Older: how memory shapes our past (CUP, Cambridge).

External links


- [http://www.sense-think-act.org Memory Exercises], a memory wiki
- [http://plato.stanford.edu/entries/memory/ Stanford Encyclopedia of Philosophy entry]
- [http://www.nlm.nih.gov/medlineplus/memory.html Memory-related resources] fron the National Institutes of Health. Category:Cognition
-
ja:記憶 simple:Memory

Symmetric multiprocessing

Symmetric Multiprocessing, or SMP, is a multiprocessor computer architecture where two or more identical processors are connected to a single shared main memory. Most common multiprocessor systems today use an SMP architecture. SMP systems allow any processor to work on any task no matter where the data for that task is located in memory; with proper operating system support, SMP systems can easily move tasks between processors to balance the work load efficiently. On the downside, memory is much slower than the processors accessing them, and even single-processor machines tend to spend a considerable amount of time waiting for data to arrive from memory. SMP makes this worse, as only one processor can access memory at a time; it is possible for several processors to be starved. SMP is only one style of multiprocessor machines; others include NUMA which dedicates different memory banks to different processors. This allows processors to access memory in parallel, which can dramatically improve memory throughput if the data is localized to specific processes (and thus processors). On the downside, NUMA makes the cost of moving data from one processor to another more expensive, meaning that balancing a workload is more expensive. The benefits of NUMA are limited to particular workloads, notably on servers where the data is often associated strongly with certain tasks or users. Other systems include asymmetric or asymmetrical multiprocessing (ASMP), in which separate specialized processors are used for specific tasks; and computer clustered multiprocessing (e.g. Beowulf), in which not all memory are available to all processors. The former is not widely used or supported (though the high-powered 3d chipsets in modern videocards could be considered a form of asymmetric multiprocessing), while the latter is used fairly extensively to build very large supercomputers. SMP has many uses in science, industry, and business where software is usually custom programmed for multithreaded processing. However, most consumer products such as word processors and computer games are not written in such a manner that they can gain large benefits from SMP systems. For games this is usually because writing a program to increase performance on SMP systems will produce a performance loss on uniprocessor systems, which comprise the largest percentage of the market. Due to the nature of the different programming methods, it would generally require a separate project to support both uniprocessor and SMP systems with maximum performance. Programs running on SMP systems do, however, experience a performance increase even when they have been written for uniprocessor systems. This is because hardware interrupts that usually suspend program execution while the kernel handles them can run on an idle processor instead. The effect in most applications (e.g. games) is not so much a performance increase as the appearance that the program is running much more smoothly. In some applications, particularly software compilers and some distributed computing projects, one will see an improvement by a factor of (nearly) the number of additional processors. This of course assumes that the OS supports SMP; if it does not, then the additional processors remain idle and the system functions as a uniprocessor system. In cases where many jobs are being processed in an SMP environment, administrators often experience a loss of hardware efficiency. Software programs have been developed to schedule jobs so that the processor utilization reaches its maximum potential. Good software packages can achieve this maximum potential by scheduling each CPU separately, as well as being able to integrate multiple SMP machines and clusters. One software package that can do this is [http://www.clusterresources.com/products/moabclustersuite.shtml Moab Cluster Suite], created by Cluster Resources, Inc. Entry level servers and workstations with two processors dominate the SMP market today. The most popular entry level SMP systems use the x86 instruction set architecture and are based on Intel’s Xeon processors or AMD’s Athlon MP, Athlon 64 X2 or Opteron 200 series processors. Other readily available non-x86 processor choices in the same market are the Sun Microsystems UltraSPARC, Fujitsu SPARC64, SGI MIPS, Intel Itanium, Hewlett Packard PA-RISC, Hewlett-Packard (formerly Compaq formerly Digital Equipment Corporation) DEC Alpha, IBM's POWER and Apple Computer PowerPC processors. In all cases, these systems are available in uniprocessor versions as well. Mid level servers using between four to eight processors can be found using the Intel Xeon MP, AMD Opteron 800 series and the above mentioned UltraSPARC, SPARC64, MIPS, Itanium, PA-RISC, Alpha and POWER processors. High end systems with sixteen or more processors are also available with all of the above processors. SMP had been around for some years in the RISC market before penetrating the x86 market; where with the exception of a few rare 486 systems the x86 SMP market began with the Intel Pentium processor supporting up to two processors, and later the Intel Pentium Pro expanded SMP support with up to four processors natively, and systems with as many as two thousand Pentium Pro processors were built. Later the Intel Pentium II, Intel Pentium III and AMD Athlon MP processors could be used with up to two processors in a system, and Intel Pentium II Xeon and Intel Pentium III Xeon processors could be used with up to four processors in a system natively, and although several much larger were built they were all limited by the physical memory addressing limitation of 64 GiB. With the introduction of 64 bit memory addressing on the AMD64 and EM64T capable processors released recently this allows systems to address much larger amounts of memory so that we will not reach their addressable limitation of 16 EiB in the foreseeable future.

See also


- Non-Uniform Memory Access Category:Parallel computing ja:対称型マルチプロセッサ

Thread-level parallelism

Thread-level parallelism (TLP) is the parallelism inherent in an application that runs multiple threads at once. This type of parallelism is found largely in applications written for commercial servers such as databases. By running many threads at once, these applications are able to tolerate the high amounts of I/O and memory system latency their workloads can incur. Category:Parallel computing

Superscalar

A superscalar CPU architecture implements a form of parallelism on a single chip, thereby allowing the system as a whole to run much faster than it would otherwise be able to at a given clock speed. The simplest processors are scalar processors. A scalar processor processes one data item at a time. In a vector processor, by contrast, a single instruction operates simultaneously on multiple data items. The difference is analogous to the difference between scalar and vector arithmetic. A superscalar processor is sort of a mixture of the two. Each instruction processes one data item, but there are multiple processing units so that multiple instructions can be processing separate data items at the same time. A superscalar processor usually sustains an execution rate in excess of one instruction per machine cycle. But just processing multiple instructions at the same time does not make an architecture superscalar. Simple pipelining, where a CPU may be loading an instruction while doing arithmetic for the previous one and storing the results from the one before that (thus executing 3 instructions at the same time) is not superscalar processing. In a superscalar CPU, there are several functional units of the same type, along with additional circuitry to dispatch instructions to the units. For instance, most superscalar designs include more than one integer unit (typically referred to as an ALU). The dispatcher reads instructions from memory and decides which ones can be run in parallel, dispatching them to the two units. Performance of the dispatcher is key to the overall performance of a superscalar design. The task is not a simple one. The instructions a = b + c; d = e + f can be run in parallel because none of the results depend on other calculations. However, the instructions a = b + c; d = a + f may or may not be able to run in parallel, depending on the order in which the instructions complete as they move through the units. Much of modern CPU design is dedicated to increasing the accuracy of the dispatcher system, and allowing it to keep the multiple units in use at all times. This has become increasingly important as the number of units has increased. While early superscalar CPUs would have two ALUs and a single FPU, a modern design like the PowerPC 970 include four ALUs and two FPUs, as well as two SIMD units. If the dispatcher is ineffective at keeping all of these units fed with instructions, the performance of the system as a whole will suffer greatly. Seymour Cray's CDC 6600 from 1965 is often mentioned as the first superscalar design, and RISC CPUs took the concept to micro computers. This was because the RISC design results in a simple core, allowing several of them to be built onto a single CPU. This was the reason that RISC designs were faster than CISC designs through the 1980s and into the 1990s, but as the chip manufacturing processes improved, even "complex" designs like the IA-32 were able to go superscalar. Essentially all general purpose CPUs developed since about 1998 are superscalar. Dramatic improvements in the quality of the control unit now appear unlikely, limiting future improvements in speed of the basic superscalar design. One potential solution to this problem is to move the dispatcher logic out of the chip and into the compiler, which can spend considerably more time and effort on making the best decisions possible. This is the basic premise of very long instruction word (VLIW) CPU designs, which is also known as static superscalar or compile time scheduling.

See also


- Super-threading
- Simultaneous multithreading
- Speculative execution/Eager execution

External links


- http://www.cs.clemson.edu/~mark/eager.html Category:Computing Category:Parallel computing Category:Computer architecture ja:スーパースケーラ

Chip-level multiprocessing

Chip-level multiprocessing (also known as, Chip multiprocessor CMP) is SMP implemented on a single VLSI integrated circuit. Multiple processor cores (multicore) typically share a common second- or third-level cache and interconnect. The goal of a CMP system is to allow greater utilization of thread-level parallelism (TLP), especially for applications that lack sufficient instruction-level parallelism (ILP) to make good use of superscalar processors. The latest versions of most RISC architectures use CMP, including PA-RISC (PA-8800), IBM POWER (POWER4 and POWER5), and SPARC (UltraSPARC IV). Other microprocessor families are also expected to use CMP in future versions. Intel's Itanium is expected to do so in the second half of 2005, with a release codenamed Montecito; then even more extensively in 2007 with a product codenamed Tukwila. Intel's Pentium is expected to start using CMP in 2006, while AMD's Opteron is expected to incorporate the technique in mid-2005. There is some controversy as to whether multiple cores on a chip is the same thing as multiple processors. Major technology providers are divided on this issue. IBM considers its dual-core POWER4 and POWER5 to be two processors, just packaged together. Sun Microsystems, in contrast, considers its UltraSPARC IV to be a multi-threaded rather than multi-processor chip. Intel considers their CMP designs to be a single processor. This is not an idle debate, because software is often more expensive when licensed for more processors. In October 2004, Microsoft announced that it would treat multicore chips as "1 processor" rather than "N processor" chips for purposes of licensing its server software. Microsoft made no specific declaration regarding its desktop software (including its enormously popular Microsoft Windows and Microsoft Office products), but presumably it would license them similarly. While Microsoft's decision is the opposite of that made by important competitors such as Oracle Corporation and IBM's Software Group, many other Independent software vendors have yet to declare their approach; many are likely to follow Microsoft's lead, given its bellwether status. However a processor manufacturer, software vendor, or user would like to treat a CMP chip for licensing and marketing purposes, symmetric multiprocessing (SMP) operating systems are required to take full advantage of CMP chips. That is, the OS—the entity responsible for scheduling and coordinating system resources—must do the exact same work for CMP chips as it does for discrete multiple processors.

See also


- SMT, a complementary technique.
- multicore processors Category:Parallel computing

Computer system

A computer system consists of a set of hardware and software which processes data in a meaningful way. The personal computer or PC exemplifies a relatively simple computer system. The Internet exemplifies a relatively complex computer system. A computer is a machine that processes data by giving information, and it performs certain operations on the given data and presents the results back. Even the simplest computer classifies as a computer system, because at least two components (hardware and software) have to work together. But the real meaning of "computer system" comes with interconnection. Many computer systems can interconnect, that is, join to become a bigger system. Interconnecting computer systems can prove difficult due to incompatibilities, sometimes between differing hardware and sometimes between different software suites. Designers of individual different computer systems do not necessarily aim to interconnect their product with any other system. But systems administrators can often configure even disparate computers to communicate using a set of rules and constraints known as protocols; these precisely define the "outside view" of the system. This outside view effectively defines the way one system connects with another. If two systems define the same "outside view", they can interconnect and become a larger computer system. This "outside view" usually comes in the form of a standard, that is, a document explaining all of the rules a device or a program must follow. International bodies such as the IETF or IEEE normally set up or endorse such standards. If an individual system obeys all of the rules, systems designers say it "complies with" the standard.

See also


- Computer
- IETF
- IEEE standards
- Legacy system
- Embedded system
-
ms:Sistem komputer

Database

A database is an organized collection of data. The term originated within the computer industry, but its meaning has been broadened by popular use, to the extent that the European Database Directive (which creates intellectual property rights for databases) includes non-electronic databases within its definition. This article is confined to a more technical use of the term; though even amongst computing professionals, some attach a much wider meaning to the word than others. One possible definition is that a database is a collection of records stored in a computer in a systematic way, such that a computer program can consult it to answer questions. For better retrieval and sorting, each record is usually organized as a set of data elements (facts). The items retrieved in answer to queries become information that can be used to make decisions. The computer program used to manage and query a database is known as a database management system (DBMS). The properties and design of database systems are included in the study of information science. The central concept of a database is that of a collection of records, or pieces of knowledge. Typically, for a given database, there is a structural description of the type of facts held in that database: this description is known as a schema. The schema describes the objects that are represented in the database, and the relationships among them. There are a number of different ways of organizing a schema, that is, of modelling the database structure: these are known as database models (or data models). The model in most common use today is the relational model, which in layman's terms represents all information in the form of multiple related tables each consisting of rows and columns (the true definition uses mathematical terminology). This model represents relationships by the use of values common to more than one table. Other models such as the hierarchical model and the network model use a more explicit representation of relationships. Strictly speaking, the term database refers to the collection of related records, and the software should be referred to as the database management system or DBMS. When the context is unambiguous, however, many database administrators and programmers use the term database to cover both meanings. Many professionals would consider a collection of data to constitute a database only if it has certain properties: for example, if the data is managed to ensure its integrity and quality, if it allows shared access by a community of users, if it has a schema, or if it supports a query language. However, there is no agreed definition of these properties. Database management systems are usually categorized according to the data model that they support: relational, object-relational, network, and so on. The data model will tend to determine the query languages that are available to access the database. A great deal of the internal engineering of a DBMS, however, is independent of the data model, and is concerned with managing factors such as performance, concurrency, integrity, and recovery from hardware failures. In these areas there are large differences between products.

History

The earliest known use of the term data base was in June 1963, when the System Development Corporation sponsored a symposium under the title Development and Management of a Computer-centered Data Base. Database as a single word became common in Europe in the early 1970s and by the end of the decade it was being used in major American newspapers. (Databank, a comparable term, had been used in the Washington Post newspaper as early as 1966.) The first database management systems were developed in the 1960s. A pioneer in the field was Charles Bachman. Bachman's early papers show that his aim was to make more effective use of the new direct access storage devices becoming available: until then, data processing had been based on punched cards and magnetic tape, so that serial processing was the dominant activity. Two key data models arose at this time: CODASYL developed the network model based on Bachman's ideas, and (apparently independently) the hierarchical model was used in a system developed by North American Rockwell, later adopted by IBM as the cornerstone of their IMS product. The relational model was proposed by E. F. Codd in 1970. He criticized existing models for confusing the abstract description of information structure with descriptions of physical access mechanisms. For a long while, however, the relational model remained of academic interest only. While CODASYL systems and IMS were conceived as practical engineering solutions taking account of the technology as it existed at the time, the relational model took a much more theoretical perspective, arguing (correctly) that hardware and software technology would catch up in time. Among the first implementations were Michael Stonebraker's Ingres at Berkeley, and the System R project at IBM. Both of these were research prototypes, announced during 1976. The first commercial products, Oracle and DB2, did not appear until around 1980. The first successful database product for microcomputers was dBASE for the CP/M and PC-DOS/MS-DOS operating systems. During the 1980s, research activity focused on distributed database systems and database machines, but these developments had little effect on the market. Another important theoretical idea was the Functional Data Model, but apart from some specialized applications in genetics, molecular biology, and fraud investigation, the world took little notice. In the 1990s, attention shifted to object-oriented databases. These had some success in fields where it was necessary to handle more complex data than relational systems could easily cope with, such as spatial databases, engineering data (including software engineering repositories,) and multimedia data. Some of these ideas were adopted by the relational vendors, who integrated new features into their products as a result; the independent object database vendors largely disappeared from the scene. In the 2000s, the fashionable area for innovation is the XML database. As with object databases, this has spawned a new collection of startup companies, but at the same time the key ideas are being integrated into the established relational products. XML databases aim to remove the traditional divide between documents and data, allowing all of an organization's information resources to be held in one place, whether they are highly structured or not.

Database models

Various techniques are used to model data structure. Most database systems are built around one particular data model, although it is increasingly common for products to offer support for more than one model. For any one logical model various physical implementations may be possible, and most products will offer the user some level of control in tuning the physical implementation, since the choices that are made have a significant effect on performance. An example of this is the relational model: all serious implementations of the relational model allow the creation of indexes which provide fast access to rows in a table if the values of certain columns are known. A data model is not just a way of structuring data: it also defines a set of operations that can be performed on the data. The relational model, for example, defines operations such as selection, projection, and join. Although these operations may not be explicit in a particular query language, they provide the foundation on which a query language is built.

Flat model

Some would disagree that this qualifies as a data model, as defined above. The flat (or table) model consists of a single, two-dimensional array of data elements, where all members of a given column are assumed to be similar values, and all members of a row are assumed to be related to one another. For instance, columns for name and password that might be used as a part of a system security database. Each row would have the specific password associated with an individual user. Columns of the table often have a type associated with them, defining them as character data, date or time information, integers, or floating point numbers. This model is, incidentally, a basis of the spreadsheet.

Network model

The network model (defined by the CODASYL specification) organizes data using two fundamental constructs, called records and sets. Records contain fields (which may be organized hierarchically, as in COBOL). Sets (not to be confused with mathematical sets) define one-to-many relationships between records: one owner, many members. A record may be an owner in any number of sets, and a member in any number of sets. The operations of the network model are navigational in style: a program maintains a current position, and navigates from one record to another by following the relationships in which the record participates. Records can also be located by supplying key values. Although it is not an essential feature of the model, network databases generally implement the set relationships by means of pointers that directly address the location of a record on disk. This gives excellent retrieval performance, at the expense of operations such as database loading and reorganization.

Relational model

The relational model was introduced in an [http://www.acm.org/classics/nov95/toc.html academic paper] by E. F. Codd in 1970 as a way to make database management systems more independent of any particular application. It is a mathematical model defined in terms of predicate logic and set theory. The products that are generally referred to as relational databases (for example, Ingres, Oracle, DB2, and SQL Server) in fact implement a model that is only an approximation to the mathematical model defined by Codd. The data structures in these products are tables, rather than relations: the main differences being that tables can contain duplicate rows, and that the rows (and columns) can be treated as being ordered. The same criticism applies to the SQL language which is the primary interface to these products. There has been considerable controversy, mainly due to Codd himself, as to whether it is correct to describe SQL implementations as "relational": but the fact is that the world does so, and the following description uses the term in its popular sense. A relational database contains multiple tables, each similar to the one in the "flat" database model. Relationships between tables are not defined explicitly; instead, keys are used to match up rows of data in different tables. A key is a collection of one or more columns in one table whose values match corresponding columns in other tables: for example, an Employee table may contain a column named Location which contains a value that matches the key of a Location table. Any column can be a key, or multiple columns can be grouped together into a single key. It is not necessary to define all the keys in advance; a column can be used as a key even if it was not originally intended to be one. A key that can be used to uniquely identify a row in a table is called a unique key. Typically one of the unique keys is the preferred way to refer to row; this is defined as the table's primary key. A key that has an external, real-world meaning (such as a person's name, a book's ISBN, or a car's serial number), is sometimes called a "natural" key. If no natural key is suitable (think of the many people named Brown), an arbitrary key can be assigned (such as by giving employees ID numbers). In practice, most databases have both generated and natural keys, because generated keys can be used internally to create links between rows that cannot break, while natural keys can be used, less reliably, for searches and for integration with other databases. (For example, records in two independently developed databases could be matched up by social security number, except when the social security numbers are incorrect, missing, or have changed.)

Relational operations

Users (or programs) request data from a relational database by sending it a query that is written in a special language, usually a dialect of SQL. Although SQL was originally intended for end-users, it is much more common for SQL queries to be embedded into software that provides an easier user interface. (Many web sites — including MediaWiki which is the engine that runs Wikipedia — perform SQL queries when generating pages.) In response to a query, the database returns a result set, which is just a list of rows containing the answers. The simplest query is just to return all the rows from a table, but more often, the rows are filtered in some way to return just the answer wanted. Often, data from multiple tables gets combined into one, by doing a join. Conceptually, this is done by taking all possible combinations of rows (the "cross-product"), and then filtering out everything except the answer. In practice, relational database management systems rewrite ("optimize") queries to perform faster, using a variety of techniques. The flexibility of relational databases allows programmers to write queries that were not anticipated by the database designers. As a result, relational databases can be used by multiple applications in ways the original designers did not foresee, which is especially important for databases that might be used for decades. This has made the idea and implementation of relational databases very popular with businesses.

Dimensional model

The dimensional model is a specialized adaptation of the relational model used to represent data in data warehouses in a way that data can be easily summarized using OLAP queries. In the dimensional model, a database consists of a single large table of facts that are described using dimensions and measures. A dimension provides the context of a fact (such as who participated, when and where it happened, and its type) and is used in queries to group related facts together. Dimensions tend to be discrete and are often hierarchical; for example, the location might include the building, state, and country. A measure is a quantity describing the fact, such as revenue. It's important that measures can be meaningfully aggregated - for example, the revenue from different locations can be added together. In an OLAP query, dimensions are chosen and the facts are grouped and added together to create a summary. The dimensional model is often implemented on top of the relational model using a star schema, consisting of one table containing the facts and surrounding tables containing the dimensions. Particularly complicated dimensions might be represented using multiple tables, resulting in a snowflake schema. A data warehouse can contain multiple star schemas that share dimension tables, allowing them to be used together. Coming up with a standard set of dimensions is an important part of dimensional modeling.

Object database models

In recent years, the object-oriented paradigm has been applied to database technology, creating a new programming model known as object databases. These databases attempt to bring the database world and the application programming world closer together, in particular by ensuring that the database uses the same type system as the application program. This aims to avoid the overhead (sometimes referred to as the impedance mismatch) of converting information between its representation in the database (for example as rows in tables) and its representation in the application program (typically as objects). At the same time object databases attempt to introduce the key ideas of object programming, such as encapsulation and polymorphism, into the world of databases. A variety of ways have been tried for storing objects in a database. Some products have approached the problem from the application programming end, by making the objects manipulated by the program persistent. This also typically requires the addition of some kind of query language, since conventional programming languages do not have the ability to find objects based on their information content. Others have attacked the problem from the database end, by defining an object-oriented data model for the database, and defining a database programming language that allows full programming capabalities as well as traditional query facilities. Object databases suffered because of a lack of standardization: although standards were defined by ODMG, they were never implemented well enough to ensure interoperability between products. Nevertheless, they have been used successfully in many applications: usually specialized applications such as engineering databases or molecular biology databases rather than mainstream commercial data processing. However, object database ideas were picked up by the relational vendors and influenced extensions made to these products and indeed to the SQL language.

Database Internals

Indexing

All of these kinds of database can take advantage of indexing to increase their speed, and this technology has advanced tremendously since its early uses in the 1960s and 1970s. The most common kind of index is a sorted list of the contents of some particular table column, with pointers to the row associated with the value. An index allows a set of table rows matching some criterion to be located quickly. Various methods of indexing are commonly used; B-trees, hashes, and linked lists are all common indexing techniques. Relational DBMSs have the advantage that indices can be created or dropped without changing existing applications, the application which indices to use. The database chooses between many different strategies based on which one it estimates will run the fastest. Relational DBMSs utilize many different algorithms to compute the result of an SQL statement. The RDBMs will produce a plan of how to execute the query, which is generated by analysing the run times of the different algorithms and selecting the quickest. Some of the key algorithms that deal with joins are Nested Loops Join, Sort-Merge Join and Hash Jo

Transactions and concurrency

In addition to their data model, most practical databases ("transactional databases") attempt to enforce a [[database transaction]] model that has desirable data integrity properties. Ideally, the database software should enforce the [[ACID
rules, summarized here:
- Atomicity - Either all the tasks in a transaction must be done, or none of them. The transaction must be completed, or else it must be undone (rolled back).
- Consistency - Every transaction must preserve the integrity constraints -- the declared consistency rules -- of the database. It cannot place the data in a contradictory state.
- Isolation - Two simultaneous transactions cannot interfere with one another. Intermediate results within a transaction are not visible to other transactions.
- Durability - Completed transactions cannot be aborted later or their results discarded. They must persist through (for instance) restarts of the DBMS after crashes. In practice, many DBMS's allow most of these rules to be selectively relaxed for better performance. Concurrency control is a method used to ensure that transactions are executed in a safe manner and follow the ACID rules. The DBMS must be able to ensure that only serializable, recoverable schedules are allowed, and that no actions of committed transactions are lost while undoing aborted transactions.

Replication

Replication of databases is closely related to transactions. If a database can log its individual actions, it is possible to create a duplicate of the data in realtime. The duplicate can be used to improve Performance or Availability of the whole database system. Common replication concepts include:
- Master/Slave Replication: All write requests are performed on the master and then replicated to the slaves
- Quorum: The result of Read and Write requests is calculated by quering a "majority" of replicas.
- Multimaster: Two or more replicas sync each other via a transaction identifier.

Applications of databases

Databases are used in many applications, spanning virtually the entire range of computer software. Databases are the preferred method of storage for large multiuser applications, where coordination between many users is needed. Even individual users find them convenient, though, and many electronic mail programs and personal organizers are based on standard database technology. Software database drivers are available for most database platforms so that application software can use a common application programming interface (API) to retrieve the information stored in a database. Two commonly used database APIs are JDBC and ODBC.

Common Database Brands

(In alphabetical order)
- 4D
- Corel Paradox
- DB2
- FileMaker Pro
- FirebirdSQL
- Informix
- MS Access
- MS SQL Server
- MySQL
- Oracle
- PostgreSQL
- Sybase SQL Server

See also


- Client-Server
- Database dump
- Database management system
- Data Manipulation Language
- Database normalization
- Databases in the United Kingdom
- Deadlock
- Deductive database
- Dimensional database
- Distributed database
- Entity-relationship model
- Flat file database
- Hierarchic Database
- Key field
- Main Memory database
- MUMPS
- Multidimensional hierarchical toolkit
- Multidimensional database
- OLAP
- Recordset : dynaset, snapshot
- Relational model
- SQL (Structured Query Language)
- Object database
- Important publications in databases
- Redundancy (databases)
- Software engineering and List of software engineering topics
- Temporal database
- Very large database

References


- The Codasyl Approach to Data Base Management. T. William Olle. Wiley, 1978. ISBN 0471995797
- Readings in Database Systems. Michael Stonebraker (ed). Morgan Kaufmann, 1988. (A collection of the most influential early papers on database technology from 1969 to 1988, with a preface analyzing their impact.)
- CNET News.com article, [http://news.com.com/2100-7344_3-5171543.html?part=rss&tag=feed&subj=news Study: Open-source databases going mainstream]
- [http://www.sprog.asb.dk/sn/lexicographicalbasis.htm Sandro Nielsen: Lexicographical Basis for an Electronic Bilingual Accounting Dictionary: Theoretical Considerations]
- [http://sourceforge.net/softwaremap/trove_list.php?form_cat=66 Database @ sourceforge.net]
- [http://www.geocities.com/mailsoftware42/db/index.html Open Source database comparison] Category:Information technology Category:Data_management Category:Digital Revolution ko:%EB%8D%B0%EC%9D%B4%ED%84%B0%EB%B2%A0%EC%9D%B4%EC%8A%A4 ja:データベース th:ฐานข้อมูล

RISC

:This article is about computer architecture; for use of the acronym in biology, see RNA-induced silencing complex. Reduced Instruction Set Computer (RISC), is a microprocessor CPU design philosophy that favors a smaller and simpler set of instructions that all take about the same amount of time to execute. The most common RISC microprocessors are ARM, DEC Alpha, PA-RISC, SPARC, MIPS, and IBM PowerPC. The idea was inspired by the discovery that many of the features that were included in traditional CPU designs to facilitate coding were being ignored by the programs that were running on them. Also these more complex features took several processor cycles to be performed. In addition, the speed of the CPU in relation to the memory it accessed was increasing. This led to a number of techniques to streamline processing within the CPU, while at the same time attempting to reduce the total number of memory accesses.

RISC design philosophy

In the late 1970s research at IBM (and similar projects elsewhere) demonstrated that the majority of these "orthogonal" addressing modes were ignored by most programs. This was a side effect of the increasing use of compilers to generate the programs, as opposed to writing them in assembly language. The compilers in use at the time only had a limited ability to take advantage of the features provided by CISC CPUs; this was largely a result of the difficulty of writing a compiler. The market was clearly moving to even wider use of compilers, diluting the usefulness of these orthogonal modes even more. Another discovery was that since these operations were rarely used, in fact they tended to be slower than a number of smaller operations doing the same thing. This seeming paradox was a side effect of the time spent designing the CPUs, designers simply did not have time to tune every possible instruction, and instead tuned only the most used ones. One famous example of this was the VAX's INDEX instruction, which ran slower than a loop implementing the same code. At about the same time CPUs started to run even faster than the memory they talked to. Even in the late 1970s it was apparent that this disparity was going to continue to grow for at least the next decade, by which time the CPU would be tens to hundreds of times faster than the memory. It became apparent that more registers (and later caches) would be needed to support these higher operating frequencies. These additional registers and cache memories would require sizeable chip or board areas that could be made available if the complexity of the CPU was reduced. Yet another part of RISC design came from practical measurements on real-world programs. Andrew Tanenbaum summed up many of these, demonstrating that most processors were vastly overdesigned. For instance, he showed that 98% of all the constants in a program would fit in 13 bits, yet almost every CPU design dedicated some multiple of 8 bits to storing them, typically 8, 16 or 32, one entire word. Taking this fact into account suggests that a machine should allow for constants to be stored in unused bits of the instruction itself, decreasing the number of memory accesses. Instead of loading up numbers from memory or registers, they would be "right there" when the CPU needed them, and therefore much faster. However this required the instruction itself to be very small, otherwise there would not be enough room left over in the 32-bits to hold reasonably sized constants. It was the small number of addressing modes and commands that resulted in the term Reduced Instruction Set. This is not an accurate terminology, as RISC designs often have huge command sets of their own. The real difference is the philosophy of doing everything in registers and loading and saving the data to and from them. This is why the design is more properly referred to as load-store. Over time the older design technique became known as Complex Instruction Set Computer, or CISC, although this was largely to give them a different name for comparison purposes. Thus the RISC philosophy was to make smaller instructions, implying fewer of them, and thus the name "reduced instruction set". Code was implemented as a series of these simple instructions, instead of a single complex instruction that had the same result. This had the side effect of leaving more room in the instruction to carry data with it, meaning that there was less need to use registers or memory. At the same time the memory interface was considerably simpler, allowing it to be tuned. However RISC also had its drawbacks. Since a series of instructions is needed to complete even simple tasks, the total number of instructions read from memory is larger, and therefore takes longer. At the time it was not clear whether or not there would be a net gain in performance due to this limitation, and there was an almost continual battle in the press and design world about the RISC concepts.

Pre-RISC design philosophy

In the early days of the computer industry, compiler technology did not exist. Programming was done in either machine code or assembly language. To make programming easier, computer architects created more and more complex instructions which were direct representations of high level functions of high level programming languages. The attitude at the time was that hardware design was easier than compiler design, so the complexity went into the hardware. Another force that encouraged complex instructions was the lack of large memories. Since memories were small, it was advantageous for the density of information held in computer programs to be very high. When every byte of memory was precious, for example one's entire system only had a few kilobytes of storage, it moved the industry to such features as highly encoded instructions, instructions which could be variable sized, instructions which did multiple operations and instructions which did both data movement and data calculation. At that time, such instruction packing issues were of higher priority than the ease of decoding such instructions. Memory was not only small, but rather slow since they were implemented using magnetic technology at the time. That was another reason to keep the density of information very high. By having dense information packing, one could decrease the frequency when one had to access this slow resource. CPUs had few registers for two reasons:
- bits in internal CPU registers are always more expensive than bits in external memory. The available level of silicon integration of the day meant large register sets would have been burdensome to the chip area or board areas available.
- Having a large number of registers would have required a large number of instruction bits (using precious RAM) to be used as register specifiers. For the above reasons, CPU designers tried to make instructions that would do as much work as possible. This led to one instruction that would do all of the work in a single instruction: load up the two numbers to be added, add them, and then store the result back directly to memory. Another version would read the two numbers from memory, but store the result in a register. Another version would read one from memory and the other from a register and store to memory again. And so on. This processor design philosophy eventually became known as Complex Instruction Set Computer (CISC). The general goal at the time was to provide every possible addressing mode for every instruction, a principle known as "orthogonality." This led to some complexity on the CPU, but in theory each possible command could be tuned individually, making the design faster than if the programmer used simpler commands. The ultimate expression of this sort of design can be seen at two ends of the power spectrum, the 6502 at one end, and the VAX at the other. The $25 single-chip 6502 effectively had only a single register, and by careful tuning of the memory interface it was still able to outperform designs running at much higher speeds (like the 4 MHz Zilog Z80). The VAX was a minicomputer whose initial implementation required 3 racks of equipment for a single cpu, and was notable for the amazing variety of memory access styles it supported, and the fact that every one of them was available for every instruction.

Meanwhile...

While the RISC philosophy was coming into its own, new ideas about how to dramatically increase performance of the CPUs were starting to develop. In the early 1980s it was thought that existing design was reaching theoretical limits. Future improvements in speed would be primarily through improved semiconductor "process", that is, smaller features (transistors and wires) on the chip. The complexity of the chip would remain largely the same, but the smaller size would allow it to run at higher clock rates. A considerable amount of effort was put into designing chips for parallel computing, with built-in communications links. Instead of making faster chips, a large number of chips would be used, dividing up problems among them. However history has shown that the original fears were not valid, and there were a number of ideas that dramatically improved performance in the late 1980s. One idea was to include a pipeline which would break down instructions into steps, and work on one step of several different instructions at the same time. A normal processor might read an instruction, decode it, fetch the memory the instruction asked for, perform the operation, and then write the results back out. The key to pipelining is the observation that the processor can start reading the next instruction as soon as it finishes reading the last, meaning that there are now two instructions being worked on (one is being read, the next is being decoded), and after another cycle there will be three. While no single instruction is completed any faster, the next instruction would complete right after the previous one. The illusion was of a much faster system, and more efficient utilization of processor resources. Yet another solution was to use several processing elements inside the processor and run them in parallel. Instead of working on one instruction to add two numbers, these superscalar processors would look at the next instruction in the pipeline and attempt to run it at the same time in an identical unit. However, this can be difficult to do, as many instructions in computing depend on the results of some other instruction. Both of these techniques relied on increasing speed by adding complexity to the basic layout of the CPU, as opposed to the instructions running on them. With chip space being a finite quantity, in order to include these features something else would have to be removed to make room. RISC was tailor-made to take advantage of these techniques, because the core logic of a RISC CPU was considerably simpler than in CISC designs. Although the first RISC designs had marginal performance, they were able to quickly add these new design features and by the late 1980s they were significantly outperforming their CISC counterparts. In time this would be addressed as process improved to the point where all of this could be added to a CISC design and still fit on a single chip, but this took most of the late-80s and early 90s. The long and short of it is that for any given level of general performance, a RISC chip will typically have many fewer transistors dedicated to the core logic. This allows the designers considerable flexibility; they can, for instance:
- increase the size of the register set
- implement measures to increase internal parallelism
- increase the size of caches
- add other functionality, like I/O and timers for microcontrollers
- add vector (SIMD) processors like AltiVec and Streaming SIMD Extensions (SSE)
- build the chips on older fabrication lines, which would otherwise go unused
- do nothing; offer the chip for battery-constrained or size-limited applications Features which are generally found in RISC designs are:
- uniform instruction encoding (for example the op-code is always in the same bit position in each instruction, which is always one word long), which allows faster decoding;
- a homogenous register set, allowing any register to be used in any context and simplifying compiler design (although there are almost always separate integer and floating point register files);
- simple addressing modes (complex addressing modes are replaced by sequences of simple arithmetic instructions);
- few data types supported in hardware (for example, some CISC machines had instructions for dealing with byte strings. Others had support for polynomials and complex numbers. Such instructions are unlikely to be found on a RISC machine). RISC designs are also more likely to feature a Harvard memory model, where the instruction stream and the data stream are conceptually separated; this means that modifying the addresses where code is held might not have any effect on the instructions executed by the processor (because the CPU has a separate instruction and data cache), at least until a special synchronization instruction is issued. On the upside, this allows both caches to be accessed simultaneously, which can often improve performance. Many of these early RISC designs also shared a not-so-nice feature, the branch delay slot. A branch delay slot is an instruction space immediately following a jump or branch. The instruction in this space is executed whether or not the branch is taken (in other words the effect of the branch is delayed). This instruction keeps the ALU of the CPU busy for the extra time normally needed to perform a branch. Nowadays the branch delay slot is considered an unfortunate side effect of a particular strategy for implementing some RISC designs, and modern RISC designs generally do away with it (such as PowerPC, more recent versions of SPARC, and MIPS).

Early RISC

The first system that would today be known as RISC was not at the time; it was the CDC 6600 supercomputer, designed in 1964 by Jim Thornton and Seymour Cray. Thornton and Cray designed it as a number-crunching CPU (with 74 op-codes, compared with a 8086's 400) plus 12 simple computers called "peripheral processors" to handle I/O (most of the operating system was in one of these). The CDC 6600 had a load/store architecture with only two addressing modes. There were eleven pipelined functional units for arithmetic and logic, plus five load units and two store units (the memory had multiple banks so all load/store units could operate at the same time). The basic clock cycle/instruction issue rate was 10 times faster than the memory access time. Another early load/store machine was the Data General Nova minicomputer, designed in 1968. The most public RISC designs, however, were the results of university research programs run with funding from the DARPA VLSI Program. The VLSI Program, practically unknown today, led to a huge number of advances in chip design, fabrication, and even computer graphics. UC Berkeley's RISC project started in 1980 under the direction of David Patterson, based on gaining performance through the use of pipelining and an aggressive use of registers known as register windows. In a normal CPU one has a small number of registers, and a program can use any register at any time. In a CPU with register windows, there are a huge number of registers, 128, but programs can only use a small number of them, 8, at any one time. A program that limits itself to 8 registers per procedure can make very fast procedure calls: The call, and the return, simply move the window to the set of 8 registers used by that procedure. (On a normal CPU, most calls "flush" the contents of the registers to RAM to clear enough working space for the subroutine, and the return "restores" those values). The RISC project delivered the RISC-I processor in 1982. Consisting of only 44,420 transistors (compared with averages of about 100,000 in newer CISC designs of the era) RISC-I had only 32 instructions, and yet completely outperformed any other single-chip design. They followed this up with the 40,760 transistor, 39 instruction RISC-II in 1983, which ran over three times as fast as RISC-I. At about the same time, John L. Hennessy started a similar project called MIPS at Stanford University in 1981. MIPS focussed almost entirely on the pipeline, making sure it could be run as "full" as possible. Although pipelining was already in use in other designs, several features of the MIPS chip made its pipeline far faster. The most important, and perhaps annoying, of these features was the demand that all instructions be able to complete in one cycle. This demand allowed the pipeline to be run at much higher speeds (there was no need for induced delays) and is responsible for much of the processor's speed. However, it also had the negative side effect of eliminating many potentially useful instructions, like a multiply or a divide. The earliest attempt to make a chip-based RISC CPU was a project at IBM which started in 1975, predating both of the projects above. Named after the building where the project ran, the work led to the IBM 801 CPU family which was used widely inside IBM hardware. The 801 was eventually produced in a single-chip form as the ROMP in 1981, which stood for Research (Office Products Division) Mini Processor. As the name implies, this CPU was designed for "mini" tasks, and when IBM released the IBM RT-PC based on the design in 1986, the performance was not acceptable. Nevertheless the 801 inspired several research projects, including new ones at IBM that would eventually lead to their POWER system. In the early years, the RISC efforts were well known, but largely confined to the university labs that had created them. The Berkeley effort became so well known that it eventually became the name for the entire concept. Many in the computer industry criticized that the performance benefits were unlikely to translate into real-world settings due to the decreased memory efficiency of multiple instructions, and that that was the reason no one was using them. But starting in 1986, all of the RISC research projects started delivering products. In fact, almost all modern RISC processors are direct copies of the RISC-II design.

Later RISC

Berkeley's research was not directly commercialized, but the RISC-II design was used by Sun Microsystems to develop the SPARC, by Pyramid Technology to develop their line of mid-range multi-processor machines, and by almost every other company a few years later. It was Sun's use of a RISC chip in their new machines that demonstrated that RISC's benefits were real, and their machines quickly outpaced the competition and essentially took over the entire workstation market. John Hennessy left Stanford (temporarily) to commercialize the MIPS design, starting the company known as MIPS Computer Systems Their first design was a second-generation MIPS chip known as the R2000. MIPS designs went on to become one of the most used RISC chips when they were included in the PlayStation and Nintendo 64 game consoles. Today they are one of the most common embedded processors in use for high-end applications. IBM learned from the RT-PC failure and would go on to design the RS/6000 based on their new POWER architecture. They then moved their existing AS/400 mainframes to POWER chips, and found much to their surprise that even the very complex instruction set ran considerably faster. The result was the new iSeries. POWER would also find itself moving "down" in scale to produce the PowerPC design, which eliminated many of the "IBM only" instructions and created a single-chip implementation. Today the PowerPC is used in all Apple Macintosh machines, as well as being one of the most commonly used CPUs for automotive applications (some cars have over 10 of them inside). On June 6, 2005 Apple decided to switch to using Intel processors, with the first Apple-i386 based on the Pentium M to be sold sometime near the beginning of 2006. Almost all other vendors quickly joined. From the UK similar research efforts resulted in the INMOS Transputer, the Acorn Archimedes and the Advanced RISC Machine line, which is a huge success today. Companies with existing CISC designs also quickly joined the revolution. Intel released the i860 and i960 by the late 1980s, although they were not very successful. Motorola built a new design called the 88000 in homage to their famed CISC 68000, but it saw almost no use and they eventually abandoned it and joined IBM to produce the PowerPC. AMD released their 29000 which would go on to become the most popular RISC design of the early 1990s. Today RISC CPUs (and microcontrollers) represent the vast majority of all CPUs in use. The RISC design technique offers power in even small sizes, and thus has come to completely dominate the market for low-power "embedded" CPUs. Embedded CPUs are by far the largest market for processors: while a family may own one or two PCs, their car(s), cell phones, and other devices may contain a total of dozens of embedded processors. RISC had also completely taken over the market for larger workstations for much of the 90s. After the release of the Sun SPARCstation the other vendors rushed to compete with RISC based solutions of their own. Even the mainframe world is now completely RISC based. However, despite many successes, RISC has made few inroads into the desktop PC and commodity server markets, where Intel's x86 platform remains the dominant processor architecture (Intel is facing increased competition from AMD, but even AMD's processors implement the x86 platform, or a 64-bit superset known as x86-64). There are three main reasons for this. One, the very large base of proprietary PC applications are written for x86, whereas no RISC platform has a similar installed base, and this meant PC users were locked into the x86 despite a lack of performance. The second is that, although RISC was indeed able to scale up in performance quite quickly and cheaply, Intel took advantage of its large market by spending enormous amounts of money on processor development. Intel could spend many times as much as any RISC manufacturer on improvements in design and manufacturing, making up for inherent flaws in the basic x86 architecture. The third reason is that Intel designers realized that they could apply RISC design philosophies and practices to their architecture. For example, the P6 core of the PentiumPro processor and its successors has special functional units which expand, or "crack", the majority of the CISC instructions into multiple simpler RISC operations. Internally, processors using the P6 core are RISC machines that emulate a CISC architecture. Consumers are interested in speed, cost per chip, and compatibility with existing software rather than the cost of development of new chips. This has led to an interesting chain of events. As the complexity of developing more and more advanced CPUs increases, the cost of both development and fabrication of high-end CPUs has exploded. The cost gains RISC gave to the CPU designer are now dwarfed by the high costs of developing any modern CPU, and today only the biggest chip makers are capable of making high performing CPUs. The end result is that virtually all RISC platforms with the exception of IBM's POWER/PowerPC have greatly shrunk in scale of development of high performing CPUs (like SPARC and MIPS) or even abandoned (like Alpha and PA-RISC) during the 00s. As of 2004, x86 chips are the fastest CPUs in SPECint displacing all RISC CPUs, and the fastest CPU in SPECfp is the IBM Power 5 processor. Still, RISC designs have led to a number of successful platforms and architectures, some of the larger ones being:
- MIPS's MIPS line, found in most SGI computers and the PlayStation and Nintendo 64 game consoles
- IBM's POWER series, used in all of their SuperComputers/mainframes
- Freescale (formerly Motorola SPS) and IBM's PowerPC (a subset of the POWER architecture) used in Microsoft's Xbox 360, Nintendo's Revolution and Sony's Playstation 3 game consoles, and, until recently, all Apple Macintosh computers
- Sun's SPARC and UltraSPARC, found in all of their later machines
- Hewlett-Packard's PA-RISC HP/PA
- DEC Alpha
- ARMPalm, Inc. originally used the (CISC) Motorola 680x0 processors in its early PDAs, but now uses (RISC) ARM processors in its latest PDAs; Nintendo uses an ARM7 CPU in the Game Boy Advance and Nintendo DS handheld game systems. The small Korean company Gamepark also markets the GP32, which uses the ARM9 CPU.

Alternative term

Because RISC instruction sets have tended grow in size over the years, some people have started to use the term "load-store" to describe RISC chips (since this is the key element to all RISC designs). Instead of the CPU itself handling all sorts of addressing modes, a load-store architecture uses a separate unit that is dedicated to handling very simple forms of load and store operations.

See also


- addressing mode
- CISC
- ZISC
- microprocessor
- instruction set architecture
- computer architecture
- Classic RISC pipeline
- [http://groups.google.com.au/group/comp.arch/msg/e86bb8d069bf56a6 John Mashey's comp.arch RISC vs CISC ... 1997] Category:Computing acronyms Category:Computer architecture ko:RISC ja:RISC th:RISC

PA-RISC

PA-RISC is a microprocessor architecture developed by Hewlett-Packard's Systems & VLSI Technology Operation. As the name implies, it is an implementation using a RISC (Reduced Instruction Set Computing) design, where the PA stands for Precision Architecture. The design is also referred to as HP/PA for Hewlett Packard Precision Architecture. PA is considered by some to stand for Palo Alto, the location of HP's headquarters