AI News, Difference between revisions of "Portal:Computer architecture"

Difference between revisions of "Portal:Computer architecture"

In computer engineering, computer architecture is the conceptual design and fundamental operational structure of a computer system.

It is a blueprint and functional description of requirements (especially speeds and interconnections) and design implementations for the various parts of a computer — focusing largely on the way by which the central processing unit (CPU) performs internally and accesses addresses in memory.

It may also be defined as the science and art of selecting and interconnecting hardware components to create computers that meet functional, performance and cost goals.

Computer architecture

In computer engineering, computer architecture is a set of rules and methods that describe the functionality, organization, and implementation of computer systems.

When building the computer Z1 in 1936, Konrad Zuse described in two patent applications for his future projects that machine instructions could be stored in the same storage used for data, i.e.

To describe the level of detail for discussing the luxuriously embellished computer, he noted that his description of formats, instruction types, hardware parameters, and speed enhancements were at the level of “system architecture” – a term that seemed more useful than “machine organization.”[7]

Computer architecture, like other architecture, is the art of determining the needs of the user of a structure and then designing to meet those needs as effectively as possible within economic and technological constraints.

computer architecture prototypes were physically built in the form of a transistor–transistor logic (TTL) computer—such as the prototypes of the 6800 and the PA-RISC—tested, and tweaked, before committing to the final hardware form. As

The purpose is to design a computer that maximizes performance while keeping power consumption in check, costs low relative to the amount of expected performance, and is also very reliable. For

A good ISA compromises between programmer convenience (how easy the code is to understand), size of the code (how much code is required to do a specific action), cost of the computer to interpret the instructions (more complexity means more hardware needed to decode and execute the instructions), and speed of the computer (with more complex decoding hardware comes longer decode time).

For example, a computer capable of running a virtual machine needs virtual memory hardware so that the memory of different virtual computers can be kept separated.

Computer architectures usually trade off standards, power versus performance, cost, memory capacity, latency (latency is the amount of time that it takes for information from one node to travel to the source) and throughput.

The 'instruction' in the standard measurements is not a count of the ISA's actual machine language instructions, but a unit of measurement, usually based on the speed of the VAX computer architecture.

Other factors influence speed, such as the mix of functional units, bus speeds, available memory, and the type and order of instructions in the programs.

Performance is affected by a very wide range of design choices — for example, pipelining a processor usually makes latency worse, but makes throughput better.

For example, computer-controlled anti-lock brakes must begin braking within a predictable, short time after the brake pedal is sensed or else failure of the brake will occur.

Furthermore, designers may target and add special features to their products, through hardware or software, that permit a specific benchmark to execute quickly but don't offer similar advantages to general tasks.

Increases in publicly released refresh rates have grown slowly over the past few years, with respect to vast leaps in power consumption reduction and miniaturization demand.

This has led to a new demand for longer battery life and reductions in size due to the mobile technology being produced at a greater rate.

This change in focus from greater refresh rates to power consumption and miniaturization can be shown by the significant reductions in power consumption, as much as 50%, that were reported by Intel in their release of the Haswell microarchitecture;

it can be seen that the focus in research and development are shifting away from refresh rates and moving towards consuming less power and taking up less space.

Schematic diagram of a modern von Neumann processor, where

the CPU is denoted by a shaded box -adapted from [Maf01].

Register file (a) block diagram, (b) implementation

of two read ports, and (c) implementation of write port -

Schematic high-level diagram of MIPS datapath from an implementational

Note that the execute step also includes writing of data

back to the register file, which is not shown in the figure, for simplicity

step does not include writing of results back to the register

Schematic diagram of a composite datapath for R-format and load/store instructions [MK98].

Schematic diagram of a composite datapath for R-format, load/store, and branch instructions [MK98].

Schematic diagram of composite datapath for R-format, load/store, and branch instructions (from Figure 4.11) with control

signals and extra multiplexer for WriteReg signal generation [MK98].

Schematic diagram of composite datapath for R-format, load/store, and branch instructions (from Figure 4.12) with control

Schematic diagram of composite datapath for R-format, load/store, branch, and jump instructions, with control signals

for the multicycle datapath finite-state control.

fetch and decode states of the multicycle datapath. Figure

numbers refer to figures in the textbook [Pat98,MK98].

numbers refer to figures in the textbook [Pat98,MK98].

(b) jump instruction-specific states of the multicycle datapath. Figure

CPI = [#Loads &#183 5 + #Stores &#183 4 + #ALU-instr's &#183

4 + #Branches &#183 3 + #Jumps &#183 3] / (Total Number of Instructions)

for the MIPS multicycle datapath, including exception handling [MK98].

Instruction set architecture

in general, ISAs define the supported data types, what state there is (such as the main memory and registers) and their semantics (such as the memory consistency and addressing modes), the instruction set (the set of machine instructions that comprises a computer's machine language), and the input/output model.

An instruction set architecture is distinguished from a microarchitecture, which is the set of processor design techniques used, in a particular processor, to implement the instruction set.

The SPREAD compatibility objective, in contrast, postulated a single architecture for a series of five processors spanning a wide range of cost and performance.

None of the five engineering design teams could count on being able to bring about adjustments in architectural specifications as a way of easing difficulties in achieving cost and performance objectives.[1]:p.137

Some virtual machines that support bytecode as their ISA such as Smalltalk, the Java virtual machine, and Microsoft's Common Language Runtime, implement this by translating the bytecode for commonly used code paths into native machine code.

A reduced instruction set computer (RISC) simplifies the processor by efficiently implementing only the instructions that are frequently used in programs, while the less common operations are implemented as subroutines, having their resulting additional processor execution time offset by infrequent use.[2]

Other types include very long instruction word (VLIW) architectures, and the closely related long instruction word (LIW) and explicitly parallel instruction computing (EPIC) architectures.

Such instructions are typified by instructions that take multiple steps, control multiple functional units, or otherwise appear on a larger scale than the bulk of simple instructions implemented by the given processor.

RISC instruction sets generally do not include ALU operations with memory operands, or instructions to move large blocks of memory, but most RISC instruction sets include SIMD or vector instructions that perform the same arithmetic operation on multiple pieces of data at the same time.

On traditional architectures, an instruction includes an opcode that specifies the operation to perform, such as add contents of memory to register—and zero or more operand specifiers, which may specify registers, memory locations, or literal data.

For example, a conditional branch instruction will be executed, and the branch taken, if the condition is true, so that execution proceeds to a different part of the program, and not executed, and the branch not taken, if the condition is false, so that execution continues sequentially.

Some instruction sets also have conditional moves, so that the move will be executed, and the data stored in the target location, if the condition is true, and not executed, and the target location not modified, if the condition is false.

Due to the large number of bits needed to encode the three registers of a 3-operand instruction, RISC architectures that have 16-bit instructions are invariably 2-operand designs, such as the Atmel AVR, TI MSP430, and some versions of ARM Thumb.

While embedded instruction sets such as Thumb suffer from extremely high register pressure because they have small register sets, general-purpose RISC ISAs like MIPS and Alpha enjoy low register pressure.

This is due to the many addressing modes and optimizations (such as sub-register addressing, memory operands in ALU instructions, absolute addressing, PC-relative addressing, and register-to-register spills) that CISC ISAs offer.[6]

Some, such as the ARM with Thumb-extension have mixed variable encoding, that is two fixed, usually 32-bit and 16-bit encodings, where instructions can not be mixed freely but must be switched between on a branch (or exception boundary in ARMv8).

RISC instruction set normally has a fixed instruction length (often 4 bytes = 32 bits), whereas a typical CISC instruction set may have instructions of widely varying length (1 to 15 bytes for x86).

Fixed-length instructions are less complicated to handle than variable-length instructions for several reasons (not having to check whether an instruction straddles a cache line or virtual memory page boundary[4]

However, more typical, or frequent, 'CISC' instructions merely combine a basic ALU operation, such as 'add', with the access of one or more operands in memory (using addressing modes such as direct, indirect, indexed, etc.).

However, as RISC computers normally require more and often longer instructions to implement a given task, they inherently make less optimal use of bus bandwidth and cache memories.

Minimal instruction set computers (MISC) are a form of stack machine, where there are few separate instructions (16-64), so that multiple instructions can be fit into a single machine word.

Naturally, due to the interpretation overhead, this is slower than directly running programs on the emulated hardware, unless the hardware running the emulator is an order of magnitude faster.

For example, many implementations of the instruction pipeline only allow a single memory load or memory store per instruction, leading to a load-store architecture (RISC).

For example, to perform digital filters fast enough, the MAC instruction in a typical digital signal processor (DSP) must use a kind of Harvard architecture that can fetch an instruction and two data words simultaneously, and it requires a single-cycle multiply–accumulate multiplier.

Microprocessor Design/Computer Architecture

To reprogram a computer meant changing the hardware switches manually, that took a long time with potential errors.

As the data travels to different parts of the datapath, the command signals from the control unit cause the data to be manipulated in specific ways, according to the instruction.

Many DSPs are modified Harvard architectures, designed to simultaneously access three distinct memory areas: the program instructions, the signal data samples, and the filter coefficients (often called the P, X, and Y memories).

In theory, such three-way Harvard architectures can be three times as fast as a Von Neumann architecture that is forced to read the instruction, the data sample, and the filter coefficient, one at a time.

However, a modern feature called 'paging' allows the physical memory to be segmented into large blocks of memory called 'pages'.

CISC systems actually have 'complex instructions', in the sense that at least one instruction takes a long time to execute -- for example, the 'double indirect' addressing mode inherently requires two memory cycles to execute, and a few CPUs have a 'string copy' instruction that may require hundreds of memory cycles to execute.

Other ISA types include DSPs, stack machines, VLIW machines, MISC machines, TTA architectures, massively parallel processor arrays, etc.

The control unit, as described above, reads the instructions, and generates the necessary digital signals to operate the other components.

The most general meaning is a 'hardware register': anything that can be used to store bits of information, in a way that all the bits of the register can be written to or read out simultaneously. Since

registers outside of a CPU are also outside the scope of the book, this book will only discuss processor registers, which are hardware registers that happen to be inside a CPU. But

The programmer-visible registers, also called the user-accessible registers, also called the architectural registers, often simply called 'the registers', are the registers that are directly encoded as part of at least one instruction in the instruction set.

Some computers have highly specialized registers -- memory addresses always came from the program counter or 'the' index register or 'the' stack pointer;

Other computers have more general-purpose registers -- any instruction that access memory can use any address register as a index register or as a stack pointer;

many designers choose to design a CPU with lots of physical registers, using them in ways that make the CPU execute the same given instruction set much faster than a CPU that lacks those registers.

The cache is used because reading external memory is very slow (compared to the speed of the processor), and reading a local cache is much faster.

Some computers order their data with the most significant byte of a word in the lowest address, while others order their data with the most significant byte of a word in the highest address.

Computers that order data with the least significant byte in the lowest address are known as 'Little Endian', and computers that order the data with the most significant byte in the lowest address are known as 'Big Endian'.

It is easier for a human (typically a programmer) to view multi-word data dumped to a screen one byte at a time if it is ordered as Big Endian.

When communicating over a network composed of both big-endian and little-endian machines, the network hardware (should) apply the Address Invariance principle, to avoid scrambling text (avoiding the NUXI problem). High-level

software (should) be written as 'endian clean' -- always reading and writing 16 bit integers as whole 16 bit integers, 32 bit integers as whole 32 bit integers, etc.

that is not 'endian clean' -- software that writes integers, but then reads them out as 8 bit octets or integers of some other length -- usually fails when re-compiled for another computer.

Advanced CPU Designs: Crash Course Computer Science #9

So now that we've built and programmed our very own CPU, we're going to take a step back and look at how CPU speeds have rapidly increased from just a few ...

Computer Setup for Architects (and Architecture Students)

Subscribe for more! Please Like this Video! In this video I explain computer components and computer related questions that architects might have. My setup: ...

computer architecture -- CPU

This video will walk you through all the parts of a CPU and how it works from a computer science standpoint. Parts of the CPU that are discused are as follows: ...

COA Lecture 2 - CPU, Basic Components of Computer, Main Memory, I/O Modules

Computer Organization and Architecture Lecture :2 - Basic components of a computer. What is CPU, Main Memory and I/O modules. What are the 2 components ...

How a CPU Works

New Course from InOneLesson (Coming Soon): Uncover the inner workings of the CPU. Author's Website: ..

Inside your computer - Bettina Bair

View full lesson: How does a computer work? The critical components of a computer are the ..

How Computers Calculate - the ALU: Crash Course Computer Science #5

Take the 2017 PBS Digital Studios Survey: Today we're going to talk about a fundamental part of all modern computers

The CPU and Von Neumann Architecture

Introducing the CPU, talking about its ALU, CU and register unit, the 3 main characteristics of the Von Neumann model, the system clock and the fetch-execute ...

How to Choose a Computer for Architecture

A guide to choosing the best computers for architecture. Whether you're a student, pro, or in a related discipline, this video will walk you through my methodology ...

How a CPU is made

How a CPU is Made - CPU Manufacturing Central Processing Unit Subscribe Here because amazing videos will come soon: ..