Changes

Jump to: navigation, search

Word

5,556 bytes added, 17:13, 25 May 2018
Created page with "'''Home * Programming * Data * Word''' A '''Word''' or Computer Word, is a term for the natural unit of data used by a particular computer architecture...."
'''[[Main Page|Home]] * [[Programming]] * [[Data]] * Word'''

A '''Word''' or Computer Word, is a term for the natural unit of data used by a particular computer architecture. Modern computers usually have a word size to be a power of 2 multiple of the unit of address resolution, likely a [[Byte]], that is two, four, or eight Bytes, which are 16, 32, or 64 [[Bit|bits]]. Many other sizes have been used in the past, including 8 (a Byte), 9, 12, 18, 24, 36, 39, 40, 48, and 60 bits. Some of the early computers were decimal rather than binary, having a word size of 10 or 12 decimal digits, and some of them had no fixed word length at all.

=16-bit Word=
Often the size of a word is defined to be a value for compatibility with earlier computers, such as [[Intel|Intel's]] [[x86]] and [[x86-64]] architecture, which referes a '''Word''' from the original [[8086]] 16-bit µ-Processor. Subsequently Intel used the terms [[Double Word]] ('''dword''') for 32-bit words, a quadruple word or [[Quad Word]] ('''qword''') for 64-bits words, and even a [[Double Quad Word]] for 128-bit words. [[x86]] and [[x86-64]] registers may still treated as word registers (ax versus eax or even rax) , while it is recommend to use the native 32-bit [[Double Word|double word]], because the word-wise access requires a prefix byte to overwrite the default width. [[SIMD and SWAR Techniques|SIMD]] instruction sets like [[MMX]], [[AltiVec]] and [[SSE2]] provide operations on vectors of four or eight words inside appropriate SIMD-registers. [[IBM 360]] and successors with 32-bit words, refer 16-bit size as '''halfword'''.

=Short=
On recent 32-bit and 64-bit processors the primitive [[C]] datatype '''short''' and '''unsigned short''' refers to 16-bit words by most compilers for those architectures. In [[Java]], '''short''' is guaranteed to have 16-bit. Signed short in C is assumed to use [[Twos' Complement]], but not strictly specified. A Word-type, explicitly type-defined in C, is therefor usually treated as unsigned, also to avoid arithmetical right shift issues:
<pre>
typedef unsigned char BYTE;
typedef unsigned short WORD;
</pre>
<span id="Ranges"></span>
=Ranges=
{| class="wikitable"
|-
! language
! type
! min
! max
|-
| rowspan="3" | [[C]], [[Cpp|C++]]
| unsigned short
| style="text-align:right;" | 0
| style="text-align:right;" | 65535
|-
| hexadecimal
| style="text-align:right;" | 0x0000
| style="text-align:right;" | 0xFFFF
|-
| #include <limits.h>
|
| USHRT_MAX
|-
| rowspan="3" | [[C]], [[Cpp|C++]],<br/>[[Java]]
| short
| style="text-align:right;" | -32768
| style="text-align:right;" | 32767
|-
| hexadecimal
| style="text-align:right;" | 0x8000
| style="text-align:right;" | 0x7FFF
|-
| #include <limits.h>
| style="text-align:right;" | SHRT_MIN
| style="text-align:right;" | SHRT_MAX
|}

=Alignment=
Words stored in memory should be stored at even byte addresses. Otherwise at runtime it will cause a miss-alignment exception on some processors, or a huge penalty on others.

=Endianness=
''Main article: [[Endianness]].''
An issue with words consisting of two or more bytes, is the order, bytes may appear inside a word of memory. According to their usual arithmetical significance, there is a low and a high byte of a 16-bit word, which may either be stored at the lower or higher byte-address in memory. Intel processors were always so called [[little-endian]] machines, the least significant byte (LSB) is at the lowest address. Other processors, including the [[IBM 370]] family, the [[PDP-10]] (36 bit), the Motorola microprocessor families, and most of the various RISC designs are [[big-endian]], and store the ‘big-end-first’.

=Extracting Bytes=
Following C union to extract or synthesize bytes from/in words, is not portable and should be avoided.
<pre>
union {
BYTE b[2];
WORD s;
} u;

u.s = 0xaa55;
assert (u.b[0] == 0x55); // fails, if big-endian
</pre>
The portable way in C can be done with inlined functions or C preprocessor macros, using arithmetical divide or modulo by 256, aka shift and mask by bitwise 'and' - or for the synthesis multiplication of high byte by 256 plus low byte:
<pre>
BYTE lowByte (WORD s) {return (BYTE)(s & 255);} // mod 256
BYTE highByte(WORD s) {return (BYTE)(s >> 8);} // div 256

WORD makeWORD (BYTE high, BYTE low) {
WORD s = high;
return (s << 8) + low; // high * 256 + low
}
</pre>

=See also=
* [[Byte]]
* [[Double Word]]
* [[Quad Word]]

=External Links=
* [https://en.wikipedia.org/wiki/Word_%28computer_science%29 Word from Wikipedia]
* [https://en.wikipedia.org/wiki/Byte Byte from Wikipedia]
* [https://en.wikipedia.org/wiki/Endianness Endianness from Wikipedia]
* [http://betterexplained.com/articles/understanding-big-and-little-endian-byte-order/ Understanding Big and Little Endian Byte Order]
* [http://www.ietf.org/rfc/ien/ien137.txt IEN 137 - DAV's Endian FAQ - On Holy Wars and a Plea for Peace] by [http://www.myri.com/staff/cohen/ Danny Cohen], [http://ai.isi.edu/ U S C/I S I], April 1, 1980
* [[Videos#MahavishnuOrchestra|Mahavishnu Orchestra]] - [https://en.wikipedia.org/wiki/Birds_of_Fire One Word], 1973, [https://en.wikipedia.org/wiki/YouTube YouTube] Video
: [[Videos#JohnMcLaughlin|John McLaughlin]], [[Videos#BillyCobham|Billy Cobham]], [https://en.wikipedia.org/wiki/Rick_Laird Rick Laird], [[Videos#JanHammer|Jan Hammer]], [https://en.wikipedia.org/wiki/Jerry_Goodman Jerry Goodman]
: {{#evu:https://www.youtube.com/watch?v=_--OPhoTUZY?rel=0|alignment=left|valignment=top}}

'''[[Data|Up one Level]]'''

Navigation menu