Changes

Jump to: navigation, search

Double

6,313 bytes added, 15:49, 9 August 2018
Created page with "'''Home * Programming * Data * Double''' '''Double''' is a 64-bit data type representing the [https://en.wikipedia.org/wiki/Double_precision_floating-po..."
'''[[Main Page|Home]] * [[Programming]] * [[Data]] * Double'''

'''Double''' is a 64-bit data type representing the [https://en.wikipedia.org/wiki/Double_precision_floating-point_format double precision floating-point format], in [https://en.wikipedia.org/wiki/IEEE_754-1985 IEEE 754-1985] called double, in [https://en.wikipedia.org/wiki/IEEE_754-2008 IEEE 754-2008] 64-bit base 2 format is officially referred to as binary64. Due to [https://en.wikipedia.org/wiki/Normal_number_%28computing%29 normalization] the true [https://en.wikipedia.org/wiki/Significand significand] includes an implicit leading one bit unless the exponent is stored with all bits zeros or ones which are reserved for [https://en.wikipedia.org/wiki/Subnormal_numbers Denormal numbers]. Thus only 52 bits of the significand are stored but the total precision is 53 bits (≈15.955 decimal digits). [https://en.wikipedia.org/wiki/Exponent_bias Exponent bias] is 0x3FF (1023).

[[FILE:IEEE 754 Double Floating Point Format.svg|none|border|text-bottom]]
[https://en.wikipedia.org/wiki/Double_precision_floating-point_format Double precision floating-point format]

=x86 Instruction Sets=
Recent [[x86]] and [[x86-64]] processors provide [https://en.wikipedia.org/wiki/X87 x87], and [[SSE2]] double precision floating point instruction sets. Since SSE2 is not obligatory for x86-32, 32-bit operating systems rely on x87. x86-64 64-bit operating systems may use the faster SSE2 instructions, but so far only 64-bit compiler for 64-bit [[Windows]] emit those instructions implicitly for double precision floating point operations <ref>[http://www.agner.org/optimize/calling_conventions.pdf Calling conventions for different C++ compilers and operating systems] (pdf) by [http://www.agner.org/ Agner Fog]</ref> . SSE2 instructions can be mixed with x87 and are explicitly available through (inline) [[Assembly]] or intrinsics of various [[C]]-Compilers.

==Integer to Double Conversion==
===X87===
To convert a signed or unsigned integer to float, two x87 instructions are needed, FILD and FSTP working on the x87 floating point stack <ref>[http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26569.pdf AMD64 ArchitectureProgrammer’s Manual Volume 5: 64-Bit Media and x87 Floating-Point Instructions]</ref> .

'''FILD'''
The FILD instruction converts a signed-integer in memory to [https://en.wikipedia.org/wiki/Extended_precision double-extended-precision] (10 bytes) format and pushes the value onto the x87 register stack. The value can be a 16-bit, 32-bit, or 64- bit integer value. Signed values from memory can always be represented exactly in x87 registers without rounding.

'''FSTP'''
The FSTP instruction pops the x87 stack after copying the value. The instruction FSTP ST(0) is the same as popping the stack with no data transfer. If the specified destination is a single-precision (4 bytes) or double-precision (8 bytes) memory location, the instruction converts the value to the appropriate precision format. It does this by rounding the significand of the source value as specified by the rounding mode determined by the RC field of the x87 control word and then converting to the format of destination. It also converts the exponent to the width and bias of the destination format.

===SSE2===
<ref>[https://support.amd.com/TechDocs/26568.pdf AMD64 Architecture, Programmer’s Manual, Volume 4: 128-Bit and 256-Bit Media Instructions] (pdf)</ref><ref>[http://msdn.microsoft.com/en-us/library/9b07190d%28v=VS.100%29.aspx Floating-Point Intrinsics Using Streaming SIMD Extensions 2 Instructions]</ref>
'''CVTDQ2PD'''
Converts two packed 32-bit signed integer values in the low-order 64 bits of an XMM register or a 64-bit memory location to two packed double-precision floating-point values and writes the converted values in another XMM register.
* Mnemonic: CVTDQ2PD xmm1, xmm2/mem64
* Intrinsic: [http://msdn.microsoft.com/en-us/library/fhwkxa6t%28v=VS.100%29.aspx _mm_cvtepi32_pd]

'''CVTPI2PD'''
Converts two packed 32-bit signed integer values in an MMX register or a 64-bit memory location to two double-precision floating-point values and writes the converted values in an XMM register.
* Mnemonic: CVTPI2PD xmm, mmx/mem64
* Intrinsic: [http://msdn.microsoft.com/en-us/library/ahh5bb05%28v=VS.100%29.aspx _mm_cvtpi32_pd]

'''CVTSI2SD'''
Converts a 32-bit or 64-bit signed integer value in a general-purpose register or memory location to a double-precision floating-point value and writes the converted value in the low-order 64 bits of an XMM register. The high-order 64 bits in the destination XMM register are not modified.
* Mnemonic: CVTSI2SD xmm, reg/mem32 (reg/mem64)
* Intrinsic: [http://msdn.microsoft.com/en-us/library/b60kza8a%28v=VS.100%29.aspx _mm_cvtsi32_sd]

=BitScan Purpose=
Integer to Double conversion can be used as base 2 logarithm of a power of two value of a 64-bit signed or unsigned integer, which might be used as 64-bit [[BitScan|bitscan]], as mentioned in [[BitScan#DoubleConversionofLS1B|Double conversion of LS1B]] and [[BitScan#DoubleConversionBSR|Double conversion]].

=See also=
* [[Quad Word]]
* [[SSE]]
* [[SSE2]]
* [[Float]]

=Publications=
* [[David Goldberg]] ('''1991'''). ''What every computer scientist should know about floating-point arithmetic''. [[ACM#Surveys|ACM Computing Surveys]], [https://www.itu.dk/~sestoft/bachelor/IEEE754_article.pdf pdf]

=Forum Posts=
* [http://www.talkchess.com/forum/viewtopic.php?t=28207 Bitboards using 2 DOUBLE's ?] by [[Carey Bloodworth|Carey]], [[CCC]], June 02, 2009 » [[Bitboards]]

=External Links=
* [https://en.wikipedia.org/wiki/Floating_point Floating point from Wikipedia]
* [https://en.wikipedia.org/wiki/Double_precision_floating-point_format Double precision floating-point format]
* [https://en.wikipedia.org/wiki/Extended_precision Extended precision]
* [https://en.wikipedia.org/wiki/Quadruple_precision Quadruple precision floating-point format]
* [http://www.mrob.com/pub/math/floatformats.html Survey of Floating-Point Formats] by [http://www.mrob.com/pub/index.html Robert Munafo]
* [http://info.uptrend.ch/uptrend/page/display/numerische-probleme-mit-reals?v=54 About Floating Point Arithmetic] from [[Johann Joss#Blog|Johanns Blog]]

=References=
<references />

'''[[Data|Up one Level]]'''

Navigation menu