Changes

Jump to: navigation, search

Float

12,521 bytes added, 16:09, 9 August 2018
Created page with "'''Home * Programming * Data * Float''' '''Float''' is a 32-bit data type representing the [https://en.wikipedia.org/wiki/Single_precision_floating-poin..."
'''[[Main Page|Home]] * [[Programming]] * [[Data]] * Float'''

'''Float''' is a 32-bit data type representing the [https://en.wikipedia.org/wiki/Single_precision_floating-point_format single precision floating-point format], in [https://en.wikipedia.org/wiki/IEEE_754-1985 IEEE 754-1985] called single, in [https://en.wikipedia.org/wiki/IEEE_754-2008 IEEE 754-2008] the 32-bit base 2 format is officially referred to as binary32. Due to [https://en.wikipedia.org/wiki/Normal_number_%28computing%29 normalization] the true [https://en.wikipedia.org/wiki/Significand significand] includes an implicit leading one bit unless the exponent is stored with all bits zeros (0x00) or ones (0xff) which are reserved for [https://en.wikipedia.org/wiki/Subnormal_numbers Denormal numbers]. Thus only 23 bits of the significand are stored but the total precision is 24 bits (≈7.225 decimal digits). [https://en.wikipedia.org/wiki/Exponent_bias Exponent bias] is 0x7f.

=Format=
[[FILE:Float example.svg|none|border|text-bottom]]
[https://en.wikipedia.org/wiki/Single_precision_floating-point_format Single precision floating-point format]

=x86 Float Instruction Sets=
Recent [[x86]] and [[x86-64]] processors provide [https://en.wikipedia.org/wiki/X87 x87], [https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions SSE] and [https://en.wikipedia.org/wiki/3DNow%21 3DNow!] ([[AMD]] only, shared with [[MMX]]/x87) floating point instruction sets. 3DNow! and SSE are [[SIMD and SWAR Techniques|SIMD]] instructions with vectors of two or four floats. Since SSE is not obligatory for x86-32, 32-bit operating systems rely on x87. x86-64 64-bit operating systems may use the faster SSE instructions, but so far only 64-bit compiler for 64-bit [[Windows]] emit those instructions implicitly for floating point operations <ref>[http://www.agner.org/optimize/calling_conventions.pdf Calling conventions for different C++ compilers and operating systems] (pdf) by [http://www.agner.org/ Agner Fog]</ref> . SSE instructions can be mixed with x87 or 3DNow! and are explicitly available through (inline) [[Assembly]] or intrinsics of various [[C]]-Compilers.

==Integer to Float Conversion==
===X87===
To convert a signed or unsigned integer to float, two x87 instructions are needed, FILD and FSTP working on the x87 floating point stack <ref>[http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26569.pdf AMD64 ArchitectureProgrammer’s Manual Volume 5: 64-Bit Media and x87 Floating-Point Instructions]</ref> .

'''FILD'''
The FILD instruction converts a signed-integer in memory to [https://en.wikipedia.org/wiki/Extended_precision double-extended-precision] (80-bit) format and pushes the value onto the x87 register stack. The value can be a 16-bit, 32-bit, or 64- bit integer value. Signed values from memory can always be represented exactly in x87 registers without rounding.

'''FSTP'''
The FSTP instruction pops the x87 stack after copying the value. The instruction FSTP ST(0) is the same as popping the stack with no data transfer. If the specified destination is a single-precision or double-precision memory location, the instruction converts the value to the appropriate precision format. It does this by rounding the significand of the source value as specified by the rounding mode determined by the RC field of the x87 control word and then converting to the format of destination. It also converts the exponent to the width and bias of the destination format.

===SSE===
'''CVTDQ2PS'''
Converts four packed signed doubleword integers in the source operand (second operand) to four packed single-precision floating-point values in the destination operand (first operand).
* Mnemonic: CVTDQ2PS xmm1, xmm2
* Intrinsic: [http://msdn.microsoft.com/en-us/library/36bwxcx5.aspx _mm_cvtepi32_ps]

'''CVTPI2PS'''
Converts two packed signed doubleword integers in the source operand (second operand) to two packed single-precision floating-point values in the destination operand (first operand).
* Mnemonic: CVTPI2PS xmm, mm
* Intrinsic: [http://msdn.microsoft.com/de-de/library/ae2ksssb.aspx _mm_cvtpd_ps]

===3DNow!===
'''PI2FD'''
Converts packed 32-bit integer values to packed floating-point, single-precision values
* Mnemonic: PI2FD mm1, mm2
* Intrinsic: [http://msdn.microsoft.com/en-us/library/72a8t1hy.aspx _m_pi2fd]
<span id="BitScan"></span>
=BitScan Purpose=
Integer to Float conversion can be used as base 2 logarithm of a power of two value of a 32-bit signed or unsigned integer, which might even base of a 64-bit [[BitScan|bitscan]] <ref>[https://www.stmintz.com/ccc/index.php?id=268305 Fast 3DNow! BitScan] by [[Gerd Isenberg]] from [[Computer Chess Forums|CCC]], December 01, 2002</ref> . The 23 lower significant bits are always zero, the exponent contains the biased bitindex:
{| class="wikitable"
|-
! i
! 2^i as hexstring
! tofloat as hexstring
! exponent - 127
|-
| style="text-align:right;" | 0
| style="text-align:right;" | 0x00000001
| style="text-align:right;" | 0x3f800000
| style="text-align:right;" | 0
|-
| style="text-align:right;" | 1
| style="text-align:right;" | 0x00000002
| style="text-align:right;" | 0x40000000
| style="text-align:right;" | 1
|-
| style="text-align:right;" | 2
| style="text-align:right;" | 0x00000004
| style="text-align:right;" | 0x40800000
| style="text-align:right;" | 2
|-
| style="text-align:right;" | 3
| style="text-align:right;" | 0x00000008
| style="text-align:right;" | 0x41000000
| style="text-align:right;" | 3
|-
| style="text-align:right;" | 4
| style="text-align:right;" | 0x00000010
| style="text-align:right;" | 0x41800000
| style="text-align:right;" | 4
|-
| style="text-align:right;" | 5
| style="text-align:right;" | 0x00000020
| style="text-align:right;" | 0x42000000
| style="text-align:right;" | 5
|-
| style="text-align:right;" | 6
| style="text-align:right;" | 0x00000040
| style="text-align:right;" | 0x42800000
| style="text-align:right;" | 6
|-
| style="text-align:right;" | 7
| style="text-align:right;" | 0x00000080
| style="text-align:right;" | 0x43000000
| style="text-align:right;" | 7
|-
| style="text-align:right;" | 8
| style="text-align:right;" | 0x00000100
| style="text-align:right;" | 0x43800000
| style="text-align:right;" | 8
|-
| style="text-align:right;" | 9
| style="text-align:right;" | 0x00000200
| style="text-align:right;" | 0x44000000
| style="text-align:right;" | 9
|-
| style="text-align:right;" | 10
| style="text-align:right;" | 0x00000400
| style="text-align:right;" | 0x44800000
| style="text-align:right;" | 10
|-
| style="text-align:right;" | 11
| style="text-align:right;" | 0x00000800
| style="text-align:right;" | 0x45000000
| style="text-align:right;" | 11
|-
| style="text-align:right;" | 12
| style="text-align:right;" | 0x00001000
| style="text-align:right;" | 0x45800000
| style="text-align:right;" | 12
|-
| style="text-align:right;" | 13
| style="text-align:right;" | 0x00002000
| style="text-align:right;" | 0x46000000
| style="text-align:right;" | 13
|-
| style="text-align:right;" | 14
| style="text-align:right;" | 0x00004000
| style="text-align:right;" | 0x46800000
| style="text-align:right;" | 14
|-
| style="text-align:right;" | 15
| style="text-align:right;" | 0x00008000
| style="text-align:right;" | 0x47000000
| style="text-align:right;" | 15
|-
| style="text-align:right;" | 16
| style="text-align:right;" | 0x00010000
| style="text-align:right;" | 0x47800000
| style="text-align:right;" | 16
|-
| style="text-align:right;" | 17
| style="text-align:right;" | 0x00020000
| style="text-align:right;" | 0x48000000
| style="text-align:right;" | 17
|-
| style="text-align:right;" | 18
| style="text-align:right;" | 0x00040000
| style="text-align:right;" | 0x48800000
| style="text-align:right;" | 18
|-
| style="text-align:right;" | 19
| style="text-align:right;" | 0x00080000
| style="text-align:right;" | 0x49000000
| style="text-align:right;" | 19
|-
| style="text-align:right;" | 20
| style="text-align:right;" | 0x00100000
| style="text-align:right;" | 0x49800000
| style="text-align:right;" | 20
|-
| style="text-align:right;" | 21
| style="text-align:right;" | 0x00200000
| style="text-align:right;" | 0x4a000000
| style="text-align:right;" | 21
|-
| style="text-align:right;" | 22
| style="text-align:right;" | 0x00400000
| style="text-align:right;" | 0x4a800000
| style="text-align:right;" | 22
|-
| style="text-align:right;" | 23
| style="text-align:right;" | 0x00800000
| style="text-align:right;" | 0x4b000000
| style="text-align:right;" | 23
|-
| style="text-align:right;" | 24
| style="text-align:right;" | 0x01000000
| style="text-align:right;" | 0x4b800000
| style="text-align:right;" | 24
|-
| style="text-align:right;" | 25
| style="text-align:right;" | 0x02000000
| style="text-align:right;" | 0x4c000000
| style="text-align:right;" | 25
|-
| style="text-align:right;" | 26
| style="text-align:right;" | 0x04000000
| style="text-align:right;" | 0x4c800000
| style="text-align:right;" | 26
|-
| style="text-align:right;" | 27
| style="text-align:right;" | 0x08000000
| style="text-align:right;" | 0x4d000000
| style="text-align:right;" | 27
|-
| style="text-align:right;" | 28
| style="text-align:right;" | 0x10000000
| style="text-align:right;" | 0x4d800000
| style="text-align:right;" | 28
|-
| style="text-align:right;" | 29
| style="text-align:right;" | 0x20000000
| style="text-align:right;" | 0x4e000000
| style="text-align:right;" | 29
|-
| style="text-align:right;" | 30
| style="text-align:right;" | 0x40000000
| style="text-align:right;" | 0x4e800000
| style="text-align:right;" | 30
|-
| style="text-align:right;" | 31
| style="text-align:right;" | 0x80000000
| style="text-align:right;" | 0x4f000000
| style="text-align:right;" | 31
|-
| style="text-align:right;" | 31
| style="text-align:right;" | 0x80000000
| style="text-align:right;" | 0xcf000000
| style="text-align:right;" | 31
|}

=See also=
* [[Double Word]]
* [[SSE]]
* [[SSE2]]
* [[Double]]

=Publications=
* [[David Goldberg]] ('''1991'''). ''What every computer scientist should know about floating-point arithmetic''. [[ACM#Surveys|ACM Computing Surveys]], [https://www.itu.dk/~sestoft/bachelor/IEEE754_article.pdf pdf]
* [[Ward Douglas Maurer]] ('''1996'''). ''[http://www.researchgate.net/publication/221014653_Relative_Precision_in_the_Inductive_Assertion_Method Relative Precision in the Inductive Assertion Method]''. [http://www.informatik.uni-trier.de/~ley/db/conf/naa/wnaa1996.html WNAA 1996] <ref>[https://en.wikipedia.org/wiki/Hoare_logic Floyd–Hoare logic from Wikipedia]</ref>
* [[Jacek Mańdziuk]], [[Daniel Osman]] ('''2004'''). ''Alpha-Beta Search Enhancements with a Real-Value Game-State Evaluation Function''. [[ICGA Journal#27_1|ICGA Journal, Vol. 27, No. 1]], [http://www.mini.pw.edu.pl/~mandziuk/PRACE/ICGA.pdf pdf]
* [http://www.ece.ncsu.edu/people.php/wwedmons William W. Edmonson], [[Maarten van Emden|Maarten H. van Emden]] ('''2008'''). ''Interval Semantics for Standard Floating-Point Arithmetic''. [https://arxiv.org/abs/0810.4196 arXiv:0810.4196]

=Forum Posts=
* [http://groups.google.com/group/rec.games.chess.computer/browse_frm/thread/cbd402de3b07b976/b31333d734e8a6cc Re: Which is better, IYHO] by [[Ian Kennedy]], [[Computer Chess Forums|rgcc]], August 20, 1995
* [https://www.stmintz.com/ccc/index.php?id=18674 Re: Floating point VS Integer Math] by [[Bruce Moreland]], [[CCC]], May 14, 1998
* [http://www.talkchess.com/forum/viewtopic.php?t=22817 Evaluation functions. Why integer?] by oysteijo, [[CCC]], August 06, 2008 » [[Evaluation]], [[Score]]
* [http://www.talkchess.com/forum/viewtopic.php?t=44841 OT: denormals] by [[Martin Sedlak]], [[CCC]], August 19, 2012
* [http://www.talkchess.com/forum/viewtopic.php?t=50472 floating point SSE eval] by [[Marco Belli]], [[CCC]], December 13, 2013 » [[Evaluation]], [[Score]]

=External Links=
* [https://en.wikipedia.org/wiki/Floating_point Floating point from Wikipedia]
* [https://en.wikipedia.org/wiki/Single_precision_floating-point_format Single precision floating-point format from Wikipedia]
* [http://www.mrob.com/pub/math/floatformats.html Survey of Floating-Point Formats] by [http://www.mrob.com/pub/index.html Robert Munafo]
* [http://info.uptrend.ch/uptrend/page/display/numerische-probleme-mit-reals?v=54 About Floating Point Arithmetic] from [[Johann Joss#Blog|Johanns Blog]]

=References=
<references />

'''[[Data|Up one Level]]'''

Navigation menu