an x86-64 expansion of bit-manipulation instructions by Intel, introduced in conjunction with the Advanced Vector Extensions SIMD instruction set. With the Bulldozer microarchitecture, BMI1 as well as AVX are also available on AMD processors under the initial name BMI, along with their Trailing Bit Manipulation Instructions (TBM) . Most BMI1 instructions (except LZCNT and TZCNT) employ the VEX prefix encoding to support up to three-operand syntax with non-destructive source operands on 32- or 64-bit general-purpose registers. BMI1 (ANDN, BEXTR, BLSI, BLSMK, BLSR, TZCNT) requires bit 3 set in EBX of CPUID with EAX=07H, ECX=0H. LZCNT, not exactly member of BMI1, requires bit 5 set in ECX of CPUID EAX=80000001H. With the advent of AVX2, some more bit-twiddling on general-purpose registers is proposed with BMI2.
BMI1 instructions may speedup various bitboard operations, such as relative complement, and isolation, reset and separation of the least significant one bit, they combine two instructions and reduce register pressure. Leading and trailing zero count are useful for scanning bits with possibly empty sets.
Logical And Not, the relative complement, no intrinsic due to compiler support.
dest ::= ~src1 & src2;
dest ::= (src >> start) & ((1 << len)-1); unsigned __int32 _bextr_u32(unsigned __int32 src, unsigned __int32 start, unsigned __int32 len); unsigned __int64 _bextr_u64(unsigned __int64 src, unsigned __int32 start, unsigned __int32 len);
dest_signextended ::= (dest ^ signbit) - signbit
dest ::= src & -src; unsigned __int64 _blsi_u64(unsigned __int64 src);
dest ::= (src-1) ^ src; unsigned __int64 _blsmsk_u64(unsigned __int64 src);
dest ::= (src-1) & src; unsigned __int64 _blsr_u64(unsigned __int64 src);
unsigned __int64 _lzcnt_u64(unsigned __int64 src);
unsigned __int64 _tzcnt_u64(unsigned __int64 src);
- Intel AVX and AVX2 Programming Reference (pdf)
- AMD64 Architecture Programmer’s Manual Volume 3: General-Purpose and System Instructions (pdf) 
- Software Optimization Guide for AMD Family 15h Processors (pdf) 9.8 Optimizing with BMI and TBM Instructions, pp. 163
- AMD64 Architecture Programmer’s Manual Volume 3: General-Purpose and System Instructions (pdf)
- Sign Extension from The Aggregate Magic Algorithms by Hank Dietz
- Looking for intrinsic "least significant bit" on Visual Studio by Oliver Brausch, CCC, September 03, 2020
- Moved BMI and TBM instructions from Volume 4 to Volume 3 in September 2011