NNUE
Home * Learning * Neural Networks * NNUE
NNUE, (ƎUИИ Efficiently Updatable Neural Networks)
a Neural Network architecture intended to replace the evaluation of Shogi, chess and other board game playing alpha-beta searchers running on a CPU. Inspired by Kunihito Hoki's approach of piece-square tables indexed by king location, and further two-piece locations and side to move as applied in his Shogi engine Bonanza [3], NNUE was introduced in 2018 by Yu Nasu [4], and was used in Shogi adaptations of Stockfish such as YaneuraOu [5],
and Kristallweizen [6], apparently with AlphaZero strength [7].
Contents
Stockfish NNUE
As reported by Henk Drost [8], Nodchip incorporated NNUE into the chess playing Stockfish 10 as a proof of concept. Stockfish NNUE was born, and in summer 2020 the computer chess community bursts out enthusiastically due to its rapidly raising playing strength with different networks trained using a mixture of supervised and reinforcement learning methods - despite the approximately halved search speed, becoming stronger than its original [9], finally responsible for the huge strength improvement of Stockfish 12.
Adoption
see Category:NNUE
Being tempted by the success of Stockfish NNUE and attracted by how easy the method and small the code is, many engine developers have started testing and applying NNUE. For quick trials and evaluating before going into serious development, some of them borrowed and/or rewrote NNUE code and use networks from Stockfish NNUE. Most of them reported positive results, such as David Carteau with Orion [10], Ehsan Rashid with DON [11], various Stockfish derivatives by Michael Byrne [12], and Volodymyr Shcherbyna with Igel [13] using the Night Nurse NNUE net by Dietrich Kappe [14]. Daniel Shawul added NNUE support à la CFish into his egbbdll probing library of Scorpio [15] [16], making it even easier to use NNUE. The promising engines Halogen 7 and 8 by Kieren Pearson, and Seer by Connor McMonigle came with their own, distinct NNUE implementations, and on November 10, 2020, the commercial Dragon by Komodo Chess aka Komodo NNUE appeared [17], trying to close the gap to Stockfish NNUE. The commercial Fat Fritz 2.0, based on a slightly modified Stockfish 12 using a customized, double sized network, was released by ChessBase in February 2021.
Architecture
Basic NNUE
The most basic form of an NNUE network consists of three layers: an input layer of length 768, one hidden layer of arbitrary size, and an output layer consisting of one neuron, representing the evaluation of the position. A NNUE network also commonly consists of two perspectives. i.e. two hidden layers representing both sides are concatenated into a single hidden layer of twice the length, before being forwarded to the output layer.
const int INPUT_SIZE = 768; const int HL_SIZE = ...; // These parameters are part of the training configuration // These are the commonly used values as of 2024 const int SCALE = 400; const int QA = 255; const int QB = 64; struct Network { int16_t accumulator_weights[INPUT_SIZE][HL_SIZE]; int16_t accumulator_biases[HL_SIZE]; int16_t output_weights[2 * HL_SIZE]; int16_t output_bias; };
Input Layer
The input layer of a basic NNUE network contains 768 neurons of binary values, each representing one possible combination of piece, square, and color. Each neuron can be either on or off, depending on whether a piece with the corresponding combinations exist on the board. The following is a common encoding scheme seen in most NNUE engines.
// Square: 0 - 63 // PieceType: Pawn = 0, Knight = 1, Bishop, Rook, Queen, King // Side: White = 0, Black = 1 int calculateIndex(Square square, PieceType pieceType, Side side) { return side * 64 * 6 + pieceType * 64 + square; }
Accumulator
The first hidden layer of a NNUE is also called the accumulator. The accumulator can be of arbitrary size, but integer multiples of powers of 2 being preferred as they allow the inference to be fully optimized. A large hidden layer trades speed for higher evaluation accuracy, and vise versa. Larger hidden layers are usually preferred as Elo gain from speedup tends to scale inversely with respect to time and computational power. Most top engines have accumulator sizes of more than 1024 neurons, with some even using accumulators as wide as 3072. A large accumulator is important as it is the only hidden layer able to be efficiently updated.
struct Accumulator { int16_t values[HL_SIZE]; };
Perspective
The vast majority of NNUE architectures require two accumulators to be maintained separately. Each represent the board from the the "perspective", or point-of-view, from each side. The current board position is always from the white perspective. To calculate the input values from the perspective of black, flip every piece along the horizontal line of symmetry of the board, and invert the colors of the pieces. Both perspectives use identical input weights and biases. During inference, the two accumulators are concatenated, with the accumulator representing side-to-move being first. This resulting vector of double the accumulator width is the "true" hidden layer, and is the vector that gets forwarded to the output layer.
int calculateIndex(Side perspective, Square square, PieceType pieceType, Side side) { if (perspective == BLACK) { side = ~side; // Flip side square = square ^ 0b111000; // Mirror square } return side * 64 * 6 + pieceType * 64 + square; } struct AccumulatorPair { Accumulator white; Accumulator black; };
Efficient Updates
The format of the input layer allows the accumulators to be efficiently updated, meaning that the neural network does not need to be fully re-evaluated after making a move. Quiet moves and promotions can only change the values of two neurons in the input layers, while captures and castling can change the values of three and four input neurons, respectively. Therefore, instead of fully calculating the accumulator values, one can take the values of the accumulator from the previous ply, subtract away the weight contributions from neurons that will be turned off, and add into it the weights from neurons that will be turned on. Due to the nonlinearity introduced by the activation function, this efficient updates can only be performed on the first hidden layer of the neural network.
The following is a basic implementation of efficient single add/sub:
void accumulatorAdd(struct Network* const network, struct Accumulator* accumulator, size_t index) { for (int i = 0; i < HL_SIZE; i ++) accumulator->values[i] += network->accumulator_weights[index][i]; } void accumulatorSub(struct Network* const network, struct Accumulator* accumulator, size_t index) { for (int i = 0; i < HL_SIZE; i ++) accumulator->values[i] -= network->accumulator_weights[index][i]; }
Output Layer
Most NNUE networks have exactly one output neuron, indicating the heuristic score of a position. The accumulator values are passed through an activation function, then forwarded to the output layer by taking the dot product to the output weights, and adding the output bias.
During training, the final evaluation value is passed though sigmoid activation to be trained on a target value. The target value is the result linear interpolation of the game result (0, 0.5, 1), and sigmoid(evaluation/eval_scale).
However, during inference, the sigmoid function is unused, and the final evaluation is scaled by eval_scale. This will produce a reasonable evaluation value with a large range and suffient precision.
Activation Function | Notes |
---|---|
ReLU (Rectified Linear Unit): f(x) = max(x, 0) | Rarely used due to potential to overflow. |
CReLU (Clipped ReLU): f(x) = clamp(x, 0, 1) | Used less commonly. Easy to implement and can be auto-vectorized . |
SCReLU (Squared Clipped ReLU): f(x) = clamp(x, 0, 1)^2 | Most prevalent and produces the strongest network. Cannot be auto-vectorized, and requires hand written SIMD code to optimize. |
int16_t CReLU(int16_t value, int16_t min, int16_t max) { if (value <= min) return min; if (value >= max) return max; return value; } // Example: CReLU activation int32_t activation(int16_t value) { return CReLU(value, 0, QA); } // When forwarding the accumulator values, the network does not consider the color of the perspectives. // Rather, we are more interested in whether the accumulator is from the perspective of the side-to-move. int32_t forward(struct Network* const network, struct Accumulator* const stm_accumulator, struct Accumulator* const nstm_accumulator) { int32_t eval = 0; // Dot product to the weights for (int i = 0; i < HL_SIZE; i++) { // BEWARE of integer overflows here. eval += activation( stm_accumulator->values[i]) * network->output_weights[i]; eval += activation(nstm_accumulator->values[i]) * network->output_weights[i + HL_SIZE]; } // Uncomment the following dequantization step when using SCReLU // eval /= QA; eval += network->output_bias; eval *= SCALE; eval /= QA * QB; return eval; }
Quantization
You may be wondering where the constants QA and QB came from, or why the final evaluation is divided by their product. The answer is that most NNUE networks are quantized. During training, the weights of the network in represented in floating points, since they offer a continuous range of values and reasonable accuracy. However, floating points are slow, and since floating point operations are not completely accurate, small errors could build up on each accumulator update, resulting in inaccurate evaluation. Therefore, trained floating-point weights are quantized before loading the network.
Quantization of a NNUE goes like the following: multiply weights and biases by some amount, then round the values into an integer. This will lose some precision, but not enough to significantly affect evaluation accuracy. Suppose we quantize the input-accumulator weights by QA, then one floating point at the accumulator will correspond to QA after quantization. Therefore, the accumulator biases are also scaled by QA, and the activation function no longer clips the value to 1, but to QA. Suppose we then quantize the accumulator-output weights by QB, then, assuming CReLU activation, one floating point value at the output node will correspond to QA * QB after quantization. Thus, the output biases are scaled by QA * QB. At the end of inference, we divide out the quantization, leaving us a very close approximation of the network evaluation.
SCReLU activation makes things messier. A floating point value of the accumulator values after SCReLU is actually QA * QA, so the output biases are scaled by QA * QA * QB. However, this does not fit inside a 16-bit integer. To prevent this, some trainers will still scale the output biases by QA * QB, so you would divide the dot product by QA preliminarily. This puts the output scaling back to QA * QB, and the dequantization can continue as usual.
Advanced NNUE Techniques
Output Buckets
In a network with output buckets, the output weights and biases are split into several versions. The index of the active output weights is typically chosen by considering the number of non-king pieces left on the board. The following code illustrates one simple formula:
const int OUTPUT_BUCKETS = 8; // can be adjusted const int DIVISOR = (32 + OUTPUT_BUCKETS - 1) / OUTPUT_BUCKETS; int choose_output_buckets(uint64_t occupancy_bb) { return (bit_count(occupancy_bb) - 2) / DIVISOR; }
Horizontal Mirroring
In a network with horizontal mirroring, the king will always be on the left side of the board. If the king goes over the middle line, then the entire board is flipped (from the accumulator's perspective), so that the king remains on one side of the board. Note that the accumulator cannot be efficiently updated when the king crosses the vertical line of symmetry, as the index of every input neuron would be recalculated. Horizontal mirroring doubles the effectiveness of training data, as well as providing more king location information to the network.
King Input Buckets
An application of the mixture of experts technique. With king input buckets, several version of accumulator weights are generated. The board is divided into regions, and the king's location in the region determines which set of accumulator weights are used. Like horizontal mirroring, a boundary-crossing king move is not efficiently updatable, and the accumulator must undergo a full refresh. However, the improvements in evaluation accuracy is oftentimes worth the slowdown.
Feature Factorizer
A technique for training king-bucketed nets. During training, a "shadow" bucket called the factorizer is added to the network. The factorizer is always activated, and contributes to the evaluate on every position. When the training run finishes, the factorizer weights are added back into each bucket. Factorizers allow faster learning for buckets, especially ones with low training data coverage.
Pairwise Multiplication
Before concatenating the accumulators, divide each accumulator into two parts of equal length. Multiply each element in the first part with a corresponding element in the second part, resulting in a vector half the size of the original accumulator. The hidden layer becomes the result of concatenating two such vectors. With pairwise multiplication, the hidden layer is reduced to the same width as each of the accumulators. At the same accumulator size, this speeds up inference but decreases evaluation accuracy. However, it allows even wider HL sizes as well as more complex architectures with multiple hidden layers.
Multiple Layers
Some engines, such as Stockfish, Ethereal and Viridithas, has networks that consists of multiple layers.
Performance Improvements
Fused Updates
Instead of calling accumulatorAdd or accumulatorSub multiple times to update the counter, merge these updates into a single function such as accumulatorAddSub or accumulatorAddSubSub. This improves the performance of efficient updates, since it saves loops and allows registers to be reused.
Copy-Make
Store a ply-indexed array of accumulators and only perform update once during make move. This approach is faster compared to make/unmake and allows optimizations such as Lazy Updates.
Lazy Updates
Delay accumulator updates until evaluation call. There are variants of lazy updates, but the following implementation is the simplest and more common. On make-move, instead of updating the accumulator, only mark it as dirty and save the features that were changed by the current move. Before the network is evaluated, travel back through the accumulator stack until reaching a clean accumulator. Do incremental updates from that clean accumulator back to the current accumulator, refreshing every accumulator in between. This is a speedup because evaluation is not necessary in certain nodes, as most engines cache static evaluation in Transposition Tables. Lazy updates is especially important in multi-threading because of the shared Transposition Table in Lazy SMP.
Accumulator Refresh Table
Introduced by Koivisto author Finn Eggers, and colloquially referred to as Finny Tables. When performing lazy updates on architectures with king bucketed inputs, one may never reach a clean accumulator reachable via efficient updates. Since a full refresh is very expensive, one may save some computation by storing a king-bucket-indexed table of accumulators and their associated bitboards. Whenever one reaches a point without a clean accumulator, probe the cached accumulator and bitboards from the refresh table, and update from the cached accumulator by computing the difference from the cached bitboards and the current bitboards. Once an updated accumulator has been obtained, update the cache entry with the current accumulator and bitboards.
SIMD
SIMD is an acronym for Single Instruction, Multiple Data. It allows the CPU to process multiple data simultaneously. Compilers can automatically generate SIMD instructions to improve the speed of NNUE inference. However, it is not perfect, and there are still opportunities to optimize NNUE inference with hand-written SIMD code.
Lizard SCReLU
This optimization is introduced by Liam McGuire in Lizard chess engine. It significantly speeds up inference on a simple NNUE architecture with SCReLU activation.
Consider the following loop to forward a SCReLU-activated hidden layer:
const int QA = 255; const int QB = 64; for (int i = 0; i < HL_SIZE; i++) { const int us_clamped = clamp(stm_accumulator->values[i], 0, QA); const int them_clamped = clamp(mstm_accumulator->values[i], 0, QA); eval += us_clamped * us_clamped * network->output_weights[i]; eval += them_clamped * them_clamped * network->output_weights[i + HL_SIZE]; }
How can one vectorize this loop? Since QA = 255, SCReLU(x) can be as large as 65025, which is outside of the range of 16-bit integers. One may consider using a lower quantization factor, but the loss of precision will be too much to account for the Elo loss. Fortunately, there is another way to approach this problem.
Say the value after clamping is v and the output weight is w. Instead of calculating (v * v) * w. One can instead calculate in a different order: (v * w) * v. This way, as long as v * w is in range, the code will not overflow. Therefore, during training, we can clip w to ensure it is always between [-1.98, 1.98]. And since we know that v has a maximum of QA, and that w is scaled by QB, the maximum of v * w is QA * QB * 1.98, which is within the maximum representable value of 16 bit integers. To finish the last step, we can use the multiply-add-adjacent intrinsic such as vpmaddwd or _mm256_madd_epi16 to pairwisely multiply the elements (implicitly promoting them to 32 bit integers), then sum adjacent pairs in the result vector to yield a result vector of identical length. In code, this looks like the following
const int QA = 255; const int QB = 64; const __m256i vec_zero = _mm256_setzero_si256(); const __m256i vec_qa = _mm256_set1_epi16(QA); __m256i sum = vec_zero; for (int i = 0; i < ...; i ...) { const __m256i us = ...; // Load from accumulator const __m256i them = ...; const __m256i us_weights = ...; // Load from net const __m256i them_weights = ...; const __m256i us_clamped = _mm256_min_epi16( _mm256_max_epi16( us, vec_zero ), vec_qa ); const __m256i them_clamped = _mm256_min_epi16( _mm256_max_epi16( them, vec_zero ), vec_qa ); const __m256i us_results = _mm256_madd_epi16( _mm256_mullo_epi16( us_weights, us_clamped ), us_clamped ); const __m256i them_results = _mm256_madd_epi16( _mm256_mullo_epi16( them_weights, them_clamped ), them_clamped ); sum = _mm256_add_epi32(sum, us_results); sum = _mm256_add_epi32(sum, them_results); }
See also
Publications
- Yu Nasu (2018). ƎUИИ Efficiently Updatable Neural-Network based Evaluation Functions for Computer Shogi. Ziosoft Computer Shogi Club, pdf, pdf (Japanese with English abstract) GitHub - asdfjkl/nnue translation [18]
- Dominik Klein (2021). Neural Networks For Chess. Release Version 1.1 · GitHub [19]
Forum Posts
2020
January ...
- The Stockfish of shogi by Larry Kaufman, CCC, January 07, 2020 » Stockfish, Shogi
- Re: The Stockfish of shogi by Gian-Carlo Pascutto, CCC, January 18, 2020
- Stockfish NNUE by Henk Drost, CCC, May 31, 2020 » Stockfish
- Stockfish NN release (NNUE) by Henk Drost, CCC, May 31, 2020
- NNUE shared library and tools by Adam Treat, CCC, June 10, 2020
July
- Lizard-NNUE Experiment NOT bad with NNUE Net Evaluation... by Nancy M Pichardo, CCC, July 15, 2020
- Can the sardine! NNUE clobbers SF by Henk Drost, CCC, July 16, 2020
- NNUE accessible explanation by Martin Fierz, CCC, July 21, 2020
- Re: NNUE accessible explanation by Jonathan Rosenthal, CCC, July 23, 2020
- Re: NNUE accessible explanation by Jonathan Rosenthal, CCC, July 24, 2020
- Re: NNUE accessible explanation by Jonathan Rosenthal, CCC, August 03, 2020
- BrainLearn NNUE 1.0 by Andrea Manzo, FishCooking, July 25, 2020 » BrainLearn
- ShashChess NNUE 1.0 by Andrea Manzo, FishCooking, July 25, 2020 » ShashChess
- LC0 vs. NNUE - some tech details... by Srdja Matovic, CCC, July 29, 2020 » Lc0
- What does NNUE actually mean by Paloma, CCC, July 29, 2020
August
- What happens with my hyperthreading? by Kai Laskos, CCC, August 06, 2020 » Stockfish NNUE
- Re: Minic version 2 by Vivien Clauzon, CCC, August 08, 2020 » Minic
- Neural Networks weights type by Fabio Gobbato, CCC, August 13, 2020 » Stockfish NNUE
- Re: Introducing Igel chess engine - Igel and NNUE by Volodymyr Shcherbyna, CCC, August 19, 2020 » Igel
- Orion 0.7 : NNUE experiment by David Carteau, CCC, August 19, 2020 » Orion
- Night Nurse 0.2 by Dietrich Kappe, CCC, August 19, 2020 » A0lite, Igel
- NNUE by Bert Tuyt, World Draughts Forum, August 19, 2020 » Draughts
September
- Train a neural network evaluation by Fabio Gobbato, CCC, September 01, 2020 » Automated Tuning
- RubiChess NNUE player implemented by Andreas Matthies, CCC, September 06, 2020 » RubiChess
- Toga III 0.4 NNUE by Dietrich Kappe, CCC, September 07, 2020 » Toga
- Neural network quantization by Fabio Gobbato, CCC, September 08, 2020 » Neural Networks
- AVX-512 and NNUE by Gian-Carlo Pascutto, CCC, September 08, 2020 » AVX-512
- First success with neural nets by Jonathan Kreuzer, CCC, September 23, 2020 » Neural Networks
- Re: First success with neural nets by Jonathan Kreuzer, CCC, November 11, 2020 » Checkers
- Nemorino 6 (NNUE) by Florentino, CCC, September 28, 2020 » Nemorino
- A Crossroad in Computer Chess; Or Desperate Flailing for Relevance by Andrew Grant, CCC, September 29, 2020
- NNUE variation by Ed Schröder, CCC, September 29, 2020
October
- BONA_PIECE_ZERO by elcabesa, CCC, October 04, 2020
- Re: Final Release of Ethereal, V12.75 by Andrew Grant, CCC, October 09, 2020 » Ethereal
- Request for someone to train an NNUE for Ethereal by Andrew Grant, CCC, October 09, 2020
- Ethereal Tuning - Data Dump by Andrew Grant, CCC, October 10, 2020
- Dangerous turn by Dann Corbit, CCC, October 10, 2020
- Black crushing white, weird ? by Vivien Clauzon, CCC, October 14, 2020 » MinicNNUE
- Hacking around CFish NNUE by Maksim Korzh, CCC, October 15, 2020 » CFish
- Re: Hacking around CFish NNUE by Daniel Shawul, CCC, October 15, 2020 » Scorpio
- How to scale stockfish NNUE score? by Maksim Korzh, CCC, October 17, 2020
- Re: How to scale stockfish NNUE score? by Daniel Shawul, CCC, October 17, 2020
- Embedding Stockfish NNUE to ANY CHESS ENGINE: YouTube series by Maksim Korzh, CCC, October 17, 2020 » BBC
- Seer by Gerd Isenberg, CCC, October 18, 2020 » Seer
- BBC 1.3 + Stockfish NNUE released! by Maksim Korzh, CCC, October 21, 2020 » BBC
- Mayhem NNUE - New NN engine by JohnWoe, CCC, October 22, 2020 » Mayhem
- Centipawns vs Millipawns with NNUE by Madeleine Birchfield, CCC, October 23, 2020 » Centipawns, Millipawns
- NNUE Question - King Placements by Andrew Grant, CCC, October 23, 2020 » Stockfish NNUE - NNUE Structure
November
- Komodo 14.1 Release and Dragon Announcement by Larry Kaufman, CCC, November 02, 2020 » Komodo
- NNUE outer product vs tensor product by Madeleine Birchfield, CCC, November 02, 2020 [20] [21]
- Pytorch NNUE training by Gary Linscott, CCC, November 08, 2020 [22]
- TucaNNo: neural network research by Alcides Schulz, CCC, November 08, 2020
- Dragon by Komodo Chess by Larry Kaufman, CCC, November 10, 2020 » Dragon by Komodo Chess
- Tensorflow NNUE training by Daniel Shawul, CCC, November 10, 2020 [23]
- Speculations about NNUE development (was New engine releases 2020) by Madeleine Birchfield, CCC, November 11, 2020
- Re: Speculations about NNUE development (was New engine releases 2020) by Connor McMonigle, CCC, November 12, 2020
- Re: Speculations about NNUE development (was New engine releases 2020) by Connor McMonigle, CCC, November 12, 2020 » Dragon by Komodo Chess, Seer, Halogen
- Re: Final Release of Ethereal, V12.75 by Andrew Grant, CCC, November 12, 2020
- Maybe not the best diversity of strongest chess engines under development by Kai Laskos, CCC, November 14, 2020 » Engine Similarity
- CPU Vector Unit, the new jam for NNs... by Srdja Matovic, CCC, November 18, 2020 » SIMD
- You've trained a brilliant NN(UE) King-Piece Network. Now what? by Andrew Grant, CCC, November 19, 2020
- Pawn King Neural Network by Tamás Kuzmics, CCC, November 26, 2020
December
- Orion 0.8 + The Cerebrum release by David Carteau, CCC, December 01, 2020 » Orion, Cerebrum
- The NNUE split programmers are in by Ed Schröder, ProDeo Forum, December 02, 2020
- Introducing the "Cerebrum" library (NNUE-like trainer and inference code) by David Carteau, CCC, December 07, 2020
- Dispelling the Myth of NNUE with LazySMP: An Analysis by Andrew Grant, CCC, December 30, 2020 » Lazy SMP
2021
January
- Translation of Yu Nasu's NNUE paper by Dominik Klein, CCC, January 07, 2021
- Re: Pytorch NNUE training by Gary Linscott, CCC, January 09, 2021
- More experiments with neural nets by Jonathan Kreuzer, CCC, January 09, 2021 » Slow Chess
- Shouldn't positional attributes drive SF's NNUE input features (rather than king position)? by Nick Pelling, FishCooking, January 10, 2021 » Stockfish NNUE
- HalfKP Structure in NNUE by Roger Stephenson, CCC, January 12, 2021
- Andscacs nnue 0.1 by Daniel José Queraltó, CCC, January 17, 2021 » Andscacs
- It's NNUE era (sharing my thoughts) by Basti Dangca, CCC, January 18, 2021
- NNUE and game phase by Dann Corbit, CCC, January 18, 2021 » Game Phases
- correspondence chess in the age of NNUE by Larry Kaufman, CCC, January 21, 2021
- One for Andrew Grant et al. - NNUE? by Srdja Matovic, CCC, January 21, 2021
- 256 in NNUE? by Ted Wong, CCC, January 28, 2021
- So what do we miss in the traditional evaluation? by Ferdinand Mosca, CCC, January 29, 2021 » Evaluation
- Latest Night Nurse released by Dietrich Kappe, CCC, January 29, 2021
- None-GPL NNUE probing code by Daniel Shawul, CCC, January 31, 2021
February
- Fat Fritz 2 by Jouni Uski, CCC, February 09, 2021 » Fat Fritz 2.0
- How much work is it to train an NNUE? by Gabor Szots, CCC, February 11, 2021
- HCE and NNUE and vectorisation by Vivien Clauzon, CCC, February 11, 2021 » Minic
- nnue reading code by Jon Dart, CCC, February 13, 2021
- New net: The White Rose by Dietrich Kappe, CCC, February 20, 2021
- Are neural nets (the weights file) copyrightable? by Adam Treat, CCC, February 21, 2021
- My first NNUE nn-f0c1c3cbf2f1.nnue by MikeB, CCC, February 27, 2021
- How to make a double-sized net as good as SF NNUE in a few easy steps by Chris Whittington, CCC, February 28, 2021 » Fat Fritz 2.0
March
- A random walk down NNUE street … by MikeB, CCC, March 06, 2021
- NNUE Research Project by Ed Schröder, CCC, March 10, 2021 » Engine Similarity
- Simex including NNUE by jjoshua2, CCC, March 11, 2021 » Engine Similarity
- NNUE ranking by Jim Logan, CCC, March 12, 2021 » Stockfish NNUE
- FEN compression by lucasart, CCC, March 17, 2021 » FEN Compression, NNUE Training
- Mabigat - hyperparameter optimizer for NNUE net by Ferdinand Mosca, CCC, March 22, 2021 » Automated Tuning
- nnue-trainer by Jon Dart, CCC, March 27, 2021
- Zeta with NNUE on GPU? by Srdja Matovic, CCC, March 31, 2021 » Zeta, GPU
April
- Rubichess NN questions by Jon Dart, CCC, April 23, 2021 » RubiChess
- Crafty NNUE Chess Engine? by supersharp77, CCC, April 29, 2021 » Crafty, Vafra [24]
May
- Komodo Dragon 2 released by Larry Kaufman, CCC, May 04, 2021 » Komodo Dragon
- Stockfish with new NNUE architecture and bigger net released by Stefan Pohl, CCC, May 19, 2021 » Stockfish, Stockfish NNUE [25]
- NNUE scoring (egbb lib) by Desperado, CCC, May 19, 2021 » Scorpio NNUE
June
- Re: Booot progress by Alex Morozov, CCC, June 01, 2021 » Booot
- Commercial Release of Ethereal 13.00 (NNUE) for AVX2 Systems by Andrew Grant, CCC, June 04, 2021 » Ethereal 13.00 (NNUE)
- Re: Commercial Release of Ethereal 13.00 (NNUE) for AVX2 Systems by Andrew Grant, CCC, June 04, 2021 » HalfKP
- Dark Horse Update by Dietrich Kappe, CCC, June 11, 2021
- Some more experiments with neural nets by Jonathan Kreuzer, CCC, June 15, 2021 » Slow Chess
- Re: will Tcec allow Stockfish with a Leela net to play? by Connor McMonigle, CCC, June 17, 2021 » Stockfish, Komodo Dragon, Ethereal, Seer
- Re: will Tcec allow Stockfish with a Leela net to play? by Daniel Shawul, CCC, June 18, 2021 » Scorpio
- Re: will Tcec allow Stockfish with a Leela net to play? by Vivien Clauzon, CCC, June 18, 2021 » Minic
- I declare that HCE is dead... by Andrew Grant, CCC, June 29, 2021 » Ethereal, HCE
July
- Re: NNUE Question - King Placements by Tomasz Sobczyk, CCC, July 01, 2021 » NNUE Question
- Re: NNUE Question - King Placements by Daniel Shawul, July 01, 2021 » ScorpioNNUE
- Before things become more messy than they already are by Ed Schröder, CCC, July 02, 2021
- NNUE training set generation by Edsel Apostol, CCC, July 03, 2021
- Time to rethink what Albert Silver has done? by Srdja Matovic, CCC, July 03, 2021 » Fat Fritz 2
- Would the ICGA have accepted today's NNUE engines? by Madeleine Birchfield, CCC, July 05, 2021 » ICGA
- Koivisto 5.0 by Finn Eggers, CCC, July 07, 2021 » Koivisto
- NNUE one year retrospective by Madeleine Birchfield, CCC, July 09, 2021
August ...
- Basic NNUE questions by Amanj Sherwany, CCC, August 04, 2021
- Alternatives to King-Pawn, King-All NNUE encoding by Andrew Grant, CCC, September 05, 2021
- NNUE - Efficiently Updatable Network - understanding by Daniel Infuehr, CCC, October 11, 2021
- NNUE - only from own engine? by Rebel, October 25, 2021
- Regarding AVX2 by Rebel, CCC, November 03, 2021 » AVX2
- Mantissa 3.0.0 by Jeremy Wright, CCC, December 10, 2021 » Mantissa
- Are NNUE Nets Specific to Chess Engines or They Universal to All Engines? by daniel71, CCC, December 26, 2021
2022 ...
- Why NNUE trainer requires an online qsearch on each training position? by Chao Ma, CCC, January 01, 2022
- Rebel 14 by Ed Schröder, CCC, January 12, 2022 » Rebel 14
- Koivisto 8.0 by Finn Eggers, CCC, March 15, 2022 » Koivisto
- NNUE + Pawn-King Network by Alvin Peng, CCC, April 22, 2022
2024 ...
- How to get started with NNUE by Arjun Basandrai, CCC, January 12, 2024
External Links
NNUE
- Efficiently updatable neural network | Wikipedia
- 次世代の将棋思考エンジン、NNUE関数を学ぼう(その1.ネットワーク構造編) - コンピュータ将棋 Qhapaq, June 02, 2018 (Japanese)
- Learn Next Generation Shogi Thinking Engine, NNUE Function (Part 1. Network Structure) - Computer Shogi
- 次世代の将棋思考エンジン、NNUE関数を学ぼう(その2.改造/学習編) - コンピュータ将棋 Qhapaq, July 08, 2018 (Japanese)
- Let's Learn Next Generation Shogi Thinking Engine, NNUE Function (Part 2. Remodeling/Learning) - Computer Shogi
- Stockfish NNUE – The Complete Guide, June 19, 2020 (Japanese and English)
- 3 technologies in shogi AI that could be used for chess AI by Motohiro Isozaki, August 21, 2020 » Stockfish NNUE
- Stockfish NNUE Wiki
- nnue | Home of the Dutch Rebel by Ed Schröder
- NNUE Guide (nnue-pytorch/nnue.md at master · glinscott/nnue-pytorch · GitHub) hosted by Gary Linscott
NNUE libraries
Some developers disintegrate and rewrite the Stockfish NNUE code into independent libraries which can be much easier to embed into other chess engines.
- GitHub - david-carteau/cerebrum: The Cerebrum library by David Carteau » Cerebrum
- GitHub - dshawul/nncpu-probe by Daniel Shawul
- GitHub - jdart1/nnue: NNUE reading code for chess by Jon Dart
Source Code
- GitHub - yaneurao/YaneuraOu: YaneuraOu is the World's Strongest Shogi engine(AI player), WCSC29 1st winner, educational and USI compliant engine
- GitHub - Tama4649/Kristallweizen: 第29回世界コンピュータ将棋選手権 準優勝のKristallweizenです。
- GitHub - nodchip/Stockfish: UCI chess engine (Stockfish NNUE by Nodchip)
- A Leela NNUE? Night Nurse and Others · dkappe/leela-chess-weights Wiki · GitHub by Dietrich Kappe
- GitHub - DanielUranga/TensorFlowNNUE by Daniel Uranga
- GitHub - glinscott/nnue-pytorch: NNUE (Chess evaluation) trainer in Pytorch by Gary Linscott
- GitHub - connormcmonigle/seer-nnue: UCI chess engine using neural networks for position evaluation by Connor McMonigle » Seer
- GitHub - bmdanielsson/nnue-trainer: PyTorch trainer for NNUE style neural networks by Martin Danielsson » Marvin [26]
- GitHub - fsmosca/Mabigat: NNUE parameter optimizer by Ferdinand Mosca » Automated Tuning
- GitHub - jw1912/bullet: Neural Network Trainer by Jamie Whiting » Automated Tuning
- GitHub - xu-shawn/nnue-visualizer by Shawn Xu
Misc
- Lower Numerical Precision Deep Learning Inference and Training by Andres Rodriguez et al., Intel, January 19, 2018 » AVX-512
- Nue from Wikipedia
- Hiromi - Spectrum, 2019, YouTube Video
References
- ↑ Nue (鵺) from the Konjaku Gazu Zoku Hyakki (今昔画図続百鬼) by Toriyama Sekien, circa 1779, Wikimedia Commons
- ↑ Re: What does NNUE actually mean by Henk Drost, CCC, July 29, 2020
- ↑ 機械学習エンジニアのための将棋AI開発入門その1 Introduction to Shogi AI development for machine learning engineers Part 1, May 03, 2020 (Japanese)
- ↑ Yu Nasu (2018). ƎUИИ Efficiently Updatable Neural-Network based Evaluation Functions for Computer Shogi. Ziosoft Computer Shogi Club, pdf (Japanese with English abstract)
- ↑ GitHub - yaneurao/YaneuraOu: YaneuraOu is the World's Strongest Shogi engine(AI player), WCSC29 1st winner, educational and USI compliant engine
- ↑ GitHub - Tama4649/Kristallweizen: 第29回世界コンピュータ将棋選手権 準優勝のKristallweizenです。
- ↑ The Stockfish of shogi by Larry Kaufman, CCC, January 07, 2020
- ↑ Stockfish NN release (NNUE) by Henk Drost, CCC, May 31, 2020
- ↑ Can the sardine! NNUE clobbers SF by Henk Drost, CCC, July 16, 2020
- ↑ Orion 0.7 : NNUE experiment by David Carteau, CCC, August 19, 2020
- ↑ Re: New engine releases 2020...Don NNUE 2020? by supersharp77, CCC, August 19, 2020
- ↑ ... the last shall be first ... by MikeB, CCC, 19 Aug 2020
- ↑ Introducing Igel chess engine by Volodymyr Shcherbyna, CCC, 20 Aug 2020
- ↑ Night Nurse 0.2 by Dietrich Kappe, CCC, August 19, 2020
- ↑ Re: Hacking around CFish NNUE by Daniel Shawul, CCC, October 15, 2020
- ↑ Re: How to scale stockfish NNUE score? by Daniel Shawul, CCC, October 17, 2020
- ↑ Dragon by Komodo Chess by Larry Kaufman, CCC, November 10, 2020
- ↑ Translation of Yu Nasu's NNUE paper by Dominik Klein, CCC, January 07, 2021
- ↑ Book about Neural Networks for Chess by dkl, CCC, September 29, 2021
- ↑ Outer product from Wikipedia
- ↑ Tensor product from Wikipedia
- ↑ PyTorch from Wikipedia
- ↑ TensorFlow from Wikipedia
- ↑ Vafra by Robert Jurjević
- ↑ Update default net to nn-8a08400ed089.nnue by Sopel97 · Pull Request #3474 · official-stockfish/Stockfish · GitHub by Tomasz Sobczyk
- ↑ nnue-trainer by Jon Dart, CCC, March 27, 2021