for example, https://www.foxnews.com/media/some-fact-checkers-biden-false-remarks-gun-show-sale

To make a prediction, we first transform each word in the text to a n-dimensional vector. Resulting matrix then goes through 2 dense layers to form a low-dimensional vector out of an entire text.
Final layer makes a single number out of the vector. The most important words are selected after the first stage through the singular value decomposition.

Loss functions are regulirized by quadratic polynoms with the coefficients that give the best predictions on the test set.