There are three ways to prevent overfitting a dataset: human supervision, historical validation and minimizing entropy. Standard industry practice among Silicon Valley is to use historical validation. Standard industry practice among low-frequency hedge funds is human supervision. Nobody is minimizing entropy yet.
The limitation of historical validation is it only works in big data situations. It is of limited applicability when dealing with small data and long-tails. Human supervision lets you overcome particular small data situations but it otherwise puts a hard limit on the complexity of your model because there is a limit to how much data you can feed through the conscious parts of a human brain.
I believe biological brains are more data efficient than machine learning problems because evolution's wetware minimizes entropy instead of validating against historical data. If the entropy models don't take much more data than the historical validation models then we are in an AI overhang.
So, how do we make an entropy-based model?
My guess is resonance within the human connectome does a sort of implicit eigendecomposition which creates an orthogonal basis for the entropy equation. An entropy-based artificial neural network (ANN) need not be so complicated. It might be possible to just add a reasonable entropy function like $S=\sum\rho_i\ln\rho_i$ to existing ANNs. Alternatively, you might go all the way and just do the whole thing with differential equations. I don't think it's a coincidence neural ordinary differential equations are so much better than a multilayer perceptron at extrapolation beyond their training dataset. (See figure 8.a and 8.b of the paper.)