Neural Networks

Table of Contents

1. Neural Networks

1.1. CNN

1.1.2. Pooling + Convolutional

http://databookuw.com/databook.pdf#section.6.5
It is common to periodically insert a Pooling layer between successive convolutional layers in a DCNN architecture.
Its function is to progressively reduce the spatial size of the representation in order to reduce the number of parameters and computation in the network.

  • This is an effective strategy to:
    1. help control overfitting and
    2. fit the computation in memory.

1.3. Neural Networks Scaling Laws

  • Has Generative AI Already Peaked? - Computerphile - YouTube

    We consistently find across our experiments that, across concepts, the frequency of a concept in the pretraining dataset is a strong predictor of the model’s performance on test examples containing that concept (see Fig. 2).
    Notably, model performance scales linearly as the concept frequency in pretraining data grows exponentially i.e., we observe a consistent log-linear scaling trend.
    We find that this log-linear trend is robust to controlling for correlated factors (similar samples in pretraining and test data [81]) and testing across different concept distributions along with samples generated entirely synthetically [52].

  • AI can’t cross this line and we don’t know why. - YouTube

1.5. OpenWorm

OpenWorm is an international open science project to simulate the roundworm Caenorhabditis elegans at the cellular level as a simulation
Although the long-term goal is to model all 959 cells of the C. elegans, the first stage is to model the worm’s locomotion by simulating the 302 neurons and 95 muscle cells.
This bottom up simulation is being pursued by the OpenWorm community.
As of this writing, a physics engine called Sibernetic has been built for the project and models of the neural connectome and a muscle cell have been created in NeuroML format.
A 3D model of the worm anatomy can be accessed through the web via the OpenWorm browser.
The OpenWorm project is also contributing to develop Geppetto, a web-based multi-algorithm, multi-scale simulation platform engineered to support the simulation of the whole organism.

1.9. Transformers

1.9.6. Attention in transformers, visually explained | DL6 - YouTube

The attention layers “nudge” the current next vector embedding towards the most probable answer

1.12. Regularization

1.12.2. Lp space - Wikipedia L0, L1, L2

\(L_\infty\) is the max function
\(L_0\) is the sum of non-zero term, defines the Hamming distance if you take the difference of two vectors
\(L_0 = k, k \le \textbf{param dimension}\) are lines, planes, cubes, hypercubes, …, hyperplanes of all but k parameters set to zero

1.13. Superposition Hypothesis

How might LLMs store facts | Chapter 7, Deep Learning - YouTube Superposition Hypothesis

1.16. Multiverse Model Compression

https://multiversecomputing.com/
Model Compression: https://paperswithcode.com/task/model-compression
https://github.com/HuangOwen/Awesome-LLM-Compression
Utilizan mejor compresión de modelos: https://multiversecomputing.com/papers/compactifai-extreme-compression-of-large-language-models-using-quantum-inspired-tensor-networks
https://arxiv.org/abs/2401.14109
El mecanismo es Tensor Network: https://en.wikipedia.org/wiki/Tensor_network

https://medium.com/@researchgraph/compactifai-large-language-models-dont-actually-have-to-be-large-857548d053d2

They show that LLMs do not have to get larger to get better. Instead, they can perform just as well with just a fraction of the parameters. Moreover, the article also presented a novel compression technique that, unlike previous methods, is controllable and explainable.

También pueden editar (Knowledge Editing) y explicar la seguridad de la red (quitarle un concepto)

1.16.3. Tensor Train (TT) Decomposition

1.17. Neural circuit policies enabling auditable autonomy

1.18. Why neural networks aren’t neural networks Youtube

https://youtu.be/CfAL_cL3SGQ
They are consecutive linear and non-linear transformations

1.19. We are doing Neural Networks wrong

https://link.medium.com/97kwPdKMSkb

  1. Artificial NN are too simple (It takes 1000 artificial neurons to simulate 1 biological neuron at the single spike timescale)
  2. We do not know how they work
  3. We could compensate for the lack of complexity in artificial neurons with larger models, tons of computing power, and gigantic datasets, but that’s too inefficient to be the eventual last step of this quest.
  4. ANNs should be more neuroscience-based for two reasons,
    1. (future) the difference in complexity between biological and artificial neurons will result in differences in outcome — AGI won’t come without a reform
    2. (present) the inefficiency with which we’re pursuing this goal is damaging our society and the planet

1.22. Ver “Manifold Mixup: Better Representations by Interpolating Hidden States” en YouTube

https://youtu.be/1L83tM8nwHU
Softer decision boundaries by mixing up data

1.24. Ver “Introduction to GANs, NIPS 2016 | Ian Goodfellow, OpenAI” en YouTube

Author: Julian Lopez Carballal

Created: 2025-03-18 mar 08:58