data science
Table of Contents
- 1. data science
- 1.1. Awesome data science
- 1.2. https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html
- 1.3. Seven states of randomness - Wikipedia
- 1.4. Resources
- 1.5. Want to know how to turn change into a movement? - Gapingvoid
- 1.6. Forward problems vs inverse problems: It’s easier to validate than to generate
- 1.7. Thinking about High-Quality Human Data | Lil’Log
- 1.8. Evolutionary Tree of LLMs
- 1.9. data science links
- 1.9.1. links
- 1.9.2. Visualización de datos
- 1.9.3. k-means y Voronoi
- 1.9.4. DBSCAN
- 1.9.5. Local outlier factor - Wikipedia (Basado en densidad como DBSCAN)
- 1.9.6. geocosas
- 1.9.7. https://github.com/easystats/ R repos
- 1.9.8. An intuitive, visual guide to copulas — While My MCMC Gently Samples
- 1.9.9. Generación de datos
- 1.9.10. Structural Temporal Series
- 1.9.11. The Artificial Intelligence Wiki | Pathmind
- 1.10. Bootstrapping
- 1.11. Taguchi Matrix / Taguchi Arrays
- 1.12. Common Statistical Tests Are Linear Models
- 1.13. Links
- 1.13.1. Moving from Statistics to Machine Learning, the Final Stage of Grief
- 1.13.2. Computer vision
- 1.13.3. Bayesian statistics and complex systems complex_systems
- 1.13.4. AI Agents are not Artists
- 1.13.5. The Physics Principle That Inspired Modern AI Art | Quanta Magazine → Stable Diffussion explained
- 1.13.6. On the Bias-Variance Tradeoff: Textbooks Need an Update
- 1.14. Scaling laws in complex systems
- 1.15. PCA & SVD & EFA
- 1.16. Wavelets
- 1.17. Bayesian
- 1.18. LDA
- 1.19. NLP
- 1.19.1. Gensim: Topic modelling for humans
- 1.19.2. A Beginner’s Guide to Word2Vec and Neural Word Embeddings | Pathmind
- 1.19.3. Word embedding - Wikipedia
- 1.19.4. The Illustrated Word2vec – Jay Alammar – Visualizing machine learning one concept at a time.
- 1.19.5. Latent Dirichlet allocation - Wikipedia
- 1.19.6. Latent semantic analysis - Wikipedia
- 1.19.7. Cognition, Convexity, and Category Theory | The n-Category Café
- 1.19.8. Vectorization Techniques in NLP [Guide]
- 1.19.9. An Introduction to Unsupervised Topic Segmentation with Implementation in Python | Naveed Afzal
- 1.20. Vector Search
- 1.21. Transformers
- 1.21.1. Positional encoding and fourier transform
- 1.21.2. Rotary Embeddings: A Relative Revolution | EleutherAI Blog
- 1.21.3. “Attention”, “Transformers”, in Neural Network “Large Language Models”
- 1.21.4. https://github.com/rasbt/LLMs-from-scratch
- 1.21.5. Mapping the Mind of a Large Language Model \ Anthropic
- 1.22. Regularization
- 1.23. From Scratch
- 1.24. MoE
- 1.25. Noise 1/f fractional gaussian / fractional brownian motion
- 1.26. Notebooks
- 1.27. Artificial Intelligence
- 1.28. Softmax vs Logistic
- 1.29. Dive into Deep Learning — Dive into Deep Learning 1.0.0-beta0 documentation
- 1.30. Metrics & Scoring
- 1.31. Divergences
- 1.32. Metric Space vs Topological Space
- 1.33. Entropies
- 1.34. Alternative languages for data
- 1.35. Non-Euclidean Statistics
- 1.35.1. https://en.wikipedia.org/wiki/Directional_statistics
- 1.35.2. The Fréchet Mean and Statistics in Non-Euclidean Spaces - heiDOK
- 1.35.3. Into the Wild: Machine Learning In Non-Euclidean Spaces · Stanford DAWN
- 1.35.4. https://en.wikipedia.org/wiki/Directional_statistics
- 1.35.5. hyperbolic neural networks
- 1.35.6. Maria Alonso-Pena - Google Académico - Circular mean
- 1.36. Relational Machine Learning
- 1.37. Machine Learning
- 1.38. DataForScience/Causality
- 1.39. Are Observational Studies of Social Contagion Doomed? - YouTube
- 1.39.1. What We (Should) Agree On
- 1.39.2. Social Influence Exists and Matters
- 1.39.3. Homophily Exists and Matters
- 1.39.4. Selection vs. Influence, Homophily vs. Contagion
- 1.39.5. How Do We Identify Causal Effects from Observations?
- 1.39.6. Good Control, Bad Control
- 1.39.7. Why Influence is Unidentified from Observations
- 1.39.8. Endogenous Selection Blas
- 1.39.9. A Bit More on Lags
- 1.39.10. A Bit More on Propensity Scores
- 1.39.11. A Bit More on Asymmetry
- 1.39.12. A Bit More on Parametric Modeling Assumptions
- 1.39.13. OK, What About Instruments?
- 1.39.14. Friends of Friends Are Not Instruments
- 1.39.15. Summing Up the Negative Part
- 1.39.16. Richer Measurements
- 1.39.17. Graph Clustering
- 1.40. Reading Group: Advanced Data Analysis by Prof. Cosma Shalizi - YouTube
- 1.41. AI Safety
- 1.42. Network analysis
- 1.43. How to Choose a Feature Selection Method For Machine Learning
- 1.44. https://en.wikipedia.org/wiki/Mark_d'Inverno
1. data science
1.1. Awesome data science
1.3. Seven states of randomness - Wikipedia
https://en.wikipedia.org/wiki/Seven_states_of_randomness
- Proper mild randomness: short-run portioning is even for N = 2, e.g. the normal distribution
- Borderline mild randomness: short-run portioning is concentrated for N = 2, but eventually becomes even as N grows, e.g. the exponential distribution with rate λ = 1 (and so with expected value 1//λ/ = 1)
- Slow randomness with finite delocalized moments: scale factor increases faster than q but no faster than \[ \(\sqrt[w]{q}\) \], w < 1
- Slow randomness with finite and localized moments: scale factor increases faster than any power of q, but remains finite, e.g. the lognormal distribution and importantly, the bounded uniform distribution (which by construction with finite scale for all q cannot be pre-wild randomness.)
- Pre-wild randomness: scale factor becomes infinite for q > 2, e.g. the Pareto distribution with α = 2.5
- Wild randomness: infinite second moment, but finite moment of some positive order, e.g. the Pareto distribution with \[ \(\alpha \leq 2\) \]
- Extreme randomness: all moments are infinite, e.g. the log-Cauchy distribution
1.4. Resources
1.6. Forward problems vs inverse problems: It’s easier to validate than to generate
Es más fácil etiquetar que inferir, o es más fácil validar si algo está bien hecho que hacerlo/generarlo. Por lo tanto, también es más fácil elegir cuál es la mejor alternativa entre una lista cerrada que generar “de cero”
Forward problem: aplicar la fórmula de la gravedad (~deducción?)
Inverse problem: proponer una fórmula para la gravedad (~inducción?)
https://en.wikipedia.org/wiki/Inverse_problem
Corolario para LLM: es más sencillo para una LLM elegir la mejor opción entre dos opciones (que se le pasan como texto) que generar la mejor opción (y que elegir de primeras entre múltiples opciones)
El aprendizaje supervisado es una “herramienta de aumentado matemático”: te permite desplegar mayores habilidades matemáticas de las que tienes, porque las está generando el modelo y tú simplemente le pasas ejemplos
[2111.04731] Survey of Deep Learning Methods for Inverse Problems
In principle, every deep learning framework could be interpreted as solving some sort of inverse problem, in the sense that the network is trained to take measurements and to infer, from given ground truth, the desired unknown state
Machine Learning convierte un inverse problem en un forward problem
https://en.wikipedia.org/wiki/Manifold_hypothesis
https://en.wikipedia.org/wiki/Whitney_embedding_theorem
1.8. Evolutionary Tree of LLMs
- [2307.09793] On the Origin of LLMs: An Evolutionary Tree and Graph for 15,821 Large Language Models
- Constellation
- llm evolutionary tree - Google Search
- Mooler0410/LLMsPracticalGuide: A curated list of practical guide resources of LLMs (LLMs Tree, Examples, Papers)
https://www.reddit.com/r/MachineLearning/comments/13wkcn3/d_llm_evolutionare_tree_from_the_practical_guides/ - LLM Evolutionary Tree. LLM Proliferation. – blog.biocomm.ai
1.9. data science links
1.9.1. links
- https://github.com/conordewey3/DS-Career-Resources
- Data Science Course Repositories
- http://vrl.cs.brown.edu/color Paletas de color con distancia perceptual
- Tipos de visualizaciones menos conocidas
- Ejemplos de todas las gráficas
- Visualización de vientos y fuentes de datos meteorológicas
- Chuleta de redes neuronales
- Chuleta de redes neuronales (Fuente original)
- DynamicMath Interactive Mathematical Applets & Animations
- The danger of AI is weirder than you think
IA explicada, por qué no funciona porque intenta “hacer trampas” porque no le has definido bien lo que es “hacer trampas” - Neural networks and deep learning A visual proof that neural nets can compute any function
- [1908.03682] Natural-Logarithm-Rectified Activation Function in Convolutional Neural Networks
- Machine learning’s crumbling foundations (Doing Data Science with bad data) https://link.medium.com/ddGMY1uRVib
1.9.2. Visualización de datos
1.9.3. k-means y Voronoi
1.9.4. DBSCAN
- https://www.naftaliharris.com/blog/visualizing-dbscan-clustering/
- https://en.wikipedia.org/wiki/DBSCAN#Parameter_estimation
- OPTICS algorithm - Wikipedia
Its basic idea is similar to DBSCAN, but it addresses one of DBSCAN’s major weaknesses: the problem of detecting meaningful clusters in data of varying density. To do so, the points of the database are (linearly) ordered such that spatially closest points become neighbors in the ordering. Additionally, a special distance is stored for each point that represents the density that must be accepted for a cluster so that both points belong to the same cluster. This is represented as a dendrogram.
1.9.4.1. Centrality Algorithms- A bird"s eye view - Part 1
https://www.reddit.com/r/programming/comments/rztvve/centrality_algorithms_a_birds_eye_view_part_1/
There are many different centrality algorithms, but most of them fall into one of three categories: degree, betweenness, and closeness.
Degree centrality is simply the number of connections a node has. Betweenness centrality measures how often a node is the shortest path between two other nodes. Closeness centrality measures how close a node is to all other nodes.
1.9.5. Local outlier factor - Wikipedia (Basado en densidad como DBSCAN)
1.9.6. geocosas
1.9.7. https://github.com/easystats/ R repos
- report Automated reporting of objects in R
- parameters Computation and processing of models’ parameters
- performance Models’ quality and performance metrics (R2, ICC, LOO, AIC, BF, …)
- modelbased Estimate effects, contrasts and means based on statistical models
- insight Easy access to model information for various model objects
- effectsize Compute and work with indices of effect size and standardized parameters
- easystats The R easystats-project
- datawizard Magic potions to clean and transform your data
- bayestestR Utilities for analyzing Bayesian models and posterior distributions
- see Visualisation toolbox for beautiful and publication-ready figures
- correlation Methods for Correlation Analysis
- circus Contains a variety of fitted models to help the systematic testing of other packages
- blog The collaborative blog
1.9.8. An intuitive, visual guide to copulas — While My MCMC Gently Samples
1.9.9. Generación de datos
1.9.10. Structural Temporal Series
Temporal Series on continuous time
https://structural-time-series.fastforwardlabs.com/
https://en.wikipedia.org/wiki/Ornstein%E2%80%93Uhlenbeck_process → AR(1) process
1.10. Bootstrapping
1.10.1. Bootstrap is better than p-values
https://link.medium.com/ROYrhtRb7lb
Once you start using the Bootstrap, you’ll be amazed at its flexibility. Small sample size, irregular distributions, business rules, expected values, A/B tests with clustered groups: the Bootstrap can do it all!
1.10.2. https://www.kdnuggets.com/2017/06/7-techniques-handle-imbalanced-data.html
Undersampling in Python
g = (df.groupby('categorical_col')) balanced = g.apply(lambda x: x.sample(g.size().min()).reset_index(drop=True))
1.11. Taguchi Matrix / Taguchi Arrays
1.12. Common Statistical Tests Are Linear Models
Concerning the teaching of “non-parametric” tests in intro-courses, I think that we can justify lying-to-children and teach “non-parametric”“ tests as if they are merely ranked versions of the corresponding parametric tests. It is much better for students to think “ranks!” than to believe that you can magically throw away assumptions. Indeed, the Bayesian equivalents of “non-parametric”” tests implemented in JASP literally just do (latent) ranking and that’s it. For the frequentist “non-parametric”" tests considered here, this approach is highly accurate for N > 15.
«With non parametric you can have a monotonic relation between varaibles instead of a linear one»
- Assumption of Normality
- Generalized linear model - Wikipedia
- Quantile regression - Wikipedia predict the median instead of the mean
1.13. Links
1.13.1. Moving from Statistics to Machine Learning, the Final Stage of Grief
1.13.2. Computer vision
1.13.3. Bayesian statistics and complex systems complex_systems
https://link.medium.com/rQ2Le9gxvcb
This (frequentist) might work for any very simple experiment, but it is fundamentally against Cohen & Stewart’s (1995) ideas which think of natural systems, hence a higher level of complexity than simple experiments comparable to rolling dices. They believe that systems can change over time, regardless of anything that happened in the past, and can develop new phenomena which have not been present to-date. This point of argumentation is again very much aligned with the definition of complexity from a social sciences angle (see emergence).
1.13.4. AI Agents are not Artists
1.13.5. The Physics Principle That Inspired Modern AI Art | Quanta Magazine → Stable Diffussion explained
1.16. Wavelets
- Ingrid Daubechies: Wavelet bases: roots, surprises and applications - YouTube
Yves Meyer, Stefan Meyer wavelet synthesis (1980s)
John Klauder, Alexandre Grossman & Thierry Paul (Squeezed States)
→ - Yves Meyer: restoring the role of mathematics in signal and image processing
- Fundamental Papers in Wavelet Theory
- The Haar wavelet is like a decision tree
- Least-squares spectral analysis - Wikipedia
- Linear canonical transformation - Wikipedia
- Wavelets: a mathematical microscope - YouTube
- Fractal Fract | Free Full-Text | A Quantum Wavelet Uncertainty Principle
qK - Books on wavelets
- time frequency - Wavelet Scattering explanation? - Signal Processing Stack Exchange
1.18. LDA
Linear discriminant analysis - Wikipedia
is a generalization of Fisher’s linear discriminant, a method used to find a linear combination of features that characterizes or separates two or more classes of objects or events. The resulting combination may be used as a linear classifier, or, more commonly, for dimensionality reduction before later classification.
LDA is closely related to analysis of variance (ANOVA) and regression analysis, which also attempt to express one dependent variable as a linear combination of other features or measurements.
However, ANOVA uses categorical independent variables and a continuous dependent variable, whereas discriminant analysis has continuous independent variables and a categorical dependent variable.
Logistic regression and probit regression are more similar to LDA than ANOVA is, as they also explain a categorical variable by the values of continuous independent variables.
These other methods are preferable in applications where it is not reasonable to assume that the independent variables are normally distributed, which is a fundamental assumption of the LDA method.
LDA is also closely related to principal component analysis (PCA) and factor analysis in that they both look for linear combinations of variables which best explain the data.
LDA explicitly attempts to model the difference between the classes of data.
PCA, in contrast, does not take into account any difference in class, and factor analysis builds the feature combinations based on differences rather than similarities.
Discriminant analysis is also different from factor analysis in that it is not an interdependence technique: a distinction between independent variables and dependent variables (also called criterion variables) must be made.
LDA works when the measurements made on independent variables for each observation are continuous quantities. When dealing with categorical independent variables, the equivalent technique is discriminant correspondence analysis.
1.18.0.1. Latent Dirichlet allocation - Wikipedia
1.19. NLP
1.19.3. Word embedding - Wikipedia
1.19.4. The Illustrated Word2vec – Jay Alammar – Visualizing machine learning one concept at a time.
1.19.8. Vectorization Techniques in NLP [Guide]
Bag of Words
Count occurrences of a word
Example:
from sklearn.feature_extraction.text import CountVectorizer cv = CountVectorizer(ngram_range=(2,2)) sents = ['coronavirus is a highly infectious disease', 'coronavirus affects older people the most', 'older people are at high risk due to this disease'] X = cv.fit_transform(sents) X = X.toarray()
TF-IDF (Term Frequency–Inverse Document Frequency)
\begin{equation*} TF = \frac{\text{Frequency of word in a document}}{\text{Total number of words in that document}} \end{equation*} \begin{equation*} DF = \frac{\text{Documents containing word W}}{\text{Total number of documents}} \end{equation*} \begin{equation*} IDF = \log \left( \frac{\text{Documents containing word W}}{\text{Total number of documents}} \right) \end{equation*}
Corrects over-counting of articles, prepositions and conjuctions
Example:
from sklearn.feature_extraction.text import TfidfVectorizer tfidf = TfidfVectorizer() transformed = tfidf.fit_transform(sents) import pandas as pd df = pd.DataFrame(transformed[0].T.todense(), index=tfidf.get_feature_names(), columns=["TF-IDF"]) df = df.sort_values('TF-IDF', ascending=False)
- Word2vec - Wikipedia
1.20. Vector Search
- currentslab/awesome-vector-search: Collections of vector search related libraries, service and research papers
- [2403.05440] Is Cosine-Similarity of Embeddings Really About Similarity?
Embeddings are optimized for next-token prediction, not for recommendation per se
- Lecture 12: Embedding models
- Embeddings: What they are and why they matter
1.21. Transformers
- Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention) – Jay Alammar – Visualizing machine learning one concept at a time.
- The Illustrated Transformer – Jay Alammar – Visualizing machine learning one concept at a time.
- The Illustrated GPT-2 (Visualizing Transformer Language Models) – Jay Alammar – Visualizing machine learning one concept at a time.
- The Annotated Transformer
- Categories | Ketan Doshi Blog
- Transformers from scratch | peterbloem.nl
multi-head attention: each head captures a “type” of relation
mary,gave,roses,to,susan → who gave the roses? and who received them? need 2 heads of attention - poloclub.github.io/transformer-explainer/
- Transformers from Scratch
- Transformers Explained Visually - Overview of Functionality | Ketan Doshi Blog
- Transformers Explained Visually - How it works, step-by-step | Ketan Doshi Blog
- Transformers Explained Visually - Multi-head Attention, deep dive | Ketan Doshi Blog
- Transformers Explained Visually - Not just how, but Why they work so well | Ketan Doshi Blog
- Transformer Models 101: Getting Started — Part 1 | by Nandini Bansal | Feb, 2023 | Towards Data Science
- https://projector.tensorflow.org/
- openai/transformer-debugger
1.21.1. Positional encoding and fourier transform
Transformer Architecture: The Positional Encoding - Amirhossein Kazemnejad’s Blog
Fourier transform is ubiquitous, but I have a “theory” that angle encoding in quantum machine learning could’ve been the source of inspiration for positional encoding
- Master Positional Encoding: Part I | by Jonathan Kernes | Towards Data Science
- Master Positional Encoding: Part II | by Jonathan Kernes | Towards Data Science
- Fourier Feature Encoding
1.21.2. Rotary Embeddings: A Relative Revolution | EleutherAI Blog
Used by GPT-J
1.22. Regularization
1.24. MoE
1.25. Noise 1/f fractional gaussian / fractional brownian motion
1.26. Notebooks
1.27. Artificial Intelligence
1.28. Softmax vs Logistic
1.30. Metrics & Scoring
1.30.1. Types of Metrics
1.30.1.1. Distance between spatio-temporal series
- User Guide — tslearn stable documentation
Dynamic Time Warping Detect underdog stocks to buy during pandemic using Dynamic Time Warping
Longest Common Subsequence
Kernel Methods - GPS Trajectories Clustering in Python
- The windowed scalogram difference: a novel wavelet tool for comparing time series
1.30.1.2. Correlations can also be metrics
The trick is to replace what is usually the mean (sometimes the median) with an estimator from a model
- https://en.wikipedia.org/wiki/Pearson_correlation_coefficient
\[ \rho = \frac{(\vec{x} - \hat{\vec{x}}) \cdot (\vec{y} - \hat{\vec{y}})}{\sqrt{[(\vec{y} - \hat{\vec{y}}) \cdot (\vec{y} - \hat{\vec{y}})] [(\vec{x} - \hat{\vec{x}}) \cdot (\vec{x} - \hat{\vec{x}})]}} \]
The Pearson distance is \[ d = 1 - \rho \] or better \[ d = \frac{1-\rho}{2} \] with \[ 0 \le d \le 1 \] - https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient nonparametric measure of rank correlation (statistical dependence between the rankings of two variables). It assesses how well the relationship between two variables can be described using a monotonic function.
- https://en.wikipedia.org/wiki/Coefficient_of_determination
- https://en.wikipedia.org/wiki/Pseudo-R-squared
The coefficient of determination can be thought of as a ratio of the variance between any two models, and this can be interpreted as 1 minus the ratio of each residual vector.
Usually the base model is the mean or the median: \[ \hat{y}_0 = \langle y \rangle \]
Similarity between models:
There has to be some time of bound \[ {\sum (y - \hat{y}_1)^2} \le {\sum (y - \hat{y}_0)^2} \]. Then we can interpret: - \[ R^2 = 1 - \frac{\sum (y - \hat{y}_1)^2}{\sum (y - \hat{y}_0)^2} = 1 - \frac{(\vec{y} - \hat{\vec{y}}_1) \cdot (\vec{y} - \hat{\vec{y}}_1)}{(\vec{y} - \hat{\vec{y}}_0) \cdot (\vec{y} - \hat{\vec{y}}_0)} \]
as similarity between model \[\hat{y}_1\] and \[\hat{y}_0\]. If \[R^2\] is close to 0, then the variance of model \[\hat{y}_1\] is close to the variance of model \[\hat{y}_0\]. If \[R^2\] is close to 1, then the model \[\hat{y}_1\] has lower variance than \[\hat{y}_0\]
1.30.1.3. Integral of a product as dot product (inner product)
\[\vec{f} \cdot \vec{g} = \int_a^b{f(t) \cdot g(t)\:dt} = \sum_i^N f_i \cdot g_i \Delta t_i\]
1.30.2. Directional Statistics
1.30.3. Diversity index
1.30.4. Similarity measure
1.30.5. Fréchet Distances
1.30.6. Fréchet mean - Wikipedia
1.30.8. Correlations
- How to combine R-squared: https://en.wikipedia.org/wiki/Fisher_transformation
- The Search for Categorical Correlation | by Shaked Zychlinski | Towards Data Science
-
import scipy.stats as ss def cramers_v(x, y): confusion_matrix = pd.crosstab(x,y) chi2 = ss.chi2_contingency(confusion_matrix)[0] n = confusion_matrix.sum().sum() phi2 = chi2/n r,k = confusion_matrix.shape phi2corr = max(0, phi2-((k-1)*(r-1))/(n-1)) rcorr = r-((r-1)**2)/(n-1) kcorr = k-((k-1)**2)/(n-1) return np.sqrt(phi2corr/min((kcorr-1),(rcorr-1)))
-
1.31. Divergences
1.32. Metric Space vs Topological Space
- To convert a metric space into a topological space tipically you need a decision boundary, which sometimes it’s arbitrary (p > 0.5 or so)
Is there a way to discover the topological numbers that describe your problem and change discontinously like a phase shift in a more natural way other than ROC curves? - Position: Topological Deep Learning is the New Frontier for Relational Learning
- Topological deep learning - Wikipedia
- lrnzgiusti/awesome-topological-deep-learning: A curated list of topological deep learning (TDL) resources and links.
1.33. Entropies
1.33.1. Iciar Martínez - Entropía de Shannon para bancos de peces
Ikerbasque UPV/EHU (Bióloga Marina)
Usa entropía de shannon para medir la actividad de un grupo de peces
https://www.ikerbasque.net/es/iciar-marti-nez
https://www.mdpi.com/1099-4300/20/2/90 → el artículo en cuestión
1.33.2. Entropy coding - Wikipedia
1.33.2.1. Huffman coding - Wikipedia
1.33.2.2. Arithmetic coding - Wikipedia
1.33.2.3. Range coding - Wikipedia
1.34. Alternative languages for data
1.34.2. Rust
1.34.2.1. Are We Learning Yet? - Rust
1.34.2.2. evcxr/evcxr_jupyter at main · google/evcxr → Rust for Data Science?
1.34.3. Relay (TVM) as a backend
Apache TVM is an open source machine learning compiler framework for CPUs, GPUs, and machine learning accelerators. It aims to enable machine learning engineers to optimize and run computations efficiently on any hardware backend.
Language Reference — tvm documentation
1.35. Non-Euclidean Statistics
1.35.5.
1.35.6. Maria Alonso-Pena - Google Académico - Circular mean
1.36. Relational Machine Learning
1.37. Machine Learning
1.38. DataForScience/Causality
1.39. Are Observational Studies of Social Contagion Doomed? - YouTube
1.39.1. What We (Should) Agree On
- Social influence exists, is causal, and matters
- Observations of social network don’t, generally, identify influence
- Getting identification will need either special assumptions or richer data
—
- There may be some ways forward
- Richer measurements
- Network clustering (maybe)
- Elaborated mechanisms
- Partial identification
- Richer measurements
1.39.2. Social Influence Exists and Matters
Example: Language: We are all speaking the same language because of social influence. Also happens in small scale through local dialects
Also skills, ideologies, religions, stories, laws, …
Not just copying
Consequences of influence depend bery strongly on the network structure
1.39.2.1. Experiment
Binary choice network (black/red color), random initialization. In each step, some node pick another node at random and become the same color
Spontaneous regions form without any deep reason
1.39.3. Homophily Exists and Matters
1.39.3.1. https://en.wikipedia.org/wiki/Network_homophily
Network homophily refers to the theory in network science which states that, based on node attributes, similar nodes may be more likely to attach to each other than dissimilar ones. The hypothesis is linked to the model of preferential attachment and it draws from the phenomenon of homophily in social sciences and much of the scientific analysis of the creation of social ties based on similarity comes from network science
1.39.3.2. https://en.wikipedia.org/wiki/Homophily
Homophily is a concept in sociology describing the tendency of individuals to associate and bond with similar others
1.39.4. Selection vs. Influence, Homophily vs. Contagion
- Selection
- correlation between disconnected nodes, just because you are selecting some nodes with common properties others are correlated
- Influence
- correlaciton between connected nodes
1.39.5. How Do We Identify Causal Effects from Observations?
1.39.5.1. Confounding - Wikipedia
1.39.5.2. Controls
- Controlling for a variable - Wikipedia
- Control for variables that block all indirect pathways linking cause to effect
- Don’t open indirect paths
- Don’t block direct paths (“back-door criterion”)
1.39.5.3. Instruments
Find independent variation in the cause and trace it throught to the effect
1.39.5.4. Mechanism
Find all the mediating variables linking cause to effect throught direct channels (“front-door criterion”)
1.39.6. Good Control, Bad Control
1.39.8. Endogenous Selection Blas
1.39.9. A Bit More on Lags
1.39.10. A Bit More on Propensity Scores
1.39.11. A Bit More on Asymmetry
1.39.13. OK, What About Instruments?
1.39.15. Summing Up the Negative Part
1.39.16. Richer Measurements
1.39.17. Graph Clustering
1.41. AI Safety
1.43. How to Choose a Feature Selection Method For Machine Learning
1.44. https://en.wikipedia.org/wiki/Mark_d'Inverno
Interesting computer scientist, agent based modelling