Garden of 📝
Search
CTRL + K
Garden of 📝
Search
CTRL + K
Home
Recent Notes
Literature Notes
Advanced NLP with Scipy
Deep Learning by Ian Goodfellow
DS & Algo Interview
How To 100M Learning Text Video
How to Read a Paper
How To Write a Paper
ML Interview
Papers
COIN
MM-LLMs
MultiVENT
OpenPI-C
Templates
Paper Template
Permanent Notes
Topic Template
Zotero Template
Topics
activation-function
algorithm
behavioral
deep-learning
evaluation
interview
loss-in-ml
machine-learning
math
nlp
paper
probability
statistics
vision
Zettelkasten
3 key question in data visualization
Accuracy
Activation Function
Active Learning
AdaBoost vs. Gradient Boosting vs. XGBoost
Adaboost
AdaDelta
AdaGrad
Adam
ADASYN
Adjusted R-squared Value
Alternative Hypothesis
Amazon Leadership Principles
Ancestral Sampling
Area Under Precision Recall Curve (AUPRC)
Attention
AUC Score
Autoencoder for Denoising Images
Autoencoder
Averaging in Ensemble Learning
Back Propagation
Backward Feature Elimination
Bag of Words
Bagging
Batch Normalization
Bayes Theorem
Bayesian Optimization Hyperparameter Finding
Beam Search
Behavioral Interview
BERT Embeddings
BERT
Bias & Variance
Bidirectional RNN or LSTM
Binary Cross Entropy
Binning or Bucketing
Binomial Distribution
bisect_left vs. bisect_right
BLEU Score
Boosting
Box Plot
Byte Level BPE
Byte Pair Encoding (BPE)
Causality vs. Correlation
Central Limit Theorem
Chain Rule
Challenges of NLP
Character Tokenizer
CNN
Co-occurrence based Word Embeddings
Co-Variance
Collinearity
Combination
Conditional Probability
conditionally-independent-joint-distribution
Confusion Matrix
Connections - Log Likelihood, Cross Entropy, KL Divergence, Logistic Regression, and Neural Networks
Contextualized Word Embeddings
Continuous Bag of Words
Continuous Random Variable
Contrastive Learning
Contrastive Loss
Convex vs Nonconvex Function
Cosine Similarity
Count based Word Embeddings
Cross Entropy
Cross Validation
Curse of Dimensionality
Data Augmentation
Data Imputation
Data Monitoring (DVC)
Data Normalization
data visualization
DBScan Clustering
Debugging Deep Learning
Decision Boundary
Decision Tree (Classification)
Decision Tree (Regression)
Decision Tree
Decoding Strategies
Density Sparse Data
Dependent Variable
Derivative
determinant
diagonal-matrix
Differentiation of Product
Differentiation
Digit Dp
Dimensionality Reduction
Discrete Random Variable
Discriminative vs. Generative Models
doing-literature-review
Domain vs. Codomain vs. Range
Dropout
Dying ReLU
Dynamic Programming (DP) in python
Eigendecomposition
eigenvalue-eigenvector
Elastic Net Regression
ELMo Embeddings
Ensemble Learning
Entropy and Information Gain
Entropy
Essential Visualizations
Estimated Mean
Estimated Standard Deviation
Estimated Variance
Euclidian Distance
Euclidian Norm
Exhaustive Search
Expected Value for Continuous Events
Expected Value for Discrete Events
Expected Value
Exploding Gradient
Exponential Distribution
Extrinsic Evaluation
F-Beta Score
F1 Score
False Negative Error
False Positive Rate
FastText Embedding
Feature Engineering
Feature Extraction
Feature Preprocessing
Feature Selection
Finding Co-relation between two data or distribution
Forward Feature Selection
Foundation Model
frobenius-norm
fully-independent-join-distribution
fully-joint-joint-distribution
Gaussian Distribution
GBM
Generalized Discriminant Analysis (GDA)
Genetic Algorithm Hyperparameter Finding
Gini Impurity
Global Minima
GloVe Embedding
Gradient Boost (Classification)
Gradient Boost (Regression)
Gradient Boosting
Gradient Clipping
Gradient Descent
Gradient
Graph Convolutional Network (GCN)
Greedy Decoding
Grid Search Hyperparameter Finding
Group Normalization
GRU
Gumbel Softmax
Handling Imbalanced Dataset
Handling Missing Data
Handling Outliers
Heapq (nlargest or nsmalles)
Hierarchical Clustering
Hierarchical Softmax
Hinge Loss
Histogram
Homonym or Polysemy
How to Choose Kernel in SVM
How to combine in Ensemble Learning
How to prepare for Behavioral Interview
Huber Loss
Hyperparameters
Hypothesis Testing
identity-matrix
Independent Component Analysis (ICA)
Independent Variable
InfoNCE Loss
Instructional Websites
Integration by Parts or Integration of Product
Internal Covariate Shift
Interquartile Range (IQR)
Interview Scheduling
Interview
Intrinsic Evaluation
Jaccard Distance
Jaccard Similarity
joint-distribuition
jupyter-notebook-on-server
K Fold Cross Validation
K-means Clustering
K-means vs. Hierarchical
K-nearest Neighbor (KNN)
Kernel in SVM
Kernel Regression
Kernel Trick
KL Divergence
L1 or Lasso Regression
L1 vs. L2 Regression
L2 or Ridge Regression
Label Encoding
Layer Normalization
Leaky ReLU
Learning Rate Scheduler
Lemmatization
LightGBM
Likelihood
Line Equation
Linear Discriminant Analysis (LDA)
Linear Regression
Local Minima
Log (Odds Ratio)
Log (Odds)
Log Scale
Log-cosh Loss
Logistic Regression vs. Neural Network
Logistic Regression
Loss vs. Cost
lp-norm
LSTM
Machine Learning Algorithm Selection
Machine Learning vs. Deep Learning
Majority vote in Ensemble Learning
Manhattan Distance
Margin in SVM
Marginal Probability
Masked Language Modeling
matplotlib functions
matplotlib legend
Matrices
Matrix Factorization
max-norm
Maximal Margin Classifier
Maximum Likelihood
Mean Absolute Error (MAE)
Mean Absolute Percentage Error (MAPE)
Mean Squared Error (MSE)
Mean Squared Logarithmic Error (MSLE)
Mean
Median
Merge K-sorted List
Merge Overlapping Intervals
Meteor Score
Min Max Normalization
Mini Batch SGD
ML System Design
Mode
Model Based vs. Instance Based Learning
Multi Class Cross Entropy
Multi Label Cross Entropy
Multi Layer Perceptron
Multicollinearity
Multiset
Multivariable Linear Regression
Multivariate Linear Regression
Multivariate Normal Distribution
Mutual Information
N-gram Method
Naive Bayes
Negative Log Likelihood
Negative Sampling
Nesterov Accelerated Gradient (NAG)
Neural Network
Next Sentence Prediction
norm
Normal Distribution
Normalization
Null Hypothesis
Odds Ratio
Odds
One Class Classification
One Class Gaussian
One Hot Vector
One vs One Multi Class Classification
One vs Rest or One vs All Multi Class Classification
Optimizers
orthogonal-matrix
orthonormal-vector
Overcomplete Autoencoder
Overfitting
Oversampling
p-value
Padding in CNN
Parameter vs. Hyperparameter
PCA vs. Autoencoder
Pearson Correlation
Perceptron
Permutation
Perplexity
Plots Compared
Polynomial Kernel
Polynomial Regression
Pooling
Population
Posterior Probability
Precision Recall Curve (PRC)
Precision
Principal Component Analysis (PCA)
Prior Probability
Probability Density Function
Probability Distribution
Probability Mass Function
Probability vs. Likelihood
Problem Solving Algorithm Selection
Proximal Policy Optimization (PPO)
Pruning in Decision Tree
PyTorch Loss Functions
Questions to ask in a Interview?
Quintile or Percentile
Quotient Rule or Differentiation of Division
R-squared Value
Radial Basis Kernel
Random Forest
Random Variable
Recall
Recommender System (RecSys)
Regularization
Reinforcement Learning from Human Feedback (RLHF)
Reinforcement Learning
Relational GCN
ReLU
RMSProp
RNN
ROC Curve
Root Mean Squared Error (RMSE)
Root Mean Squared Logarithmic Error (RMSLE)
ROUGE-L Score
ROUGE-LSUM Score
ROUGE-N Score
Saddle Points
scalar
Second Order Derivative or Hessian Matrix
Self Attention vs. Cross Attention
Self-Supervised Learning
Semi-supervised Learning
Sensitivity
SentencePiece Tokenization
Sigmoid Function
Sigmoid Kernel
Simple Linear Regression
Singular Value Decomposition (SVD)
Skip Gram Model
SMOTE
Soft Margin in SVM
Softmax
Softplus
Softsign
Some Common Behavioral Questions
Sources of Uncertainty
spacy-doc-object
spacy-doc-span-token
spacy-explanation-of-labels
spacy-matcher
spacy-named-entities
spacy-operator-quantifier
spacy-pattern
spacy-pipeline
spacy-pos
spacy-semantic-similarity
spacy-syntactic-dependency
Specificity
Splitting tree in Decision Tree
Stacking or Meta Model in Ensemble Learning
Standard deviation
Standardization or Normalization
Standardization
Statistical Power
Statistical Significance
Stemming
Stochastic Gradient Descent (SGD)
Stochastic Gradient Descent with Momentum
Stop Words
Stratified K Fold Cross Validation
Stride in CNN
Stump
Sub-sampling in Word2Vec
Sub-word Tokenizer
Supervised Learning
Support Vector Machine (SVM)
Support Vector
Surprise
SVC
Swallow vs. Deep Learning
t-SNE
Tanh
Text Preprocessing
TF-IDF
Three Way Partioning
Time Complexity of ML Models
Tokenizer
trace-operator
Training a Deep Neural Network
Transformer
Triplet Loss
True Negative Rate
True Positive Rate
Two Pointer
Type 1 Error vs. Type 2 Error
Undercomplete Autoencoder
Undersampling
Uniform Distribution
Unigram Tokenization
unit-vector
Unsupervised Learning
Vanishing Gradient
Variance
Variational Autoencoder
vector
Weakly Supervised Learning
Weight Initialization
Word Embeddings
Word Tokenizer
Word2Vec Embedding
WordPiece Tokenization
XGBoost
Z-score
Euclidian Norm
#math
#interview
Euclidian Norm
Euclidian norm or
l
2
norm is the square root of sum of all the squared vector values
|
|
A
|
|
2
=
∑
i
A
i
2
Related Notes
Quotient Rule or Differentiation of Division
Differentiation of Product
Gradient
Second Order Derivative or Hessian Matrix