Garden of 📝
Search
CTRL + K
Garden of 📝
Search
CTRL + K
Home
Recent Notes
Literature Notes
Advanced NLP with Scipy
Deep Learning by Ian Goodfellow
DS & Algo Interview
How To 100M Learning Text Video
How to Read a Paper
How To Write a Paper
ML Interview
Papers
COIN
Large Language Models are Zero-Shot Rankers for Recommender Systems
MM-LLMs
Molmo and PixMo
MultiVENT
OpenPI-C
Semantic Product Search for Matching Structured Product Catalogs in E-Commerce
What is More Likely to Happen Next
Templates
Paper Template
Permanent Notes
Topic Template
Zotero Template
Topics
activation-function
algorithm
behavioral
deep-learning
evaluation
interview
loss-in-ml
machine-learning
math
nlp
paper
probability
retrieval
statistics
vision
Zettelkasten
3 key question in data visualization
Accuracy
Activation Function
Active Learning
AdaBoost vs. Gradient Boosting vs. XGBoost
Adaboost
AdaDelta
AdaGrad
Adam
ADASYN
Adjusted R-squared Value
Alternative Hypothesis
Amazon Leadership Principles
Ancestral Sampling
Area Under Precision Recall Curve (AUPRC)
Attention
AUC Score
Autoencoder for Denoising Images
Autoencoder
Averaging in Ensemble Learning
Back Propagation
Backward Feature Elimination
Bag of Words
Bagging
Batch Normalization
Bayes Theorem
Bayesian Optimization Hyperparameter Finding
Beam Search
Behavioral Interview
BERT Embeddings
BERT
Bias & Variance
Bidirectional RNN or LSTM
Binary Cross Entropy
Binning or Bucketing
Binomial Distribution
bisect_left vs. bisect_right
BLEU Score
Boosting
Box Plot
Byte Level BPE
Byte Pair Encoding (BPE)
Causal Language Modeling
Causality vs. Correlation
Central Limit Theorem
Chain Rule
Challenges of NLP
Character Tokenizer
CNN
Co-occurrence based Word Embeddings
Co-Variance
Collinearity
Combination
Conditional Probability
conditionally-independent-joint-distribution
Confusion Matrix
Connections - Log Likelihood, Cross Entropy, KL Divergence, Logistic Regression, and Neural Networks
Contextualized Word Embeddings
Continuous Bag of Words
Continuous Random Variable
Contrastive Learning
Contrastive Loss
Convex vs Nonconvex Function
Cosine Similarity
Count based Word Embeddings
Cross Entropy
Cross Validation
Crossed Feature
Curse of Dimensionality
Data Augmentation
Data Imputation
Data Monitoring (DVC)
data visualization
DBScan Clustering
Debugging Deep Learning
Decision Boundary
Decision Tree (Classification)
Decision Tree (Regression)
Decision Tree
Decoder Only Transformer
Decoding Strategies
Density Sparse Data
Dependent Variable
Derivative
determinant
diagonal-matrix
Differentiation of Product
Differentiation
Digit Dp
Dimensionality Reduction
Discrete Random Variable
Discriminative vs. Generative Models
doing-literature-review
Domain vs. Codomain vs. Range
Dropout
Dying ReLU
Dynamic Programming (DP) in python
Eigendecomposition
eigenvalue-eigenvector
Elastic Net Regression
ELMo Embeddings
Encoder Only Transformer
Ensemble Learning
Entropy and Information Gain
Entropy
Essential Visualizations
Estimated Mean
Estimated Standard Deviation
Estimated Variance
Euclidian Distance
Euclidian Norm
Exhaustive Search
Expected Value for Continuous Events
Expected Value for Discrete Events
Expected Value
Exploding Gradient
Exponential Distribution
Extrinsic Evaluation
F-Beta Score
F-Beta@K
F1 Score
False Negative Error
False Positive Rate
FastText Embedding
Feature Engineering
Feature Extraction
Feature Hashing
Feature Preprocessing
Feature Selection
Finding Co-relation between two data or distribution
Fine Tuning Large Language Models
Forward Feature Selection
Foundation Model
frobenius-norm
fully-independent-join-distribution
fully-joint-joint-distribution
Gaussian Distribution
GBM
Generalized Discriminant Analysis (GDA)
Genetic Algorithm Hyperparameter Finding
Gini Impurity
Global Minima
GloVe Embedding
Gradient Boost (Classification)
Gradient Boost (Regression)
Gradient Boosting
Gradient Clipping
Gradient Descent
Gradient
Graph Convolutional Network (GCN)
Greedy Decoding
Grid Search Hyperparameter Finding
Group Normalization
GRU
Gumbel Softmax
Handling Imbalanced Dataset
Handling Missing Data
Handling Outliers
Heapq (nlargest or nsmalles)
Hierarchical Clustering
Hierarchical Softmax
Hinge Loss
Histogram
Homonym or Polysemy
How to Choose Kernel in SVM
How to combine in Ensemble Learning
How to prepare for Behavioral Interview
Huber Loss
Hyperparameters
Hypothesis Testing
identity-matrix
Independent Component Analysis (ICA)
Independent Variable
InfoNCE Loss
Instructional Websites
Integration by Parts or Integration of Product
Internal Covariate Shift
Interquartile Range (IQR)
Interview Scheduling
Interview
Intrinsic Evaluation
Jaccard Distance
Jaccard Similarity
joint-distribuition
jupyter-notebook-on-server
K Fold Cross Validation
K-means Clustering
K-means vs. Hierarchical
K-nearest Neighbor (KNN)
Kernel in SVM
Kernel Regression
Kernel Trick
KL Divergence
L1 or Lasso Regression
L1 vs. L2 Regression
L2 or Ridge Regression
Label Encoding
Layer Normalization
Leaky ReLU
Learning Rate Scheduler
Lemmatization
LightGBM
Likelihood
Line Equation
Linear Discriminant Analysis (LDA)
Linear Regression
Local Minima
Log (Odds Ratio)
Log (Odds)
Log Scale
Log-cosh Loss
logarithm
Logistic Regression vs. Neural Network
Logistic Regression
Loss vs. Cost
lp-norm
LSTM
Machine Learning Algorithm Selection
Machine Learning vs. Deep Learning
Majority vote in Ensemble Learning
Manhattan Distance
Margin in SVM
Marginal Probability
Masked Language Modeling
matplotlib functions
matplotlib legend
Matrices
Matrix Factorization
max-norm
Maximal Margin Classifier
Maximum Likelihood
Mean Absolute Error (MAE)
Mean Absolute Percentage Error (MAPE)
Mean Reciprocal Rank (MRR)
Mean Squared Error (MSE)
Mean Squared Logarithmic Error (MSLE)
Mean
Median
Merge K-sorted List
Merge Overlapping Intervals
Meteor Score
Min Max Normalization
Mini Batch SGD
ML Case Study or ML Design
ML System Design
Mode
Model Based vs. Instance Based Learning
Multi Class Cross Entropy
Multi Label Cross Entropy
Multi Layer Perceptron
Multicollinearity
Multiset
Multivariable Linear Regression
Multivariate Linear Regression
Multivariate Normal Distribution
Mutual Information
N-gram Method
Naive Bayes
Negative Log Likelihood
Negative Sampling
Nesterov Accelerated Gradient (NAG)
Neural Network Normalization
Neural Network
Next Sentence Prediction
norm
Normal Distribution
Normalization
Null Hypothesis
Odds Ratio
Odds
One Class Classification
One Class Gaussian
One Hot Vector
One vs One Multi Class Classification
One vs Rest or One vs All Multi Class Classification
Optimizers
orthogonal-matrix
orthonormal-vector
Overcomplete Autoencoder
Overfitting
Oversampling
p-value
Padding in CNN
Parameter vs. Hyperparameter
PCA vs. Autoencoder
Pearson Correlation
Perceptron
Permutation
Perplexity
Plots Compared
Polynomial Kernel
Polynomial Regression
Pooling
Population
Posterior Probability
Precision Recall Curve (PRC)
Precision
Precision@K
Principal Component Analysis (PCA)
Prior Probability
Probability Density Function
Probability Distribution
Probability Mass Function
Probability vs. Likelihood
Problem Solving Algorithm Selection
Proximal Policy Optimization (PPO)
Pruning in Decision Tree
PyTorch Loss Functions
PyTorch Refresher
Questions to ask in a Interview?
Quintile or Percentile
Quotient Rule or Differentiation of Division
R-squared Value
Radial Basis Kernel
Random Forest
Random Variable
Recall
Recall@K
Recommender System (RecSys)
Regularization
Reinforcement Learning from Human Feedback (RLHF)
Reinforcement Learning
Relational GCN
ReLU
Reno Talk @UMBC on Scale-2024
Retrieval Metrics
RMSProp
RNN
ROC Curve
Root Mean Squared Error (RMSE)
Root Mean Squared Logarithmic Error (RMSLE)
ROUGE-L Score
ROUGE-LSUM Score
ROUGE-N Score
Saddle Points
scalar
Second Order Derivative or Hessian Matrix
Self Attention vs. Cross Attention
Self-Supervised Learning
Semi-supervised Learning
Sensitivity
SentencePiece Tokenization
Sequence-to-Sequence Model
Sigmoid Function
Sigmoid Kernel
Simple Linear Regression
Singular Value Decomposition (SVD)
Skip Gram Model
SMOTE
Soft Margin in SVM
Softmax
Softplus
Softsign
Some Common Behavioral Questions
Sources of Uncertainty
spacy-doc-object
spacy-doc-span-token
spacy-explanation-of-labels
spacy-matcher
spacy-named-entities
spacy-operator-quantifier
spacy-pattern
spacy-pipeline
spacy-pos
spacy-semantic-similarity
spacy-syntactic-dependency
Specificity
Splitting tree in Decision Tree
Stacking or Meta Model in Ensemble Learning
Standard deviation
Standardization or Normalization
Standardization
Statistical Power
Statistical Significance
Stemming
Stochastic Gradient Descent (SGD)
Stochastic Gradient Descent with Momentum
Stop Words
Stratified K Fold Cross Validation
Stride in CNN
Stump
Sub-sampling in Word2Vec
Sub-word Tokenizer
Supervised Learning
Support Vector Machine (SVM)
Support Vector
Surprise
SVC
Swallow vs. Deep Learning
t-SNE
Tanh
Text Preprocessing
TF-IDF
Three Way Partioning
Time Complexity of ML Algos
Time Complexity of ML Models
Tokenizer
Top-K in Retrieval System
trace-operator
Training a Deep Neural Network
Transformer
Triplet Loss
True Negative Rate
True Positive Rate
Two Pointer
Type 1 Error vs. Type 2 Error
Undercomplete Autoencoder
Undersampling
Uniform Distribution
Unigram Tokenization
unit-vector
Unsupervised Learning
Vanishing Gradient
Variance
Variational Autoencoder
vector
Weakly Supervised Learning
Weight Initialization
Word Embeddings
Word Tokenizer
Word2Vec Embedding
WordPiece Tokenization
XGBoost
Z-score
ML Interview
#interview
#machine-learning
#deep-learning
#nlp
#vision
#math
Integration by Parts or Integration of Product
Differentiation of Product
Quotient Rule or Differentiation of Division
Chain Rule
Permutation
Combination
Line Equation
Convex vs Nonconvex Function
#statistics
Histogram
Distribution ⭐️
Uniform Distribution
Normal Distribution
Multivariate Normal Distribution
Multinomial Normal Distribution
Gaussian Distribution
Exponential Distribution
Binomial Distribution
Poisson Distribution
Population
Mean
Mode
Median
Variance
Standard deviation
Co-Variance
Finding Co-relation between two data or distribution
Pearson Correlation
R-squared Value
Mutual Information
Cosine Similarity
⭐️
Co-Variance
Jaccard index
Chi Squared Test
Distance Metric
Manhattan Distance
Euclidian Distance
Cosine Similarity
Mahalanobis Distance
Hamming Distance
Chebychev Distance
Hypothesis Testing
Null Hypothesis
Statistical Test
p-value
Odds
Log (Odds)
Central Limit Theorem
Quintile or Percentile
Log Scale
#probability
Random Variable
Discrete Random Variable
Continuous Random Variable
Probability Distribution
Probability Mass Function
Probability Density Function
Conditional Probability
Marginal Probability
Bayes Theorem
Prior Probability
Posterior Probability
Likelihood
Negative Log Likelihood
Expected Value
Probability vs. Likelihood
Maximum Likelihood
#visualization
Plots Compared
#machine-learning
Supervised Learning
Linear Regression
⭐️
Polynomial Regression
Bayesian Regression
Logistic Regression
⭐️
Multinomial Logistic Regression
Perceptron
⭐️
Multi Layer Perceptron
⭐️
GLM
LDA
UMAP
t-SNE
Support Vector Machine (SVM)
⭐️
SVR ⭐️
SVC
Kernel in SVM
Polynomial Kernel
Radial Basis Kernel
Sigmoid Kernel
K-nearest Neighbor (KNN)
Decision Tree
⭐️
GBM
Adaboost
XGBoost
LightGBM
CatBoost
Pruning in Decision Tree
Ensemble Learning
Bagging
Random Forest
⭐️
Boosting
Gradient Boosting
⭐️
How to combine in Ensemble Learning
Naive Bayes
Gaussian ⭐️
Multinomial ⭐️
Bernouli
Complement
Categorical
Markov Chain
Unsupervised Learning
Clustering
K-means Clustering
⭐️
Hierarchical Clustering
DBScan Clustering
HDBScan Clustering
K-means vs. Hierarchical
Spectral Clustering
Gaussian Mixture Model
Dimensionality Reduction
Principal Component Analysis (PCA)
⭐️
UMAP
HeatMap
t-SNE plots
Autoencoder
Association
Apriori
Expectation Minimization
Semi-supervised Learning
Recommendation
Content Filtering ⭐️
Collaborative Filtering ⭐️
Metric Learning
Learning to Rank
Pointwise Learning to Rank
Pairwise Learning to Rank
Listwise Learning to Rank
Probabilistic Graphical Model
Conditional Random Field
Bayessian Network
#deep-learning
CNN
RNN
⭐️
LSTM
⭐️
Bidirectional RNN or LSTM
⭐️
GRU
⭐️
Autoencoder
Standard ⭐️
Variational Autoencoder ⭐️
PCA vs. Autoencoder
Overcomplete Autoencoder
Undercomplete Autoencoder
Uses: ⭐️
Autoencoder for Anomaly Detection
Autoencoder for Denoising Images
Representation Learning
Attention
Reference
Self Attention ⭐️
Masked Self Attention ⭐️
Multihead Self Attention ⭐️
Encoder-Decoder Attention
Factorized Self Attention
Flash Attention
Cross Attention
Transformer
Encoder-decoder ⭐️
Encoder Only ⭐️
Decoder Only ⭐️
Contrastive Learning
⭐️
Graph Convolutional Network (GCN)
⭐️
Relational GCN
Graph Attention Network
Word Embeddings
TF-IDF
⭐️
Word2Vec ⭐️
Ref
Continuous Bag of Words (CBOW)
Skip Gram Model
FastText ⭐️
Glove ⭐️
Elmo
BERT
Embeddings
Activation Function
Sigmoid Function
⭐️
Tanh
⭐️
Softplus
Softsign
Softmax
⭐️
ReLU
⭐️
Leaky ReLU
PReLU
ELU
SELU
Swiss ReLU
GeLU
Optimizers
Gradient Descent
⭐️
Stochastic Gradient Descent (SGD)
⭐️
Mini Batch SGD
⭐️
Stochastic Gradient Descent with Momentum
Nesterov Accelerated Gradient (NAG)
Adaptive Methods
AdaGrad
AdaDelta
RMSProp
Adam
Adamax
AMSGrad
NADAM
Generative Adversarial Network
Genetic Algorithms
Reinforcement Learning
#loss-in-ml
Entropy
⭐️
Cross Entropy
Multi Class Cross Entropy
⭐️
Multi Label Cross Entropy
⭐️
KL Divergence
⭐️
Contrastive Loss
⭐️
Triplet Loss
⭐️
InfoNCE Loss
⭐️
Mean Squared Error (MSE)
⭐️
Mean Absolute Error (MAE)
Mean Squared Logarithmic Error (MSLE)
Mean Absolute Percentage Error (MAPE)
Huber Loss
Log-cosh Loss
Poisson Loss
Hinge Loss
#evaluation
Extrinsic Evaluation
Intrinsic Evaluation
Perplexity
⭐️
Precision
Recall
Accuracy
F1 Score
⭐️
Sensitivity
⭐️
Specificity
⭐️
True Positive Rate
False Positive Rate
Confusion Matrix
⭐️
Bias & Variance
⭐️
AUC Score
ROC Curve
BLEU Score
⭐️
ROUGE-N Score
⭐️
ROUGE-L Score
⭐️
Meteor Score
BERTScore
Mean Squared Error (MSE)
Mean Absolute Error (MAE)
Root Mean Squared Error (RMSE)
Mean Absolute Percentage Error (MAPE)
R-squared Value
Root Mean Squared Logarithmic Error (RMSLE)
Regularization
L1 or Lasso Regression
⭐️
L2 or Ridge Regression
⭐️
Elastic Net Regression
Dropout
⭐️
Misc.
Machine Learning vs. Deep Learning
Cross Validation
Multi Class Classification
One vs Rest or One vs All Multi Class Classification
One vs One Multi Class Classification
Internal Covariate Shift
Discriminative vs. Generative Models
Kernel Regression
One Class Classification
One Class Gaussian
One Class K-means
One Class KNN
One Class SVM
Gumble Softmax ⭐️
Normalization
Normalization
Batch Normalization
Layer Normalization
Generation
Greedy Decoding
⭐️
Beam Search
⭐️
Random Sampling ⭐️
Minimum Bayes Risk
Handling Missing Data
⭐️
Overfitting
⭐️
Handling Imbalanced Dataset
⭐️
SMOTE
ADASYN
Handling Outliers
⭐️
Tokenizer
Byte Pair Encoding (BPE)
⭐️
WordPiece Tokenization
SentencePiece Tokenization
Parametric vs Non Parametric
⭐️
Model Based vs. Instance Based Learning
⭐️
Swallow vs. Deep Learning
⭐️
Parameter vs. Hyperparameter
⭐️
Exploding Gradient
⭐️
Vanishing Gradient
⭐️
Hyperparameters
Loss vs. Cost
Gradient Clipping
Grad accumulation
Stemming
Lemmatization
Causality vs. Correlation
Negative Sampling
Data Augmentation
Data Imputation
Hinge Loss
Feature Selection
Framenet
Wordnet
Verbnet
AMR Graph
Transfer Learning
Teacher Forcing ⭐️
Student Forcing ⭐️
Curriculum Learning ⭐️
Weight Initialization
Xavier
Normal
Learning Rate Scheduler
⭐️
Fine Tuning Speedup
LORA ⭐️
Adapter ⭐️
Hyper parameter finding
Grid Search Hyperparameter Finding
Random Search
Bayesian Optimization Hyperparameter Finding
Genetic Algorithm Hyperparameter Finding
Gradient based techniques
Different types of Learning
Zero Shot Learning
One Shot Learning
Few Shot Learning
Transfer Learning
Active Learning
Idea about SOTA Research
LLaMA
ChatGPT
BERT
Ref
BART
GPT
GPT-2
ROBERTA
ALbert
XLNET
Electra
DistilBert
[ ]
ELBO
End to End Machine Learning Pipeline
Convex vs. Non-Convex
Convex vs. Non-Convex Optimization
One Hot Vector
LabelEncoding
One Hot Encoding vs. Label Encoding
Inductive Bias
Selection Bias
Type 1 Error vs. Type 2 Error
Related Notes
Word2Vec Embedding
Decoding Strategies
Interview
WordPiece Tokenization
How To 100M Learning Text Video