| 3 key question in data visualization |
| Accuracy |
| Activation Function |
| Active Learning |
| Adaboost |
| AdaBoost vs. Gradient Boosting vs. XGBoost |
| AdaDelta |
| AdaGrad |
| Adam |
| Additive Attention |
| Adjusted R-squared Value |
| Alternative Hypothesis |
| Amazon Leadership Principles |
| Area Under Precision Recall Curve (AUPRC) |
| AUC Score |
| Auto Regressive Model |
| Autoencoder for Denoising Images |
| Averaging in Ensemble Learning |
| Backward Feature Elimination |
| Bag of Words |
| Bagging |
| Balanced Accuracy |
| Batch Normalization |
| Bayes Theorem |
| Bayesian Optimization Hyperparameter Finding |
| Beam Search |
| Behavioral Interview |
| BERT |
| BERT Embeddings |
| Bias & Variance |
| Bidirectional RNN or LSTM |
| Binary Cross Entropy |
| Binning or Bucketing |
| Binomial Distribution |
| BLEU Score |
| Boosting |
| Byte Level BPE |
| Byte Pair Encoding (BPE) |
| Causal Language Modeling |
| Causality vs. Correlation |
| Central Limit Theorem |
| Chain Rule |
| Challenges of NLP |
| Character Tokenizer |
| CNN |
| Co-occurrence based Word Embeddings |
| Co-Variance |
| Collinearity |
| Conditional Probability |
| Confusion Matrix |
| Contextualized Word Embeddings |
| Continuous Bag of Words |
| Continuous Random Variable |
| Contrastive Learning |
| Contrastive Loss |
| Convex vs Nonconvex Function |
| Cosine Similarity |
| Count based Word Embeddings |
| Cross Entropy |
| Cross Validation |
| Cross-Attention |
| Crossed Feature |
| Curse of Dimensionality |
| Data Augmentation |
| Data Imputation |
| data visualization |
| DBScan Clustering |
| Debugging Deep Learning |
| Decision Tree |
| Decision Tree (Classification) |
| Decision Tree (Regression) |
| Decoder Only Transformer |
| Decoding Strategies |
| Density Sparse Data |
| Dependent Variable |
| Derivative |
| Differentiation |
| Differentiation of Product |
| Digit Dp |
| Dimensionality Reduction |
| Discrete Random Variable |
| Discriminative vs. Generative Models |
| DistilBERT |
| Dropout |
| DS & Algo Interview |
| Dying ReLU |
| Dynamic Programming (DP) in python |
| Elastic Net Regression |
| ELMo Embeddings |
| Encoder Only Transformer |
| Encoder-Decoder Transformer |
| Ensemble Learning |
| Entropy |
| Entropy and Information Gain |
| Essential Visualizations |
| Estimated Mean |
| Estimated Standard Deviation |
| Estimated Variance |
| Euclidian Norm |
| Expected Value |
| Expected Value for Continuous Events |
| Expected Value for Discrete Events |
| Exploding Gradient |
| Exponential Distribution |
| Extrinsic Evaluation |
| F-Beta Score |
| F-Beta@K |
| F1 Score |
| False Negative Error |
| False Positive Rate |
| FastText Embedding |
| Feature Engineering |
| Feature Extraction |
| Feature Hashing |
| Feature Preprocessing |
| Feature Selection |
| Finding Co-relation between two data or distribution |
| Forward Feature Selection |
| Foundation Model |
| Gaussian Distribution |
| GBM |
| Genetic Algorithm Hyperparameter Finding |
| Gini Impurity |
| Global Attention |
| GloVe Embedding |
| GPU Computation for LLM |
| Gradient |
| Gradient Boost (Classification) |
| Gradient Boost (Regression) |
| Gradient Boosting |
| Gradient Clipping |
| Gradient Descent |
| Graph Convolutional Network (GCN) |
| Greedy Decoding |
| Grid Search Hyperparameter Finding |
| Group Normalization |
| GRU |
| Gumbel Softmax |
| Handling Imbalanced Dataset |
| Handling Missing Data |
| Handling Outliers |
| Heapq (nlargest or nsmalles) |
| Hierarchical Clustering |
| Hinge Loss |
| Histogram |
| Homonym or Polysemy |
| How to Choose Kernel in SVM |
| How to combine in Ensemble Learning |
| How to prepare for Behavioral Interview |
| Huber Loss |
| Hyperparameters |
| Hypothesis Testing |
| Independent Variable |
| InfoNCE Loss |
| Instruction Fine Tuning |
| Internal Covariate Shift |
| interview |
| Interview Scheduling |
| Intrinsic Evaluation |
| Jaccard Distance |
| Jaccard Similarity |
| K Fold Cross Validation |
| K-means Clustering |
| K-means vs. Hierarchical |
| K-nearest Neighbor (KNN) |
| Kernel in SVM |
| Kernel Regression |
| Kernel Trick |
| KL Divergence |
| L1 or Lasso Regression |
| L1 vs. L2 Regression |
| L2 or Ridge Regression |
| Label Encoding |
| Layer Normalization |
| Leaky ReLU |
| Learning Rate Scheduler |
| LightGBM |
| Likelihood |
| Line Equation |
| Linear Regression |
| Local Attention |
| Log (Odds Ratio) |
| Log (Odds) |
| Log Scale |
| Log-cosh Loss |
| Logistic Regression |
| Logistic Regression vs. Neural Network |
| Loss vs. Cost |
| LSTM |
| Machine Learning Algorithm Selection |
| Machine Learning vs. Deep Learning |
| Majority vote in Ensemble Learning |
| Margin in SVM |
| Marginal Probability |
| Masked Self-Attention |
| matplotlib legend |
| Maximal Margin Classifier |
| Maximum Likelihood |
| Mean |
| Mean Absolute Error (MAE) |
| Mean Absolute Percentage Error (MAPE) |
| Mean Reciprocal Rank (MRR) |
| Mean Squared Error (MSE) |
| Mean Squared Logarithmic Error (MSLE) |
| Median |
| Merge K-sorted List |
| Merge Overlapping Intervals |
| Meteor Score |
| Min Max Normalization |
| Mini Batch SGD |
| ML Case Study or ML Design |
| ML Interview |
| ML System Design |
| Mode |
| Model Based vs. Instance Based Learning |
| Multi Class Cross Entropy |
| Multi Label Cross Entropy |
| Multi Layer Perceptron |
| Multi-Head Attention |
| Multicollinearity |
| Multivariable Linear Regression |
| Multivariate Linear Regression |
| Multivariate Normal Distribution |
| Mutual Information |
| N-gram Method |
| Naive Bayes |
| Named Entity Recognition (NER) |
| Negative Log Likelihood |
| Negative Sampling |
| Nesterov Accelerated Gradient (NAG) |
| Neural Network |
| Neural Network Normalization |
| Normal Distribution |
| Null Hypothesis |
| Odds |
| Odds Ratio |
| One Class Classification |
| One Class Gaussian |
| One Hot Vector |
| One vs One Multi Class Classification |
| One vs Rest or One vs All Multi Class Classification |
| Optimizers |
| Optimizing Transformer |
| Overcomplete Autoencoder |
| Overfitting |
| Oversampling |
| p-value |
| Padding in CNN |
| Parameter vs. Hyperparameter |
| PCA vs. Autoencoder |
| Pearson Correlation |
| Perceptron |
| Perplexity |
| Plots Compared |
| Polynomial Regression |
| Pooling |
| Population |
| Positional Encoding in Transformer |
| Posterior Probability |
| Pre-Training LLM |
| Precision |
| Precision Recall Curve (PRC) |
| Precision@K |
| Principal Component Analysis (PCA) |
| Prior Probability |
| Probability Density Function |
| Probability Distribution |
| Probability Mass Function |
| Probability vs. Likelihood |
| Problem Solving Algorithm Selection |
| Pruning in Decision Tree |
| PyTorch Refresher |
| Questions to ask in a Interview? |
| Quintile or Percentile |
| Quotient Rule or Differentiation of Division |
| R-squared Value |
| Random Forest |
| Random Variable |
| Recall |
| Recall@K |
| Recommender System (RecSys) |
| Regularization |
| Reinforcement Learning |
| Reinforcement Learning from Human Feedback (RLHF) |
| Relational GCN |
| ReLU |
| Retrieval Metrics |
| RMSProp |
| RNN |
| ROC Curve |
| Root Mean Squared Error (RMSE) |
| Root Mean Squared Logarithmic Error (RMSLE) |
| ROUGE-L Score |
| ROUGE-LSUM Score |
| ROUGE-N Score |
| Saddle Points |
| Self-Attention |
| Self-Supervised Learning |
| Semi-supervised Learning |
| Sensitivity |
| SentencePiece Tokenization |
| Sigmoid Function |
| Simple Linear Regression |
| Singular Value Decomposition (SVD) |
| Skip Gram Model |
| Soft Margin in SVM |
| Softmax |
| Softplus |
| Softsign |
| Some Common Behavioral Questions |
| Specificity |
| Splitting tree in Decision Tree |
| Stacking or Meta Model in Ensemble Learning |
| Standard deviation |
| Standardization |
| Standardization or Normalization |
| Statistical Significance |
| Stochastic Gradient Descent (SGD) |
| Stochastic Gradient Descent with Momentum |
| Stop Words |
| Stride in CNN |
| Stump |
| Sub-sampling in Word2Vec |
| Sub-word Tokenizer |
| Supervised Learning |
| Support Vector |
| Support Vector Machine (SVM) |
| Surprise |
| SVC |
| Swallow vs. Deep Learning |
| Tanh |
| Text Preprocessing |
| TF-IDF |
| Three Way Partioning |
| Time Complexity of ML Algos |
| Time Complexity of ML Models |
| Tokenizer |
| Training a Deep Neural Network |
| Transformer vs LSTM |
| Triplet Loss |
| True Negative Rate |
| True Positive Rate |
| Type 1 Error vs. Type 2 Error |
| Undercomplete Autoencoder |
| Undersampling |
| Unigram Tokenization |
| Unsupervised Learning |
| Vanishing Gradient |
| Variance |
| Weight Initialization |
| Why do we use Projection in QKV? |
| Why transformer uses positional embeddings? |
| Why Trigonometric Function for Positional Encoding? |
| Word Embeddings |
| Word Tokenizer |
| Word2Vec Embedding |
| WordPiece Tokenization |
| XGBoost |