# Interview Questions on Machine Learning

Machine learning is a field of computer science that takes input data and predicts the output based on various statistical techniques.*What is Machine Learning?**What are the different types of Machine Learning Algorithms?*

1. Supervised Learning

2. Unsupervised Learning

3. Semi-Supervised Learning

4. Reinforcement LearningBias — It is defined as the difference between the predicted value and the actual value or in general terms, it is the error in your model.*What are bias and variance?*

Variance — It is defined as the amount by which the predicted value differed in one training set over the expected values in all the other training sets.1. By minimizing the total error*How can you achieve optimum bias and variance?*

2. Using bagging and resampling techniques

3. Adjusting minor values in the algorithm1. Relationship between x and y is linear*What are the assumptions of Linear Regression?*

2. Each point on the graph are independent of each other

3. All the points in the dataset are normally distributed

4. All the points have equal varianceIt is a machine learning technique that solves the problem of overfitting in the models. There are three types of regularization — Lasso Regression, Ridge Regression, and Elastic Net Regression.*What is regularization?*This is a first-order iterative optimization algorithm that is used to find the local minima of the derivative function. In this iterative algorithm, the next step is to the opposite of the gradient i.e. towards the steepest side.*What is gradient descent?*Learning rate is a hyperparameter used in neural networks which determines the amount of change in weights after each step. This is also called step size.*What is a learning rate?*The MinMaxScalar() method is a normalization technique that brings all the data point values to a range of 0 to 1. The formula for this can be written as,*What is the use of MinMaxScalar() from sklearn.preprocessing?**MinMaxScalar_value = (value — min_value) / (max_value — min_value)*The z-score method is present in the scipy.stats library. This is a normalization technique where each value is called the standard score. The standard score of each data point variable can be calculated as,*How does the z-score method work?**standard_score = (variable-mean) / standard_deviation*CNN is a type of Artificial Neural Network that is used for image processing. They are used to perform both descriptive and predictive tasks. It is specifically designed to process pixel data.*What is a Convolutional Neural Network(CNN)?*RNN is a type of Artificial Neural Network where the connections between the nodes form a directed graph. This type of neural network is mainly used in textual mining.*What is a Recurrent Neural Network(RNN)?*There are 4 assumptions that are associated with the linear regression model,*What are the assumptions of Linear regression?*

1.*Linearity-*The relation between X and Y is linear

2.*Independence-*All the points in the dataset are independent of each other

3.*Normality-*All the points in the dataset are normally distributed

4.*Homoscedasticity-*The variance of residual is the same for any value of XMulticollinearity exists when an independent variable is highly correlated to more than one independent variable in multiple linear regression. It undermines the statistical significance of an independent variable and hence it is considered a problem.*What is multicollinearity? Why is it considered a problem?*There are 2 ways to remove multicollinearity,*How can you remove multicollinearity from the model?*

1. Removing highly correlated predictors from the model

2. Using Principal Component AnalysisThis is a supervised machine learning algorithm. Its main target is to predict whether the output (a set of probabilities) is either True/False, Yes/No, etc.*What is logistic regression?*The actual output of logistic regression is a set of real numbers or logs ranging from minus infinity to plus infinity. These values are then converted into a set of probabilities for classification.*Although logistic regression is a classification algorithm, why is there “regression” in it?*It is a type of statistical model that estimates the value of the observed data under the most probable conditions.*What is Maximum Likelihood estimation?*Confusion matrix, F1 score, Accuracy, Precision, Recall, and ROC curve.*What are the evaluation metrics for Classification Algorithms?*It is a supervised machine learning algorithm that creates a tree-like model of decisions and their possible consequences.*What is a Decision Tree?*Pruning is a process of limiting the size of a decision tree to avoid overfitting the data and also to reduce the complexity o the tree.*What is pruning?*It is also called*What is Bagging?**Bootstrap Aggregating.*It is a machine learning ensemble algorithm designed to increase accuracy and avoid overfitting of classification and regression models.It is an ensemble learning method for classification, regression, and other tasks that operate by constructing multiple decision trees. It gets outputs from all the decision trees and then selects the best one among them.*What is a Random Forest?*It is a method to evaluate a machine learning model with limited data by resampling procedures. It has only one parameter, k, which determines the number of groups the data is split into. It is also called k-fold cross-validation or out-of-sample testing.*What is cross-validation?*The k-Nearest Neighbors algorithm is an unsupervised learning method where the observed data is categorized into the most frequent class out of the k nearest data points. It can be used for both classification and regression.*What is KNN?*1. Used for both regression and classification*What are the advantages of the KNN algorithm?*

2. Simple and easy to implement the algorithm

3. Quick calculation time

4. High accuracy1. Computationally expensive*What are the disadvantages of the KNN algorithm?*

2. With large data, predictions become slow

3. Irrelevant features affect the predictionsThe k-means algorithm is an unsupervised learning method where the observed data is categorized based on the least distance from the center of all classes to the observed data.*What is the k-means algorithm?*1. Simple and easy to implement*What are the advantages of the k-means algorithm?*

2. Easily scalable to large datasets

3. Can create clusters of any shape1. It is difficult to determine the number of clusters manually*What are the disadvantages of the k-means algorithm?*

2. Outliers cannot be detected