UNIT – 4ObjectSegmentationCMR COLLEGE OF ENGINEERING & TECHNOLOGYAn Autonomous Institution with NAAC Accreditation (A+ Grade)*Approved by AICTE *Permanently Affiliated to JNTUH *NBA AccreditationKandlakoya (V), Medchal Road, Hyderabad- 501401
2.
Contents• Object Segmentation•Regression vs segmentation• Supervised & unsupervised Learning• TREE BUILDING• REGRESSION• CLASSIFICATION• OVERFITTING• PRUNING & COMPLEXITY• Multiple decision trees
3.
Object segmentation –REGRESSION vsSEGMENTATION• Regression: It is a statistical method to model the relationship between adependent and independent variables.Helps us to understand how the value of the dependent variable ischanging corresponding to an independent variable when otherindependent variables are held fixed. It predicts continuous/real values such as temperature, age, salary,price, etc.• Examples:Prediction of rain using temperature and other factorsDetermining Market trendsPrediction of road accidents due to rash driving.
4.
Types of regressions•Linear RegressionExpressed using equationy=mx+ ϵWhere:• Y – Dependent variable• X – Independent (explanatory) variable• a – Intercept• b – Slope• ϵ – Residual (error)
5.
• Multiple LinearRegressionMultiple independent variables are used in the model.Y = a + bX1 + cX2 + dX3 + ϵWhere:• Y – Dependent variable• X1, X2, X3 – Independent (explanatory) variables• a – Intercept• b, c, d – Slopes• ϵ – Residual (error)
6.
• Process ofsegmenting the data according to the company’s requirement• To refine the analysis based on the defined context through certain tools.• Then designing and implementing strategies specific to these segments makes easier decisionmaking.• Example:• An e-commerce company might segment its customer data based on pastpurchase behavior.• By analysing the data - identify different groups of customers.• frequent buyers,• occasional shoppers, or• one-time purchasers.• This allows them to tailor them in marketing messages, promotions to eachgroup.• Increases the chances of attracting and retaining customers.• Target customers in the right way with segmented communications and more meaningfulmarketing.SEGMENTATION
• Supervised Learning:Useslabeled datasets to trainalgorithms to predictoutcomes and recognizepatterns.What is labelled data?SUPERVISED and Unsupervised learningTraining with your supervisor who provides answers while you arelearning.
11.
• First Determinethe type of training dataset• Collect/Gather the labelled training data.• Split the training dataset into training dataset, test dataset, andvalidation dataset.• Determine the input features of the training dataset, which should haveenough knowledge so that the model can accurately predict the output.• Determine the suitable algorithm for the model, such as support vectormachine, decision tree, etc.• Execute the algorithm on the training dataset. Sometimes we needvalidation sets as the control parameters, which are the subset oftraining datasets.• Evaluate the accuracy of the model by providing the test set. If the modelpredicts the correct output, which means our model is accurate.STEPS INVOLVED IN SUPERVISED learning
• Supervised Learning•Regression : Regression algorithmsare used if there is a relationshipbetween the input variable and theoutput variable.• Popular Regression algorithms:• Linear Regression• Multiple Regression• Polynomial RegressionSUPERVISED learning algorithms
14.
• CLASSIFICATION :Used when the output variable is categorical, whichmeans there are two classes such as Yes-No, Male-Female, True-false,etc.• Used to group similar objects into unique classes.• Random Forest• Decision Trees• Logistic Regression• Support vector Machines• Binary classification: If the algorithm tries to group 2 distinct groupsof classes, then it is called binary classification.• Multiclass classification: If the algorithm tries to group objects intomore than 2 groups, it is called multiclass classification.• Strength: Classification algorithms usually perform very well.
15.
• Models aretrained using unlabeled dataset and are allowedto act on that data without any supervision.• Cannot be directly applied to a regression or classificationproblem because unlike supervised learning, we have theinput data but no corresponding output data.• Goal: Find the underlying structure of dataset, group thatdata according to similarities, and represent that dataset in acompressed format.• Unsupervised performs it’s task by clustering the imagedataset into the groups according to similarities betweenimages.• More important - works on unlabeled and uncategorizeddata.• In real-world, we do not always have input data with thecorresponding output so to solve such cases, we needunsupervised learning.UNSUPERVISED learning
16.
• We havetaken an unlabeled input data,• Now, this unlabeled input data is fed to the machine learning model in order totrain it.• Firstly, it will interpret the raw data to find the hidden patterns from the data andthen will apply suitable algorithms such as k-means clustering, Decision tree, etc• Once it applies the suitable algorithm, the algorithm divides the data objects intogroups according to the similarities and difference between the objects.WORKING OF UNSUPERVISED learning
17.
TYPES OF UNSUPERVISEDlearning• CLUSTERING:• The task of grouping data points basedon their similarity with each other iscalled Clustering or Cluster Analysis.• Aims at gaining insights from unlabeleddata points.• Example 1:
18.
• Now itis not necessary thatthe clusters formed must becircular in shape.• The shape of clusters can bearbitrary.• There are many algorithmsthat work well with detectingarbitrary shaped clusters.Example 2:
19.
• Decision Treesare a type of SupervisedMachine Learning where the data iscontinuously split according to a certainparameter.• A decision tree simply asks a question, andbased on the answer (Yes/No), it furthersplit the tree into subtrees.• The tree can be explained by two entities,namely decision nodes and leaves.• Leaves Final Outcomes• Decision nodes Where the data is split• Example:• Let’s say you want to predict whether aperson is fit given their information like age,eating habit, and physical activity, etc.DECISION TREEs:
20.
• There aremany algorithms out there which construct Decision Trees,but one of the best is called as ID3 (ITERATIVE DICHOTOMISER 3)Algorithm.• Used for classification tasks and is based on concept of informationgain.• Entropy: Entropy basically tells us how impure a collection of data is.• “Entropy is the measurement of homogeneity. It returns us theinformation about an arbitrary dataset that how impure/non-homogeneous the data set is.”• Entropy can be calculated using the below formula:
21.
Information Gain:• Wecan measure the effectiveness of an attribute inclassifying the training set. The measure we will usecalled information gain, is simply the expected reductionin entropy caused by partitioning the data set according tothis attribute.• The information gain (Gain(S,A) of an attribute A relative toa collection of data set S, is defined as-