ENGR 1330 Computational Thinking with Data Science Purpose of data science Not only business Politics Economics Agriculture Medical science Sociology Weather Environment And many more The principal purpose of Data Science is to find patterns within data It uses various statistical techniques to analyze and draw insights from the data From data extraction wrangling and pre processing a Data Scientist must scrutinize the data thoroughly Then he has the responsibility of making predictions from the data The goal of a Data Scientist is to derive conclusions from the data Through these conclusions he is able to assist companies in making smarter business decisions 2 Probability Probability is the branch of mathematics concerning numerical descriptions of how likely an is to occur or how likely it is that a proposition is true The probability of an event is a number between 0 and 1 where roughly speaking 0 indicates impossibility of the event and 1 indicates certainty The higher the probability of an event the more likely it is that the event will occur A simple example is the tossing of a fair unbiased coin Since the coin is fair the two outcomes heads and tails are both equally probable the probability of heads equals the probability of tails and since no other outcomes are possible the probability of either heads or tails is 1 2 which could also be written as 0 5 or 50 3 Probability examples Assume a box containing 3 red balls 4 green balls and 2 blue balls If you pick one ball at random what is the probability that the ball will be red Ans 3 9 0 33 If you pick 2 balls one after the other without replacement what is the probability that both will be green Ans 4 9 3 8 0 167 If you pick 2 balls one after the other without replacement what is the probability that the first one will be blue and second one will be red Ans 2 9 3 8 0 0833 If you pick 3 balls one after the other with replacement what is the probability that the first one will be blue second one will be red and third one will be green Ans 2 9 3 9 4 9 0 033 4 Simulation Simulation is the process of using a computer to mimic a physical experiment Step 1 What to Simulate Specify the quantity you want to simulate It can be probability of occurrence of a certain event or a statistical parameter For example you might decide that you want to simulate the outcomes of rolling a dice Step 2 Simulating One Value Figure out how to simulate one value of the quantity you specified in Step 1 You may create a function or call a function to mimic the action required to determine the value For example you can use np random choice function to pick a random face among the six faces of a dice Step 3 Number of Repetitions Decide how many times you want to simulate the quantity You will have to repeat Step 2 that many times For example if you want to simulate the outcome of rolling the dice 1000 times you repeat step 3 for 1000 times using a for loop Step 4 Coding the Simulation Put it all together in code 5 Coding the Simulation Create an empty array list in which to collect all the simulated values We will call this the collection array Create choose a function that imitates the procedure of getting the outcome of the experiment Create a repetitions sequence a for loop that is a sequence whose length is the number of repetitions you specified in Step 3 For n repetitions we can use the sequences np arange n or range n functions For each element of the repetitions sequence Simulate one value based on the code you developed in Step 2 Augment the collection array with this simulated value That s it Once you have carried out the steps above your simulation is done The collection array contains all the simulated values 6 Simulation Example The Monty Hall simulation is derived from Let s Make a Deal TV show and became famous in Game Theory The setting is a game show in which the contestant is faced with three closed doors Behind one of the doors is a fancy car and behind each of the other two there is a goat Rules are https en wikipedia org wiki Monty Hall problem The contestant makes an initial choice but that door isn t opened At least one of the remaining two doors has a goat behind it and that door is opened The contestant is asked whether he wants to switch his choice of door Mathematically if he switches his choice his chance of winning is doubled 7 Simulation Example Should the contestant stick with her initial choice or switch to the other door That is the Monty Hall problem The chance that the car is behind the originally chosen door is 1 3 After Monty opens the door with the goat the chance distribution changes If the contestant switches the decision he she doubles the chance 8 Simulation Example https youtu be Xp6V lO1ZKA 9
View Full Document