During the past few decades, a lot of work has been done in the field of research and medical science. In the modern era, medical science cannot flourish by simple bookish knowledge. Modern medical technicians need to enhance their clinical expertise by merging medical science with computerized technology. Research has been the major subject of interest in the west for years but now it has changed remarkable possession. The mostly loved potential prediction algorithm of data science is “random forest”. Leo Breiman in 1990, the random forest is known for its simplicity. We cannot say that the prediction is always 100% right but it helps the people who are new to data science to understand the Algorithm.
The Algorithm designed is used to study suicide predictions in 2017. The goal of this study was to predict about the patient who has negative thoughts in their mind about their life. Also about the history of self-injury. The collected data is also used to analyze whether these patients will commit suicide or not. They took 5000 patients, out of these 2000 patients committed suicide while the research was on its way.
Researchers have almost over the 1300 different traits and characteristics that they utilized to make their predictions about individual’s medical histories. These traits include age, gender, and various other aspects. The risk of suicide could be reduced and the life of different people could be saved in future if these predictions are turning out to be 100% accurate and this would be an extraordinary achievement.
In this world, predictive Algorithms are present everywhere. Artificial intelligence tool rely on predictive Algorithm for decision making. Algorithm has the potential to diagnose and to treat the health issues like depression, cancer and lungs failure.
In order to understand the random forest, one needs to understand a decision tree. In decision tree, we ask yes or no questions to make predictions. For example, in order to make predictions about suicide, there are three question which usually asked three question or uses three information: Is a person is diagnosed with depression, is a person diagnosed with bipolar disorder, or the patient has visited 3 or more times the ER in the past years.
Another amazing thing about this tree is that one can make guesses to predict methods. This method helps to explain things easily. But there is a problem with this tree decision, a good decision can never be made with just one prediction. We need to make a lot of different trees. Then the average result is concluded from it and here is a complication in this method. This thing guides us to a key perception of modern machine learning. Through resampling, one can deduct some of the data that is not necessary and create new databases.
For example, if a researcher has taken the database of 5000 patient, in order to create a new dataset of resampling, a researcher can select a one person randomly out of 5000 people, 5000 times. But the result of the database can be different from the source database because in random selection one person may be selected more than one time. One can select 3200 people in the random selection. Remaining 1800 people cannot be selected through random selection. With this resampling, a researcher can lead to a new decision and a new result. A new result may be slightly different from the previous one.
If the random selection deducts all the unnecessary things, then the result will be more appropriate. But if the research includes all the unnecessary things, then the result will be inappropriate. The good thing is that we can make tons of trees. The researchers who studied suicide have made 500 trees. All the work is done by the computer. They have also made thousands or millions of trees.
Once a forest is created, the average of trees is taken by the researchers to deduce the result. One need to do a lot of things to make a forest tree truly random. The results collected from the data of 500 will be the same because they have the same traits in resampling. So this shows if one limits the traits or variable, the results obtained will be different.
1300 different variables have been taken in the study of suicide predictions. Any of these variables may be used to make a tree and to deduce a result and to make a random tree. This process of randomization makes a random forest. It may include people with depression or without depression. This process is also known as decorrelated the trees.
A suicide prediction algorithm which is good should have two characteristics. 1) One can commits suicide if he or she won’t, should be rarely predicted. 2) One should not be missed someone who has the risk to commit suicide.
If Algorithm shows that 50% chances of suicide are there than 79% people really commits suicide. If it shows that there are less than 50% chances of committing suicide than only 5% people will commit suicide.
Another great thing about this Algorithm prediction is that it gives more answers or prediction than yes or no. For example, if it shows that a person has 45% chances of committing suicide and the other person has 10% chances than it shows that a person will not commit suicide.
Unfortunately, the study of suicide prediction is irregular. But these are used for fraud detection and targeted advertising. It is not developed to improve public policy. These models are considered to be annoyingly insensitive and unable to understand. But the reality is not this. If you have a math awareness, you can easily understand to learn and execute Algorithms. When more people will learn to use and to handle Algorithm then the chance to apply these to solve a variety of problems will become possible. And they will not be merely used for commercial purpose.