- Notifications
You must be signed in to change notification settings - Fork1
Should the famhistory model in the tutorial not use probabilities in stead of binary coding#12
-
I was following the tutorial on DCA (Python), and tried to independently reproduce the results in order to understand what was happening under the hood. I used the definition of net benefit I found in the simple step-by-step paper of Vickers (2019): So I set out to calculate the required stats, but could not get the right outcomes for the The fix was to not get the probabilities from the logistical regression model (mod1_results.predict()), but to compare the raw binary famhistory data to the threshold: And input these predictions into the sensitivity and specificity calculations. The Python package even has a specific input argument to force the probabilities to be calculated (models_to_prob), but this is explicitly not used in the tutorial. I have the idea I am missing something obvious here. Could you help me out? |
BetaWas this translation helpful?Give feedback.
All reactions
This is really a philosophical issue, and one that might be obscured a bit by the rather artificial nature of the didactic example. A binary test can be viewed in two ways: 1) treat if test is positive, don't treat if negative; 2) if test is positive, cite positive predictive value, if negative, cite 1 - negative predictive value and then treat or not depending on threshold probability. There is no right or wrong answer here, but the first approach is taken in most of the decision curve analysis methodologic literature and is hard wired into the code: probability is assumed to be 1 if test is positive and 0 if negative. But as you nicely demonstrated, it is easy to use approach (2) if tha…
Replies: 3 comments
-
This is really a philosophical issue, and one that might be obscured a bit by the rather artificial nature of the didactic example. A binary test can be viewed in two ways: 1) treat if test is positive, don't treat if negative; 2) if test is positive, cite positive predictive value, if negative, cite 1 - negative predictive value and then treat or not depending on threshold probability. There is no right or wrong answer here, but the first approach is taken in most of the decision curve analysis methodologic literature and is hard wired into the code: probability is assumed to be 1 if test is positive and 0 if negative. But as you nicely demonstrated, it is easy to use approach (2) if that is what you prefer. |
BetaWas this translation helpful?Give feedback.
All reactions
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
-
Thanks for responding! So I think what you are saying that you try and emulate how the end-user would interpret the model. That makes things clear for me. For reference for other people, this code reproduces the net_benefit curves for both the raw and probability options. The code is not meant to be flexible as the code in the package, so I think it is easier to see what is going on exactly (and I hope I did everything correctly :)): Nice to see:
edit: I expanded the code example to include all the other benefit curves mentioned in the tutorial. |
BetaWas this translation helpful?Give feedback.
All reactions
-
That is great. An alternative approach would be the following (written in Stata) |
BetaWas this translation helpful?Give feedback.



