I am trying to plot a panel of different analytes that are on different orders of magnitude for 5 different solid malignancies. To be able to better compile these analytes into those panels I calculated the z scores of each sample using the mean level of expression across all samples.
Now I have the following situation where I want to plot the values in a way where the different cancer types are on the x axis and the average z-score is on the y axis.
I think I have 2 options and I am not sure which is correct or answers my question better. Option 1: take the average Z-score of all Ovarian samples of analyte X and use this value as a datapoint for my figure, repeat for all malignancies and analytes. This will result in the figure having a number of points for each group that is equal to the number of analytes (which is identical for all groups) Graph created using Option 1
Here I see the risk that one outlier sample with high values across the board would drive all analytes' averages to be higher.
Option 2: calculate the average Z-score of all analytes of sample A and use this value as a datapoint for my figure, repeat for all malignanies. This will result in the figure having a different number of points per malignancy as each malignancy has a different n. Graph created using Option 2
Here an individual analyte that is greatly increased in all of the samples of one group could skew the data and defeat the point of looking at only certain analytes.
Obviously Option 1 gives me the "nicer" graph but I just want to confirm that I am not showing some weird artifact or something. I am really not sure which way visualizes my question "Is there a difference between the cancer types in this panel of analytes?" better.
Bonus question: is it more common to use mean or median z-score for a situation like this, as my data is somewhat skewed I kind of want to use median as this would allow me to mitigate the potential risks I have described underneath each image.
Thank you for your feedback. If I need to change something about this question let me know. I can also prepare a reprex, but i don't need any actual coding help, it is more about confirming that what I am doing isn't completly wrong.
I tried both Options and I am not sure which one is the right one to use in this case.
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1742413455a4439273.html
评论列表(0条)