sas - Why is the sample size for the pms.final_data3 different than pms.bhs after creating dietary tertiles? - Stack Overflow

I am trying to create dietary tertiles based on the numeric dietary variables in the dataset. The origi

I am trying to create dietary tertiles based on the numeric dietary variables in the dataset. The original dataset I created pms.bhs has 101 observations, however when I run the code below to create the dietary tertile variables the sample size reduces to 67. Please let me know if you need any additional information to help me debug why the sample size is reduced to 67 because the dataset should retain the 101 participants in the dataset.

data pms.final_data3;
  set pms.final_data3;
  if Niacin_mg = . then Niacin_Tertile = .;
  else if Niacin_mg <= 16.67 then Niacin_Tertile = 1; /* Lowest tertile */
  else if Niacin_mg > 16.67 and Niacin_mg_1 <= 27.1 then Niacin_Tertile = 2; /* Middle tertile */
  else Niacin_Tertile = 3; /* Highest tertile */
  if Riboflavin_mg = . then Riboflavin_Tertile = .;
  else if Riboflavin_mg <= 1.38 then Riboflavin_Tertile = 1;
  else if Riboflavin_mg > 1.38 and Ribofalvin_mg <= 2.34 then Riboflavin_Tertile = 2;
  else Riboflavin_Tertile = 3;
  if VitaminB6_mg = . then VitaminB6_Tertile = .;
  else if VitaminB6_mg <= 1.48 then VitaminB6_Tertile = 1;
  else if VitaminB6_mg_1 > 1.48 and VitaminB6_mg <= 2.41 then VitaminB6_Tertile = 2;
  else VitaminB6_Tertile =3;
  if Totalfolate_mg = . then Totalfolate_Tertile = .;
  else if Totalfolate_mg <= 0.3009 then Totalfolate_Tertile = 1;
  else if Totalfolate_mg > 0.3009 and Totalfolate_mg <= 0.44942 then Totalfolate_Tertile = 2;
  else Totalfolate_Tertile =3;
  if VitaminB12_mg = . then VitaminB12_Tertile = .;
  else if VitaminB12_mg <= .00281 then VitaminB12_Tertile = 1;
  else if VitaminB12_mg > .00281 and VitaminB12_mg <= .00492 then VitaminB12_Tertile = 2;
  else VitaminB12_Tertile =3;
  if Potassium_mg = . then Potassium_Tertile = .;
  else if Potassium_mg <= 1696.36 then Potassium_Tertile = 1;
  else if Potassium_mg > 1696.36 and Potassium_mg <= 2783.33 then Potassium_Tertile = 2;
  else Potassium_Tertile = 3;
  if Magnesium_mg = . then Magnesium_Tertile = .;
  else if Magnesium_mg <= 244.69 then Magnesium_Tertile = 1;
  else if Magnesium_mg > 244.69 and Magnesium_mg <= 366.14 then Magnesium_Tertile = 2;
  else Magnesium_Tertile = 3;
  if Manganese_mg = . then Manganese_Tertile = .;
  else if Manganese_mg <= 2.33 then Manganese_Tertile = 1;
  else if Manganese_mg > 2.33 and Manganese_mg <= 3.59 then Manganese_Tertile = 2;
  else Manganese_Tertile = 3;
  if VitaminD_mg = . then VitaminD_Tertile = .;
  else if VitaminD_mg <= .00252 then VitaminD_Tertile = 1;
  else if VitaminD_mg > .00252 and VitaminD_mg <= .00497 then VitaminD_Tertile = 2;
  else VitaminD_Tertile =3;
  if Calcium_mg = . then Calcium_Tertile = .;
  else if Calcium_mg <= 852.91 then Calcium_Tertile = 1;
  else if Calcium_mg > 852.91 and Calcium_mg <= 1218.04 then Calcium_Tertile = 2;
  else Calcium_Tertile =3;
  if Zinc_mg = . then Zinc_Tertile =.;
  else if Zinc_mg <= 6.91 then Zinc_Tertile = 1;
  else if Zinc_mg > 6.91 and Zinc_mg <= 10.93 then Zinc_Tertile = 2;
  else Zinc_Tertile =3;
  if Caffeine_mg = . then Caffeine_Tertile = .;
  else if Caffeine_mg <= 27.27 then Caffeine_Tertile = 1;
  else if Caffeine_mg > 27.27 and Caffeine_mg <= 131.59 then Caffeine_Tertile = 2;
  else Caffeine_Tertile =3;
run;

I am trying to create dietary tertiles based on the numeric dietary variables in the dataset. The original dataset I created pms.bhs has 101 observations, however when I run the code below to create the dietary tertile variables the sample size reduces to 67. Please let me know if you need any additional information to help me debug why the sample size is reduced to 67 because the dataset should retain the 101 participants in the dataset.

data pms.final_data3;
  set pms.final_data3;
  if Niacin_mg = . then Niacin_Tertile = .;
  else if Niacin_mg <= 16.67 then Niacin_Tertile = 1; /* Lowest tertile */
  else if Niacin_mg > 16.67 and Niacin_mg_1 <= 27.1 then Niacin_Tertile = 2; /* Middle tertile */
  else Niacin_Tertile = 3; /* Highest tertile */
  if Riboflavin_mg = . then Riboflavin_Tertile = .;
  else if Riboflavin_mg <= 1.38 then Riboflavin_Tertile = 1;
  else if Riboflavin_mg > 1.38 and Ribofalvin_mg <= 2.34 then Riboflavin_Tertile = 2;
  else Riboflavin_Tertile = 3;
  if VitaminB6_mg = . then VitaminB6_Tertile = .;
  else if VitaminB6_mg <= 1.48 then VitaminB6_Tertile = 1;
  else if VitaminB6_mg_1 > 1.48 and VitaminB6_mg <= 2.41 then VitaminB6_Tertile = 2;
  else VitaminB6_Tertile =3;
  if Totalfolate_mg = . then Totalfolate_Tertile = .;
  else if Totalfolate_mg <= 0.3009 then Totalfolate_Tertile = 1;
  else if Totalfolate_mg > 0.3009 and Totalfolate_mg <= 0.44942 then Totalfolate_Tertile = 2;
  else Totalfolate_Tertile =3;
  if VitaminB12_mg = . then VitaminB12_Tertile = .;
  else if VitaminB12_mg <= .00281 then VitaminB12_Tertile = 1;
  else if VitaminB12_mg > .00281 and VitaminB12_mg <= .00492 then VitaminB12_Tertile = 2;
  else VitaminB12_Tertile =3;
  if Potassium_mg = . then Potassium_Tertile = .;
  else if Potassium_mg <= 1696.36 then Potassium_Tertile = 1;
  else if Potassium_mg > 1696.36 and Potassium_mg <= 2783.33 then Potassium_Tertile = 2;
  else Potassium_Tertile = 3;
  if Magnesium_mg = . then Magnesium_Tertile = .;
  else if Magnesium_mg <= 244.69 then Magnesium_Tertile = 1;
  else if Magnesium_mg > 244.69 and Magnesium_mg <= 366.14 then Magnesium_Tertile = 2;
  else Magnesium_Tertile = 3;
  if Manganese_mg = . then Manganese_Tertile = .;
  else if Manganese_mg <= 2.33 then Manganese_Tertile = 1;
  else if Manganese_mg > 2.33 and Manganese_mg <= 3.59 then Manganese_Tertile = 2;
  else Manganese_Tertile = 3;
  if VitaminD_mg = . then VitaminD_Tertile = .;
  else if VitaminD_mg <= .00252 then VitaminD_Tertile = 1;
  else if VitaminD_mg > .00252 and VitaminD_mg <= .00497 then VitaminD_Tertile = 2;
  else VitaminD_Tertile =3;
  if Calcium_mg = . then Calcium_Tertile = .;
  else if Calcium_mg <= 852.91 then Calcium_Tertile = 1;
  else if Calcium_mg > 852.91 and Calcium_mg <= 1218.04 then Calcium_Tertile = 2;
  else Calcium_Tertile =3;
  if Zinc_mg = . then Zinc_Tertile =.;
  else if Zinc_mg <= 6.91 then Zinc_Tertile = 1;
  else if Zinc_mg > 6.91 and Zinc_mg <= 10.93 then Zinc_Tertile = 2;
  else Zinc_Tertile =3;
  if Caffeine_mg = . then Caffeine_Tertile = .;
  else if Caffeine_mg <= 27.27 then Caffeine_Tertile = 1;
  else if Caffeine_mg > 27.27 and Caffeine_mg <= 131.59 then Caffeine_Tertile = 2;
  else Caffeine_Tertile =3;
run;

Share Improve this question asked Mar 20 at 17:33 John MathewsJohn Mathews 631 silver badge5 bronze badges 2
  • We would need to see the LOG. In particular the notes about how many observations were read in and written out by the data step. From your picture of the PROC CONTENTS output there are 69 observations, not 101 or 67. Note it is very dangerous to overwrite your inputs since a coding mistake could lose the original dataset. – Tom Commented Mar 20 at 18:16
  • else if VitaminB6_mg_1 > 1.48 and VitaminB6_mg <= 2.41 then VitaminB6_Tertile = 2; probably doesn't affect the results in terms of the number of rows but seems inconsistent with the rest of your logic. I would highly recommend PROC RANK instead. – Reeza Commented Mar 24 at 17:04
Add a comment  | 

2 Answers 2

Reset to default 1

Not sure what your issue is but you might want to make your logic a little more bullet proof.

Use the MISSING() function in case you have any special missing values, like .A that would fail an equality test and so move into the first group. And reference the same variable for all of the testing. Sometimes you reference Niacin_mg and other times Niacin_mg_1.

And take advantage of the IF/THEN/ELSE IF structure of your code to simplify the middle rules.

  if missing(Niacin_mg) then Niacin_Tertile = .;
  else if Niacin_mg <= 16.67 then Niacin_Tertile = 1; 
  else if Niacin_mg <= 27.1 then Niacin_Tertile = 2;
  else Niacin_Tertile = 3; 

As already mentioned, you will need to provide more information from the SAS log to figure out what is causing concern.

You might consider PROC RANK GROUPS=

proc rank data=sashelp.class out=Tertiles groups=3;
   var age height weight;
   ranks T3_age T3_height T3_weight;
   run;
proc print; 
   run;

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744392685a4571986.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信