To Mask or Not to Mask: The Effect of Prompt Tokens on Instruction Tuning | by David Vaughn

These plots counsel that when a dataset’s Rg distribution covers a number of orders of magnitude or has non-negligible illustration in each the Rg>1 and Rg<1 areas (corresponding to within the case with OpenOrca and different datasets with R̅g>1) the distribution can change into extremely skewed. Consequently, the arithmetic imply could also be disproportionately influenced by bigger values, doubtlessly misrepresenting the distribution’s central tendency. In such instances, computing the imply in log-space (then optionally remodeling it again to the unique scale) may present a extra significant abstract statistic. In different phrases, it may make sense to make use of the geometric imply: