Have you ever ever pooled many well being surveys which have a posh sampling design and embrace variables equivalent to main sampling unit (PSU), stratum, and sampling weights? Solely to seek out that a few of these surveys didn’t have any of those variables in any respect (e.g., PSU and sampling weights solely, not stratum)? Right here’s an environment friendly answer packing the datasets in a listing and making use of capabilities that can ship the outcomes you’re searching for.
In a pooled dataset of many well being surveys, a few of which can not have all three variables (PSU, stratum, and sampling weights), for those who have been to conduct a complete-case evaluation, you’ll solely preserve the datasets with knowledge for the three variables. This may result in a considerable pattern measurement discount as we might want to exclude the datasets with, for instance, PSU and sampling weights however with out knowledge for the stratum variable.
My answer? To investigate every dataset independently, leveraging no matter variable they’ve (e.g., PSU and sampling weights solely, not stratum). Sadly, analyzing every dataset at a time will probably be inefficient. Nevertheless, having confronted this problem a number of instances in over seven years {of professional} expertise in analysis and dozens…