Our philosophy at VSNi is to consider extreme residuals as potential outliers. Consequently, these are flagged up, but the user needs to inquire further to see if they are really outliers or not. Incorrectly identified outliers arise frequently for many reasons, such as use of an incomplete or incorrect model, convergency issues, or even typos. It is important to determine if an offending observation is a data error, but even in this case, an outlier might still make sense given the type of response evaluated. In addition, considering biological knowledge of the process is important too. We recommend checking the book from Welham et al. (2014) for further discussion on how to identify and deal with outliers in the context of linear models.
ASReml-SA will identify potential outliers and warn about them in the .asr file. For example, you will see:
1 possible outliers: see .res file
Looking farther in the .res file you will find a plot of residuals to help you assess the data, then at the end of the file you will find the reported potential outliers:
STND RES 35 184.00 7.35
In the above example ASReml reports that observation number 35 with a value of 184 is ‘suspicious’.
Once an observation is identified as an outlier you can instruct ASReml-SA to deal with it in different ways.
- Eliminating the observation from your data file. Here, you work outside ASReml-SA, where you open your file on your own system (e.g. Excel) and eliminate the corresponding row of data. However, we recommend you replace the suspicious value (e.g., 184.00) with a missing record, which is typically ‘*’, but if you have defined missing values differently (e.g., ‘NA’, or ‘.’), then use that format. After this is done, save your file and re-run your analysis in ASReml-SA.
- Using ASReml-SA to manipulate data. As you read your dataset into ASReml-SA, you have the option to perform some manipulations on your data (refer to the user guide for more information). When dealing with outliers a good option is to create a new response variable and request that the suspicious value is made missing. For example:
newyield !=yield !M184
The above code creates a new variable called newyield, which is a copy of yield, and then with the use of ‘!M’ we make ALL values equal to 184 missing. This will work fine if you have a single observation on your response with that value, otherwise, you will eliminate more observations than desired.
- Remove an observation within your model formulae. You can ‘drop’ your observation by using the function out(). This creates a binary dummy effect that will extract the effect of your observation from the rest of the model. For example:
yield ~ mu out(35) treatment !r block
Here, we ‘drop’ observation number 35 (as reported in the .res file) by using out(35). Note that this new model term will be added to your ANOVA table, allowing you to test its significance as a fixed effect with 1 degree of freedom. We recommend placing this term before any of your relevant model factors in order to not affect your sequential F-tests. It is possible to add more out() terms to your model, effectively eliminating all of your potential outliers.