Create entry on using IQR to find possible outliers in R

the tutor said on his email ((Here you may use your simulated data, and the dataset mtcars in R.Choose 1 variable for weeks 6 and 7: start with mpg for example.Multiple variable case;))—————————————————————- is already in R and can be analysed to get stand residuals to compare to -2 or 2> head(mtcars)mpg cyl disp hp drat wt qsec vs am gear carbMazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1> attach(mtcars)The following objects are masked from mtcars (pos = 3):am, carb, cyl, disp, drat, gear, hp, mpg, qsec, vs, wt> model <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars)> rstandard(model)Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive Hornet Sportabout-0.65456638 -0.29166549 -0.99109408 -0.13160093 0.11349120Valiant Duster 360 Merc 240D Merc 230 Merc 280-1.17577132 -0.65302963 0.67692139 -0.55672545 -0.15727964Merc 280C Merc 450SE Merc 450SL Merc 450SLC Cadillac Fleetwood-0.85551938 0.39697897 0.08190899 -0.73669070 0.02236813Lincoln Continental Chrysler Imperial Fiat 128 Honda Civic Toyota Corolla0.49793369 2.40096631 2.24888836 0.49755516 2.15282871Toyota Corona Dodge Challenger AMC Javelin Camaro Z28 Pontiac Firebird-1.57150088 -1.14579414 -1.47989966 -0.43415180 1.06927128Fiat X1-9 Porsche 914-2 Lotus Europa Ford Pantera L Ferrari Dino-0.14990597 0.34115026 1.12804819 -0.35739634 -0.17986946Maserati Bora Volvo 142E0.90881167 -0.60647195# 3 outliers in red> > plot(rstandard(model))—————————————————————————-I attached my previos proposal


For this assignment, you’re required to maintain an e-Journal for four(4) weeks (between Weeks 5 to 9) recording an aspect of
your project that you worked on in each week. For this purpose, use the Journal component available within
the Mahara ePortfolio suit of software tools.
In each week;
Create a new entry in your Mahara Journal.
Describe an aspect of your project that you worked on during that week (e.g. testing a piece of software, writing your report).
Limit this to a maximum of 200 words. You may include images, videos and any other relevant multimedia if necessary. When
writing this episode;)
o Include the geographical location where this episode took place
o Main purpose/objective of this task
o Reflect on what you have learned and comment on any peer feedback you have received.
Use the relevant professional skills document (for engineering projects – Stage 1 competency standard for professional Engineer Engineers Australia (External LINK (Links to an external site.)Links to an external site.(pp. 6-8)] or for IT projects – Skills Framework
for the Information Age version 6 – SFIA (External LINK (Links to an external site.)Links to an external site.) and identify a specific
skill that is relevant to the episode you have described above. For example;
For an engineering project – if you have described some work you did in relationship to writing your final report, you may
attribute this to the competency “3.2 Effective oral and written communication in professional and lay domains”
For an IT related project – if you tested a software component, you may attribute this to the SFIA’s “Development and
implementation, Systems development, Testing”
Use the following rubric* when developing your journal. Additional resources are available in the e-portfolio section of the unit
Canvas site.
At the end of Week 9, submit your completed e-portfolio item (by sharing the ePortfolio link) by the deadline.
[*rubric adapted from © David Hubert at Salt Lake Community College.]
NOTES: Your report should constitute the following items. Please discuss with your supervisor prior
to completing this report. This report will serve as a guide during the project implementation stage.
If your project involves group activities, this report should clearly outline your own expected
contributions to the project. All reports are marked on an individual merit basis. The supervisor and
the unit convenor have the right to verbally query any aspect of the project that you may claim to
have contributed to and on the content of this report. Plagiarism is a serious offence and will be
dealt with severely.
1 . Aims and Objectives of the project (15 marks)
An outlier is an abnormal observation that is distant from other values in
a random sample. An outlier occurs maybe because of experimental errors,
during a set up the experiment maybe set in a wrong way and in the end, there
are wrong details during the set up resulting to wrong tests and will eventually
bring the wrong results.
There are two types of outliers, a minor outlier for instance can fall near
the inner fences of the of the data set. A major outlier on the other hand falls
outside the fences of the data set. A normal experiment does not always have
all experiments right there are errors that result in outliers, this project will
have a detailed discussion on outliers and how they are handled during
experiments (Kannan, 2015).
The aim of this research project is to discuss about outliers and outlier
detection in R programming while analyzing data in an experiment. In data
analysis using R software statistical professionals analyze data and the
outcome of the data analysis is in interpreted into a graph. There are several
research specialists who have noticed during experiments there may be several
outliers in the graphical outcomes.
To establish the source of outliers the data specialists have to plan and
execute measures that will ensure there is success in establishing the source of
outliers. The several objectives to be executed in finding and establishing
outliers include:
• Previous data records are extracted from computer storage
• Data sets are analyzed and represented in graphical presentation
• Experiments are carried out with altered experimental set up
• Results are analyzed and presented into graphs
Results from the experiments are used to compare outcome and find disparity.
2 . Background and Description (20 marks)
Further expand on the project details, giving a short overview of the project’s background/history,
motivation, partners, including a brief survey of related literature. (max 500 words)
Outlier detection in statistics goes back in the 18th century, data specialists in those
years used to delete the outliers from the data to ensure there are normal results which are
presented into graphs. Deleting outliers was not the final solution to carrying out a
successful experiment, it was a change in the tradition. Data specialists decided to include
data as part of the outliers in the experiment results to provide useful information about the
data. Data specialist and statisticians thought it useful to keep outliers as part of the data to
use them in carrying out other statistical analysis on the experiments. Apart from being
useful to other experiments the outliers served important in providing experimental
intelligence and improving experiment set up (Green, 2015).
Outliers provide intelligent information while checking for errors that occurred
during experimentation or recording. Finding an outlier in the result means there is an error
that either occurred during experimental set up or during the analysis of the results. While
carrying out the results it is possible to make alterations in the end, there will be results that
will differ from what is expected. The alterations can be made during the recording of data,
if there are values that are found to have errors the data analyst will have to go through the
results again and ensure all recordings are correct. An alteration of data results in shopping
data analysis can be a display of the different tastes the customers have.
During a recording on the usage of products in the market the customers feel the
data depending on how they find the products. If there are different data results collected,
they are a representation of a population taste. The information is useful to the seller
because it represents they will be able to come up with products to serve a different
market. Outliers can also give additional information on the products if for instance there
are samples of data from customers on their satisfaction of commodities, then the data will
give information on some of the problems encountered. Transportation of commodities can
be affected by bad weather. Customers who receive their commodities due to such
problems will fill less satisfaction and hence give poor review. The seller will be able to
identify such problems and will make changes in the future to ensure they run the business
more smoothly.
3 . Methodology/Approach (30 marks)
Discuss your approach to the project, including necessary techniques/technologies you may use. This
is where you would describe what you would expect to do during the project implementation phase.
(max 750 words)
There are several methods which can be used to identify outliers in statistics, the
methods are not 100% efficient into finding the outliers but they are a method that can at
least find substantial information about outliers and their cause.
Z score
This method works by relating the data set to the mean and the standard deviation
of the whole data set. The data set is identified through the mean , median and standard
deviation of data that has already been established. Effects of scale and location of the data
sets are set aside so that the data can be compared directly with the data sets. The concept
of the method is that once the data has been rescaled and centered anything that is above 2
will be considered an outlier (Komsta, 2015).
What is the Z score of 13 pounds which has a mean distribution of pounds and a standard
deviation of 2 pounds?
13 – 10
= 1.5
Modified Z score
This method uses the MAD and mean to identify an outlier, the mean and MAD of
the are calculated and there are compared with each data set. On comparison all the data
that has a big alteration compared to the other data sets it is said to be an outlier. The
method is however difficult to find why there is an outlier. Whether there is an error in
experimenting or in recording (Aggarwal, 2016).
What is the modified Z score for 16 with a mean distribution of 10 and a standard deviation
of 3?
16 – 10 = 2
IQR method
This is a method that was developed by John Tukey an established and founder of
data analysis. The method was introduced in a time when data calculation and plotting
graphs was done by hand. The group of data was divided into equal groups and data was
plotted into a graph to show the results. There were representations of the 1 st and 3rd
percentile which present 25 and 75% in a graph. In a normal graph if the percentiles were in
a higher range of more than that they would be classified inner of outer outliers.
Find the outliers for the following data
10.2, 14.1, 14.4. 14.4, 14.4, 14.5, 14.5, 14.6, 14.7, 14.7, 14.7, 14.9, 15.1, 15.9, 16.4
Median = 15+1 = 8
Q2 = 14.6
The two data points are
10.2, 14.1, 14.4. 14.4, 14.4, 14.5, 14.5
14.7, 14.7, 14.7, 14.9, 15.1, 15.9, 16.4
Q1 = 14.4
Q3 = 14.9
IQR = 14.9 – 14.4
= 0.5
4 . Project Plan (30 marks)
Provide a time line of your project, including major milestones, deliverables and the expected
outcomes. Please refer to the Unit outline for important mandatory milestones. You may use the
following chart as a guide. However feel free to use an appropriate planning tool if you so desire in
consultation with your supervisor (e.g. Gantt Chart). The plan should include only your work. Where
your output is dependent on contributions by other team members, make a note of these and also
indicate any contingency plans you may put in place if the team member in question fails to deliver
their part of the project work.
Seek project supervisor and a project
Submit project proposal
Discuss/ draft project
Revise on necessary changes
Practice R / draft report
Submit mid-semester report
-differences/similarities among the existing methods
– go through report
Prepare for Final Project Presentation
Revise whole project /and add something new if found
Ensure all project parts are relevant as expected
Finalise report/Practice on project presentations
Submit Final report
Deliver Oral Presentation
Submit e-Portfolio
5 . References (5 marks)
List of reference materials you have used in writing this report including, academic articles, website
links, other publications and any other communications. Consult your supervisor regarding the
appropriate format to use in citing your reference materials (E.g. IEEE format [1])
[1] J. IEEE. IEEE Citation Reference [Online]. Available:
Kannan, (2015). Outlier detection in multivariate data. Retrieved from
Green, C (2015). Detecting multivariate financial data. Retrieved from
Guo, J (2015). A note on conventional outlier detection. Retrieved from
Komsta, L (2015). Package outliers. Retrieved from
Aggarwal, C (2016). Outlier analysis. Retrieved from
