Chosen data set: 70% Assi

Need help with assignments?

Our qualified writers can create original, plagiarism-free papers in any format you choose (APA, MLA, Harvard, Chicago, etc.)

Order from us for quality, customized work in due time of your choice.

Click Here To Order Now

Chosen data set:
70% Assi

Chosen data set:
70% Assignment 2 2023/24
You will
use R to mine actual data for a problem of interest.  You will be provided with a list of data sets
from which you will choose. You will design the data mining
task, mine the data, and describe your results. 
You also will research existing solutions to the problem, if any have
been proposed or documented.  Your own
data and results need not be on a par with actual industry results; the goal is
for you to get as realistic a hands-on experience as possible, given the
constraints of what you have learned.
In writing
up/presenting your research, think of yourself as an analyst employed by or
retained by a company (large or small) or by a funding source (e.g., a venture
capital (VC) firm or incubator), who wants to understand the state of the art
for using data mining for the task in question. 
Review what has been done to date on your problem.  Consider as an example predictive analytics
for on-line advertising:  A VC firm
considering funding on-line ad networks or ad-tech start-ups would need to
understand the state of the art in using data mining for targeting on-line
advertising, when considering an idea for applying data mining.  Don’t worry too much about coming up with a
novel idea.  It is more important to
develop the idea well (within the scope of what we’ve discussed in class). 
You must
use the CRISP-DM data mining process to structure your research and report.
Keep in mind that it may be ineffective simply to proceed linearly through the
steps, and this may need to be reflected in your analysis.  You should interact with me from the
preparation of your initial ideas through to writing your report, as a
consultant would interact with a firm or funding source in preparing a research
report. Use your imagination, prior experience, or ask for help to fill in any
gaps between the material available and what you would be able to find out if
you actually could interact with the client firm. 
Submission: On Wednesday 1 May 2024 you will submit your final report which should be about 1500
words, plus any appendices you would like to include.  Use external sources where appropriate and
provide clear citations and bibliography. You must also submit your data file and a working R script which I can run against it; failure to submit these will immediately reduce your potential mark by
at least 25%.
You will
get the most out of the project if you interact with me during the development
of your ideas.  Please feel free to talk
to me about your ideas as often as you’d like either in workshops in the second
half of term or in my online feedback and guidance hours. Or email me with
specific questions/problems you are having – please include your complete R
script file and data file or link to it so that I can answer your question
While we often learn coding by copying and editing code written by
others, there is a limit to how much copying you may do for this assignment.
You may copy and edit code from any of the workshop notes. You may also copy
code snippets (a line or two at a time) from elsewhere so long as you have to
edit them in some way to refer to your dataset. You
should not copy code from a source which is working with the same dataset that
you are using – I will regard this as plagiarism, and you
will then have to take the consequences. You
should also note that the use of AI tools in production of either the report or
the code is an assessment offense.
Your report should include the information
detailed below, in approximately the order given.  Be as precise/specific as you can.
Business Understanding (take this
Business Understanding phase focuses on understanding the objectives and
requirements of the project. Aside from the third task, the three other tasks
in this phase are foundational project management activities that are universal
to most projects:
Determine business objectives: You should
first “thoroughly understand, from a business perspective, what the customer
really wants to accomplish.” (CRISP-DM Guide) and then define business success
Assess situation: Determine resources
availability, project requirements, assess risks and contingencies, and conduct
a cost-benefit analysis.
Determine data mining goals: In addition to
defining the business objectives, you should also define what success looks
like from a technical data mining perspective.
Data Understanding
to the foundation of Business Understanding, it drives the focus to identify,
collect, and analyse the data sets that can help you accomplish the project
goals. This phase also has four tasks:
Collect initial data: Acquire the necessary
data and (if necessary) load it into your analysis tool.
Describe data: Examine the data and document
its surface properties like data format, number of records, or field
Explore data: Dig deeper into the data. Query
it, visualize it, and identify relationships among the data.
Verify data quality: How clean/dirty is the
data? Document any quality issues.
Data Preparation
phase prepares the final data set(s) for modelling. It has five tasks:
Select data: Determine which data sets will be
used and document reasons for inclusion/exclusion.
Clean data: Often this is the lengthiest task.
Without it, you’ll likely fall victim to garbage-in, garbage-out. A common
practice during this task is to correct, impute, or remove erroneous values.
Construct data: Derive new attributes that
will be helpful. For example, derive someone’s body mass index from height and
weight fields.
Integrate data: Create new data sets by
combining data from multiple sources.
Format data: Re-format data as necessary. For
example, you might convert string values that store numbers to numeric values
so that you can perform mathematical operations.
is possible that you will need to do little here, as I have selected data sets
which are quite clean. However, you do need to show that you are aware of what
you should look for, even if no action is then required.
you’ll likely build and assess various models based on several different
modelling techniques. This phase has four tasks:
Select modelling techniques: Determine which
algorithms to try (e.g. linear or logistic regression).
Generate test design: Depending your modelling
approach, you might need to split the data into training, test, and validation
Build model: As glamorous as this might sound,
this might just be executing a few lines of code.
Assess model: Generally, multiple models are
competing against each other, and the data scientist needs to interpret the
model results based on domain knowledge, the pre-defined success criteria, and
the test design. Use appropriate assessment criteria which apply across your
models so that you can compare them.
the Assess Model task of the Modelling phase focuses on technical model
assessment, the Evaluation phase looks more broadly at which model best meets
the business and what to do next. This phase has three tasks:
Evaluate results: Do the models meet the
business success criteria? Which one(s) should we approve for the business?
Review process: Review the work accomplished.
Was anything overlooked? Were all steps properly executed? Summarize findings
and correct anything if needed.
Determine next steps: Based on the previous
tasks, determine whether to proceed to deployment, iterate further, or initiate
new projects.
model is not particularly useful unless the customer can access its results.
The complexity of this phase varies widely. This final phase has three tasks:
Plan deployment: Develop and document a plan
for deploying the model.
Plan monitoring and maintenance: what will
need to be done to avoid issues during the operational phase (or post-project
phase) of a model.
Review project: Conduct a project
retrospective about what went well, what could have been better, and how to
improve in the future.
submitted and assessed part of this coursework is a report together
with R code and data files, rather than an academic essay.  Thus, the marking criteria are different from
those usually required for an academic essay. 
Your assignment will be assessed on the criteria shown in the rubric on
the next two pages.
Understanding (10%)
Outstanding definition of the business problem with precise and
detailed statement of how data mining solution will address it
Excellent definition of the business problem with precise and
detailed statement of how data mining solution will address it
Very good definition of the business problem with precise or
detailed statement of how data mining solution will address it
Good definition of the business problem with statement of how
data mining solution will address it
Some attempt at definition of the business problem with
imprecise statement of how data mining solution will address it
Weak definition of the business problem with little
consideration of how data mining solution will address it
Poor definition of the business problem with little
consideration of how data mining solution will address it
Little definition of the business problem with little
consideration of how data mining solution will address it
No definition of the business problem
Understanding (10%)
Outstanding identification and description of data and data
Excellent identification and description of data and data
Very good identification and description of data and data
Good identification and description of data and data sources
Some attempt at identification and description of data and data
Weak identification and description of data and data source
Poor identification and description of data and data sources
Little identification and description of data and data sources
No identification or description of data and data sources
Preparation (10%)
Outstanding specification of how data are prepared for data
Excellent specification of how data are prepared for data mining
Very good specification of how data are prepared for data mining
Good specification of how data are prepared for data mining
Some attempt at specification of how data are prepared for data
Weak specification of how data are prepared for data mining
Poor specification of how data are prepared for data mining
Little specification of how data are prepared for data mining
No specification of how data are prepared for data mining
Outstanding choice of one or more modelling
/prediction techniques
Excellent choice of one or more modelling
/prediction techniques
Very good choice of one or more modelling
/prediction techniques
Good choice of one or more modelling
/prediction techniques
Good choice of single modelling technique
Weak choice of single modelling technique
Poor choice of single modelling technique
Little choice of single modelling technique
No modelling technique described
Outstanding discussion of alternatives and applicability of
Excellent discussion of alternatives and applicability of
Very good discussion of alternatives and applicability of
Good discussion of alternatives and applicability of model(s)
Some attempt at discussion of alternatives and applicability of
Weak discussion of alternatives and applicability of model(s)
Poor discussion of alternatives and applicability of model(s)
Little discussion of alternatives and applicability of model(s)
No consideration of alternatives and applicability of model(s)
Outstanding discussion of aspects of evaluation
Excellent discussion of aspects of evaluation
Very good discussion of aspects of evaluation
Good discussion of aspects of evaluation
Some attempt at discussion of aspects of evaluation
Weak discussion of aspects of evaluation
Poor discussion of aspects of evaluation
Little discussion of aspects of evaluation
No discussion of aspects of evaluation
Outstanding discussion of deployment issues
Excellent discussion of deployment issues
Very good discussion of deployment issues
Good discussion of deployment issues
Some attempt at discussion of deployment issues
Weak discussion of deployment issues
Poor discussion of deployment issues
Little discussion of deployment issues
No discussion of deployment issues
R Code
Submitted code and data contains functionality which goes beyond
what has been taught
Submitted code and data function efficiently and without changes
Submitted code and data function without changes
Submitted code and data function without significant changes
Submitted code and/or data require some changes before they will
Submitted code and/or data require substantial changes before
they will function
Little code submitted and/or code and data require substantial
changes before they will function
Little code submitted and code and data require substantial
changes before they will function
No code submitted
Code contains clearly worded comments throughout
Code contains clearly worded comments
Code contains comments throughout
Code contains comments
Code contains some comments
Code contains few comments
Code contains very few comments
Comments are unhelpful or misleading
Code contains no comments
References (5%)
use of citations within the text with all references accurately cited
use of citations within text with nearly all references accurately cited
Very good
use of citations within text with most references properly cited
Good use
of citations within text with most references properly cited
use of citations within text but not necessarily cited properly
Weak use
of citations within text and not necessarily cited properly
citations within text and not accurately cited
Very few
if any citations within text and not accurately cited
evidence of understanding of referencing systems
Writing Style (5%)
Outstanding fluency in writing style. Attention
to spelling, punctuation and/or grammar is outstanding
Excellent fluency in writing style. Attention to
spelling, punctuation and/or grammar is excellent
Very good fluency in writing style. Attention to
spelling, punctuation and/or grammar is very good
Good fluency in writing style. Attention to
spelling, punctuation and/or grammar is good
Satisfactory fluency in writing style. Attention
to spelling, punctuation and/or grammar is satisfactory
Weak fluency in writing style. Attention to
spelling, punctuation and/or grammar is weak
Poor fluency in writing style. Attention to
spelling, punctuation and/or grammar is poor
Little to no fluency in writing style. Attention
to spelling, punctuation and/or grammar is absent
No fluency in writing style. No attention to
spelling, punctuation and/or grammar
Overall Presentation
Outstanding competence in organisation and
Excellent competence in organisation and
Very good competence in organisation and
Good competence in organisation and presentation
Satisfactory competence in organisation and
Weak competence in organisation and presentation
Poor competence in organisation and presentation
Little to no competence in organisation and
No competence in organisation and presentation
Example of work attached below

Need help with assignments?

Our qualified writers can create original, plagiarism-free papers in any format you choose (APA, MLA, Harvard, Chicago, etc.)

Order from us for quality, customized work in due time of your choice.

Click Here To Order Now