The Montana ActiveStatistics Conference

The Montana ActiveStatistics Conference
June 28 and 29, 2018
Carroll College, Helena, Montana
O'Connell Hall 106 (Building 2 on this Campus Map)

Please join us for a conference on statistics, data science, and innovations in the teaching of these subjects. No registration fee will be charged; breakfast and lunch will be provided.

Keynote Address by Prof. Robert Gould, Director of the UCLA Center for Teaching Statistics (Slides)

Conference participants may park in any of the "F" (faculty) or "B" parking lots (Campus Map). Helena has many hotels, but the nearest is the Best Western Premier Helena Great Northern Hotel.

Brief Conference Agenda

Thursday, June 28

8:15 - 9:00 Registration and breakfast

9:00 - 9:30 Jodi Fasteen (Carroll College) Leveraging real world data in an introductory statistics course. (Data Set) (Questions) (Slides)

9:35 - 10:05 Kelly Cline (Carroll College) Clicker questions that stimulate discussions and provoke misconceptions in introductory statistics (Slides)

10:10 - 10:40 Stacey Hancock (Montana State University) Six ways to integrate the Guidelines for Assessment and Instruction in Statistics Education into your introductory statistics course (Slides)

10:40 - 11:00 Break

11:00 - 11:30 Eric Sullivan (Carroll College) Data Science with Active Learning, Real Data, and Projects (Slides)

11:35 - 12:05 Phillip Curtiss (Montana Tech) Data Science Program at Montana Tech (Slides)

12:10 - 1:00 Lunch

1:00 - 1:30 Eric Sullivan (Carroll College) A Hands On Exploration of Neural Networks (Link to Neural Network Playground) (Slides)

1:30 - 2:00 Clay Looney (University of Montana) Analytics & Innovation: Unleashing New Sources of Value

2:05 - 2:20 J. Hathaway (BYU-Idaho) BYU-I and Undergraduate Data Science (Slides)

2:25 - 2:40 Andy Hoegh (Montana State University) Statistical Programming to Principles of Data Science: Rethinking the traditional statistical programming curricula

2:45 - 3:15 Break

3:15 - 4:15 Rob Gould (UCLA): We Are All Data Scientists (Or We Should Be) (Slides)

4:15 - 5:00 Discussion

5:00 Dinner on your own

Friday, June 29

8:15 - 9:00 Breakfast

9:00 - 9:30 Robert delMas (University of Minnesota): Random is Random: Helping Students Distinguish between Random Sampling and Random Assignment (Slides)

9:35 - 10:05 Kyle Caudle (South Dakota School of Mines and Technology): Activity Based Learning in Statistics. Why shouldn't statistics be fun? (Slides)

10:10 - 10:40 Jennifer Green (Montana State) GAISE-ing Into Theory: Statistical Thinking in Mathematical Statistics (Slides)

10:40 - 11:00 Break

11:00 - 11:30 Matt Roscoe (University of Montana) Supporting Statistical Literacy with GeoGebra (Slides)

11:35 - 12:05 Roger Johnson (South Dakota School of Mines & Technology) Activities for the statistics classroom (Slides)

12:10 - 1:30 Lunch

1:30 - 2:00 Ted Wendt (Carroll College) The Carroll College ActiveStatistics Curriculum (Carroll Active Statistics Web Page)

2:05 - 2:35 Elizabeth Younce (Carroll College) The Student Undergraduate Research Experience: Using Institutional Data to Answer Real Questions (Slides)

2:35 - 2:50 Break

2:50 - 3:05 Mark Greenwood (Montana State University) Comparing grade outcomes in a second statistics course based on introductory statistics curricula. (Slides)

3:10 - 3:25 David Lartey (Montana State University) Application of the Cox Regression Model to Estimate Dropout Rate in Introductory Statistics

3:30 - 3:45 Cody Custis (State of Montana) Finding Cases of Non-Overdose, Not Drug Poisoning in Montana (Slides)

3:50 - 4:05 Paul Harmon (Montana State University) An Alternative to the Carnegie Classifications: Using Structural Equation Models to Identify Similar Doctoral Institutions (Slides)

4:10 - 5:00 Discussion: How do data science and statistics curricula interact? Challenges and opportunities

Conference Agenda with Abstracts

Thursday, June 28

8:15 - 9:00 Registration and breakfast

9:00 - 9:30 Jodi Fasteen (Carroll College) Leveraging real world data in an introductory statistics course.
We will actively explore the Pulse of a Nation data set using TinkerPlots to generate visuals. We will also discuss the who, what, when, where, and why of using real world data in an intro statistics course.

9:35 - 10:05 Kelly Cline (Carroll College) Clicker questions that stimulate discussions and provoke misconceptions in introductory statistics
In teaching statistics, clickers can be used to have the students discuss and vote on multiple-choice questions which promote discussions, provide instructor feedback, and provoke common errors and misconceptions, thus creating engaging and memorable lessons. To identify the most effective clicker questions, we recorded the percentage of the class voting on every question used in nineteen sections of introductory statistics courses, taught by three instructors, at two institutions, over the course of nine years, a total of 971 class votes on a set of 202 questions. Of these, 78 questions had data from at least five votes. We study the data from student votes, identifying the clicker questions which were most effective at promoting discussions, at providing instructor feedback, and at provoking common errors and misconceptions. We present examples of these questions and discuss how we used them in class.

10:10 - 10:40 Stacey Hancock (Montana State University) Six ways to integrate the Guidelines for Assessment and Instruction in Statistics Education into your introductory statistics course
In 2016, the GAISE College Report ASA Revision Committee revised the 2005 Guidelines for Assessment and Instruction in Statistics Education College Report, recommendations on what and how we teach introductory statistics. Even though the emergence of data science as a discipline and the availability of vast quantities of data have transformed the landscape of statistics education, these guidelines remained largely intact: 1. Teach statistical thinking. 2. Focus on conceptual understanding. 3. Integrate real data with a context and purpose. 4. Foster active learning. 5. Use technology to explore concepts and analyze data. 6. Use assessments to improve and evaluate student learning. Two new emphases to the first recommendation were added to reflect the changes made through advances in technology and the expansion of a data-driven society: a. Teach statistics as an investigative process of problem-solving and decision-making, and b. Give students experience with multivariable thinking. In this talk, we will explore six ways to integrate these guidelines into your introductory statistics course, from what we teach to how we teach. Along the way, we will highlight innovations in statistics education and how statistics educators are incorporating data science into the classroom.

10:40 - 11:00 Break

11:00 - 11:30 Eric Sullivan (Carroll College) Piloting a data science course using real data, projects, and active learning.
Data science problems are naturally open ended and poorly posed. As such, students learning data science need to build flexible skill sets for dealing with modern data science problems. In this presentation we discuss three common themes of the data science courses at Carroll College: active learning, using real data, and projects. We will discuss how these pedagogical choices help to build students skills with handling the unique features of each new data set and data science problem.

11:35 - 12:05 Phillip Curtiss (Montana Tech) Data Science Program at Montana Tech
The Bachelor of Science degree in Data Science at Montana Tech was approved by the Montana Board of Regents in September 2016. Prior to its approval as a standalone program, the subject matter was offered as a concentration option in the Statistics Program. The Data Science degree program is designed to produce graduates with strong statistics and computer science training to meet the industry proficiency needs of today's data scientists. Dr. Curtiss will talk about the industry need that drove the development of the program, the organization of the program being offered as a joint effort between the Computer Science Department and the Statistics Program, the curriculum of the Data Science program, and its status after its first year of being offered.

12:10 - 1:00 Lunch

1:00 - 1:30 Eric Sullivan (Carroll College) A Hands On Exploratin of Neural Networks
In this hands-on activity we will briefly discuss what Neural Networks are and we will then use an online tool to explore training and building features for Neural Networks. Neural Networks have seen a recent resurgence in popularity in Data Science, and part of the struggle that students have when learning neural networks is cutting past the technical details to the simple underlying ideas. We have used the exploration in an introductory Data Science class and students demonstrate a better understanding of feature selection, network training, and network architecture after this activity.

1:30 - 2:00 Clay Looney (University of Montana) Analytics & Innovation: Unleashing New Sources of Value
The presentation focuses on transforming the benefits of data analytics into innovations that provide new sources of value to organizations. Based on experiences teaching in and directing the University of Montana's Masters of Science in Business Analytics (MSBA) program, the presenter will discuss the how the intersection of business, statistics, and computing links data analytics to innovations that can unlock new insights and unleash tremendous value.

2:05 - 2:20 J. Hathaway (BYU-Idaho) BYU-I and Undergraduate Data Science
We have completed our first year of our Bachelors and Associates degrees in data science at BYU-I. In this presentation, I will share the growth experiences that we had working across colleges and departments to build, propose, and support the degree and new courses. In addition, I will share the development of our data science society on campus and other near campus data science activities that are progressing. Finally, I will discuss the internship and employment opportunities and challenges our students face with an undergraduate degree in data science.

2:25 - 2:40 Andy Hoegh (Montana State University) Statistical Programming to Principles of Data Science: Rethinking the traditional statistical programming curricula
Statistics departments have traditionally taught statistical programming courses that focus on analyzing data using programs such as R, Minitab, SPSS, or SAS. While these software packages are essential for a statistical practitioner, courses are often taught assuming a tidy data structure and the emphasis is placed largely on creating graphics and running statistical tests. In an era with messy, unstructured data more emphasis needs to be placed on general programming skills. While the field of data science lacks a concise, agreed upon definition, one essential component is strong programming skills, which are often reinforced in an introductory or principles of data science course. These courses often place more emphasis on data scraping, wrangling, processing, and storage than the typical statistical programming course. This talk provides a review of general statistical programming courses and principles of data science courses and recommends a hybrid approach for undergraduate and graduate statistics students that meld computational thinking and statistical principles.

2:45 - 3:15 Break

3:15 - 4:15 Rob Gould (UCLA): We Are All Data Scientists (Or We Should Be)
In this talk, I make the argument that data literacy - a combination of statistical thinking, computational thinking, and mathematical thinking - is too important to ignore, and yet it has been and continues to be ignored. I'll also discuss several projects at the K-16 level that have attempted to improve data literacy.

4:15 - 5:00 Discussion

5:00 Dinner on your own

Friday, June 29

8:15 - 9:00 Breakfast

9:00 - 9:30 Robert delMas (University of Minnesota): Random is Random: Helping Students Distinguish between Random Sampling and Random Assignment
Recommended learning goals for students in introductory statistics courses include the ability to recognize and explain the key role of randomness in designing studies and in drawing conclusions from those studies involving generalizations to a population or causal claims (GAISE College Report ASA Revision Committee, 2016). A study was designed to explore introductory statistics students' understanding of the distinct roles that random sampling and random assignment play in study design and the conclusions that can be made from each. A study design unit lasting two and a half weeks was designed and implemented in four sections of a university undergraduate introductory statistics course based on modeling and simulation. Descriptions of some of the classroom activities and assessment results will be presented, along with comparisons to performance of university students on similar items from the Comprehensive Assessment of Outcomes in Statistics (CAOS) test. Results indicated that students' understanding of study design concepts increased after the unit and the great majority of students successfully made appropriate connections between random sampling and generalization, and between random assignment and causal claims.

9:35 - 10:05 Kyle Caudle (South Dakota School of Mines and Technology): Activity Based Learning in Statistics. Why shouldn't statistics be fun?
This talk will focus on 2 classroom exercises that will help students become participants in their learning. The first exercise uses a permutation goodness of fit test to investigate whether the Gamemakers in the Hunger Games rigged the lottery which choses children (tributes) to participate in the games. The second exercises investigates statistical significance by discussing the question "How many times should a deck be shuffled while playing card games?". Both activities have been recently published in the Journal Teaching Statistics.

10:10 - 10:40 Jennifer Green (Montana State) GAISE-ing Into Theory: Statistical Thinking in Mathematical Statistics
The Guidelines for Assessment and Instruction in Statistics Education (GAISE) College Report has helped to shape the teaching of introductory statistics courses over the past 13 years, recommending that such courses 1) teach statistical thinking, 2) focus on conceptual understanding, 3) integrate real data with a context and purpose, 4) foster active learning, 5) use technology to explore concepts and analyze data, and 6) use assessments to improve and evaluate student learning. Although the report is geared toward the introductory course, the GAISE recommendations are just as relevant for higher level statistics courses. In this presentation, we will discuss implementing the reforms in the two-semester calculus-based undergraduate mathematical statistics sequence. We will describe teaching methods, strategies and tools we have used to promote student understanding and problem solving, as well as share student reflections on the course revisions.

10:40 - 11:00 Break

11:00 - 11:30 Matt Roscoe (University of Montana) Supporting Statistical Literacy with GeoGebra
GeoGebra is a dynamic mathematics software that is freely available to non-commercial users. Recent upgrades to GeoGebra have expanded upon previously available geometry and algebra views to include a new spreadsheet view that offers dynamic data exploration and imaging. Using examples drawn from the study of probability and statistics we will explore how GeoGebra can support the development of statistical literacy at the undergraduate level. We will conclude by examining several examples of student work where GeoGebra is employed in statistical inquiry.

11:35 - 12:05 Roger Johnson (South Dakota School of Mines & Technology) Activities for the statistics classroom
I'll outline a number of activities I've used in my statistics classrooms to get students more actively involved in their learning of topics in estimation, hypothesis testing and least squares. Any needed props may be readily obtained and are inexpensive.

12:10 - 1:30 Lunch

1:30 - 2:00 Ted Wendt (Carroll College) The Carroll College ActiveStatistics Curriculum

2:05 - 2:35 Elizabeth Younce (Carroll College) The Student Undergraduate Research Experience: Using Institutional Data to Answer Real Questions
Last fall semester I had the opportunity to use faculty workload data from the college in an internship with the Office of Institutional Effectiveness. Lessons from this included how to correctly work with sensitive data, why data dictionaries are mandatory, how discrete math is used in data science, and lots of new ways to work in R Studio when data isn't pre-cleaned for analysis.

2:35 - 2:50 Break

2:50 - 3:05 Mark Greenwood (Montana State University) Comparing grade outcomes in a second statistics course based on introductory statistics curricula.
Simulation-based, active-learning curricula are becoming popular for introductory statistics courses and recent work has suggested that these new teaching methods are improving student success rates in the introductory course, but little is known about the impacts on students in a second course. We use data from a university that has been introducing various versions of new curricula alongside traditional approaches to assess the impacts of three different curricula on student performance in a second course. We find that students who take randomization-based courses have lower average performance in a second statistics course, but that these differences are small relative to those based on students' prior grades.

3:10 - 3:25 David Lartey (Montana State University) Application of the Cox Regression Model to Estimate Dropout Rate in Introductory Statistics
Do students with a low number of prior math courses have a higher dropout rate in Introductory statistics than students with a high number of prior math courses? How does a student's ACT, SAT or MPLEX score affect their dropout rate? To answer these questions, one needs to perform survival analysis. Survival, or time-to-event analysis, is one of most significant advancements of mathematical statistics in recent years with broad applications in the fields of mechanical research, engineering and especially biomedical research. In this paper, review the properties and modeling methods for survival data, then fit a Cox Proportional Hazards Model for the data on time until dropout for students in the introductory statistics course (STAT 216). The data showed that the number of prior math courses taken and/or MPLEX score could have an effect on the time to drop out for students enrolled in the STAT 216 course.

3:30 - 3:45 Cody Custis (State of Montana) Finding Cases of Non-Overdose, Not Drug Poisoning in Montana
Background: ESSENCE is a monitoring system originally developed and implemented for early detection of a large-scale release of a biologic agent-such the 2001 anthrax attack. In Montana, ESSENCE utilizes information gathered from electronic health records (EHR) uploaded from emergency departments which are updated every 24 hours, giving near real-time surveillance.
Public health and medical professionals now use ESSENCE for identifying disease clusters, determining the size and spread of an outbreak after it is detected, and monitoring disease trends. ESSENCE is also used to detect opioid drug overdoses. 'Spikes' from sources such as fentanyl can be detected through use of ESSENCE
Challenge: Currently, syndromic surveillance data for drug overdose is selected using algorithms from the Centers for Disease Control and Prevention (CDC). These algorithms have high sensitivity to identify true positive drug overdose, but have specificity issues with false negatives. For example, a search for the term "opioids" may find "opioids were ruled out as a contributing cause." The challenge is to develop better algorithms to detect drug overdose. Students will have access to a training dataset of a few hundred observations from 2016, to be validated on 2017 data.

3:50 - 4:05 Paul Harmon (Montana State University) An Alternative to the Carnegie Classifications: Using Structural Equation Models to Identify Similar Doctoral Institutions
Institutional classification systems, such as the Carnegie Classifications, help to delineate groups of institutions with similar characteristics; they are thus used by administrators to guide policy decisions. However, the Carnegie Classifications are marked by several statistical and practical shortcomings. First, they are neither well-documented nor easily reproduced. Second, they rely on subjective determination of groups based on visual inspection of indices based on two separate Principal Component Analyses of institutional characteristics.
Using the 2015 data set from the Carnegie Classifications for Doctoral granting institutions, we propose an alternative method of classification that relies on Structural Equation Modeling of latent factors rather than PCA-based indices of institutional productivity. In this method, we create a single index created from two latent factors: one pertaining to STEM research outcomes and the other to non-STEM outcomes. Classifications can then be made using univariate clustering methods as opposed to the two-variable cluster solution and subjective group determination done in the Carnegie method. We further demonstrate the use of R-Shiny applications that allow a user to change the underlying variables on which universities are measured and assess the resulting changes in group membership and identify groups of comparison schools.

4:10 - 5:00 Discussion: How do data science and statistics curricula interact? Challenges and opportunities

The ActiveStatistics project has been made possible through the generosity of the W.M. Keck Foundation.

kcline@carroll.edu

The Montana ActiveStatistics Conference