Friday, September 21, 2018

Developing Strategies and Technology for Generation and Analysis of Longitudinal High Frequency Data Streams from Faculty and Students

“Developing Strategies and Technology for Generation and Analysis of Longitudinal High Frequency Data Streams from Faculty and Students”.
(This project has been presented to the NSF in the form of a proposal for The NSF 2026 Idea Machine)
Synopsis
Avery developed science is the result of the scrupulous analysis of a vast amount of data.
To advance science of education, researchers need (a) to establish reliable sources of data, which could later be (b) analyzed.
This project is to establish a reliable procedure or procedures for developing a large amount of reliable data using techniques developed in other field (physics) for high frequency “datamining”.
The proposal has been developed by Prof. Plamen Ivanov and I more than a year ago. Prof. Plamen Ivanov is the director of the Keck Laboratory for Network Physiology where he applies various data mining methods (and more) to study correlations within the physiological networks of a human body. We believe that similar methods can and need to be applied to study various aspects of learning and teaching practices of individuals and the groups of individuals. However, despite our best effort, so far, we could not find any interests in funding our research, neither inside our institutions, nor from the NSF. This fact represents another confirmation that currently the NSF has more interest in funding social-oriented projects in education (making improvements “here and now”) than supporting fundamental research (no one can predict if, where, and how that research would bring fruitful social or economic results).
However, we hope the NSF may soon shift its approach to funding fundamental research in the field of education; the indication of this possibility is the NSF’s call for “big ideas”.
We have submitted our proposal for The NSF 2026 Idea Machine. However, we welcome the attention and support from any party interested in the development of a fundamental science of education.
Below is the short description of the goals and proposed methods (copied from the full proposal).
I. Introduction:
This project, when realized, has a potential to transform science of education and, hence, education. The essence of the project is developing a revolutionary and science-based innovative approach to describing, structuring, analyzing, and assessing the teaching and learning process.
This 2-page presentation is to present the shortest version of the proposal, but the innovative nature of the proposed project demands more detailed representation, which is offered later.
The big data analysis has entered many important human practices. For example, one can point at such fields like: Human Genome Project (DNA v. health), healthcare and epidemiology (spread of diseases), particle physics, social and business networking (Facebook, Twitter, Snapchat, Instagram, cellphone communication, telemedicine, remote business communication), national security (trends in various networks), business network analysis (AirB&B, Uber, Lift, Netflix), trading stocks, currency exchange (live records of massive volume of transactions). Within all those fields, data scientists were able to: (1) establish protocols and procedures for quantifying data, for collecting, structuring, comparing and sharing vast amounts of data; and (2) for mining the large data bases for extracting valuable and reliable information on the correlations between multiple parameters based on various types and levels of data coming from multiple sources.
However, despite the fact that education represents one of the most vastly spread and one of the most important human practices, the methods developed in other fields for (1) collecting, and (2) mining BIG data have not found applications in the field of education. Current approaches do not provide understanding of the deep structure of teaching and learning processes, do not lead to development of quantitative measures of the quality of teaching, and development of quantitative measures of the trends in teaching (e.g. the measure of the improvement in teaching), and development of quantitative measures of the student progress correlated with student learning outcomes.
II. Description of the current state of the Educational Data Mining (a.k.a. EDM):
1.  EDM is in the stage of an early development and rather represents Advanced Educational statistics (e.g. Educational Data Mining Society has been formed only five years ago: educationaldatamining.org).
2.  Currently the following approaches are used to obtain various educational data:
·     Observing school teachers or college faculty while teaching and assessing teacher’s actions using various observation protocols (e.g. BOPR, COPUS, MarzanoOP, RTOP, GORP).
·     Observing school and college students while being taught using various observation protocols (e.g. a “STEM class observation protocol”).
·     Collecting responses to various surveys (e.g. “National Survey of Student Engagement”, “National Survey of College Faculty”).
·     Collecting data during various student-computer interactions when using various computer-based media (MOOCs, computer games, intelligent tutoring systems, online content delivery systems, online homework delivery systems).
It is important to stress that:
(A) When data collection methods are based on the use of surveys or observation protocols, they are typically used only ones or twice during a teaching period (a semester, or a year); these methods are typically used to observe of a small percentage of teachers and students.
(B)  Data collected using computer-based media does not access the everyday reflection of students on the learning process (actions taken for absorbing information and developing skills, and following results and satisfaction); does not access the everyday reflection of teaching faculty on the teaching process and on the student progress; this data typically presents the aggregated student response on the course as a whole (ranking the difficulty of a course, ranking homework assignments, indicating relevance of a textbook and other resources, overall satisfaction); mostly present two-parametric correlations like “time used for homework” – “final grade”.
Currently, educational data: is collected during isolated educational projects; does not represent longitudinal streams of high frequency data collected during the full term of learning; does not satisfy criteria for being “big data” (except few collected via student-computer interactions); does not involve data streams with a large number of parameters; does not allow cross analysis for searching stable correlations between multiple parameters. In its current state, EDM is rather Advanced Educational Statistics.
Currently, there is NO research which:
(1) regularly and frequently (e.g. several times a week) collects data simultaneously from teaching faculty and from students during the whole period of teaching a course (not just via observing one lecture);
(2) uses media technologies, including phone apps, to collect the desired sets of educational data incoming from multiple sources (faculty, disciplines, departments, institutions);
(3) uses technologies to mining data in searching for stable correlations between different factors affecting teaching-learning practices and student’s performance using multivariable (multi-parametric) space.
Currently, there is no “brick-and-mortal” educational institution which collects from faculty and from students high frequency responses about multiple features of a teaching and learning processes. There is no institution which collects and cross-correlates multiple responses across various disciplines over a long period of time.
III. The scope and immediate goals of the proposed project:
The project will pioneer (A) the development of a new type of a big data base via collecting longitudinal streams of high frequency data in the field of education; (B) the development of the new methodology for mining new type of educational data and extracting valuable and reliable information on the correlations between various parameters of multiple data sources of different types and levels (faculty, departments, institutions).
Every day zillions of apps are being used by millions of people. People already have habits of tracking information every day (calories intake, calories burned, steps made, miles traveled, etc.). Why not harness the new technologies and the new habit to generate a stream of high frequency educational data?         
The goals:
1.  Establishing a set of measurable and universal (but modifiable) parameters which will be used for describing the state and structure of any teaching and learning processes (i.e. for any course).
2.  Developing one questionnaire for teaching faculty and one questionnaire for students, which they will use during a course regularly and frequently for self-observation, for assessing students’ actions and progress, for assessing faculty teaching actions and traits.
3.  Developing an app for collecting the data provided by students and faculty.
4.  Developing the strategy for analyzing the data coming from faculty and students in search for correlations.
5.  Developing a web-site for collecting the data coming from faculty and students.
6.  Piloting the program
We are proposing collecting high frequency longitudinal responses (from faculty and students: before the beginning of the course, then after each lecture, after each exam, summative responses after two weeks of a course about lectures, labs and all other features of the course, generalized responses after each third of a semester, and the accumulative responses just before and after the final examination). The goal is to develop procedures which will allow to visualize the structure of the responses, changes in the structure, trends in changes in the structure. This should allow to access regularly student reflection on the course and on his or her performance during the course (how do students assess the difficulty of various assignments, the clarity or helpfulness of lectures, workbooks, textbook, office hours, etc., helpful traits of a lecturer). This also should allow to access regularly the structured reflection of a faculty on teaching approach selected for the course, on students’ readiness, behavior, performance, success. The next goal is to demonstrate the existence of stable trends in correlations between various parameters affecting learning process of students.
IV. Resources.
The project will leverage the existence of the expertise and resources allocated at the Boston University: including scientists who have deep expertise in developing and application of methods for collecting and organizing big data coming from multiple sources, for quantifying data, extracting information from big data on important correlations between multiple parameters describing functioning of various systems or subsystems, finding cross relations, describing information transfer between multiple sources. Using noise reduction methods, finding critical points and visualizing state transitions (PI Prof. Plamen Ivanov), and experienced teaching faculty (co-PI Dr. Valentin Voroshilov), and high computational facility (GHPCC).
V. Future development.
The proposed approach to educational data mining is pioneering the development of the new type of educational data, and the new methodology for collecting and mining that new type of educational data.