Education data management systemTechnical Field
The invention relates to the technical field of teaching informatization, in particular to an education data management system.
Background
At present, with the application of digital informatization in the education field, more and more education network platforms and terminal education software are produced, a user can generate a large amount of data when using the education platform or the education software, the storage of the data occupies a large amount of space, and how to utilize the data is a technical problem which needs to be considered in the industry. In addition, when data in the education field is analyzed at present, only one-dimensional or two-dimensional data can be analyzed in real time, and how to realize the real-time analysis supporting the multi-dimensional data is also a technical problem to be solved in the education field at present.
Therefore, it is an urgent technical problem to be solved in the art to provide an educational data management system that realizes application of data in the educational field and supports instant query analysis of multidimensional data.
Disclosure of Invention
In view of the above, the present invention provides an educational data management system, which solves the above technical problems.
The invention provides an educational data management system, comprising: the system comprises a data acquisition module, a data warehouse module, a multi-dimensional analysis module and an output module;
the data acquisition module is connected with the data warehouse module and used for acquiring various education data and sending the education data to the data warehouse module;
the data warehouse module is used for dividing the education data according to the logic theme and carrying out layering processing, and comprises a theme model splitting module and a data warehouse building module; wherein,
the topic model splitting module comprises at least six topic models preset according to topic names, wherein the topic names at least comprise student topics, teacher topics, examination topics, test topic topics, behavior topics and flow topics;
the data warehouse building module is used for building a data warehouse through a plurality of topic models, and specifically comprises the following steps: sequentially loading a plurality of topic models through a data retention layer, a fine-grained model layer, a mild summary layer and a moderate summary layer to construct a data warehouse; wherein,
the data retention layer is used for storing the received education data;
the fine-grained model layer is used for performing data integration processing in a subject domain on data of the data retention layer;
the mild summary layer is used for splitting and summarizing related services for the data of the fine-grained model layer;
the moderate summary level is used for generating statistical data from the data of the mild summary level according to the application requirements of the system;
the multidimensional analysis module is connected with the data warehouse module and used for receiving multidimensional analysis instructions, calling Hive tools to inquire according to the multidimensional analysis instructions and generating multidimensional analysis reports;
and the output module is connected with the multi-dimensional analysis module and used for receiving and outputting the multi-dimensional analysis report sent by the multi-dimensional analysis module.
Optionally, the educational data includes structured data, semi-structured data, and unstructured data, and the data acquisition module is further configured to perform disambiguation on the structured data after converting the semi-structured data and the unstructured data into the structured data.
Optionally, the data acquisition module extracts the education data from the service database of the data source through the ETL tool, and sends the education data to the data warehouse module after cleaning and converting the education data; wherein, the mode of extraction includes: the data source with small data quantity and large change quantity adopts full-quantity synchronous extraction; incremental synchronous extraction is adopted for data sources with large data volume and small change; and performing incremental extraction according to the time partition based on the date timestamp or the updated time of the data source table as a time partition field, and adopting full extraction if no time partition field exists.
Optionally, the data source includes education software, education website and teaching system; the data acquisition module extracts education data from a business database of a data source through an ETL tool, and comprises the following steps: the ETL tool extracts education data from the business database of the data source at predetermined time intervals.
Optionally, the information under the student theme includes: at least one of student number, student age, student gender, student birthday, change record of students, student school, student grade, student class and student contact way;
the information under the teacher theme includes: at least one of teacher's contact, teacher's time, professor's subject, professor's class, class student details;
the information under the examination topic includes: at least one of homework practice, simulation test, interim test, end-of-term test, examination paper information record and reference data record;
the information under the topic of the test question comprises: the corresponding relation between the examination questions and the examination question knowledge point information;
the information under the action theme includes: teacher's paper-out record, teacher's paper-reading record, student's answer record;
the information under the traffic topic includes: all behavior logs generated by students on the education software or the education website, and all behavior logs generated by teachers on the education software or the education website.
Optionally, the system further comprises a data application module, wherein the data application module is connected with the data warehouse module and is used for receiving a data application instruction and calling corresponding data from the data warehouse module; the data application comprises at least one of data analysis, data query, data interface service and BI report.
Optionally, the system further comprises a permission management module, configured to provide different usage permissions according to permission levels of the user, where the permission levels include an analyst permission and an engineer permission.
Compared with the prior art, the educational data management system provided by the invention at least realizes the following beneficial effects:
the system provided by the invention abstracts a plurality of topic models according to the characteristics of the internet education data to construct an education data warehouse, and the education data warehouse is used as a topic-oriented, integrated, time-varying and relatively stable data set, so that the support of data analysis decision can be realized. The invention can realize the application of the data in the education field, support the multi-dimensional instant query and analysis of the education data, is suitable for various frequently-variable analysis scenes, can realize the accurate mastering of the data, the statistical analysis and reporting requirements and provides a basis for data mining and decision support.
Other features of the present invention and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.
FIG. 1 is a first block diagram of an educational data management system provided in an embodiment of the present invention;
FIG. 2 is a schematic diagram of a data logical hierarchy within a data warehouse of an educational data management system provided in an embodiment of the present invention;
fig. 3 is a block diagram of an educational data management system according to an embodiment of the present invention.
Detailed Description
Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.
In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.
Fig. 1 is a block diagram of an educational data management system according to an embodiment of the present invention, and fig. 2 is a schematic diagram of a data logical hierarchy in a data warehouse of the educational data management system according to an embodiment of the present invention.
As shown in fig. 1, the educational data management system comprises: the system comprises a data acquisition module 11, a data warehouse module 12, a multi-dimensional analysis module 13 and an output module 14;
and the data acquisition module 11 is connected with the data warehouse module 12 and is used for acquiring various education data and sending the education data to the data warehouse module 12. Optionally, the educational data includes structured data, semi-structured data, and unstructured data, and the data acquisition module 11 is further configured to perform disambiguation on the structured data after converting the semi-structured data and the unstructured data into the structured data. Since the educational data may come from different data sources, there may be duplicate data attributes for which the present invention is capable of disambiguating. In addition, it is also possible that the attributes of the collected portions of the educational data are independent of the analysis objectives provided by the system of the present invention, and the data collection module of the present invention can cull such independent data attributes. The disambiguation processing can achieve the effect of reducing data dimensionality, and meanwhile, reduces the data volume for subsequent processing.
Optionally, the data acquisition module 11 extracts the education data from the service database of the data source through an ETL (Extract-Transform-Load) tool, and sends the education data to the data warehouse module 12 after cleaning and converting the education data; wherein, the mode of extraction includes: the data source with small data quantity and large change quantity adopts full-quantity synchronous extraction; incremental synchronous extraction is adopted for data sources with large data volume and small change; and performing incremental extraction according to the time partition based on the date timestamp or the updated time of the data source table as a time partition field, and adopting full extraction if no time partition field exists. By setting the incremental and full-scale synchronous extraction mode, the advantage of the Hive data warehouse partition table can be fully utilized. The Hive is a data warehouse tool based on Hadoop, can map a Structured data file into a database table, provides a simple sql (Structured Query Language) Query function, and can convert an sql statement into a MapReduce (programming model for parallel operation of large-scale data sets) task for running. The method comprises the steps that education data are cleaned, data from different data sources can be cleaned, and data with partial redundancy and partial information loss are removed; the education data are converted, optionally, the data can be compressed, generalized and normalized by adopting methods of statistics, clustering and classification, and corresponding data conversion is performed on different data, so that the data processed subsequently are more meaningful.
Optionally, the data source includes data sources capable of providing educational data, such as educational software, educational websites, and teaching systems; the data collection module 11 extracts education data from a business database of a data source through an ETL tool, and includes: the ETL tool extracts education data from the business database of the data source at predetermined time intervals. Wherein the predetermined time may be one day, so that the educational data in the data warehouse can be updated every day; the predetermined time may also be one week, so that the educational data in the data warehouse can be updated weekly; the predetermined time may be adjusted and set according to the service requirements of the system.
The data warehouse module 12 is used for dividing educational data according to logic topics and performing hierarchical processing, and the data warehouse module 12 includes a topic model splitting module 121 and a data warehouse building module 122; wherein,
the topic model splitting module 121 includes at least six topic models preset according to topic names, where the topic names at least include a student topic, a teacher topic, an examination topic, a behavior topic, and a flow topic; in practice, more topic models can be set according to specific data analysis requirements. The data warehouse building module 122 is configured to build a data warehouse through a plurality of topic models, as shown in fig. 2, specifically including: sequentially loading a plurality of topic models through a data retention layer, a fine-grained model layer, a mild summary layer and a moderate summary layer to construct a data warehouse; the data retention layer is used for storing the received education data, and the data retention layer stores the history of all data and is used as a user review and basic support; the fine-grained model layer is used for performing data integration processing in a subject domain on data of the data retention layer, can support various data query scenes, and simultaneously supports access and re-development of detailed data; the mild summary layer is used for splitting and summarizing related services for the data of the fine-grained model layer; the medium-level summary layer is used for generating statistical data from the data of the light-level summary layer according to the application requirements of the system.
Optionally, the student theme contains basic student information, and the information under the student theme includes: at least one of student number, student age, student gender, student birthday, change record of students, student school, student grade, student class and student contact way; the change records of the students can be records of the years of the students, the sections of the students, the changes of the students and the like.
The teacher theme comprises basic teacher information, organization relations and the like, and the information under the teacher theme comprises: at least one of teacher's contact, from teaching time, professor's subject, professor's class, class student details.
The examination subject includes examination information, wherein an exercise, a simulation examination, a formal examination, and the like are all counted as one examination, or the examination may be divided according to a rule defined by a user. The information under the examination topic includes: at least one of homework practice, simulation test, interim test, end-of-term test, examination paper information record and reference data record; the reference data records are records of students taking examinations, such as the number of people taking examinations, the number of people lacking examinations, and the like.
The information under the topic of the test question comprises: the corresponding relation between the examination questions and the examination question knowledge point information;
the information under the action theme includes: teacher's paper-out record, teacher's paper-reading record, student's answer record;
the information under the traffic topic includes: all behavior logs generated by students on the education software or the education website, and all behavior logs generated by teachers on the education software or the education website.
The theme in the invention abstracts various core service scenes of internet education, and when the service is newly increased or changed, the theme can be newly increased or a service table can be expanded in the theme. The invention provides good expansibility, readability and usability.
For example, a teaching software records data one: student serial number, student cell-phone number. Data two is recorded in a certain teaching system: student number, student answer number, and subject score. According to the logic theme division of the invention, the data I is divided into student themes, and the data II is divided into action themes.
In the data warehouse provided by the invention, the data retention layer optionally contains the following data:
student basic information (student ID, student age, student gender, student birthday … …)
Student education information (student ID, student school, student grade, student class)
Student answers (student number, student answer number, this question score)
……
The data are merged and processed into the following data at a fine-grained model layer:
student details (student ID, student age, student gender, student birthday, student ID, student school, student grade, student class)
Student answers (student number, student answer number, subject score, knowledge point of subject, college entrance examination … ….)
The light summary layer then further processes the data from the fine-grained model layer to relieve the subsequent computational stress, with the following data:
student basic statistics (school, grade, class, boy number, girl number, birthday 7 months before birthday)
Student answering statistics (student ID, knowledge point, full-scale, lost-scale, 0-scale)
Finally, the data from the mild summary layers are further processed by the moderate summary layers to form statistical data.
Wherein, the ID is a number, an identification number or an account number.
And the multidimensional analysis module 13 is connected with the data warehouse module 12 and is used for receiving multidimensional analysis instructions, calling Hive tools to query according to the multidimensional analysis instructions and generating multidimensional analysis reports. Optionally, the fine-grained model layer may perform correlation query to obtain the result of data analysis. In the educational data warehouse, the subjects and the data in the subjects are extracted from real services, so that the educational data warehouse is easy to understand and has extremely high association efficiency.
For example, when it is necessary to count all wrong questions made by a certain class of students in a certain month, student information, student response records, test paper information and test question information can be associated and queried through a Hive tool. When the habit of a certain student using the teaching software to learn needs to be inquired, the student information, the student score and the flow data can be associated.
And the output module 14 is connected with the multidimensional analysis module 13 and is used for receiving and outputting the multidimensional analysis report sent by the multidimensional analysis module. Optionally, the output module 14 further includes a visualization unit, and the visualization unit may visually display the multidimensional analysis report based on the report data by using a visualization tool. The data and the chart are combined through a visualization tool, so that the analysis result is more visual and understandable.
In an embodiment, fig. 3 is a block diagram of a second educational data management system according to an embodiment of the present invention, and as shown in fig. 3, the educational data management system further includes a data application module 15, where the data application module 15 is connected to the data warehouse module 12, and is configured to receive a data application instruction and retrieve corresponding data from the data warehouse module 12; the data application comprises at least one of data analysis, data query, data interface service and BI report. The system provided by the embodiment of the invention can be suitable for different application scenes and various data statistical reports, supports data mining and model training of algorithm engineers, and supports flexible data external services. Optionally, when the application instructions include data mining and algorithm requirements, the moderate summarized data may be used for data profile understanding, and the light summarized data content is used for model training and algorithm implementation.
The educational data management system further comprises a rights management module 16 for providing different usage rights according to the user's rights level, wherein the rights level includes analyst rights and engineer rights. The system can be normally used for carrying out application operations such as data analysis and the like under the authority of an analyst. And partial data parameters in each module in the system can be modified according to business requirements under the authority of an engineer.
According to the embodiment, the educational data management system provided by the invention at least has the following beneficial effects:
the system provided by the invention abstracts a plurality of topic models according to the characteristics of the internet education data to construct an education data warehouse, and the education data warehouse is used as a topic-oriented, integrated, time-varying and relatively stable data set, so that the support of data analysis decision can be realized. The invention can realize the application of the data in the education field, support the multi-dimensional instant query and analysis of the education data, is suitable for various frequently-variable analysis scenes, can realize the accurate mastering of the data, the statistical analysis and reporting requirements and provides a basis for data mining and decision support.
Although some specific embodiments of the present invention have been described in detail by way of examples, it should be understood by those skilled in the art that the above examples are for illustrative purposes only and are not intended to limit the scope of the present invention. It will be appreciated by those skilled in the art that modifications may be made to the above embodiments without departing from the scope and spirit of the invention. The scope of the invention is defined by the appended claims.