
技术领域technical field
本发明涉及软件代码变更历史追溯领域,尤其涉及一种基于Git的代码行生命周期追溯方法及电子装置。The invention relates to the field of software code change history tracing, in particular to a Git-based code line life cycle tracing method and an electronic device.
背景技术Background technique
版本控制是软件开发中的一种重要实践方法,软件项目使用版本控制系统记录软件代码变更的历史,包括每一次变更的开发者、时间、代码变更的内容。Version control is an important practice method in software development. Software projects use a version control system to record the history of software code changes, including the developer, time, and content of each change.
Git是当前最流行的版本控制系统,Git中使用了commit来记录每一次的代码变更,每个commit有一个唯一的ID,commit之间有父子关系,表示commit的逻辑先后顺序,这些父子关系将一个版本库中的commit连接成为一张有向无环图。Git is currently the most popular version control system. Git uses commit to record every code change. Each commit has a unique ID. There is a parent-child relationship between commits, indicating the logical sequence of commits. These parent-child relationships will The commit links in a repository become a directed acyclic graph.
Git中代码变更的内容的格式与Linux系统中diff工具的unified format相同。The format of the code changes in Git is the same as the unified format of the diff tool in the Linux system.
Git等版本控制系统在记录代码变更时,以commit为基本单元追溯代码的变更历史,commit中又包含了变更的代码文件和文件中变更的行,当开发者或者研究者想要对代码行的生命周期,即一行代码从在代码库中出现到消失的历程开展研究时,无法直接获取相关数据,即无法获取Git版本库中代码行生命周期数据。Version control systems such as Git use commit as the basic unit to trace the change history of the code when recording code changes. The commit also includes the changed code file and the changed line in the file. When a developer or researcher wants to change the code line Life cycle, that is, the process of a line of code from appearing in the code base to disappearing. When conducting research, relevant data cannot be directly obtained, that is, the life cycle data of the code line in the Git repository cannot be obtained.
发明内容SUMMARY OF THE INVENTION
为解决上述问题,本发明提供一种基于Git的代码行生命周期追溯方法及电子装置,通过遍历Git中的commit并跟踪提取代码文件行的变更历史信息,给出代码文件中每一行的产生、位移和消亡点。In order to solve the above problems, the present invention provides a Git-based code line life cycle tracing method and electronic device, by traversing the commit in Git and tracking the change history information of the extracted code file line, the generation, Displacement and Death Point.
本发明的技术内容包括:The technical content of the present invention includes:
一种基于Git的代码行生命周期追溯方法,其步骤包括:A Git-based code line life cycle tracing method, the steps include:
1)提取Git仓库中的每一个commit的信息,其中所述commit的信息包括:commit的ID、相应父commit的ID、commit的作者、commit的时间、commit的描述和commit的代码变更内容;1) Extract the information of each commit in the Git warehouse, wherein the information of the commit includes: the ID of the commit, the ID of the corresponding parent commit, the author of the commit, the time of the commit, the description of the commit and the code change content of the commit;
2)根据commit的ID及相应父commit的ID,建立commit组成的有向无环图,按广度优先顺序遍历该有向无环图,依据commit的代码变更内容,跟踪提取并记录代码文件行的变更历史信息,其中所述代码文件行的变更历史信息包括:2) According to the ID of the commit and the ID of the corresponding parent commit, establish a directed acyclic graph composed of commits, traverse the directed acyclic graph in breadth-first order, and track, extract and record the line of the code file according to the code change content of the commit. Change history information, wherein the change history information of the code file line includes:
a)文件,包含一个或多个该文件被创建的commit;a) a file containing one or more commits for which the file was created;
b)文件被创建的commit,包含零个或多个行;b) the commit that the file was created, containing zero or more lines;
c)行,包含一个或多个行发生位移时的commit;c) lines, including commits when one or more lines are shifted;
d)行发生位移时的commit,包含上一次位移时的commit的ID和位移后的行号;d) The commit when the row is displaced, including the ID of the commit at the last displacement and the row number after the displacement;
3)对于一个commit中的一个文件的一行待查询代码,依据代码文件行的变更历史信息记录,得到该行待查询代码的产生、位移或消亡的commit的信息,其中所述该行待查询代码的产生、位移或消亡的commit的信息包括:commit的ID、commit的作者、commit的时间和commit的描述。3) For a line of code to be queried in a file in a commit, according to the change history information record of the line of the code file, obtain the information of the commit of the generation, displacement or demise of the code to be queried in the line, wherein the code to be queried in the line is obtained. The information of the generated, displaced or dead commit includes: the ID of the commit, the author of the commit, the time of the commit and the description of the commit.
进一步地,使用Git命令:git log--pretty=format:"%H;%P;%an;%ae;%at;%s;%b",获取commit的ID及相应父commit的ID、commit的作者、commit的时间和commit的描述。Further, use the Git command: git log --pretty=format: "%H;%P;%an;%ae;%at;%s;%b" to obtain the ID of the commit and the ID, commit of the corresponding parent commit The author of the commit, the time of the commit, and the description of the commit.
进一步地,使用Git命令:git diff<父commit ID><commit ID>,获取commit的代码变更内容。Further, use the Git command: git diff <parent commit ID><commit ID> to get the code changes of the commit.
进一步地,记录所述commit组成的有向无环图的数据结构包括:双向链表。Further, the data structure for recording the directed acyclic graph composed of the commits includes: a doubly linked list.
进一步地,记录所述代码文件行的变更历史信息的数据结构包括:字典结构。Further, the data structure for recording the change history information of the lines of the code file includes: a dictionary structure.
进一步地,所述commit的代码变更内容包括:变更文件的文件路径与文件变更内容。Further, the code change content of the commit includes: changing the file path and file change content of the file.
进一步地,文件以文件路径作为唯一标识。Further, the file is uniquely identified by the file path.
进一步地,文件被创建的commit以commit的ID作为唯一标识。Further, the commit ID in which the file is created is uniquely identified.
进一步地,行以被计入的顺序编号作为唯一标识。Further, the row is uniquely identified by the sequence number in which it is counted.
进一步地,行发生位移时的commit以commit的ID作为唯一标识。Further, the commit ID when the row is displaced is uniquely identified by the commit ID.
进一步地,记录位移后的行号的策略包括:Further, the strategy for recording the shifted line number includes:
1)如果是添加的行,直接记录代码变更内容中变更后的行号;1) If it is an added line, directly record the changed line number in the code change content;
2)如果是删除的行,直接记录为0;2) If it is a deleted row, it is directly recorded as 0;
3)如果是保留的行,从代码变更内容中读取该保留的行之前添加的行数与删除的行数,用该保留的行在其上一次位移后的行号,累加该保留的行之前添加的行数及累减该保留的行之前删除的行数。3) If it is a reserved line, read the number of added lines and deleted lines before the reserved line from the code change content, and use the line number of the reserved line after its previous displacement to accumulate the reserved line The number of previously added rows and the number of rows deleted before cumulating the retained row.
一种存储介质,所述存储介质中存储有计算机程序,其中,所述计算机程序被设置为运行时执行上述所述的方法。A storage medium in which a computer program is stored, wherein the computer program is configured to execute the above-mentioned method when running.
一种电子装置,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器被设置为运行所述计算机以执行上述所述的方法。An electronic device comprising a memory and a processor having a computer program stored in the memory, the processor being arranged to run the computer to perform the method described above.
与现有技术相比,本发明具有以下优势:Compared with the prior art, the present invention has the following advantages:
1)追溯了行粒度的代码生命周期数据,包括代码行的产生点、位移点、消亡点;1) Trace the code life cycle data of line granularity, including the generation point, displacement point, and death point of the code line;
2)能够追溯到代码行在多个分支上的完整变更历史;2) The complete change history of lines of code that can be traced back to multiple branches;
3)给定任意commit快照中的任意文件的任意行都能直接返回其全部生命周期数据。3) Any line of any file in a given snapshot of any commit can directly return its full lifetime data.
附图说明Description of drawings
图1为本发明的一种基于Git的代码行生命周期追溯方法实施例的步骤流程图。FIG. 1 is a flow chart of steps of an embodiment of a Git-based code line life cycle tracing method according to the present invention.
具体实施方式Detailed ways
为使本发明的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本发明作进一步详细的说明。In order to make the above objects, features and advantages of the present invention more clearly understood, the present invention will be described in further detail below with reference to the accompanying drawings and specific embodiments.
如图1所示,本发明一种基于Git的代码行生命周期追溯方法实施例的步骤流程图,可以包括以下步骤:As shown in FIG. 1 , a flowchart of steps of an embodiment of a Git-based code line life cycle tracing method of the present invention may include the following steps:
步骤1,提取Git仓库中的每一个commit的信息,包括commit的ID,父commit的ID,commit的作者、时间、描述,commit的代码变更内容;Step 1, extract the information of each commit in the Git repository, including the ID of the commit, the ID of the parent commit, the author, time, description of the commit, and the code change content of the commit;
代码变更内容示例:Example of code changes:
示例中,文件abc.c发生变更,根据“@@-6,5+6,6@@”提示,其后显示的文件内容为变更前第6行开始连续5行,变更后第6行开始连续6行,+号开始的表示添加的行,-号表示删除的行。In the example, the file abc.c is changed. According to the prompt of "@@-6,5+6,6@@", the content of the file displayed thereafter is 5 consecutive lines starting from the 6th line before the change, and starting from the 6th line after the change. For 6 consecutive lines, the + sign indicates the added line, and the - sign indicates the deleted line.
优选地,使用Git命令:git log--pretty=format:"%H;%P;%an;%ae;%at;%s;%b",获取ID,父commit ID,commit的作者、时间、描述;Preferably, use the Git command: git log --pretty=format: "%H;%P;%an;%ae;%at;%s;%b", get ID, parent commit ID, commit author, time ,describe;
优选地,使用Git命令:git diff<父commit ID><commit ID>,获取代码变更内容。Preferably, use the Git command: git diff <parent commit ID><commit ID> to obtain code changes.
步骤2;根据步骤1中提取的commit的ID、及相应父commit的ID建立commit组成的有向无环图,按广度优先顺序遍历该有向无环图,跟踪提取并记录以下代码文件行的变更历史信息:Step 2: According to the ID of the commit extracted in step 1 and the ID of the corresponding parent commit, a directed acyclic graph composed of commits is established, and the directed acyclic graph is traversed in breadth-first order, and the following code file lines are extracted and recorded. Change history information:
文件:以文件路径作为唯一标识,文件中包含一个或多个该文件被创建的commit,当出现文件删除再创建的情况时,则有多个该文件被创建的commit;File: The file path is used as the unique identifier. The file contains one or more commits for which the file was created. When the file is deleted and re-created, there are multiple commits for the file to be created;
文件被创建的commit:以commit的ID作为唯一标识,文件被创建的commit中包含零个或多个行,当创建的是空文件时,则commit中包含零个行;The commit when the file is created: The commit ID is used as the unique identifier. The commit when the file is created contains zero or more lines. When an empty file is created, the commit contains zero lines;
行:根据被计入的顺序编号,编号作为唯一标识,行中包含一个或多个行发生位移时的commit;Line: Numbered according to the sequence counted, the number is used as a unique identifier, and the line contains the commit when one or more lines are displaced;
行发生位移时的commit:以commit的ID作为唯一标识,行发生位移时的commit中包含上一次位移时的commit的ID和位移后的行号,行被添加和被删除时都视为发生了位移:Commit when the row is displaced: The commit ID is used as the unique identifier. The commit when the row is displaced contains the ID of the previous commit and the row number after the displacement. When a row is added or deleted, it is considered to have occurred. Displacement:
(1)如果是添加的行,直接记录代码变更内容中变更后的行号,对于步骤1中的示例:(1) If it is an added line, directly record the changed line number in the code change content, for the example in step 1:
int b=1;为第8行,int b=1; for the 8th line,
return a;为第9行,return a; is line 9,
(2)如果是删除的行,直接记录为0,对于步骤1中的示例:(2) If it is a deleted row, record it directly as 0, for the example in step 1:
return 0;为第0行;return 0; is line 0;
(3)如果是保留的行,从代码变更内容中读取该行之前添加的和行数和删除的行数,用该行在其上一次位移后的行号,累加该行之前添加的行数,累减该行之前删除的行数,对于步骤4中的示例:(3) If it is a reserved line, read the number of added and deleted lines before the line from the content of the code change, and use the line number after the previous displacement of the line to accumulate the lines added before the line number, the number of rows deleted before decrementing the row, for the example in step 4:
int main(){为第6行,int main(){ is line 6,
int a=0;为第7行;int a = 0; for the 7th line;
}为第9+2-1=10行;} is row 9+2-1=10;
优选地,为了便于快速查询,使用双向链表存储commit组成的有向无环图,使用字典存储代码文件行的变更历史信息。Preferably, in order to facilitate quick query, a doubly linked list is used to store a directed acyclic graph composed of commits, and a dictionary is used to store the change history information of code file lines.
步骤3,给定一个commit中的一个文件的一行代码,查询步骤2形成的记录,返回该行产生、位移和消亡的commit信息,包括commit的ID,父commit的ID,commit的作者、时间、描述,commit的代码变更内容,行产生的commit的为行号从0变为非0时“行发生位移时的commit”,行消亡的commit为行号从非0变为0是“行发生位移时的commit”。Step 3: Given a line of code in a file in a commit, query the record formed in
为实验步骤:For the experimental steps:
1)从GitHub上星数最多的1000个Git库中随机选取10个软件项目Git库;1) Randomly select 10 software project Git libraries from the 1000 Git libraries with the most stars on GitHub;
2)使用本发明从每个Git库中提取行粒度代码变更历史;2) using the present invention to extract line-granularity code change history from each Git repository;
3)分别从这个10个Git库中随机选取10个commit,从每个commit中随机选取一个文件中的一行,使用本发明查询这些行的生命周期数据;3) randomly select 10 commits from these 10 Git libraries respectively, randomly select a row in a file from each commit, and use the present invention to query the life cycle data of these rows;
4)从步骤3的结果中随机选取10个人工走查,检查是否准确。4) Randomly select 10 manual walkthroughs from the results of
实验结果:Experimental results:
实验结果显示,本发明可以100%查询到行生命周期数据,得到的数据100%准确。The experimental results show that the invention can query 100% row life cycle data, and the obtained data is 100% accurate.
以上实施例仅用以说明本发明的技术方案而非对其进行限制,本领域的普通技术人员可以对本发明的技术方案进行修改或者等同替换,而不脱离本发明的精神和范围,本发明的保护范围应以权利要求所述为准。The above embodiments are only used to illustrate the technical solutions of the present invention rather than limit them. Those of ordinary skill in the art can modify or equivalently replace the technical solutions of the present invention without departing from the spirit and scope of the present invention. The scope of protection shall be subject to what is stated in the claims.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202110013631.7ACN112698866B (en) | 2021-01-06 | 2021-01-06 | A Git-based code line life cycle tracing method and electronic device |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202110013631.7ACN112698866B (en) | 2021-01-06 | 2021-01-06 | A Git-based code line life cycle tracing method and electronic device |
| Publication Number | Publication Date |
|---|---|
| CN112698866A CN112698866A (en) | 2021-04-23 |
| CN112698866Btrue CN112698866B (en) | 2022-06-17 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202110013631.7AActiveCN112698866B (en) | 2021-01-06 | 2021-01-06 | A Git-based code line life cycle tracing method and electronic device |
| Country | Link |
|---|---|
| CN (1) | CN112698866B (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN117112011A (en)* | 2023-08-16 | 2023-11-24 | 北京冠群信息技术股份有限公司 | Version management method and device |
| CN118034777B (en)* | 2024-04-11 | 2024-06-25 | 四川天邑康和通信股份有限公司 | FTTR-based log management and version control method, FTTR-based log management and version control device, FTTR-based log management and version control equipment and medium |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN105956087A (en)* | 2016-04-29 | 2016-09-21 | 清华大学 | Data and code version management system and method |
| CN109800018A (en)* | 2019-01-10 | 2019-05-24 | 郑州云海信息技术有限公司 | A kind of code administration method and system based on Gerrit |
| CN110286880A (en)* | 2019-06-17 | 2019-09-27 | 中国科学院软件研究所 | A complete continuous integration data collection method for GitHub and Travis CI |
| CN111290777A (en)* | 2020-01-23 | 2020-06-16 | 复旦大学 | An Evolutionary History Slicing Method for Software Code Units and Code Metrics |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20140289280A1 (en)* | 2013-03-15 | 2014-09-25 | Perforce Software, Inc. | System and Method for Bi-directional Conversion of Directed Acyclic Graphs and Inter-File Branching |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN105956087A (en)* | 2016-04-29 | 2016-09-21 | 清华大学 | Data and code version management system and method |
| CN109800018A (en)* | 2019-01-10 | 2019-05-24 | 郑州云海信息技术有限公司 | A kind of code administration method and system based on Gerrit |
| CN110286880A (en)* | 2019-06-17 | 2019-09-27 | 中国科学院软件研究所 | A complete continuous integration data collection method for GitHub and Travis CI |
| CN111290777A (en)* | 2020-01-23 | 2020-06-16 | 复旦大学 | An Evolutionary History Slicing Method for Software Code Units and Code Metrics |
| Publication number | Publication date |
|---|---|
| CN112698866A (en) | 2021-04-23 |
| Publication | Publication Date | Title |
|---|---|---|
| CN108089893B (en) | Method and device for determining redundant resources, terminal equipment and storage medium | |
| US9400733B2 (en) | Pattern matching framework for log analysis | |
| CN103559323B (en) | Database implementation method | |
| CN110109910A (en) | Data processing method and system, electronic equipment and computer readable storage medium | |
| CN111125298A (en) | Method, equipment and storage medium for reconstructing NTFS file directory tree | |
| CN103617277A (en) | Method for restoring data table content deleted mistakenly | |
| CN112698866B (en) | A Git-based code line life cycle tracing method and electronic device | |
| CN110716739A (en) | Code change information statistical method, system and readable storage medium | |
| CN111061742B (en) | Method and device for marking data and service system thereof | |
| CN111414362A (en) | Data reading method, device, equipment and storage medium | |
| CN106155832A (en) | Method, device and the Android device that a kind of data are recovered | |
| CN108009049B (en) | MYISAM storage engine deleted record offline recovery method and storage medium | |
| CN105095247A (en) | Symbolic data analysis method and system | |
| CN114153690B (en) | Program memory monitoring method, device, computer equipment and storage medium | |
| CN104090922A (en) | Method and device for clearing privacy data | |
| CN112363814B (en) | Task scheduling method, device, computer equipment and storage medium | |
| CN106503186A (en) | A kind of data managing method, client and system | |
| CN114968725B (en) | Task dependency correction method, device, computer equipment and storage medium | |
| US20180349443A1 (en) | Edge store compression in graph databases | |
| WO2019227705A1 (en) | Image entry method, server and computer storage medium | |
| CN111190896B (en) | Data processing method, device, storage medium and computer equipment | |
| CN116662327B (en) | Data fusion cleaning method for database | |
| CN111241096A (en) | Text extraction method, system, terminal and storage medium for EXCEL document | |
| CN116450664A (en) | Data processing method, device, equipment and storage medium | |
| CN108845857A (en) | A kind of icon management method and device based on cloud platform |
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |