Key ideas:

Naturalness of Names and Contexts. We relies on the principle of naturalness of software for the words/tokens composing the names of the program entities and the method names.
Abstractive Summarization. In MNire, we consider the method name as the abstract on the method’s functionality, which is expressed via the contexts

Main interesting findings:

To suggest a good method name, relying on the naturalness property of the program entities in the contexts yields better results than using the AST or PDG structures in the methods
Method names are quite unique, so one cannot rely on searching for a good name in the data of the previously seen method names
However, the tokens composing them are repeated frequently, so our generative direction is suitable to produce a new method name

Suggesting Natural Method Names to Check Name Consistencies

Misleading names of the methods in a project or the APIs in a software library confuse developers about program functionality and API usages, leading to API misuses and defects. In this paper, we introduce MNire, a machine learning approach to check the consistency between the name of a given method and its implementation. MNire first generates a candidate name and compares the current name against it. If the two names are sufficiently similar, we consider the method as consistent. To generate the method name, we draw our ideas and intuition from an empirical study on the nature of method names in a large dataset. Our key findings are that high proportions of the tokens of method names can be found in the three contexts of a given method including its body, the interface (the method's parameter types and return type), and the enclosing class' name. Even when such tokens are not there, MNire uses the contexts to predict the tokens due to the high co-occurrence likelihoods. Our unique idea is to treat the name generation as an abstract summarization on the tokens collected from the names of the program entities in the three above contexts.

We conducted several experiments to evaluate MNire in method name consistency checking and in method name recommending on large datasets with +14M methods. In detecting inconsistency method names, MNire improves the state-of-the-art approach by 10.4% and 11% relatively in recall and precision, respectively. In method name recommendation, MNire improves relatively over the state-of-the-art technique, code2vec, in both recall 18.2% higher and precision (11.1% higher). To assess MNire's usefulness, we used it to detect inconsistent methods and suggest new names in several active, GitHub projects. We made 50 pull requests and received 42 responses. Among them, 5 PRs were merged into the main branch, and 13 were approved for later merging. In total, in 31/42 cases, the developer teams agree that our suggested names are more meaningful than the current names, showing MNire's usefulness.

Exploratory Study

Uniqueness of Method Names

62.9% of the full method names are unique. For a given method, one cannot rely on searching for a good name in the data of the previously seen method names.

78.1% of the tokens in method names can be found in the other previously seen method names.

	Method name	Token
Mean #occcurrence	4.8	400.3
Median #occcurrence	1	3
#occcurrence = 1	62.9%	21.9%
#occcurrence > 1	37.1%	78.1%

% of tokens in method names found in contexts

The percentages of the methods whose names share certain tokens with the contexts

Common tokens shared between a method name and the contexts

There are high proportions of the tokens of method names which are shared with the three contexts. There are high percentages of the methods whose names share with the names of the entities in the contexts.

The conditional occurrences of tokens in method names on the contexts

When encountering all the tokens of the name of the program entities used in the body of a method, in 35.9% of the cases, we could see a token in the method’s name.

Even the tokens are not found in the contexts, one could use the contexts to predict the tokens in the method names due to those high conditional probabilities.

Each of the contexts could be used to provide the indication of the occurrences of the tokens in the good names more than those in the inconsistent names.

Avg conditional occurrence of tokens in method names on the contexts

Accuracy Comparison

MNire outperformed the state-of-the-art approaches in both consistency checking and name recommending

43.1% of the method names suggested exactly match with the oracle (while only 37.2% of the method names occurring more than once)

13.1% of the generated names that are not previously seen in the training data

Consistency Checking Comparison Results (in %)

		Liu et al	MNire
IC	Precision	56.8	62.7
	Recall	84.5	93.6
	F-score	67.9	75.1
C	Precision	51.4	56.0
	Recall	72.2	84.2
	F-score	60.0	67.3
Accuracy		60.9	68.9

Name Recommending Comparison Results (in %)

	code2vec	MNire
Precision	63.1	70.1
Recall	54.4	64.3
F-score	58.4	67.1
Exact Match	-	43.1

Study on Accuracy by the Sizes of Methods

MNire works well on the methods with the regular sizes (1-25 LOCs). Even on the longer methods (+25 LOCs), MNire accuracy decreased gracefully with the precision and recall of 47.0% and 41.4% respectively.

MRire's accuracy in different methods' sizes

Context Analysis Evaluation Results

Impact of Contexts on MCC Results (in %)

		IMP	IMP+INF	IMP+ENC	IMP+INF+ENC =MNire
IC	Precision	60.2	61.7	61.0	62.7
	Recall	90.0	92.1	91.3	93.6
	F-score	72.1	73.9	73.1	75.1
C	Precision	53.2	55.1	54.1	56.0
	Recall	79.3	82.3	80.6	84.2
	F-score	63.7	66.0	64.7	67.3
Accuracy		62.1	65.2	64.2	68.9

Impact of Contexts on MNR Results (in %)

	IMP	IMP+INF	IMP+ENC	IMP+INF+ENC =MNire
Precision	49.7	63.2	54.4	66.4
Recall	43.3	57.8	48.9	61.1
F-score	46.3	60.4	51.5	63.6
Exact match	20.2	34.7	25.7	43.1

Sensitivity Results

Accuracy with Different Representations

Impact of Representation on MCC Results (in %)

		Lexeme	AST	Graph	MNire
IC	Precision	59.0	57.2	55.3	62.7
	Recall	88.3	85.6	80.3	93.6
	F-score	70.7	68.6	65.5	75.1
C	Precision	47.1	46.2	45.8	56.0
	Recall	78.2	73.5	72.1	84.2
	F-score	58.8	56.8	56.0	67.3
Accuracy		52.0	51.1	50.5	68.9

Impact of Representation on MNR Results (in %)

	Lexeme	AST	Graph	MNire
Precision	29.5	23.1	16.2	50.6
Recall	25.1	29.2	30.3	45.1
F-score	27.1	25.9	21.1	47.7
Exact Match	9.1	8.1	4.7	22.1

This result suggests that the naturalness of names is more important to the problem of method name suggestion. While structures and dependencies are important for code execution, to suggest a method name, which is the abstract of entire method, using tokensof the names in the contexts as in MNire yields better performance.

Impact of Contexts' Size and Lengths of Tokens in Contexts on Accuracy

Impact of Context's Size on MNR Results (in %)

		1-10 tokens	10-20 tokens	20-30 tokens	+30 tokens
F-score	35.9	41.1	43.2	51.0

Impact of Tokens' Length on MNR Results (in %)

		0-80%	80-90%	90-95%	+95%
F-score	37.0	39.4	42.5	48.5

Impact of Training Data’s Size on Accuracy

Impact of Representation on MCC Results (in %)

		1.0M	1.25M	1.5M	1.75M	2.0M
IC	Precision	59.6	60.1	60.3	61.5	62.7
	Recall	92.2	92.5	93.2	93.4	93.6
	F-score	72.4	72.9	73.2	74.2	75.1
C	Precision	52.7	53.5	54.4	55.2	56.0
	Recall	81.4	81.9	82.6	83.5	84.2
	F-score	63.9	64.7	65.6	66.5	67.3
Accuracy		62.6	63.8	65.8	67.3	68.9

Impact of Representation on MNR Results (in %)

	1.0K	2.5K	5.0K	7.5K	10.0K
Precision	41.1	56.2	63.1	64.9	66.4
Recall	47.8	53.7	57.6	59.5	61.1
F-score	44.2	54.9	60.2	62.0	63.6
Exact Match	19.9	29.4	34.8	26.9	38.2

Impact of Threshold for consistency checking

Impact of Threshold on MCC results

Datasets

Corpus for Method Name Consistency Checking (Download)

	Test data	Train data
#Methods	2,700	1,962,872
#Files	-	250,972
#Projects	-	430
#Unique method names	-	540,237
#occurence > 1	-	33.5%

Corpus for Method Name Recommending

	Test data	Train data	Total
Comparison Experiment with code2vec (Download)
#Files	61,641	1,746,272	1,807,913
#Methods	458,800	14,000,028	14,458,828
Experiments for RQ4, RQ5, and RQ6 (Download)
#Projects	450	9,772	10,222
#Files	51,631	1,756,282	1,807,913
#Methods	466,800	13,992,028	14,458,828
Live Study on Real Developers (Download)
#Projects	100	-	100
#Files	18,980	-	18,980
#Methods	139,827	-	139,827