Paper discussion: Gorla et al. [1]

We discuss the paper “Checking App Behavior Against App Descriptions[1] by A. Gorla et al.

Paper

  • Appeared in 2014
  • Published at ICSE – best conference in SE
  • Cited 247 times up to now
Citations over time

Citations over time

People

  • Alessandra Gorla: Assistant Researcher Professor at the IMDEA Software Institute in Madrid, Spain.

  • Ilaria Tavecchia: Data Scientist at Hyper Anna, Sydney, New South Wales, Australia.

  • Florian Gross: Distributed Systems Engineer (Fraud) at Twitch, Berlin, Germany.

  • Andreas Zeller: Full Professor for Software Engineering at Saarland University in Saarbrücken, Germany.

  • The work for this paper took place in Saarland University.

  • At the time of the publication, the first three authors were research assistants/associates at Saarland University.

Motivation

Why do this research?

  • Validating whether a program does what it claims to do is a long-standing problem for developers.
  • Mobile devices’ users have to also care about such issues.
  • Researchers and industrial partners have used traditional methods (static and dynamic analysis) to identify in predefined patterns (e.g. specification) of software systems malicious behaviors.
  • To guard apps against new attacks there should be developed new approaches that take into account the context that the apps depend on.
  • A behavior considered malicious for one app may be a feature of another app.

Research method

What does the paper do?

For checking implemented app behavior against advertised app behavior, the paper presents CHABADA that:

  1. collects 22,500+ good Android apps from Google Play,
  2. uses Natural Language Processing on app descriptions and identify the main topics,
  3. clusters apps by related topics (e.g. navigation),
  4. identifies, for each cluster of apps, usages of permission-related APIs, and
  5. uses machine learning (ML) to spot abnormal API usages in apps (e.g. a weather app that accesses the messaging API, which is unusual for an app of this category).

Evaluation

How does the paper evaluate the presented methods?

RQ1 Can CHABADA effectively identify anomalies (i.e. mismatches between description and behavior) in Android apps?

  • Manual investigation of the top found outliers and classification of the apps’ behavior.

RQ2 Can CHABADA be used to reveal malicious Android apps?

  • Consideration of known malicious apps and running of ML classifier.

Results

What does the paper find?

  • CHABADA is able to find several examples of false advertising, plain fraud, and other questionable behavior. After the investigation of RQ1, authors found in 160 top outliers: 26% malicious, 13% dubious, and 61% benign apps.

  • CHABADA is effective as a malware detector. After the investigation of RQ2, authors found that CHABADA detects the majority of malware (i.e. it correctly identified 56% of the malicious apps), even without knowing existing malware patterns.

Discussion / Implications

Why are the results important?

  • App vendors need to be explicit about what their apps do.
  • App store suppliers should introduce better standards to avoid deceiving or incomplete advertising.
  • Software systems, such as Android, should ask their users for permissions in a totally comprehensible way.
  • CHABADA can help users by highlighting differences between the supposed and real behavior of apps, in an easy way.

Questions

Technical questions

  • How do you find the presentation of the results?

  • What can we really do with these results?

  • What do you think about the related work?

Meta questions

  • What do we think about this paper in general?

  • How could we improve this paper ourselves?

  • Can you easily replicate this study?

  • Did the future work ever happen?

Discussion summary

To be filled in after discussion!

Paper discussion: 2nd paper to be added here!

Discussion summary

To be filled in after discussion!

References

[1] A. Gorla, I. Tavecchia, F. Gross, and A. Zeller, “Checking app behavior against app descriptions,” in Proceedings of the 36th international conference on software engineering, 2014, pp. 1025–1035.

[2] M. Linares-Vásquez, G. Bavota, C. Bernal-Cárdenas, M. Di Penta, R. Oliveto, and D. Poshyvanyk, “API change and fault proneness: A threat to the success of Android apps,” in Proceedings of the 2013 9th joint meeting on foundations of software engineering, 2013, pp. 477–487.

[3] G. Bavota, M. Linares-Vásquez, C. E. Bernal-Cárdenas, M. D. Penta, R. Oliveto, and D. Poshyvanyk, “The impact of api change- and fault-proneness on the user ratings of android apps,” IEEE Transactions on Software Engineering, vol. 41, no. 4, pp. 384–407, 2015.

[4] N. Chen, J. Lin, S. C. H. Hoi, X. Xiao, and B. Zhang, “AR-miner: Mining informative reviews for developers from mobile app marketplace,” in Proceedings of the 36th international conference on software engineering, 2014, pp. 767–778.

[5] S. Panichella, A. D. Sorbo, E. Guzman, C. A. Visaggio, G. Canfora, and H. C. Gall, “How can i improve my app? Classifying user reviews for software maintenance and evolution,” in 2015 ieee international conference on software maintenance and evolution (icsme), 2015, pp. 281–290.

[6] W. Martin, F. Sarro, Y. Jia, Y. Zhang, and M. Harman, “A survey of app store analysis for software engineering,” IEEE Transactions on Software Engineering, vol. 43, no. 9, pp. 817–847, 2017.