This is a list of useful libraries for developing new “Big Code” tools.

Add your library by creating a pull request here.


codemining-* is a suite of Java-based tools for tokenizing, parsing Java code. The repository also contains code to analyze Git-based repositories.
  • codeminining-core contains code for tokenizing Java, JavaScript, Python, C and C++ in the JVM.
  • codemining-treelm contains Java AST parsing and tree-level language models.
  • commitmining-tools contains tools for traversing a Git repository, its history and possibly its files.

Tags: #codeanalysis