Just like vast amounts of data on the web enabled Big Data applications, now large repositories of programs (e.g. open source code on GitHub) enable a new class of applications that leverage these repositories of "Big Code". Using "Big Code" means to automatically learn from existing code in order to solve tasks such as predicting program bugs, predicting program behavior, predicting identifier names, or automatically creating new code. The topic spans inter-disciplinary research in Machine Learning (ML), Programming Languages (PL) and Software Engineering (SE). This website lists some of the state-of-the-art techniques in the area.

Have a look at the current challenges to be solved by "Big Code".

Download or try amazing tools that leverage "Big Code".

Do research and download some of the existing datasets to compare your solution to state-of-the-art.

If you would like to contribute, fork this repository, make edits and create a pull request using GitHub.

This site is a result of a Dagstuhl seminar Programming from Big Code

Its goal is to keep as an access point where researchers and practitioners can collaborate and define the important challenges of the community. The site should be able to act as:

  • an access point where developers can find links to working tools that leverage “Big Code”.
  • a collection of challenges and datasets that allow the research community to jointly solve important problems and/or compare solutions.

This website should not be a replacement for scientific conferences, thus “publishing” here should in principle be done only after publishing elsewhere. The goal is also not to become an unordered list of conference procedings, but instead to group works per dataset or per challenge.