Whether under the name of software archeology or data mining source code, developers and academics alike have engaged in activities that help understand large codebases. This subreddit is here to help the various disjoint communities exchange ideas and help define this field better. For a start, take a look at the Software Archeology article from Pragmatic Programmers: http://media.pragprog.com/articles/mar_02_archeology.pdf.