Repository File Statistics

Download full source code.

First off, the code is crude, I’m not trying to make it efficient. This is something that works for my purposes and might for yours.

I have a lot of repositories, my own, and third parties. I wanted to see what types of files are in them, how many code lines, vs comments, blanks, etc. I was going to do this with bash, awk, and grep because that is what I have used in the past, but I find it easier to do this in C# now. Less fighting with precise regex, spaces between equals signs, memory structures, etc.

I wrote a small C# program to do this. It starts at a directory of my choosing, and recursively goes through all the subdirectories, identifying the git repositories. Then it goes through each repository, and for each file and analyzes them. It’s nothing groundbreaking, but I couldn’t find anything that did exactly what I wanted, so I wrote it myself.

Here is some of the example output -

Repo: https://github.com/App-vNext/Polly/
Directory: c:\dev\github\polly\Polly

File type: .cs
File count: 778
Total lines: 94269
Code lines: 49861
Commented lines: 14732
Brace only lines: 13309
Blank lines: 16367

File type: .xml
File count: 31
Total lines: 75356
Blank lines: 118

File type: .json
File count: 19
Total lines: 568
Blank lines: 0

File type: .csproj
File count: 20
Total lines: 510
Blank lines: 58

File type: .js
File count: 1
Total lines: 10
Blank lines: 0

-------------------------------------------------
-------------------------------------------------

Repo: https://github.com/App-vNext/Polly.Caching.MemoryCache
Directory: c:\dev\github\polly\Polly.Caching.MemoryCache

File type: .cs
File count: 9
Total lines: 624
Code lines: 336
Commented lines: 58
Brace only lines: 122
Blank lines: 108

File type: .csproj
File count: 2
Total lines: 89
Blank lines: 6

-------------------------------------------------
-------------------------------------------------

The Code

The code is fairly boring and simple so I’m not going to go through it in detail. It’s a console application that takes a directory as a parameter and then goes through all the subdirectories looking for git repositories. It then goes through each repository and analyzes the files.

Here is Program.cs -

string[] fileTypesToSearchFor = [".cs", ".csproj", ".ts", ".js", ".css", ".html", ".cshtml", ".sql", ".json", ".xml"];
RepoStatistics repoStatistics = new RepoStatistics();
var repoAndDirectories = repoStatistics.GetRepoAndDirectories(@"c:\dev\github\polly");

foreach(var repoAndDirectory in repoAndDirectories)
{
    var dirResult = repoStatistics.ProcessDirectory(repoAndDirectory.DirectoryPath, fileTypesToSearchFor);
    repoAndDirectory.RepoResult = dirResult;
}

foreach (var repoAndDirectory in repoAndDirectories)
{
    Console.WriteLine($"Repo: {repoAndDirectory.RepoName}");
    Console.WriteLine($"Directory: {repoAndDirectory.DirectoryPath}");
    Console.WriteLine();
    foreach (var repoResult in repoAndDirectory.RepoResult.OrderByDescending(kv => kv.Value.TotalLines))
    {

        Console.WriteLine($"File type: {repoResult.Key}");
        Console.WriteLine($"File count: {repoResult.Value.FileCount}");
        Console.WriteLine($"Total lines: {repoResult.Value.TotalLines}");
        if (repoResult.Key == ".cs")
        {  
            Console.WriteLine($"Code lines: {repoResult.Value.TotalLines - repoResult.Value.BlankLines - repoResult.Value.BraceLines - repoResult.Value.CommentedLines}");
            Console.WriteLine($"Commented lines: {repoResult.Value.CommentedLines}");
            Console.WriteLine($"Brace only lines: {repoResult.Value.BraceLines}");
        }
        Console.WriteLine($"Blank lines: {repoResult.Value.BlankLines}");
        Console.WriteLine();
    }
    Console.WriteLine("-------------------------------------------------\n-------------------------------------------------\n");
}
{{ /highlight }}

Most of the work happens in `RepoStatistics.cs` - 
{{ highlight csharp "linenos=false" }}
public class RepoStatistics
{
    public IList<RepoAndDirectory> GetRepoAndDirectories(string rootDirectory)
    {
        string[] repositories = Directory.GetDirectories(rootDirectory, "*.git", SearchOption.AllDirectories);
        var repositoriesAndDirectories = new List<RepoAndDirectory>();
        foreach (var repository in repositories)
        {
            string[] lines = File.ReadAllLines(repository + Path.DirectorySeparatorChar + "config");
            foreach (var line in lines)
            {
                if (line.Contains("url"))
                {
                    repositoriesAndDirectories.Add(new RepoAndDirectory { RepoName = line.Substring(7), DirectoryPath = repository.Substring(0, repository.Length - 5) });
                    break;
                }
            }
        }
        return repositoriesAndDirectories;
    }

    public Dictionary<string, FileResult> ProcessDirectory(string directoryPath, string[] fileTypesToSearchFor)
    {
        var directoryResults = new Dictionary<string, FileResult>();
        var fileNames = Directory.GetFiles(directoryPath, "*.*", SearchOption.AllDirectories)
            .Where(file => fileTypesToSearchFor.Contains(Path.GetExtension(file).ToLower())) // match .CS files too
            .ToList();

        foreach (string fileName in fileNames)
        {
            if(fileName.Contains("artifacts") || fileName.Contains(Path.DirectorySeparatorChar + "bin" + Path.DirectorySeparatorChar) || fileName.Contains(Path.DirectorySeparatorChar + "obj" + Path.DirectorySeparatorChar))
            {
                continue;
            }
            string extension = Path.GetExtension(fileName).ToLower();
            FileResult resultForFileType = directoryResults.GetValueOrDefault(extension);
            if (resultForFileType == null)
            {
                resultForFileType = new FileResult();
                directoryResults.Add(extension, resultForFileType);
            }
            AnalyzeFile(fileName, resultForFileType);
        }

        return directoryResults;
    }

    private void AnalyzeFile(string fileName, FileResult fileResult)
    {
        string extension = Path.GetExtension(fileName).ToLower();

        var lines = File.ReadAllLines(fileName);

        int totalLinesInFile = lines.Length + 1;
        int blankLinesInFile = 0;
        int braceLinesInFile = 0;
        int commentLinesInFile = 0;

        bool inCommentBlock = false;

        foreach (var line in lines)
        {
            string trimmedLine = line.Trim();
            if (extension == ".cs") // only look for comments in .cs files
            {
                if (trimmedLine.Contains("/*") && trimmedLine.Contains("*/"))
                {
                    commentLinesInFile++;
                    continue;
                }
                if (inCommentBlock)
                {
                    if (trimmedLine.Contains("*/"))
                    {
                        inCommentBlock = false;
                    }
                    commentLinesInFile++;
                    continue;
                }
                if (trimmedLine.Contains("/*"))
                {
                    inCommentBlock = true;
                    commentLinesInFile++;
                    continue;
                }

                if (trimmedLine.StartsWith("//"))
                {
                    commentLinesInFile++;
                    continue;
                }
            }

            if (string.IsNullOrWhiteSpace(trimmedLine))
            {
                blankLinesInFile++;
                continue;
            }
            if (extension == ".cs") // only look for brace lines in .cs files
            {
                if (trimmedLine == "{" || trimmedLine == "}")
                {
                    braceLinesInFile++;
                    continue;
                }
            }
        }
        fileResult.TotalLines += totalLinesInFile;
        fileResult.BlankLines += blankLinesInFile;
        fileResult.BraceLines += braceLinesInFile;
        fileResult.CommentedLines += commentLinesInFile;
        fileResult.FileCount++;
    }
}

And then there are a few simple classes to hold the results -

public class RepoAndDirectory
{
    public string RepoName { get; set; }
    public string DirectoryPath { get; set; }
    public Dictionary<string, FileResult> RepoResult { get; set; }
}

public class FileResult
{
    public int TotalLines { get; set; }
    public int BlankLines { get; set; }
    public int BraceLines { get; set; }
    public int FileCount { get; set; }
    public int CommentedLines {get; set;}
}

Download full source code.

comments powered by Disqus

Related