In 2016 I was asked to help evaluate a large and ugly codebase for a client, working closely with my colleague at the time Ra-el Peters. The client had started a big-bang re-write of their legacy code, but that endeavour seemed to be stalling. We needed to quickly review both codebases and answer their key question:
Should they continue with the re-write, or revert to the legacy code and try a different approach?
I'd also read Adam Tornhill's "Your code as a crime scene" and was quite keen to use his code-maat tools - which had the huge advantage of not really needing language parsers to work. Adam pointed out that purely looking at lines-of-code might be enough - large files are correlated to complexity in many cases.
So Ra-el and I threw together some code, using cloc to count lines of code, code-maat to collect other metrics from source-control such as rate of change and number of authors, and D3 Treemaps to show file sizes.
I tinkered with this for a while - it basically worked, but stalled a bit. It was hard to extend beyond what code-maat offered - I added a module for indentation-based complexity, but both the scanning side and the visualisation were lacking - especially the visualisation, it was hard to change, and treemaps are just a bit ugly.
TODO: insert a treemap screenshot
And that's where things sat from 2016 to 2018. I did a bit of packaging up, cleaning up, I talked about it at a few ThoughtWorks internal conferences, and at the UCL's The 59th CREST Open Workshop - Multi-language Software Analysis - and then didn't do much more. (I also had a lovely toddler who took up a lot of my tinkering time!)
(Note if you watch the UCL video - I have read a lot more about metrics and research since then, I'm not sure I agree with all the thoughts of my past self!)
In February 2019 I was working on a project where I had the chance to play with the rust programming language - and I really liked the look of it. Not only was it fast, with some very clever memory management and safety features, it also had a great community, and a great ecosystem. There were libraries for counting lines of code, there were libraries for processing git history - all the things I needed for my code scanning.
I then had a gap between projects, and thought it'd be fun to re-implement my clojure code in rust - both because I wanted to learn rust, but also because the clojure code was quite slow - and I could tell it was going to get slower and slower, the more I dug into other metrics.
So I started what is now the polyglot-code-scanner. It went remarkably smoothly - the existing rust libraries meant I could ditch code-maat and build everything in one rust program.
Later in 2019 I worked on a project with some react, however, and it was pretty obvious that what I wanted was a new react-based UI. I also found the d3-voronoi-treemap library, which appeared to give me an option of building the layout I wanted, in D3!
Unfortunately running the voronoi layout in real-time turned out to be too slow and buggy - but I worked out eventually that I could split the layout into a standalone program that ran in node.js, so speed was less critical.
So in 2019/2020 I slowly built the rest of the tools up. I also decided to do some reading into the research behind Adam's books, which led to a fascinating rabbit-hole of research papers - I've already improved some metrics, and added a first stab at temporal coupling - and there's a lot of room for more growth.Edit this page on GitHub