Table of Contents
This is not an exhaustive list - I have a Trello full of tasks and ideas! But I thought it'd be worth sharing things I'm thinking of doing.
I'm also keen to get feedback - if you think I should focus on some areas of this more than others, feel free to contact me - my contact details are on korny.info
This site is nice for getting up and running quickly, but I keep running into little inconveniences using someone else's theme - for instance I was going to add Disqus comments, but it got fiddly making it play nicely with the theme. I probably need to move the site to a plain Gatsby site - it's not that hard to do!
Coupling is really a proof-of-concept at the moment, it needs work.
Probably as a minimum, I should ditch the "put everything in 24-hour buckets" approach - it leads to far too many false positives, especially on active projects where some files get changed daily.
Firefox code uses mercurial not git - and while git is pretty dominant, legacy codebases often use something else. It'd be nice to have scanner logic for other kinds of VCS!
My ideal would be to have a rust algorithm, and then compile it to WebAssembly so it could be called from the browser.
However, I made some attempts to port it to rust, and hit some solid walls. The underlying algorithm is based on (Nocaj and Brandes 2012) which is great, but the algorithms for convex hulls and the like are all heavily dependant on mutation of data structures, which are non-trivial in rust. It'd be easier to do a complete rewrite.
There's a lot of cool research out there - I'd like my plans to be based at least somewhat on what research seems to be showing is relevant, rather than my gut feel.
There are so many things that could be shown! I'm learning more as I read the research, but some things from my list:
- Show user information in more ways - impact of individuals, teams, how people move between teams
- Allow user editing - especially merging identities if someone uses multiple emails
- Show ownership more clearly - distinguish between where 5 people changed a file equally, and where one person made 90% of those changes
- Show more clearly how users change over time - research talks about the impact of new people on a team, we should be able to visualise that
- ~Following renames - the scanner sees file renames, but it doesn't do a great job of matching earlier changes to the final file name!~ - fixed in release 0.2.0!
- More tests - I started with a lot of unit tests, but I haven't always added tests for newer features, especially as some need quite complex test data setup
- Test suites - I want to craft some git repos with real test scenarios, e.g. multiple authors, PRs, rebases. This is quite tricky to do! But it'd improve testing a lot. One option is to base tests on a real-world open-source codebase, but it might make tests a bit fragile.
At the moment the date slider is a bit deceptive - a lot of metrics are based on the current status of files, so looking at an older time interval doesn't really show what the world was like then.
This could be fixed, but it'd be a lot of work! The scanner would need to look at all git deletes, and reconstitute the state of the main branch at each point in time. Also probably you'd need to then calculate all other metrics at regular (monthly? yearly?) snapshots. I'd have to consider if this was worth the effort - though being able to scan backwards through metrics over time would be awesome...
- Add tests for the complex D3 areas - still should be unit testable, but might be more fiddly
- Add journey tests for the overall UI - might not actually be worth it.
I really like keeping things simple - I want the parsing to always work, even for plain text files / obscure languages no-one has looked at.
However, it did occur to me that I could get more language information - method / class lengths, say - from another class of file parser that is widespread: syntax highlighting code. Github has code to highlight a vast range of programming languages, as do libraries like prism.js. I've even seen tools which hook into Visual Studio Code's engine to highlight code: https://github.com/andrewbranch/gatsby-remark-vscode - something like this might provide a way to get more insights into code, without writing thousands of language parsers...
There are lots of other things I'd like to explore - especially more detailed git history visualisations. This will probably have to wait until I have a need to do these as part of my day job!Edit this page on GitHub