polyglot-tools-docs
How to run the Polyglot Code Scanner
How to run the Polyglot Code Scanner
The code scanner is a rust command-line program - which means applications are compiled executable binary files, which you run from a text command-line.
There are two ways to run the scanner - download a binary executable for your platform, or build it yourself from source.
Fetching binary releases
Binary releases are published to https://github.com/kornysietsma/polyglot-code-scanner/releases
You should be able to download the right file for your platform here. Note I haven't tested this thoroughly on Windows!
What about Mac M1?
Sorry, M1 builds aren't working easily in github actions yet. Apparently there might be some code in alpha for this? I hope to have M1 builds soon
Getting binaries to work on mac OSX
OSX has a system to protect you from malware, called 'gatekeeper' - by default on OSX if you try to run the binary app, you'll get an error "app is from an unknown developer". To bypass this I'd have to go through a complex extra stage of signing and bundling my tool as an application, which I'm avoiding for now.
If you are happy that the binary is safe - all the code that builds it is on github, so you should be able to inspect it yourself - you can run it by stripping the extra attributes that OSX adds when you download it, using xattr
:
xattr -d com.apple.quarantine polyglot_code_scanner-x86_64-macos
Then you can move the polyglot_code_scanner
binary to whever you want.
If you don't want to trust the binary, follow the steps below to compile it yourself.
If it doesn't work
I haven't been able to test the binary releases myself on many platforms, so if these don't work, you might have to revert to compiling yourself. (Feel free to raise an issue though, and I'll try to fix it)
Fetching code and compiling a binary
Get rust and cargo - instructions are at https://www.rust-lang.org/tools/install
Clone or fork the project from https://github.com/kornysietsma/polyglot-code-scanner
Or you can fetch the source code for a particular release from https://github.com/kornysietsma/polyglot-code-scanner/releases
test and build with
cargo
the rust packaging tool:
$ cargo test# ... maybe some warnings but hopefully no failures$ cargo build --release# ... lots of messages and noise - compiling can be slow$ ls -l target/releasetotal 14368drwxr-xr-x 125 korny staff 4000 28 Jul 15:09 builddrwxr-xr-x 891 korny staff 28512 28 Jul 15:11 depsdrwxr-xr-x 2 korny staff 64 20 Jun 15:20 examplesdrwxr-xr-x 2 korny staff 64 20 Jun 15:20 incremental-rw-r--r-- 1 korny staff 1003 29 Jun 14:45 libpolyglot_code_scanner.d-rw-r--r-- 2 korny staff 3062736 28 Jul 15:11 libpolyglot_code_scanner.rlib-rwxr-xr-x 2 korny staff 4280600 28 Jul 15:11 polyglot_code_scanner-rw-r--r-- 1 korny staff 1063 29 Jun 14:45 polyglot_code_scanner.d$ target/release/polyglot_code_scanner -Vpolyglot_code_scanner 0.4.0
All you need to care about is the binary polyglot_code_scanner
- that's what you need to run. You can run it from the target directory or copy it somewhere useful.
(You can also run the project with cargo run
but that's slower as it runs with debugging and without a lot of optimisation)
Running it
You can run polyglot_code_scanner -h
for command-line help.
Usually you will just run:
polyglot_code_scanner --name <projectname> -o json_file_name.json <source directory>
Important options
--name <name>
- you need to specify a name, so the Explorer can display the name and use it when saving and loading settings to avoid collisions.
--years <git_years>
- scan through this much history. Git scanning can be slow, and often you only care about the last few years.
--coupling
- enables temporal coupling. This can be very very slow to calculate - if you don't want it, don't specify this option! See the temporal coupling page for more.
--follow-symlinks
- allows the scanner to follow symbolic links
--no-git
- do not scan for git information at all - this can speed up processing significantly, and is needed for projects not stored in git
Handling multiple repositories
The scanner actually works in terms of directories and subdirectories, rather than really caring too much about git repositories. (It's even possible to work with code without git data - in theory, though this is probably broken once you try to visualise the results)
So you can store git repositories under the scanned root directory - it's quite common to check out every repository for a project as children of the root directory, and then use the --circles
layout option to show those repositories as top-level circles in the explorer.
You can even go deeper - one project I worked on had a nested tree of directories and projects 2 or 3 levels under the root; the scanner handles this just fine.
You can also use symbolic links to your actual repositories - pass the --follow-symlinks
parameter to enable this. (by default symbolic links are ignored as some repositories use symlinks in strange ways)
Ignoring files
The scanner automatically ignores any files that git ignores, using the standard .gitignore
file mechanisms.
You can also add your own ignore files - at the root, or anywhere in subdirectories - the scanner will look for files called .polyglot_code_scanner_ignore
files anywhere in the codebase, the syntax is the same as .gitignore's.
So for example if you want to ignore node.js junk, XML and CSV files you could add a .polyglot_code_scanner_ignore
file at the root of your project with the content:
node_modules/package-lock.jsonyarn.lock*.xml*.csv
Output file
See Data Format for a description of the output JSON file format.
It's not pretty printed to save space - if you want it pretty, the easiest thing is to install jq and run:
jq < file.json | less