How to run the Polyglot Code Scanner

How to run the Polyglot Code Scanner

The code scanner is a rust command-line program - which means applications are compiled executable binary files, which you run from a text command-line.

There are two ways to run the scanner - download a binary executable for your platform, or build it yourself from source.

Fetching binary releases

Binary releases are published to https://github.com/kornysietsma/polyglot-code-scanner/releases

You should be able to download the right file for your platform here. Note I haven't tested this thoroughly on Windows!

What about Mac M1?

Sorry, M1 builds aren't working easily in github actions yet. Apparently there might be some code in alpha for this? I hope to have M1 builds soon

Getting binaries to work on mac OSX

OSX has a system to protect you from malware, called 'gatekeeper' - by default on OSX if you try to run the binary app, you'll get an error "app is from an unknown developer". To bypass this I'd have to go through a complex extra stage of signing and bundling my tool as an application, which I'm avoiding for now.

If you are happy that the binary is safe - all the code that builds it is on github, so you should be able to inspect it yourself - you can run it by stripping the extra attributes that OSX adds when you download it, using xattr:

xattr -d com.apple.quarantine polyglot_code_scanner-x86_64-macos

Then you can move the polyglot_code_scanner binary to whever you want.

If you don't want to trust the binary, follow the steps below to compile it yourself.

If it doesn't work

I haven't been able to test the binary releases myself on many platforms, so if these don't work, you might have to revert to compiling yourself. (Feel free to raise an issue though, and I'll try to fix it)

Fetching code and compiling a binary

  1. Get rust and cargo - instructions are at https://www.rust-lang.org/tools/install

  2. Clone or fork the project from https://github.com/kornysietsma/polyglot-code-scanner

    Or you can fetch the source code for a particular release from https://github.com/kornysietsma/polyglot-code-scanner/releases

  3. test and build with cargo the rust packaging tool:

$ cargo test
# ... maybe some warnings but hopefully no failures
$ cargo build --release
# ... lots of messages and noise - compiling can be slow
$ ls -l target/release
total 14368
drwxr-xr-x 125 korny staff 4000 28 Jul 15:09 build
drwxr-xr-x 891 korny staff 28512 28 Jul 15:11 deps
drwxr-xr-x 2 korny staff 64 20 Jun 15:20 examples
drwxr-xr-x 2 korny staff 64 20 Jun 15:20 incremental
-rw-r--r-- 1 korny staff 1003 29 Jun 14:45 libpolyglot_code_scanner.d
-rw-r--r-- 2 korny staff 3062736 28 Jul 15:11 libpolyglot_code_scanner.rlib
-rwxr-xr-x 2 korny staff 4280600 28 Jul 15:11 polyglot_code_scanner
-rw-r--r-- 1 korny staff 1063 29 Jun 14:45 polyglot_code_scanner.d
$ target/release/polyglot_code_scanner -V
polyglot_code_scanner 0.4.0

All you need to care about is the binary polyglot_code_scanner - that's what you need to run. You can run it from the target directory or copy it somewhere useful.

(You can also run the project with cargo run but that's slower as it runs with debugging and without a lot of optimisation)

Running it

You can run polyglot_code_scanner -h for command-line help.

Usually you will just run:

polyglot_code_scanner --name <projectname> -o json_file_name.json <source directory>

Important options

--name <name> - you need to specify a name, so the Explorer can display the name and use it when saving and loading settings to avoid collisions.

--years <git_years> - scan through this much history. Git scanning can be slow, and often you only care about the last few years.

--coupling - enables temporal coupling. This can be very very slow to calculate - if you don't want it, don't specify this option! See the temporal coupling page for more.

--follow-symlinks - allows the scanner to follow symbolic links

--no-git - do not scan for git information at all - this can speed up processing significantly, and is needed for projects not stored in git

Handling multiple repositories

The scanner actually works in terms of directories and subdirectories, rather than really caring too much about git repositories. (It's even possible to work with code without git data - in theory, though this is probably broken once you try to visualise the results)

So you can store git repositories under the scanned root directory - it's quite common to check out every repository for a project as children of the root directory, and then use the --circles layout option to show those repositories as top-level circles in the explorer.

You can even go deeper - one project I worked on had a nested tree of directories and projects 2 or 3 levels under the root; the scanner handles this just fine.

You can also use symbolic links to your actual repositories - pass the --follow-symlinks parameter to enable this. (by default symbolic links are ignored as some repositories use symlinks in strange ways)

Ignoring files

The scanner automatically ignores any files that git ignores, using the standard .gitignore file mechanisms.

You can also add your own ignore files - at the root, or anywhere in subdirectories - the scanner will look for files called .polyglot_code_scanner_ignore files anywhere in the codebase, the syntax is the same as .gitignore's.

So for example if you want to ignore node.js junk, XML and CSV files you could add a .polyglot_code_scanner_ignore file at the root of your project with the content:


Output file

See Data Format for a description of the output JSON file format.

It's not pretty printed to save space - if you want it pretty, the easiest thing is to install jq and run:

jq < file.json | less
Edit this page on GitHub