Dialyzing legacy code
Wednesday, 3rd July, 2019
** Introduction
Typespecs are handy documentation for erlang functions, but they really come to life when used with Dialyzer []. Dialyzer analyzes a codebase and checks that functions behave according to their typespec. This post runs quickly through using dialyzer on an existing codebase, my own LIGA project.
** Preparation
First, build a PLT (Persistent Lookup Table) for the project. Include a list of erlang apps the project uses, and provide a project-specific location:
$ dialyzer --build_plt --apps kernel stdlib erts eunit --output_plt .liga.plt Compiling some key modules to native code... done in 0m25.63s Creating PLT .liga.plt ... Unknown functions: compile:file/2 compile:forms/2 compile:noenv_forms/2 compile:output_generated/1 crypto:block_decrypt/4 crypto:start/0 Unknown types: compile:option/0 done in 0m24.01s done (passed successfully)
This will output a dialyzer PLT in a newly-created local file .liga.plt.
More applications can be added after building:
$ dialyzer --add_to_plt --apps compiler crypto --plt .liga.plt
It is possible to run Dialyzer over a whole codebase in one sweep. The simplest way is to give dialyzer a list of directories to analyse, e.g.:
$ dialyzer -r src/ test/ --src
The --src
flag tells dialyzer to find and check .erl files (default is to check compiled .beam files).
Build tools like erlang.mk have wrappers too, e.g.:
$ make dialyze
When dialyzing a legacy codebase, the above is likely to produce a lot of warnings, so going module-by-module might be more manageable. Here is a simple workflow:
$ dialyzer src/liga_intmap.erl
- [… edit liga_intmap.erl as desired …]
$ make
$ dialyzer --add_to_plt -c ebin/liga_intmap.beam --plt .liga.plt
The last step adds the functions in the module to the PLT. Once the whole codebase has been done for the first time, future checks in batch mode (i.e., after changes to the code) should return with few or no warnings.
When dialyzing a codebase module-by-module, we check each module, make any desired changes, then recompile the code and add the module’s beam file (along with any other updated beam files) to the project’s PLT. Dialyzer will issue warnings for any “unkown functions” (i.e., functions in modules it doesn’t know about). To avoid as many of these as possible, we go through the modules working up the dependency tree, starting at leaf modules (without dependencies).
Grapherl can render a dependency tree of erlang modules as a .png file, e.g.:
$ grapherl -m ebin/ liga.png
Here are some example warnings I got when dialyzing LIGA.
labmap.erl
$ dialyzer src/liga_labmap.erl Checking whether the PLT .liga.plt is up-to-date... yes Proceeding with analysis... liga_labmap.erl:77: The pattern can never match the type done in 0m0.14s done (warnings were emitted)
Dialyzer doesn’t see macros (including records). As the line defining ?VERSION
as original
is commented out, the clause of versioned_weights/3
that matches it appears to be superfluous.
data_server.erl
$ dialyzer src/data_server.erl Checking whether the PLT .liga.plt is up-to-date... yes Proceeding with analysis... data_server.erl:82: The created fun has no local return data_server.erl:83: The call data_server:get_with_complement(Lab::any(),Ra::any(),{'nm',_},0) breaks the contract (atom(),any(),pcage(),non_neg_integer() | 'all') -> {[labelled_string()],[labelled_string()]} done in 0m0.22s done (warnings were emitted)
“no local return”
– this can mean that the specified function never returns, in which case the typespec can mark the function’s return type as no_return(). It can also (perhaps more often) mean that dialyzer itself crashed while checking the function. In my experience the next warning gives a clue to the cause of the crash.
“breaks the contract”
– there is a mismatch between type expectations between calling function and function being called. The error might just be in the typespec annotations — or two parts of the codebase might have gotten out of sync. In either case tis is important to resolve.
liga_writer.erl
This warning is from erlang.mk’s ‘make dialyze’.
liga_writer.erl:18: Expression produces a value of type 'true' | {'error','bad_directory'}, but this value is unmatched
The line in question is:
code:add_path("."),
Where the code is using a stdlib function for its side-effects rather than for its return value. The function does return a value though, and the code would be safer and clearer if the expected value was matched:
true = code:add_path("."),
** Conclusion
Dialyzer is quite simple to use, and helps improve the coherence and clarity of a codebase. As well as the documentation, the Dialyzer chapter of Learn You Some Erlang is worth a read.