Why Ocaml?
Kevin Murphy, 3 December 2002.
In Fall 2002,
I started a project
which involved a
good mix of string processing, simple statistics and some simple data
structures like hash tables and trees.
A preliminary prototype in matlab was very slow, so
I wanted to look for a more suitable language to implement it in.
My desiderata are listed below.
The language should
- Have an intepreter for rapid prototyping, ease of
debugging, and maximum fun.
- Have a native code (not just byte code) compiler that produces
fast code that can be run stand-alone or called from the
interactive environment.
- Have good support for vectors, multi-dimensional arrays,
strings,
hash tables, etc. in the standard library.
- Have a free implementation.
- Work under linux and windows.
(so I can transfer code easily between my desktop and my laptop.)
This left (in my mind) Lisp or ML,
both of which meet the above desiderata.
(Lisp is a functional programming language with imperative features;
ML is a strong, statically-typed version of Lisp.)
Deciding between Lisp and ML is harder...
ML vs Lisp
- Popularity/ familiarity.
Lisp is more widely known/used than ML (especially in AI).
There is a lot of code already written in Lisp.
- Type checking.
ML is statically type checked (unlike lisp), which
reduces errors and improves efficiency.
Although Lisp allows one to declare types to improve efficiency,
it is a bit ugly and not as powerful as ML.
In addition, ML has type inference, which means it is not necessary to
explicitly declare types. (The CMU CL compiler also does some type inference.)
- Compilers.
For ML, there are two free compilers:
The Standard ML of New Jersey
and
Ocaml.
The Ocaml
compiler is somewhat more efficient than the SML/NJ compiler
(see
"Do you blow SML/NJ's socks off?").
In addition, Ocaml comes with some excellent libraries,
and support for objects,
making it preferable to SML in my opinion.
For Lisp, there are several compilers.
Allegro lisp compiler is expensive.
GNU Clisp
compiler is free and portable, but has poor floating point
performance.
CMU common lisp
is free and has good floating point performance, but
only has a unix port.
- Speed.
According to The great
computer language shootout,
(see also the newer Computer language
shootout benchmarks)
Ocaml is the second fastest language - slower than C, but faster than C++.
No matter how I changed the weights reflecting relative importance of
speed, memory usage, lines of code, mathematical vs string processing,
etc., it always came out in the top 3.
I was skeptical, but the same results hold true in the
Win32 version of the
shootout, implemented independently.
- Syntax.
I have not yet gotten used to lisp syntax (it is said that lisp stands
for "lost in superfluous parentheses").
On the other hand, Ocaml also has a few quirks, e.g., one must
remember to write +. for real
addition and + for integer addition. However, this seems quite
natural. More importantly, people claim Lisp's macros can be used to
define fancy syntactic sugar. Ocaml also has a
preprocessor, but I haven't learned how to use it yet.
Speed of OCaml
The benchmarks above suggests the Ocaml compiler generates the second fastest code
of any of the currently available compilers (gcc and the Intel C
compilers being first).
Given that Ocaml is also a beautiful language to program in, this is
pretty compelling.
But maybe the benchmarks are unreliable?
See eg
Ocaml is only fast if used imperatively,
Slashdot 14 March 2005.
This is possible.
However, I found several other favorable reports on Ocaml's performance.
e.g., this
example, which implements the
Sieve of Eratosthenes for computing primes in Ocaml and C.
The Ocaml code is faster, even though the C code is well-written.
In addition,
I found
this
quote from Doug McClain, on a detailed comparative study of C++, IDL, Fortran, SML,
Ocaml, Dylan, Erlang, Clean, Haskell, Lisp, Mathematica for scientific
computing:
"And most importantly, the CAML version works, and it works properly
every time. I am assured, having monitored its runtime behavior that
there are no memory leaks. Furthermore, the quality of code generated
by the CAML compiler has been analyzed by the Intel VTune system and
it show no pipeline stalls, maximum parallelism between integer and
floating point units, and machine assembly code that is as good or
better than can be achieved by hand coding."
So I did my own experiment. It involved a lot of simple floating
point arithmetic, plus some string matching.
I found the following speedups relative to intrepreted matlab 6.1:
Ocaml native code compiler: 10 times faster,
Ocaml bytecode compiler: 2 times faster,
Matlab mcc compiler: 1.4 times faster.
(The matlab code has 670 lines of code,
the equivalent Ocaml code has 989 lines.)
Ocaml links
- Ocaml wiki
- Cocan ocaml wiki
-
Doug Bagley's editorial on Ocaml.
- Ocaml vs Ruby
-
Lisp vs Ocaml vs C++. In particular, Ocaml produces much smaller
binaries than lisp.
- PsiLab, Ocaml
package for numerical computation.
-
LACAML, Ocaml interface to LAPACK, BLAS, etc.
- Ocaml interface to
GSL,
the Gnu Scientific library.
- Ocaml
interface to FFTW, the fastest FFT in the West (a C library for
DFTs, which was generated using Ocaml).
-
NML,
a vectorized Ocaml-like language for numerical computing.
(See also the NML announcement.)
No longer updated.
- OcamlMex,
lets you call Ocaml code from Matlab.
- Image processing
in ocaml
-
Tries and other finite-state language processing tools.
-
Tries, bit
vectors, heaps, etc..
- Agrep,
regular expression matching with errors.
-
OcamlDot,
interface to
Dot graph
layout package.
-
Humps, long list of free Ocaml software.
- Making Ocaml run
fast: advice from the creators of the language.
-
Caml Weekly news, a good archive of edited emails
-
Summary of discussion on operator overloading,
with emphasis on matrix operations
- F#,
Microsoft's way of combining Ocaml with C#.
- Ocaml
coding style guidelines, from a caltech class
- IBAL,
Pfeffer's stochastic/ decision theoretic agent language, implemented
in Ocaml
- Ocaml
manual
- Ocaml homepage
- Maxent (logistic
regression) code
Comparison of other languages
Click here for my comparison.
Going back to Matlab...
After making prototypes of my statistical language project
in matlab and Ocaml, I found that
the Ocaml version was about 10 times faster than matlab.
However,
Since January 2003,
I have gone back to matlab
because
I have become more involved in computer vision projects,
for which matlab is ideal
(although Frank Dellaert uses Ocaml for vision),
and because my collaborator uses Matlab.