Theory, Definitions and Algorithms
These are some of the entry points to the underlying fundamentals in compiler theory, in random order.
- Halting problem
- Von Neumann architecture
- Lattice
- Type inference
- Covariance and Contravariance
- List comprehension
- Suffix array
- Currying
- C3 linearization
Some useful sites and articles with state of the art information.
- How OCaml type checker works
- Viewpoints Research Institute
- The Data Compression Resource
- The flow programming language
- Best programming languages
- Understanding Strict Aliasing
- A new class of algorithms for software pipelining with resource constraints
- A short history of btrfs
- Google search results: "data parallel algorithms paper"
- A Quick Survey on Intermediate Representations for Program Analysis
Programming languages around interesting concepts
- A Week with Elixir
- Redex
- Maude
- Go Programming Language
- YouTube - Another Go at Language Design
- How Go message passing relates to the L4 micro-kernel
- The Caml language homepage
- The Agda Wiki
- CellML is an XML-based language to store and exchange computer-based mathematical models.
- Heterogeneous Computing and C++ AMP
- Intel Clik Plus now available in open-source
- C++ FAQ
- Concurrent Eiffel with SCOOP
Projects
Description | Language | Implementation | License |
---|---|---|---|
GCC, GCC Interactive Compiler | Ada, C, C++, Java, Go | C | GPL |
LLVM , Clang: Clang Language Extensions CXX status | C++ | C++ | BSD |
Open64, AMD Open64 page | C, C++, Fortran | C++ | GPL |
Path64, Path64 open source announcement | C, C++, Fortran | C/C++ | GPL (v2, v3 depending on files) |
nVidia also released its CUDA LLVM-based CUDA compiler as open source in 2011.
The autoparallelizing compiler for shared-memory computers here
Pointers inside compiler source bases
LLVM web resources
LLVM Hello
LLVM is able to emit machine independent code which is a useful option if you want to start playing with object translation tools outside the LLVM source tree.
# The argument to emit machine independent code is -emit-llvm
$ llvm-gcc -emit-llvm -o hello.o hello.c
The bit code libraries to link are libLLVMBitReader.a and libLLVMBitWriter.a while the headers can be found in include/llvm/Bitcode. A useful tool is the disassembler llvm-dis.
LLVM adding a target
First thing to figure out what is going while building llvm is to enable verbose mode:
cd llvm-3.0-obj/obj && make VERBOSE=1
Each target machine requires a subdirectory in lib/Target that will build a static library linked into different executables. Once you have created your own target subdirectory using a sibling as a template, you will also need to patch the following files to get the new target recognized by the system.
llvm-3.0/CMakeLists.txt
llvm-3.0/configure
llvm-3.0/include/llvm/ADT/Triple.h
The way dependencies and link command lines are generated is quite complex. It all revolves around llvm-config (llvm-3.0-obj/Release/bin/llvm-config, llvm-3.0.src/Makefile.rules:LLVMConfigLibs). The idea is to generate a deep dependency tree (required to run ld command-line) from direct prerequisites. The problem is that exported symbols are used into the mix of computing dependencies such that your source files do not cross reference symbols in each other, one library might be dropped from the ld command line and you end-up with an error like:
llvm[3]: Linking Release executable llc (without symbols)
Undefined symbols for architecture x86_64:
"_LLVMInitializeNameTargetMC", referenced from:
_main in llc.o
ld: symbol(s) not found for architecture x86_64
LLVM Understanding the Intermediate Representation
The best way to start understanding the IR is to look for the code dumping it to stdout or maybe a file. Such entry point can be found in lib/VMCore/AsmWriter.cpp.
void Module::print(raw_ostream &ROS, AssemblyAnnotationWriter *AAW) const { ...
void AssemblyWriter::printModule(const Module *M) { ...
void AssemblyWriter::printFunction(const Function *F) { ...
void AssemblyWriter::printBasicBlock(const BasicBlock *BB) { ...
void AssemblyWriter::printInstruction(const Instruction &I) { ...
void AssemblyWriter::writeOperand(const Value *Operand, bool PrintType) { ...
LLVM IR is a classic module / function / basic blok / instruction decomposition implemented as structures and pointers. The definitions are in:
llvm-3.0/include/llvm/Module.h
llvm-3.0/include/llvm/Function.h
llvm-3.0/include/llvm/BasicBlock.h
llvm-3.0/include/llvm/Instruction.h
Open64
I tried to compile the Open64 source base on OSX 10.6.8 with gcc 4.2.1 but that did not go well. The fact that the i386-apple-darwin10.8.0 default target picked up by configure is not supported was a little worrying. After looking through the configure script, the only patterns that do not result in a "open64 is not supported on" are
- x86_64*-*-linux*
- i*86*-*-linux*
- ia64*-*-linux*
$ ${srcTop}/contrib/compilers/open64/configure --prefix=${installTop} --with-build-optimize=debug --build=x86_64-unknown-linux-gnu
$ make
...
open64/osprey/driver/table.c:46:20: warning: malloc.h: No such file or directory
An interesting entry point to follow is the Lnoptimizer() function in osprey/be/lno/lnopt_main.cxx. Looking for the Start_Timer/Stop_Timer is a good approximation to find specific optimizer phases and algorithms. The list of opcodes is defined in osprey/common/com/opcode_gen_core.h.
Path64
Browsing through the directory structure and the source it seems like a lot of the code came originally out of the Open64 project.
I tried to compile the Path64 source base on OSX 10.6.8 with gcc 4.2.1 but that did not go well either.
$ cmake -DCMAKE_BUILD_TYPE=Debug \
-DPATH64_ENABLE_TARGETS=x86_64 \
-DPSC_CRT_PATH_x86_64=/usr/lib \
-DPSC_LIBSUPCPP_PATH_x86_64=/usr/lib \
-DPSC_LIBSTDCPP_PATH_x86_64=/usr/lib \
-DPSC_LIBGCC_PATH_x86_64=/usr/lib/gcc/i686-apple-darwin10/4.2.1 \
-DPSC_LIBGCC_EH_PATH_x86_64=/usr/lib/gcc/i686-apple-darwin10/4.2.1 \
-DPSC_LIBGCC_S_PATH_x86_64=/usr/lib \
-DCMAKE_INSTALL_PREFIX=${installTop} \
${srcTop}/contrib/compilers/path64
$ make
path64/src/csu/elf-x86_64/crtbegin.S:30:unknown section type: @progbits