John Hauser  
2455 Hilgard Avenue #23, Berkeley, CA 94709-1234, USA  / (510) 843-6909
jh@jhauser.us

Objective

  My technical interests include computer processor design, computer arithmetic, signal processing, image processing, graphics rendering, operating system kernels, device drivers, computer languages, compilers, and software support libraries.

Education

1988–2000      University of California / Berkeley, California
Ph.D. in Computer Science (specializing in computer architecture, with a minor in mathematics).
M.S. in Computer Science.
1987–1988 University of Colorado / Boulder, Colorado
One year of graduate study.
1982–1987 North Carolina State University / Raleigh, North Carolina
B.S. in Computer Engineering (mostly electrical engineering).
B.S. in Computer Science.

Employment

2010–today iCelero / San Jose → subsequently Bluechip Systems / Santa Clara, California
Hardware and software development for Bluechip Systems’ computer contained in a microSD package (MicroCloud X4 and other variations).
2010–today University of California / Berkeley, California
Part-time development of floating-point software (mostly Berkeley SoftFloat) and hardware floating-point units.
2010 Droplet Technology / Palo Alto, California
C-code optimization of one of Droplet’s image codecs.
2003–2009      3Plus1 Technology / Saratoga, California
Design and implementation of 3Plus1’s CoolEngine, plus various programming tools.
(Technology acquired by iCelero, which later spawned Bluechip Systems.)
1999–2004 Berkeley Design Technology, Inc. (BDTI) / Berkeley, California
Senior DSP Engineer (part-time).
1989–2000 University of California / Berkeley, California
Various Graduate Student Instructor and Graduate Student Researcher positions.
1994–1997 International Computer Science Institute (ICSI) / Berkeley, California
Software programming for ICSI’s T0 vector microprocessor.
summer 1992 Silicon Graphics / Mountain View, California
Implementation of quadruple-precision floating-point in software.

Experience

Computer architecture, hardware design
2010–today Made various contributions to the definition of the new RISC-V instruction set architecture. Most notably, supplied the bulk of the technical details in the current draft for the proposed RISC-V hypervisor extension (as of October, 2017).
2010–2014      At iCelero, tracked down bugs in hardware units provided by various suppliers, and devised workarounds both in hardware and software.
2003–2011 For 3Plus1 and iCelero, defined and implemented nearly all of the CoolEngine, a multiprocessor subsystem designed primarily for streaming tasks such as video processing. Each CoolEngine processor can execute multiple operations (scalar and SIMD) per clock cycle from a compressed VLIW machine code. A CoolEngine combines several such processors with an intelligent multichannel DMA for I/O and DRAM access. Was responsible for every aspect of the CoolEngine processors, including the programming architecture, instruction caches, instruction fetch, decode, and dispatch units, datapaths, and all functional units; plus the DMA unit and other CoolEngine components. Implementation was in Verilog and a proprietary langauge (machine-translated to Verilog). Physical area and timing were optimized using feedback from Synopsys standard-cell synthesis.
2002 At BDTI, advised a major processor design company of specific SIMD-style extensions that could be made to an existing architecture to improve its performance for fixed-point digital signal processing.
1994–2000 For my doctoral dissertation, examined the use of an FPGA-like device as an additional microprocessor functional unit. Defined a novel processor architecture named Garp, and constructed programming tools and a cycle-accurate simulator. Implementation feasibility was studied through SPICE circuit simulations and partial VLSI layout. My research advisor was John Wawrzynek.
System software, firmware
2011–2016 Developed all of the on-chip ROM firmware used for booting Bluechip Systems’ microSD-package computer, in a mixture of C and ARM assembly code. Also created several tools and libraries for similar “bare metal” programming of the platform. With some experimentation, found a working configuration for the device’s complex DRAM controller. Wrote the basic boot code that configures the DRAM controller and MMU and then loads a program from flash memory through an internal SD bus interface. Ported Express Logic’s ThreadX library for multithreading. Wrote the initial program code to load Linux and pass control to it, including an optional cryptographic test of the integrity of the Linux code that, for speed, is performed on the device’s CoolEngine.
2006–2008 For 3Plus1, defined and implemented libraries and related software tools for interprocessor communication within a CoolEngine multiprocessor subsystem.
1996 Created a Solaris device driver to interface with ICSI’s SPERT-II board, an SBus daughter card containing a T0 processor.
Programming languages, compilers, other software development tools
2013 Created a complete and robust GDB bridge to support debugging software on Bluechip Systems’ computer in a microSD package, for code running on both the device’s ARM processor and its CoolEngine.
2004–2008 For 3Plus1, created tools and libraries to provide a practical alternative platform for testing and debugging CoolEngine software, by allowing CoolEngine source code (in C and assembly language) to be compiled and linked to run efficiently on an ordinary desktop computer. In this foreign environment, the CoolEngine’s multiple processors are simulated by transparently spawning multiple processes, and CoolEngine assembly code is handled by transparently invoking a CoolEngine processor simulator whenever an assembly-language function is called.
2004–2005 Invented the CoolEngine’s complex assembly language, and created the first assembler.
2000–2003 Represented BDTI within the ISO working group responsible for Standard C. Contributed in major part to the ISO Technical Report on extensions to Standard C for embedded microcomputers, TR 18037 [late draft]. Was the author of nearly all of clause 5 (named address spaces and registers) and much of clauses 4 and 6 (fixed-point arithmetic, I/O addressing).
1996–1997 Built a C preprocessor as part of a project to construct a complete ISO-Standard C compiler.
1995 Implemented a basic-block instruction scheduler within the GNU assembler (gas) for ICSI’s T0 vector microprocessor. The T0 processor is unusually difficult to schedule for because structural hazards can be held by the vector instructions for literally dozens of clock cycles. Several heuristics were combined to cover different situations. Almost no noticeable execution time was added to gas, even for artificially generated basic blocks containing many hundreds of instructions.
1991–1994 For my Master’s degree, examined the need for exception-handling features in programming languages, and critiqued the main kinds of exception mechanisms that have been implemented or proposed over the years. Special attention was given to the efficient handling of arithmetic exceptions on high-speed processors.
Computer arithmetic
2010–today For U.C., Berkeley, completely rewrote the SoftFloat and TestFloat software packages, adding new features and optimizations. Adaptations of SoftFloat are used in a number of projects worldwide, including Linux for older ARM processors. Also developed hardware functional units for the basic floating-point arithmetic operations, implemented in both Verilog and Chisel. The hardware floating-point units have been used in several fabricated RISC-V processors, both academic projects and commercial products.
2003–2004 Participated in the working group revising the IEEE standard for floating-point arithmetic (IEEE 754).
2001 Created a C++ library for BDTI that fully implements parameterized fixed-point types. Fixed-point formats are specified using C++ template parameters, and the standard arithmetic operators, +, -, *, etc., are overloaded to permit arbitrary fixed-point operands in addition to the usual integers and floating-point. Precise control over rounding and overflow allows bit-identical mimicking of most fixed-point hardware.
1996–1998 Released and updated the original SoftFloat and TestFloat packages, both grown out of work originally done for ICSI (see below). At the time of its release, TestFloat found small flaws in the floating-point of several commercial processors, including a flaw in the Intel Pentium Pro that was rediscovered the next year and dubbed Dan-0411 by Robert Collins of the Intel Secrets Home Page.
1994–1995 Implemented floating-point and other arithmetic functions for ICSI’s T0 vector microprocessor. Functions coded include single- and double-precision IEEE floating-point emulation written in C, and a vector version of single-precision IEEE floating-point written in T0 assembly language.
1992 At Silicon Graphics, coded IEEE-compliant quadruple-precision floating-point in MIPS assembly language.
1992 Discovered an oversight in Digital Equipment Corporation’s Alpha architecture concerning floating-point subnormal numbers. The discovery resulted in a last-minute fix by Digital before the first Alpha machines were shipped.
Image processing, digital signal processing (DSP)
2015–2017 For Bluechip Systems, demonstrated the performance of the CoolEngine by crafting optimized implementations of a selection of image processing functions, notably a standard histogram of oriented gradients (HOG) reduction used for object detection, and some image filters including a Gaussian filter and a 3×3 median filter.
2010 For Droplet Technology, overhauled the C source code of a proprietary image codec to optimize its performance on an ARM processor.
2003–2009 Designed 3Plus1’s CoolEngine for efficient performance of many common DSP functions (such as FFT).
1999–2009 For various BDTI customers, defined and coded numerous DSP functions, and also improved the performance of several DSP applications through profiling and the recoding of critical functions, usually in hand-optimized assembly language. Speed improvements in some cases were as much as a factor of ten. For 3Plus1, performed the same services to demonstrate the CoolEngine’s capabilites.
1999–2003 At BDTI, helped evaluate the DSP performance of a number of processors, either with respect to specific customer needs or in accordance with BDTI’s proprietary benchmarking methodology. Participated in ongoing efforts within BDTI to refine and extend the company’s benchmarking methods.
2001 Together with a colleague at BDTI, converted a customer-supplied software MP3 decoder entirely from floating-point to fixed-point in order to port the decoder to processors supporting only integer operations in hardware.
Other areas
1989 Helped crack the encoding of Adobe Type-1 fonts, before the format was publicly documented by Adobe. My contributions included expanding the set of known byte-code operators and deducing much of the font hinting mechanism.
I have also done some unpublished research on geometric interpolation (splines), on image scaling (changing size and/or resolution), and on dithering color images to a limited palette.

Computer Skills

Operating systems: Microsoft Windows, Linux, older Solaris, MS-DOS, some Mac OS X.
Programming APIs/libraries: Windows, POSIX, Linux, UNIX terminfo database, MS-DOS, some ThreadX, OpenCV.
Programming languages: C, some C++, Common LISP, miscellaneous others.
Software development tools: GCC, ARM DS-5, yacc/bison, GNU make, some Eclipse, GDB.
Hardware description languages: Verilog, some Chisel.
Hardware development tools: Icarus Verilog, Verilator, some GTKWave.
Processor architectures and assembly languages: ARM (mostly v5), RISC-V, Intel x86 (with some SIMD extensions), older MIPS, Motorola 68000, some SPARC, PowerPC (with Altivec extensions), several DSPs, others.
Hardware buses/protocols: SD and MMC (for SD cards, etc.), some JTAG, AXI, AHB, LPDDR2 (Mobile DDR), SPI, I2C.
Document definition languages: LaTeX, TeX, some HTML, PostScript.
Source control tools: Git, Subversion, CVS.

Publications

The SFRA: A Corner-Turn FPGA Architecture.” Nicholas Weaver, John Hauser, and John Wawrzynek. In Proceedings of the 2004 ACM/SIGDA 12th International Symposium on Field-Programmable Gate Arrays (FPGA ’04, February 2004).
The Garp Architecture and C Compiler.” Timothy J. Callahan, John R. Hauser, and John Wawrzynek. Computer 33:4 (April 2000).
A Fixed-Point Recursive Digital Oscillator for Additive Synthesis.” Todd Hodes, John Hauser, John Wawrzynek, Adrian Freed, and David Wessel. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’99, March 1999).
Garp: A MIPS Processor with a Reconfigurable Coprocessor.” John R. Hauser and John Wawrzynek. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM ’97, April 1997).
Handling Floating-Point Exceptions in Numeric Programs.” John R. Hauser. ACM Transactions on Programming Languages and Systems 18:2 (March 1996).

John Hauser, 2017 November 2