Knowledge Dumps (Lectures on a Topic of Interest):
Stored Program Architecture
Topics (Learn through hands on projects):
Put Linux on X
Build SOC based system
Power User: Wikis and Wagn
Our goal will be to create an online encyclopedia of resources for digital systems design. We will also create a practical hands on course for the emerging generation of makers where they will learn all about state of the art systems. This knowledge and the related skills are highly marketable, but what we really want is the joy of knowing how things work and being able to invent new things or just make personal customizations to express ourselves.
Many of the resources you will need are already well covered by other authors. I highly recommend studying all the material in Elements of Computer System as well as taking the course Nand2Tetris. Either just studying the open materials up to taking the official course or implementing the home schooled (link to resources here, or do this ourselves on this site). Much of this course is available for free, so please use it to supplement what you find here. You can also find resources that take you down into analog devices and physics to understand how gates are constructed in different technologies. The approach of covering all the layers of system abstraction/implementation is part of my goal here, so if you find some material here confusing for whatever reason, please use these and other resources you find to give youself a solid foundation before beginning to find your own path.
This material is intended to get you into building and experimenting with current technologies and to be used as a guide for finding and exploring your particular interests which may be off the beaten path. This site is intended to be more invitation to explore the edge of your knowledge and even the edge of what is possible. None of it is that complicated, but there is a lot of it and this is only the beginning. It will lead directly into in-depth material to get you quickly into the foundational technologies of Linux and the command shell. This will prepare you to dig into projects like Raspberry PI and Beagle Bone as well and the much simpler Arduino based projects. There are already large maker communities around these projects and you will find many resources on other sites for building projects and learning the ropes. We will also show you resources to prototype with more commercially supported projects. We might prototype with Android based hardware, or build an SoC based on industry standard desktop and laptop architectures. These days that is Intel or ISA compatible AMD processors.
With these resources, any group of technically savvy people can collaborate to first learn how it all works, and then start designing and building systems of the 21st century. The real bottleneck from where we are to where we need to be is what Fuller call Universal Design Science and this material is designed to provide the background needed to build the control systems for these designs. The rest of the necessary background is basic maker knowledge. Makers know how to build things and as a community they know all of the current building technology as well as the foundational traditional technologies that they are based on.
We are not going to reinvent any wheels here, wherever possible we will build on and link to other projects and resources. We participate in the commons of knowledge and learning about signals and systems as we are rebuilding our world in more healthful and wealthful ways. In this open course you will be invited to learn by doing on both projects that we suggest or ones that you and your community design. We will also build a hub here for such projects to share resources and collaborate. You are invited to make this material your own and fix and expand it to be more clear and complete. Projects closely related to the chapters can be discussed at the end of each chapter.
I. Abstraction: Computers as Mathematical Machines
A. Formal Systems: Games and Modelling
B. Signs and Systems: Semiotics and Representation
II. Mechanism: from Babbage to Modern Computer Networks
A. Representation: from Positions of Wheels to a Binary Signal
1. Boolean mathematics and logic circuits.
a. NAND/NOR and random logic
b. General N to M and Programable Logic Block
2. Latches and Memory
3. From State Machines to Turing
B. Programs and Data
1. Von Neuman architecture: Programs as Data
a. Modern 16/32/64 bit architectures
2. Programming Languages
a. Machine Language: Just the bits
b. Assembler Languages: Human readable machine language
c. C: The high level assembler
d. Everything else is also language interpretation
C. Data as Logic: the FPGA revolution
1. Full circle: System on a Chip with programable custom logic.
D. From Components to Systems and Networked Systems
1. Processing Units, Multiple Cores
2. Memory Hierarchy: Processor caches, RAM and Storage
3. Networks and I/O: Other systems and the world outside
In this unit we learn about the atomic bits of computation. Many things have been tried, that is the history course, all modern systems are composed of subsystems that are built with logic blocks described by Boolean Algebra. We will start with simple static logic circuits, and you will learn to use Logisim and other tools to build complex logic elements from nested modules that can be realized as physical circuits directly or programmed into an FPLA.
Abstractly, a gate maps some number of inputs into a single output, and sometimes we generalize to compute multiple outputs from a set of inputs. Starting from the simplest, refer to the Simple Gates circuit in the examples, we have the 1 to 1 functions, and we can see there are exactly four of them or 2<sup>2</sup>, then 2 to 1 with 16 possible functions, including the three examples (or 2<sup>2*N</sup>). In any given circuit technology, we only have to implement a small number of gates and we can build the rest from them.
One input is either just a wire (no change), or it is NOT, where the output is the opposite value to the input. Well, lets back up a sec, actually there are 4 functions of two inputs, but two of them are degenerate in the sense of really being zero input funtions, that is always true (1) and always false (0).
Also, sometimes a technology will have a fanout limit, but we can still get around that by putting in two NOT gates back to back to in effect re-amplify the input so it can drive more loads. There could also be a multistage amplifier at the edge of the microscopic parts of a chip and the pads the lead out to the board and the rest of the world. But lets not worry about that in thinking about logic. We can rely on standards and modules to interface with the analog and digital protocols that will integrate our logic elements with other systems and networks.
Let that be the first lesson, sometimes even one bit of logic is complicated, and entire systems can exchange data in multi-layered protocols. But as logic, there are two tables:
0 -> 0, 1 -> 1 and 0 -> 1, 1 -> 0
Then on two inputs, we have a table with four binary values, so there are 16 possible functions. (add example tables as graphics) Lets see how that can work. For one bit in, I have a table with two rows, one for each input value, and it has two bits in it, one table entry for each row. In that table, 00 and 11 are the always true cases and 01 and 10 are the other two, but we have to now what order they are to know which is whiich. If 0 is first, the 01 is the itentity function and 10 is NOT. If the sixteen are described in four bits, one for each combination of values of A and B, then again some are reducable to zero or one input gates if you assume a simpler gate where one or more inputs isn't connected to anything. We often will use hamming code order, so the coding of ABCD bits below will match AB-00,01,11,10. You can examine these in the simulator by grounding or pulling up an input.
A table form for one input would have the four functions from above with for lines, one for each "function" where each binary digit represents the output with particular input values. So the first column here is when the input is zero, and the second when one. So we have the four functions of one input:
- 00 Always 0 (False)
- 01 Identity (same as input)
- 10 NOT
- 11 Always 1 (True)
So, lets look at all sixteen (Digits AB: 00, 01, 10 11)
- 0000 Always False
- 0001 A AND B
- 0011 A
- 0010 A AND NOT B
- 0110 A XOR B = NOT (A EQ B)
- 0111 OR
- 0101 B
- 0100 NOT A AND B
- 1100 NOT A
- 1101 NOT (A AND NOT B) = NOT A OR B
- 1111 Always true
- 1110 NOT (A AND B) -- Called NAND
- 1010 NOT B
- 1011 NOT (NOT A AND B) = A OR NOT B
- 1001 A EQ B = NOT (A XOR B)
- 1000 NOT (A OR B) -- Called NOR
From Logic to Gates
Logic as above is the abstraction, the math you want your digital circuit to compute. A fact passed over above is that gates are not technically restricted to binary logic, but as a practical matter they are. We build almost everything with binary, synchronous (and we'll get into that when we talk more about timing and clocks later) logic. So the math is all Boolean Algebra as in the brief introduction above.
You won't need to know that much about how logic functions are implemented as gates (transistors) at the circuit level unless you are a specialist working for a silicon manufacturer. There are many interesting things to learn about all of that, but these days if you are designing custom logic for a project, you are either part of a big design team producing custom chips, or you are designing with some form of programable array logic. In these technologies you can get thousands and even millions of simple gates and medium scale modules either as a chip or on a system chip with a processor, memory and other standard interface logic.
You should be able to build and test any of the simple logic functions in any gate technology. You can easily show that any more complex logic function can be made from a sufficient set of one and two input gates. NAND or NOR on their own are sufficient as are either AND or OR paired with NOT. Different gate implementation may present opportunities to optimize that are somewwhat specific to the implementation, but in general you can compute and N bit to M bit logic function with any sufficient set of logic gates and there are straightforward ways to optimize for space and speed. Because of this much of this work is automated in the form of hardware compilers that can compile hardware either directly onto a custom chip or into "standard" FPGA chips and systems.
We know about six of them, two zero degenerates, two one degenerates times two because there are two inputs. Then we have the six symetric operations, AND, OR and XOR and their negations, NAND, NOR and XNOR (which you could call EQUAL, since it is true when the two inputs are the same. If you invert one of the inputs, there are for more, two ANDs and two ORs. XOR is missing, why? That's all 16. Note that the two forms on many of the lines are application of De Morgan's law.
There is a lot more, of course. Next you'll need to lean about flip-flops, simple circuits that hold state, and that will lead to clocks, timing and synchronous systems. At the edges, they blend into analog functions that implement an interface between Synchonous Logic and a storage or transmission medium.
If there is something not already covered, do what I do, Google it. Propose your own excercises and real problems, document your work, identify and collect resources and write lessons and knowledge dumps. Use this site as a home to build a shared knowledge base.
In the begining, logic and control were pretty much completely distinct from data. In [tabulating machines, the data was on cards, and the programming was done by jumpers stored as jumper frames. Then when electronic computers were being developed and data stored electronically, it was realized that the logic and controls could be represented as data and stored in memories right alongside the data. Architecturally it might be worthwhile to treat each store differently but in principle they can be in one unified memory store. Regardless, code is data, we load binary machine codes from data store to ready them to run, so at some point the system is told to tread this data as code and it can execute on the hardware. Along the way many architectural variations have been tried, but architects have come to rely on the principle that the simpler and more consistent your architecture the better. The most popular microprocessors have had word sizes of (power of 2) multiples of 8 bits. Generation zero had 8 bit registers and 16 bit addresses, and sometimes had extended 16 bit operations. The emergence of the 32 bit architecture was a bit of a watershed. A 32 bit byte address can access 4Gbytes directly, which is more than big enough for most problems. Currently, the state of the art is 64 bit words and addresses which covers more memory that we can build for the forseable future. You could put a 64 bit processor on an SoC, but mostly it would be overkill. Better to leave all that chip real estate for other useful functions.
A Word about Virtualization
In NAND to Tetris, a whole layer of their abstraction called "Virtual Machine" is the target of high level language. While the concept of the virtual machine is actually key to how we can change a lot of the hardware details and still run pretty much the same software, the VM in their architecture stack is really a specific concept in high level language design that first became widespread with the Java language. The tools for that course are actually implemented in Java, so it is a natural fit, but I want to emphasize that the concept of virtualization is much more general.
The original virtual machines are the mathematecal abstractions created by Church and Turing to model computation mathematically. It is worth studying their work in detail but even more important are the general results about computation that these two model are first equivalent, they express the same idea of computation and that there are hard theoretical limits to computability that are very much parallel to Gödel's work on the completeness of mathematics. The Church model is interesting in how it doesn't really even reference hardware expression while the Turing model is practically a physical model of computation. The implecation is that the Church model doesn't fully express the costs of computation. In the Turing model you can ask how much tape a run uses or how many steps does it run, how many states in the controller, and so on. You can only formulate these questions about Church computations after considering implementations in some depth. And yet they are shown to be equivalent.
When we move from these abstract models to real CPU implementations, the first thing to notice is that the Turing machine has a potentially infinite tape, but as a practical matter we implement CPUs almost universally with a fixed width, mostly 8, 16, 32 or 64 bits now that all the oddball architectures are long gone. That's ok because we can do it all in software. If you really need infinite precision arithmatic and to potentially address any amount of memory, we can do it in software. This hardware/software tradeoff is important in the NAND to Tetris presentation, and generally throughout the development of leading edge technologies. This tradeoff is central to the RISC/CISC debate mentioned below.
Wherever possible this material should refer to current and emerging best practices, and to complete this discussion of the VM layer there are at least two that are very important. Java and their JVM (Java Virtual Machine) has been around for a while and there are many tools and toolchains to develop and deploy Java based systems. In the Java model, compilers and tools are supposed to compile to run on the VM directly but in practice this isn't always the case. If the code can be translated once from the JVM bytecodes to one or more machine languages that it will run faster as native machine code. This is just a case of moving as much work as you can to before you load and run a program, you can interpret the JVM codes "on the fly" but you optimise the repetetive work of reading and decoding bytecodes and looking up the code fragments needed to evaluate them with a caching scheme. Or do all this even earlier by compiling translations of small blocks of JVM to native code.
The other important technology and one that we will focus a lot of attention on when we get to that level of this abstractions is LLVM or Low Level Virtual Machine. The idea is that much of the work to optimize code as well as link editing and in general how the language tool chain is architected is independent of both the high level language being compiled and the low level machine codes being run on the machine. Part of the LLVM toolset are C++ libraries to generate LLVM code, the upshot of this is creating a creating a toy compiler. As and excercise you might try to implement the toy language from NAND to Tetris, Jack, in LLVM by building from this example. Also expect golang, built on LLVM, to be an important language for our collaborative work. What makes LLVM so promissing is who it is designed to interface with code generators, not so much to be interpreted directly. That means it doesn't have to have low level byte codes, and the external representations are for persisting tree structures of internal representations shared by the toolchain. These are designed to efficiently store and reload these internal structures which are passed between the toolchain libraries whenever they run.
In virtual memory, CPU architectures include special hardware to make the process efficient, but its purpose is to provide the software a simple and consistent model of main memory. In small architectures with 16 or fewer bits of address space you rarely need to have a program address more memory than the machine phyically has, but even some extended 16 bit architectures like the 8086 and 80286 might use more memory that physically available because of multi-tasking. Here each "process" gets its own memory space and segment registers are used to move around several 64K memory windows available to "user mode" processes. As you might imagine this process is a bit of a mess to manage and when 32 and then 64 bit architectures became available, all of these complex memory models went away.
Now in systems with virtual memory, and this includes modern smart phones too, there is a hardware and software supported virtual machine that provides 1) an ISA (instruction set architecture, a VM instruction set implemented in hardware) and 2) A memory model that supports large flat address spaces and leaves the details of the main memory (RAM) and secondary storage (Disk or Flash) to the system. The ISA is a subset of the CPU's instruction set that is available to user programs and can be targetted by high and low level languages. The instructions and features needed to implement other virtualizations (memory here) are priviledged, and only available to systems code invoked by hardware traps. One kind of trap is to call a system function, another is to service an interrupting I/O device and for virtual memory there are memory traps that are called in the middle of ISA instructions when the memory addresses is not present or written when marked read-only. Thus a new processor or one from a different vendor might need changes to system code because the non-virtualized system instructions and traps are handled differently, but user code runs unchanged.
The work virtual gets quite a workout in computer science because we are doing some much abstraction in so many layers where having somewhat standardized abstractions helps us to manage change. It allows us to work on compilers and optimizations far away from concerns about just where to put that hardware software split for each feature. The hardware architects can focus on the areas that get them the most boost from the software they are actually asked to run and not require that developers contort their processes to the low level hardware optimizations. Virtualization is good for seperation of concerns, whether we name it that or not.
Programs in Memory
Even in the abstract Turing model, there is a clear split between control (the state machine that reads and writes the tape) and the data (the tape), but that is a bit misleading. The reality is that all the interesting computation will have to put recursive elements into memory (write them to tape) because the finite states of the control logic would limit the depth of those recursive elements. In other words, if you implemented Church's Lambda Calculus, you would have to put all the Lambda code into memory (on the tape) and the control logic would implement the language specification which is finite.
While it may have been inevitable that we would soon discover that codes in the data are being "run" or interpreted by other software and hardware, but it still is an important insight to realize that the same digital storage elements that were being used for data could be interpreted by a control mechanism and thus machine language (binary code) was born.
Processors I have Known and Loved
Around the time I came on the scene, the 8-bit microprocessors ruled, the 6502 in the Apple II and others, several hobby machines with 8080 or Z80 processors, my first machine had a Z80 which was a superset of the 8080 architecture. There were many others, the 6800 familyy from Motorola had some following, but Motorola only had a real hit with the 68000 family, the later entries being the first true 32 bit machines.
In the meantime, Intel had a big win with their 8088 chip being selected by IBM for their PC. I started at Victor Business Products at that time where we had an 8088 based machine, the Victor 9000. This ended up being the begining of the PC clone era where Intel based PCs with some derivitive of the IBM PC I/O bus as it morphed over the years. With the 32bit 80386 acheiving an important milestone enabling commodity PC hardware to run advanced multitasking operating systems. In the 1980s an 90s there were a number of competing 32 and 64 bit processors, many of these RISC processors, but in the end Intel's superior market power and deep pockets was able to consolidate the market for high-end microprocessors.
RISC requires a little more explaination, Reduced Instruction Set CPU vs. CISC (Complex ...) was an idea in processor architecture that you should design a simpler instruction set and invest the hardware resources in other accelleration techniques like pipelining and caching, etc. The argument was that the software, in this case mostly in the compilers where machine level instructions are emitted, can erase any benefit from having complex instructions and addressing modes at the machine instruction level. In the competition of the marketplace this idea pretty much proved out and gave these processors a slight competitive edge, but that edge was overwhelmed by other technological areas where Intel had an edge. They could acheived similar accelleration with a few more gates and they would have those extra gates because their process technology was a generation ahead of much of the competition.
So these days we will see mainly two processor architectures, Intel and ISA compatable AMD processors and both of these companies a many processor models, some for low power and laptops others with as many cores as they can fit with current technology. AMD has recently started putting two different kinds of core on some chips. I almost forgot to mention video accellerators, which are typically some sort of DSP (Digital Signal Processor). Several video card manufacturers are still competing with their own processors but generally these processors are streamlined for repeated computation often with a lot of floating point math work so they give them vector processing features and multiple floating point units. AMD can put video accellrators allongside their standard cores for greater integration at the systems level.
The other important archtecture is ARM, which is on pretty much all smartphones and two project systems we will be interested in. The BeagleBone and Raspberry PI both use ARM processors. The are complete computer systems on small boards with extensive expansion capabilities. You can easily prototype custom hardware to interface whatever you might imagine and to program it from these small but powerful computer systems. For learning projects we can also find products that are more or less a complete Android phone, but just the board with extendable connections for customizations. Therefore we can entertain projects that extend a custom smartphone.
The final project I want to mention is the Arduino which is based on an 8-bit somewhat RISC processor. Because of this project and they way the tools have been shared, there is a considerable open tool set to target this processor. Therefore Arduiono and Aduiono Plus (meaning AVR based but different, bigger, etc.)
The components of a system are arranged in similar patterns regardless of the scale and technology. An IBM system 360 was actually a network of systems components. Each component could have it's own central processor and network connections to the main processing complex. Each component would occupy several equipment racks, and board level components did simple logic and interface functions. In the SoC systems we prototype with, each of those racks in compressed onto sub-chip level modules of standard chips often deployed in cell phones and other consumer products. Same thing, different scale, different technology at all levels.
Maybe the mainframe central processor is our desktop, laptop or a server in the cloud, and we can think of our SoC projects as more like an I/O processor. In the mainframe, that might connect the main processor and a network of terminals or to tape and disk devices. Modern disk drives probably have a small processor in them on a purpose build SoC that includes all the drive and interface electronics as well. The network for a disk drive could be SATA or USB.
Our SoC has USB or could have SATA if desired, so we can plug a drive in and access it. Now our little SoC is the main processor. The ARM processor that is on the SoC is the same as is on most smart phones. Plug in a display and keyboard and as a system it has the same architecture as any desktop or laptop. Plug in a touch sensitive display and wire it up in a wearable that holds the processor and batteries comfortably and has a display integrated into the sleave. If you want to experiment and get fancy, add gesture sensors and more.
Ok, so what are the main parts of a system? You may have guessed, that it has a central processing unit and some networkable components. It will have a memory hierarchy because the CPU is only the processing part of the stored program computer. It needs to store the programs and data somewhere. Because processors use more or less the same technology as at least part of the Memory Hierarchy as the processors get faster, the memory gets faster too. Mostly, the memories are also typically growing in size which effects the latency a bit. Also, at some point it goes to a mechanical storage device where the random access time doesn't get much better and the transfer speeds grow more slowly than density as well.
The rest can be considered I/O. As we noted, storage is external at least over
So far we have covered mostly hardware. Certainly virtualization and systems architecture involves a lot of software, programs to run on general purpose hardware, but the languages are all machine word formats for low level instructions. You could program that way, but I haven't seen it since the hobby days in the '70s when we didn't have compilers and assemblers and the machines were too small to run them anyway. You certainly can't write very big programs this way. At the lowest level there is assembly language that maps more or less one to one to machine instructions, but has mnemonic keywords and symbolic names for program and data addresses to perform some of the low level chores. At this low level we see another tool to handle low level accounting, the link editor. This program takes enhanced machine code files, object files produced by the assembler and compilers which come up next. The link editor finalizes the locations of code and data and updates the internal links to point to the finalized location values. Many modern operating systems (OS) defer this linking until load time. This is called dynamic linking, and it allows shared library code to be stored in object form in files called dynamic libraries. This saves a lot of space because the library code doesn't have to be copied into each executable making them smaller. On systems that can use virtual memory features to share the code in memory, you also save a lot of runtime memory because a library used by many programs only needs to be in memory once.
That was just a little preview of a more complete toolchain for going from source code to running programs. Languages come in two basic types, compiled and interpreted. Actually it is a bit of a continuum, on one end with completely compiled code that is pretty mush like machine or assembler code, you load and go (maybe after linking), and on the other end where a program reads each byte or line of a program is parsed and executed. Many utility programs use the same basic "read-eval-print" loop even when the language isn't general purpose. Let's skip the digression where all programs are interpreters for some domain specific language and stick to complete languages. Formally, you might want to specify Turing machine equivalence, but that isn't probably necessary. Languages vary quite a bit on their expressive power even though they all share a lot of structure and concepts as well.
Among interpreted languages, the grandaddy is probably BASIC althogh LISP is older. BASIC isn't too important today, but new variants of LISP show up every day, for example Cloture. LISP has its enthusiasts and some of the new variants are probably great languages, but I don't know them well enough to say more. There is another group of languages that have come around more recently, I would like Perl, Python and Ruby, plus maybe PHP, but it is more a variant of Perl. Of these, only Ruby has a good enough type system with classes and some decent forms of inheritance.
Develop a pattern to crowd fund class projects. Funders buy shares in the prototypes developed for the class. Create markets where funders bid for production from small prototype batches and artifacts of the creative process as they become available. This can supliment a more typical crowd funding where the investment buys a commitment to an end product (say a copy of a music production). Maybe the invention team commits to some number of prototypes to be allocated to investors.
This is a Commons for Production for turning DIY projects into production services. The basic idea is to provide the resources for experimentation and prototype work using products and components that are readily available. Many of the commons based resources will be crowdsourced knowledge about the best resources available whether that is knowledge of how to do things or the best suppliers of component technologies (e.g. low cost, high quality solar cells, solar heat collector tubes, batteries, LEDs, etc.).
What will make the difference between good systems and great systems is how much we are able to enhance resilience in the systems we design and build. We hear a lot of talk about smart grids and smart homes, but we don't think about the best ways to design and build them so they are not vulnerable to hacks and abuse. While these resources will be a lot about systems and components that are not that complex or mysterious, the program of bits and things and projects and resources related to digital control systems will be just as important as the other technologies.
The best way to handle information about marketplaces and suppliers may be to create a colony on WikiRate of members from our commons who focus on particular products and markets of interest to them. Our more critical resources will need us to develop our own systems that are likely to be Decko based like WikiRate. We will want to develop the code and support systems to defend the infrastructure we design and build from malware and malicious hacking. Of course this will just be a component of the overall systems functions. The commons will fund and develop advanced control systems that include both open hardware and software. This is not to ignore the mechanical and electrical system designs that we accumulate in our commons.
As a general contractor I want a networked resource who know how to build, install and maintain the systems and services of the 21st century and beyond. In my father's generation you could see how almost all of the technology worked with the naked eye, and you could take it apart and put it back together better with simple tools and ones you could invent yourself. We want a commons to hold the knowledge base, to develop new shared knowledge and systems designs and to create and maintain marketplaces for the products and services related to all this knowledge. Who will service your electric car? The dealer, sure, but the success of products like the automobile depends on the aftermarkets of products and services around them. I'd say it is a clear indication of success and maturity that they develop markets. This commons is all about catalyzing this kind of market and taking advantable of the passion of fanatic user communities.
As someone who enjoys tinkering and customizing my things and spaces, I want to be able to get professional knowledge and advice when I'm building and tinkering or to be able to call in some pros for bigger projects. As an expert in systems and ecological design, I want to contribute to the commons and get a bit of income from that work. I know it is speculative so maybe this is all styled as gift economy work and income could come from gifts and bounties that acknowledge my contributions.