Basics

Basics

Overview

A complete instructional manual on the Scheme programming language is beyond the scope of this document. That said, here are a few basic concepts that the reader is assumed to understand by the tutorial text presented here:

History

The Lisp family of languages is one of the oldest “high level” programming languages to be invented. Its syntax and semantics are inspired by Church’s Lambda Calculus, making it a direct embodiment of the formal language that effectively gave rise to the concept of a “programming language” in the first place.

For example, the following expression of the Lambda Calculus

$$ \lambda x \; . + x \; 1 $$

is represented as the following lambda expression in pretty much any dialect of Lisp:

(lambda (x) (+ x 1))

Note: Expressions of the Lambda Calculus are shown with prefix (a.k.a. “Polish”) notation, as Church himself preferred for various practical and theoretical reasons. I.e. the fact that Lisp has traditionally used prefix notation is more than a mere convenience given its list-oriented syntax and semantics but, rather, one physical manifestation of a principle that is at the core of its design.

Ironically, the “excessive” use of parentheses for which Lisp is often criticized actually could be entirely avoided through more rigorous and consistent use of prefix notation. The elimination of the need for most parenthesies was one of the things Church was fond of pointing out when explaining why he preferred it over infix notation.

In both cases, the value of the corresponding expression is a function of one argument, x, which returns the result of adding 1 to x.

Note: The preceding sentence is literally true of Scheme, but is a bit misleading if applied directly to some other Lisp dialects. Common Lisp, for example, does not treat lambda forms as ordinary expressions that evaluate to procedure objects. The equivalent expression in Common Lisp is:

(function (lambda (x) (+ x 1)))

with the alternative syntax

#'lambda (x) (+ x 1))

The ppreceding can be abbreviated in Common Lisp source code to:

(lambda (x) (+ x 1))

only because of a built-in macro that expands to a use of the function special form.

Compared to Scheme, Common Lisp has other “oddities” that derive from this same distinction between the “variable value” and the “function value” of an expression. Which is “better” is somewhat a matter of taste, but the Scheme approach generally results in more stream-lined syntax and simpler logic, more in keeping with the original spirt of the earliest dialects of Lisp and with the Lambda Calculus, itself. For example, there is no need in Scheme for anything like Common Lisp’s funcall form.

Lisp and the Lambda Calculus also share the feature that the variables bound by λ expressions can refer either to data or to functions that operate on data. In other words, the following expression of the Lambda Calculus

$$ (\lambda f y . f \; y) (\lambda x . + x \; 1) 2 $$

depicts functional composition and evaluates to the number 3. Exactly the same is true of the following Scheme expression:

((lambda (f y) (f y))
    (lambda (x) (+ x 1))
    2)

In both cases, the outer λ expression evaluates to a function that takes two arguments, f and y. The value of f is, itself, a function and the value of y is the data to which f is to be applied. This outer λ expression is passed our previous \ref lambda-increment “add 1” example as the value to bind to f and the number 2 as the value to bind to y. When the outer λ function applies the inner one to 2, the result of the whole expression is the number 3.

Given its purpose, the Lambda Calculus does not include any mechanism for assigning “global” names to functions or data. I.e. the only mechanism for giving a name to a function or data value is to bind it to a variable by applying it in a λ invocation. Mathematicians have traditionally used words like “let” and “where” when they wish to give a name to a function or constant, for example:

$$ \text{Let } n = G(F, 2) \\ \text{where } G = \lambda f y.f y \\ \text{and } F = \lambda x . + x \; 1 $$

This practice gave rise to the inclusion of “special forms” in Lisp like let:

(let ((g (lambda (f y) (f y)))
      (f (lambda (x) (+ x 1))))
  (g f 2))

as well as for defining “global” variables:

(define G (lambda (f y) (f y)))

(define F (lambda (x) (+ x 1)))

(define n (G F 2))

In short, if you wish to understand how the Lisp family of languages acquired some of its “quirks,” you need look no further than its deliberate association with the venerable Lambda Calculus.

Practical Data Types

As described above, the Lisp family of languages was inspired by Chuch’s Lambda Calculus. The latter was invented for the purpose of proving certain theorems in a branch of mathematics known as Computability Theory. As such, it really only supports two “data types:” real numbers and functions whose arguments are real numbers or functions.

In order to be used for practical purposes, any programming language must support far more data types than just numbers and functions. Lisp-like languages support the usual array of “primitive” data types familiar to any programmer using any language: integers, floats, booleans, characters, strings etc. They also define a few idiosyncratic primitive data types including symbols and lists. Lisp lists are constructed from another unique data type, cons cells. Scheme adds another “special” data type of its own, keywords.

Symbols

Symbols are simply “atomic” values that are represented as a sequence of characters and glyphs that conform to the language syntax rules. Symbols have no internal structure or pre-defined meaning. They can be used as the names of variables and they can also be used as data values in their own right. Some symbols are effectively reserved for special purposes, like lambda.

The quote special form (which is almost always abbreviated in source code to a single apostrophe, as in 'my-symbol) can be used to tell Scheme to use a symbol or other data structure as a literal data value rather than evaluating it as an expression (the name of a variable, in the case of a single symbol).

Symbols can be tested for equality when used as data values, but not much more. They are different from strings in that the latter function like arrays of characters (e.g. there are built-in functions to extract subsets of the characters from a given string, construct a new string by concatenating existing ones and so on) while symbols are “atomic” values.

Cons Cells

From time immemorial, the basic unit of dynamic memory allocation in Lisp-like languages has been known as a cons cell from the name of the built-in function used to allocate them, cons (short for construct). A cons cell is simply a container for a pair of references to data. The two data elements referenced by a cons cell are accessed using functions which, for historical reasons having to do with the assembly language of the machine on which the earliest versions of Lisp were implemented, are named car and cdr (where cdr is pronounced as “coulder” – which is a statement that only makes sense to speakers of English).

Lists

Lists are created from cons cells using a simple convention: 'nil, \c #nil or the “empty list” (depending on the dialect) is a list, as is the result of applying cons where the second parameter is a list. As with symbols, the quote special form can be used to suppress evaluation of lists. This allows lists of symbols and other types to be used as data rather than evaluated as function invocations in source code.

Keywords

Keywords are a bit like symbols that start with ‘#', and a bit like numbers or strings in that they are “self-quoting” literal constants. Scheme uses a few pre-defined keywords as special constants:

Keyword Description
#t Boolean true
#f Boolean false
#nil The empty list (in some versions of Scheme)

The “official” syntax for the empty list in Scheme is '(). In some implementations, the keyword #nil is a synonym for '(). This is different from Common Lisp and many other dialects that use the symbol nil to mean both the empty list and boolean false. Scheme falls short of having a true boolean data type in that (like most Lisp dialects) any value other than the one which means “false” (the keyword #f, in Scheme’s case) is considered “true” for purposes of the test expressions in forms like if, when, while and so on.

For example, the following returns the number 2:

(car (cdr (cons 'a (cons 2 '()))))

which could be written more idiomatically (as well as much more succinctly) as:

(cadr '(a 2))

In both cases the Scheme run time system will be invoked to construct a list of two elements, the first being the symbol a and the second being the number 2. A combination of a call to cdr followed by a call to car returns the second element of that list. A call to car by itself would have returned the first element in the list, the symbol a. Two calls to cdr in succession would have returned the empty list which is the same as the keyword \c #nil in some versions of Scheme. (Many other dialects of Lisp use the symbol nil to represent the empty list).

Historically, most programmer-defined data structures were created using combinations of “scalar” values constructed from lists. For example, the classic alist (for “associative list”) data structure is simply a list, each of whose elements is itself a list of two elements. The first (car) of each entry in such an “alist” is considered to be a key and the second (cadr) of each “alist” entry is that key’s value. This was the “traditional” way to implement a dictionary data structure in Lisp-like languages.

Scheme provides a wealth of additional “primitive” data types, some of which are the subjects of various other tutorial sections in this document.