If you are not able to explain it with words, you may have to add pictures. And if you still can’t manage it with pictures, you could always make a video.

About this book

In the year 1970, Prof. Niklaus Wirth invented the Pascal programming language as a way to teach his students the fundamentals of computer programming. Although the initial core Pascal language was designed for teaching purposes only, it was soon expanded by commercial vendors and gained some popularity. Later, Wirth presented the language Modula-2 with improved syntax and the module concept for larger projects, and the Oberon language family with additional support for Object-Oriented Programming.

The Nim programming language can be seen in this tradition, as it is basically an easy language suited for beginners with no prior programming experience, but at the same time is not restricted in any way. Nim offers all the concepts of modern and powerful programming languages, combined with high performance and a certain level of universality. Nim can be used to create programs for tiny microcontrollers, large desktop apps, and web applications. Most books about programming languages focus on the language itself, often assuming that the reader is already familiar with the foundations of computer hardware and has some programming experience. This is generally a valid approach, as most people are taught this fundamental knowledge, sometimes referred to as Computer Science (CS), in school today. However, there are people who, for various reasons, may have missed this introduction in school and later decide that they need some programming skills, perhaps for a technical job. Moreover, some children may not be satisfied with the introduction to computer science taught at school. Therefore, we decided to start this book with a short introduction to fundamental concepts. Most people may skip that part, but you should be really sure that you know these foundations. This book is divided into seven parts — part VII is the Appendix. It is possible to read the parts independently of each other in any order, but for Nim beginners, it is recommended to read them mostly in ascending order, perhaps while previewing some interesting sections in the second half of the book early on. In Part II, we explain the basics of computer programming step by step in a way that should enable even those with no prior experience to learn independently. In this part, we might repeat some of the material that we already mentioned in Part I. We do that intentionally, as some people might skip Part I, and because it is generally beneficial to reinforce the reader’s learning process through repetition. Part III will give you an overview of Nim’s standard library, which contains many useful functions and data types that we can use in our programs to solve common tasks like input and output operations, using the file system, or sorting data. In Part IV, we will apply what we have learned by solving some common programming tasks, like sorting, searching, or converting numbers from the internal computer format to displayable text. Part V will introduce some useful external packages that can be easily installed using one of Nim’s package managers. Nim already has a few thousand external packages — some of them may support or replace the standard library, and others offer special or advanced functionalities. Part VI of the book will finally introduce advanced concepts like asynchronous operations, threading and parallel processing, macros and meta-programming, and, last but not least, Nim’s concept implementation. Some sections, that do not integrate well into the other six parts, or that are boring or useful only for a minority of Nim users, have been moved to the Appendix and may not be part of a printed copy of the book. This currently includes a short introduction to Nim’s standard package manager: Nimble.

This book is essentially a traditional textbook — simple yet detailed. It is designed such that individuals aged 14 and above can read and understand it independently, with little or no help from adults. Unfortunately, the English language may still be a challenge for many kids not born in a country with a strong English language tradition. Fortunately, automatic translations are already supported for some languages, and we might be able to offer translated editions of the book later, possibly in Chinese and German.

In the last few decades in the area of computer programming, traditional textbooks have been partly replaced by videos, "Crash course" books, and "Learning by doing" books. Indeed, a good video may help you start with a new language, and it can enable people who have difficulties reading printed texts or concentrating on a topic for a few minutes to learn a programming language. Unfortunately, the quality of most videos is very bad; some are made by kids just having learned the first steps of computer programming themselves. Furthermore, watching videos does not necessarily improve the reading and concentration issues that people might have. "Crash course" and "Learning by doing" books may give you a good start, but for that, we already have a lot of textual tutorials. The concern with these types of books is that, while they may help you solve common tasks, they don’t necessarily foster a deeper understanding. Generally, the idea of a "Crash course" or "Learning by doing" is not bad. However, in computer science, starting with a larger example application can be overwhelming, as you have to learn a lot of things simultaneously. It may work for you, but there is the danger that you forget all the details very quickly again. Moreover, these types of books are not very helpful when you need to look something up. The other concern with "Learning by doing" in computer science is that learning materials may have only examples in which you may not be really interested: Of course, we can create a simple chat application, a simple Twitter clone, and do some basic web scraping using async/await. Or create a basic game or a simple GUI with one of the dozen available toolkits. But what if you are not interested in chatting and twittering, and that single selected toolkit? We believe that in such cases, reading the detailed examples can be very frustrating. Therefore, we recommend that after reading the first tutorial, and perhaps a few pages of this book, you start coding with topics you are interested in. Perhaps you could do it together with some friends? Whenever you need concrete help, you can find it on the Internet, using search engines, Wikipedia, or a discussion platform of your choice. And if you really have no idea what project to start with, then computer programming might not be the right profession for you.

Although Nim has a JavaScript backend and thus well supports web-related development, this book focuses on native code generation using the C and C++ backends. We will discuss some peculiarities of the JavaScript backend in the second half of the book, and we may provide some examples of the use of the JavaScript backend in the Appendix. If you are strongly interested in web development and the JavaScript backend, then you may also consult the book Nim in Action by Dominik Picheta, which gives some detailed examples for the development of web-based software using the Nim programming language, including a simple chat application and the foundation of a microblogging and social networking service. You may also consult the tutorials and manuals of Nim web packages like Karax, Jester, or Basolato.

This book will not attempt to explain things that are already well-explained elsewhere, or that should have been well-explained elsewhere — at least not in this first edition, where we have many other essential topics to cover. So, for now, we will leave out the following: the installation of the compiler, the process of installing and using text editors or IDEs with special Nim support, the use of Nim package managers such as Nimble and Nimph, the use of the foreign function interface (FFI) to create bindings to C libraries, and internal compiler details like the various memory management options and all the pragmas.[1] Also, we do not intend to fill the book with redundant information, such as tables listing all the Nim keywords or Nim’s primitive data types, as you can easily find all of that in the Nim language manual.

While the creation of graphical user interfaces (GUIs) is an important topic, we cannot provide many details for various reasons. Nim does not have a singularly accepted GUI library, but there are more than 20 attempts — from pure Nim ones like NimX or Fidget, to wrapped libraries like GTK or QML, to GUIs that try to provide a native look for various operating systems like XWidgets or NiGui, and even web-based GUIs. And for each of these, at least for the more serious ones, we could write a separate GUI book. Therefore, we will only provide a few minimal examples for some of them in Parts IV or V of the book.

Furthermore, we will not delve into game programming, as it is a broad area with numerous existing tutorials.

Maybe in later editions of the book, we will add some more topics, e.g. game programming, as so many people like it. However, we will always have to ensure that a potential printed version of the book does not exceed 500 pages, which may require us to exclude some content in the printed version.

Generally, when learning a new programming language, people start with some short tutorials before delving deeper into the language by following a book. This approach is indeed a good start. So we recommend that you read the short official tutorials, parts 1 and 2, and perhaps also some other tutorials freely available online. Tutorials typically only scratch the surface of the topics, so you may not fully understand them all, but this approach gives you a feel for the language. There also exist some video tutorials, in case you have problems reading. However, if that’s the case, this book might not be of much use to you. If you already have a background in computer science and experience with other languages such as C++, Haskell, or Rust, the tutorials and the Nim language manual might be fully sufficient for you; thus, you may not need this book at all. Or you may prefer the recently published book of Mr. Rumpf, called "Mastering Nim: A complete guide to the programming language" available at Amazon.com.

This book is based on the Nim reference implementation by Mr. A. Rumpf. Most explanations and examples should also be valid for other implementations, like the one at https://github.com/nim-works/nimskull.

Although the initial pages of this book were written in the spring of 2020, the book should be mostly up-to-date with Nim versions 1.6 and 2.0.

Nim version 1.6.14 was released on 27 June 2023 and includes many bug fixes for the 1.0 branch. Nim 2.0, initially announced for early 2023, was finally released on 01 August 2023.

The v2.0 release brings many improvements but does not include any serious breaking changes that would invalidate old code. Only a few minor modifications might be necessary for old code to compile and run again. In this book, we may use and discuss a few Nim 2.0 features, but most code should be compatible with the 1.x series of the compiler or with nimskull (cyo) with no or only minor changes.

The most significant change in Nim 2.0 is that ORC memory management has become the default indicating that it is considered ready for use in production. ORC gives us GC-like, fully deterministic memory management with minimal overhead compared to manual memory handling. It reduces the maximal memory consumption of apps, avoids GC-generated delays, and may increase the performance of our programs. Additionally, ARC and ORC memory management should bring serious advantages for the creation and performance of threaded and parallel code. We have summarized the most important new features of Nim 2.0 in the appendix titled Changes for Nim 2.0.

Note that incremental compilation (IC) or CPS task scheduling (Continuation-passing style) is still in development and not yet fully supported by Nim v2.0. And for parallel and threaded code execution, it may be useful to consider high-quality external libraries rather than those in Nim’s standard library.[2]. This may also apply to modules for asynchronous code execution and a few other libraries.[3]

For all the details please refer to the corresponding section in the Appendix: Disclaimer and legal notice

Electronic versions of the book in HTML and PDF formats are available at https://nimprogrammingbook.com.

For a short overview of the Nim programming language, you may also consult the website at https://nimprogramming.com/.

For the latest news about the Nim language, installation instructions, and much more useful information, visit the official homepage at https://nim-lang.org/

An alternative Nim implementation, which may later develop into another language, is available at https://github.com/nim-works/nimskull.

The source code for the book can be found at https://github.com/StefanSalewski/NimProgrammingBook. You can use the GitHub issue tracker to point out mistakes or unclear explanations, which we will strive to address.

About the author

Dr. S. Salewski studied Physics, Mathematics, and Computer Science at the University of Hamburg (Germany), where he earned his Ph.D. in 2005 in the field of laser physics. He has worked in the field of fiber laser physics, electronics, and software development, using languages such as Pascal, Modula-2, Oberon, C, Ruby, and Nim. Some of his software projects, including the Nim GTK GUI bindings and Nim implementations of an N-dimensional RTree and a fully dynamic, constrained Delaunay triangulation, are freely available as open-source projects at https://github.com/StefanSalewski.

ChatGPT and GPT-4

You might have already heard about ChatGPT, an AI (Artificial Intelligence) chatbot developed by OpenAI. ChatGPT was launched as a prototype on November 30, 2022, and it quickly garnered attention for its detailed responses and articulate answers across various domains of knowledge.[4] Although this book was written by a human, parts of it could have potentially been created by GPT, resulting in a more fluent tone and fewer spelling and grammar errors.[5] As such, this book, or at least parts of it, could now be considered obsolete. While ChatGPT still has some serious issues (it’s better not to ask it about the author of this book, the creator of the Nim language, or other Nim core developers), it can provide very valuable information on various topics. Despite Nim being relatively small compared to the current mainstream languages, we were really surprised by how much ChatGPT already knows about it. To utilize ChatGPT, registration with your real name and phone number is required. We think that’s acceptable to prevent misuse. The basic service is still free. A professional version, now also known as GPT-4, is available with a monthly payment, but the free basic version still functions. When you log in to the OpenAI homepage (https://chat.openai.com/auth/login?next=/chat), you can ask the bot questions, have it create a simple Nim program for you, fix errors in your program, or add comments. The bot might even be able to convert source code from other languages to Nim. Or it can rephrase text, so you could feed ChatGPT a section from this book, and then request a rewrite for improved grammar or even expanded content. Some possible tasks you can ask ChatGPT to do:

  • Do you know the Nim programming language?

  • What does the Nim statement 'echo 1 + 2' print to the screen?

  • And can you guess what the Nim statement 'echo 1, 2' would print?

  • Are you sure that the two arguments provided to the echo statement are separated with a space?

  • I just tested it. I run the Nim program 'echo 1, 2' called t.nim with the command 'nim r t.nim' and it prints "12", so there is no space involved!

  • Please create a Nim program, that prints the numbers one to 10 each on its own line.

  • I will insert a small Nim program with issues into the prompt. Please fix it.

  • [insert code here]

  • Can you please add some comments to the source code?

Of course, you should not really trust the advice of ChatGPT. Currently, as of early 2023, it thinks that the echo command automatically adds spaces between the printed values. Of course, that is wrong, and we can convince ChatGPT of that. However, ChatGPT currently cannot remember what it has learned in a user dialog, so it repeats the same errors next time. Note that ChatGPT was trained with data only until 2021, so it may not be aware of Nim 2.0 and other recent changes.

Below is a complete dialog with ChatGPT from early 2023:

I have one more question about the Nim programming language. The following code, which is supposed to convert a string to an integer, does not compile. Can you fix it?
`var s: string = "123"; var i: int = int(s)`

Yes, the code you provided is not valid Nim syntax. In order to convert a string to an integer in Nim, you can use the parseInt procedure from the strutils module.

Here is an example of how to use it:
import strutils

var s: string = "123"
var i: int = parseInt(s)
echo i # Output: 123
In this example, we import the strutils module, which provides the `parseInt` procedure for parsing integers from strings. Then, we define a string `s` with the value "123", and use `parseInt` to convert it to an integer, which we store in the variable `i`. Finally, we use `echo()` to print the value of `i`.

Since spring 2023, GPT-4 has been available to paying subscribers, offering more helpful information and less irrelevant content.

We recently used GPT-4 to create a short Nim info page: https://nimprogramming.com/

Part I: Introduction

Give me a video; I get a headache from reading.[6]

Initially, you do not need to know many details to use computers and write computer programs. It’s much like driving a car. Even though a car is a complex machine, children can generally manage to move it after a brief introduction. [7] Nevertheless, professional racing drivers typically require a much deeper understanding of the inner workings of all the technical components, along with extensive practice.

What is a computer?

A computer is primarily a device that runs computer programs by following instructions on how to manipulate data.

Nearly all computers currently in use — from tiny ones integrated into electronic gadgets, to well-known desktop computers (PCs), and large, powerful supercomputers filling entire rooms — work internally with digital data only.[8] Digital data essentially comprises integer (whole) numbers encoded in binary form, which are represented by sequences of the symbols 0 and 1. We will discuss the term digital in more detail in the next section.

The most important part of a digital computer is the CPU, the Central Processing Unit. This tiny device, built of digital electronic circuits, can perform very basic mathematical and logical operations on numbers, such as adding two numbers or determining whether one number is larger or smaller than another. Most computer CPUs can only store a limited number of values internally, which are lost when the power is switched off. Therefore, the CPU is typically electrically connected to a RAM module, a Random Access Memory, which can store many more numbers and allows fast access to these numbers, and to a hard disk or SSD device, which can permanently store the numbers but does not allow such fast access. The stored numbers are most often simply referred to as data; in essence, this data is nothing more than numbers, but it can be interpreted in various ways, such as pictures, sounds, and more.

The traditional hard disk drives (HDDs), which store data electromechanically on rotating magnetic disks, as well as the more modern variants, the solid-state devices (SSDs), which store data using modern semiconductor technologies, can store data persistently for longer time periods, even when no electric power supply is available. Both SSDs and HDDs can be optionally split into multiple partitions; for example, one or multiple OS partitions for executable programs or pure data partitions for passive data such as text files or pictures. Before use, each partition is generally formatted, at which point a file system (FS) is created. These two steps create an internal structure on the storage device, which allows us to store and retrieve individual data blocks like programs, text files, or pictures.

Nearly all of today’s desktop computers, as well as most notebooks and cellphones, contain not just a single CPU, but multiple CPUs, also known as cores. This enables them to run different programs in parallel or parts of a single program on different CPUs to increase performance and reduce total execution time. So-called supercomputers can contain thousands of CPUs. Besides CPUs, most computers also have at least one GPU, a Graphic Processing Unit, that can be used to display data on a screen or monitor, maybe for doing animations in games or for playing video. The distinction between CPUs and GPUs is not clear-cut. Usually, a CPU can also display data on screens and monitors, and GPUs can also perform some data processing tasks that CPUs can handle. However, GPUs are optimized for the data display task.

More visible to the average computer user are the peripheral devices such as a keyboard, mouse, screen, and perhaps a printer. These enable human interaction with the computer, but they are not core components — the computer can function effectively without them. In notebooks, laptop computers, or cell phones, the peripheral devices are closely integrated with the core components. All the physical parts of a computer are also called hardware, while the programs running on that hardware are called software.

A less visible, but equally important, class of computers consists of microcontrollers and so-called embedded devices. These are typically tiny units encased in black plastic with some electrical contacts. The devices can contain all necessary elements, i.e., the CPU, some RAM, and persistent storage that can store programs and data when no electric power supply is available. Although these devices may be limited in computing power and the amount of data they can store and process, they are incorporated in many consumer devices. They control your washing machine, refrigerator, television, radio, and others. Some devices in your home may even contain multiple microcontrollers and often the microcontrollers can already communicate with each other by RF (Radio-Frequency), or access the Internet by WLAN, which is sometimes called the Internet of Things (IoT).

Another class of large, very powerful digital computers — known as mainframe computers or supercomputers — is optimized to process large amounts of data very quickly. The key to their enormous computing power lies in many fast CPUs working in parallel; problems or tasks are split into many small parts that are solved by individual CPUs, and the final result is the combination of all these solved sub-tasks. However, it is not always possible to split large problems into smaller sub-tasks.

Digital computers usually operate based on a clock signal that pulses at a certain frequency; the number of clock pulses per second is called the clock rate. The CPU can perform simple operations, such as the addition of two integers, at each pulse of the clock signal. For more complicated operations, such as multiplication or division, it may need more clock pulses. Therefore, a rough measure of a computer’s performance is the clock rate divided by the number of pulses that the CPU needs to perform a basic operation, multiplied by the number of CPUs or cores that the computer can use.

A completely different type of computer is the quantum computer. This is a large, expensive high-tech device that uses the principles of quantum mechanics to execute many computations simultaneously. Only a few of them exist today, for research at universities and some large commercial institutes. Quantum computers may fundamentally change computing and our entire world someday, but they are not the topic of this book.

Analogue and digital

Whenever we measure a quantity using a base unit, thus providing a certain level of granularity, we operate within the digital realm. Our ordinary money is digital, as the cent is the smallest base unit; you will never pay a fraction of a cent for something. Time can be considered as a digital quantity as long as we accept the second as the smallest unit. Even on so-called analogue watches, the second hand generally moves forward in one-second increments, making it impossible to measure fractions of a second with such a watch.

An obvious analogue property is the thermodynamic temperature, and its classic measurement device is the well-known capillary thermometer, consisting of a glass capillary filled with alcohol or liquid mercury. When temperature increases, the liquid in a reservoir expands more than the surrounding glass and partly fills the capillary. That filling rate is an analogue measure of the temperature.

While the hourglass is considered digital (as you can count the tiny sand grains), the sundial is not.

Most quantities in the real world appear to be analogue, and digital quantities are often perceived as an arbitrary approximation. However, quantum mechanics has taught us that many quantities in our world do have a granularity. In physical terms, quantities such as energy or momentum are multiples of the tiny Planck constant. Or consider electric charge, which is always a multiple of the elementary charge unit of a single electron. Whenever electrical current flows through a conductor such as a wire, an ionized gas, or an electrolyte like saltwater, it does so in multiples of the elementary charge, not in fractions of it. And of course, light and electromagnetic radiation also have some form of granularity, which the photoelectric effect, as well as Compton scattering, proves.

An important and useful feature of digital signals and data is their direct correlation to integers (integral numbers).

The simplest form of digital data is binary data, which can only have two distinct values. When you use a mechanical switch to turn the light bulb in your house on or off, you change the binary state of the light. Your neighbor, when watching your house, receives binary signals.[9]

Digital computers generally use binary electric states internally — voltage or current `on` or `off`. Such an on/off state is called a bit. We will discuss more details about bits and binary logic later. One bit can obviously store only two states, which we may map to the numbers `0` and `1`. Larger integer numbers can be represented by a sequence of multiple bits.

The Morse code was an early application used to transmit messages encoded in binary form.

A crucial characteristic of digitally encoded data is its ability to be copied and transmitted without loss of precision. The reason for this is that digital numbers have a well-defined clean state, there is no noise overlaying the data that could accumulate when the data is copied multiple times. Well, this statement isn’t entirely accurate — under poor conditions, noise can become significant enough to alter the binary state of signals. Imagine trying to transfer some whole numbers encoded in binary form, perhaps by binary states represented as voltage levels `0 Volt` and `5 Volts`, over an electric wire across a long distance. Clearly, the long wire can act as an antenna and pick up electromagnetic noise, which could potentially shift the true `0` Volt data to a voltage closer to `5` Volts, leading to incorrect reception. To detect these types of transmission errors, checksums are added to the actual data. A checksum, derived from the original data using a special mathematical formula, is transferred with it. The receiver applies the same formula to the received data and compares the result with the received checksum. If they do not match, it is clear that the data transmission is corrupted, and a resend is requested.

The opposite of digital is generally called analogue, a term that is used for data that has or seems to have no granularity. For example, we speak of an analogue voltage when the voltage can assume any value in a given range, and when it does not "jump" but changes continuously.[10] To observe analogue voltages or currents, one can use a moving coil meter, a device in which the current flowing through a coil in a magnetic field causes the magnetic force to move the hand/pointer.

As mentioned in the previous section, nearly all of our current computers work exclusively with digital data. Essentially, this means they work internally with integer numbers, stored in sequences of binary bits. All input for computers must have the form of integer numbers and all output takes the form of integer numbers. Whenever we need to input analogue data into computers, such as analogue voltage, we must convert it into a digital approximation. For that task, special devices called analogue to digital converters (ADC) exist. And in some cases, we have to convert the digital output data of computers to analogue signals, like when a computer plays music: The digital data output from the computer is then converted by a device known as a digital to analogue converter (DAC) into an analogue voltage. This analogue voltage generates a current that flows through a coil in our speakers. This electric current in turn generates a magnetic field, which exerts mechanical forces that move the speaker’s membrane. The resulting oscillating movements produce variations in air pressure that our ears detect, and that we perceive as sound.

What is an operating system?

Most computers, from cellphones to large supercomputers, use an operating system (OS). A well-known OS is GNU/Linux. An operating system can be seen as the initial program that is loaded and started when we switch the computer on, functioning as a kind of supervisor:[11] it can load and execute other programs, distributing resources like CPU cores or RAM among multiple running programs. It also manages user input via the keyboard and mouse, displays output data on the screen in both textual and graphically forms, controls how data is stored in nonvolatile storage media like hard disks or SSDs, oversees all network traffic, among other tasks. An important role of the OS is enabling user programs to access all the various hardware components, regardless of vendor, in a uniform, high-level manner. An OS can be seen as an intermediary layer between user programs, such as a text processor or a game, and the computer’s hardware. The OS allows user programs to work on a higher level of abstraction, so they do not need to know much about the low-level hardware details.

An important feature of most modern operating systems is their ability to run multiple system and user programs concurrently or in parallel. Concurrent execution of programs means that the execution swiftly switches between all active programs. In this way, the user does not notice when programs pause for short time intervals. All programs appear to be running continuously, though not necessarily at full speed. True parallel execution of programs, meaning they can all run continuously at full speed, is only possible when the computer has multiple CPUs or a CPU with multiple physical cores.

Computer operating systems generally have a close relationship with software libraries. Libraries are software components that provide data types and functions through a well-defined interface, known as an Application Programming Interface (API), and exhibit specific behaviors. Libraries can either be part of the OS, or they can function largely independently of it.

Libraries can be utilized as shared libraries, which are single binary files stored on a computer’s file system — often with the `.so` or `.dll` file extension — and are accessible by different computer programs simultaneously. They can also be used as static libraries, which are an integral part of individual programs. Shared libraries have some advantages: we need only one instance of them on the file system of the computer, and the library is loaded only once into the computer memory (RAM), even when it is used by different apps simultaneously. This saves space, and when the library has serious errors, it is in principle possible to replace the library with a corrected version, which is then used by all the software on the computer. Shared libraries often come in numbered versions, where a higher number denotes a newer, improved, or extended library version. Sometimes, some of the programs we use may still need an older library version, while other software already needs a newer one. In that case, our file system has to provide multiple versions of a shared library, each of which can be used independently. On the other hand, statically linked libraries are directly glued with a single computer program. This simplifies the distribution of the program, as it can be shipped as a single entity without the need to ensure that all the necessary dynamic libraries are available on the destination computer. However, if a statically linked library has serious errors, then we have to replace all the programs that are linked statically with that corrupted library.

Small microcontrollers and embedded devices often do not require an operating system as they generally run only one single-user program and typically lack a wide variety of hardware components for support.

What is a user interface?

To interact with the OS and the application programs running on the computer, we need some form of user interface. Traditional user interfaces are text-centric and often provided directly by the OS as one single text screen filling the whole display: The user has to enter textual commands and the computer reacts with textual messages. For entering commands and data, a keyboard, whose layout was heavily inspired by the classical mechanical typewriter, is used. For about half a century now, graphical user interfaces (GUIs) have mostly replaced, or at least supplemented, textual user interfaces for desktop computers. Even cellphones and other electronic gadgets now use a form of GUI for user interaction. For large mainframe computers, the textual user interface is still common. Graphical user interfaces display sets of icons or widgets to the user. These are often arranged within rectangular graphical boxes, known as windows. These windows can be moved around, resized, and partially or fully overlapped with other windows. A special type of window, known as a terminal, shell, or console window, behaves like the traditional full-screen textual user interfaces. Graphical user interfaces allow users to interact with the computer through simple actions like clicking on buttons or using drag or swipe gestures, performed directly on a touch-sensitive display or with a device called a mouse, which mirrors its mechanical movement on the table to a graphical cursor on the computer display, and provides a set of pushbuttons that are used to initiate a click action when the mouse pointer hovers over an icon or widget. The main advantage of graphical user interfaces is that the user does not have to remember and type in long command sequences. A set of on-screen buttons labeled with single letters can simulate a traditional keyboard, but a physical keyboard is still used when the input of longer textual data is required. Graphical user interfaces are sometimes enhanced by speech recognition systems, which allow users to enter commands or textual messages vocally. Graphical user interfaces may appear to be strongly coupled with the OS, but they are still system programs executed by the OS. For the Microsoft Windows OS and the macOS, this distinction is not very obvious, as the same GUI is running permanently. For other operating systems, like Linux, the distinction is more apparent. Linux systems are sometimes used without a GUI, and various GUI toolkits, such as Gnome, KDE, and many others, are available.

What is computer programming?

Computer programming involves the creation, testing, and optimization of computer programs.

What is a computer program?

A computer program is essentially a sequence of numbers that are meaningful to a computer CPU. The CPU recognizes these numbers as instructions or numeric machine code, such as the instruction to add two numbers. The first computers, built in the 1950s, were programmed by feeding sequences of plain numbers to the device. The numbers were stored on what were known as punch cards. These were made of strong paper and the numbers were encoded through holes in the cards. The holes could be recognized by electrical contacts to feed the numbers into the CPU. Since plain numbers do not align well with human cognition, more abstract codes were soon developed and used. A very direct code that matches numerical instructions to symbols is known as the assembly language. In that language, for example, the character sequence "add A0, $8" may map directly to a sequence of numbers which instructs the CPU to add the constant integer number 8 to CPU register A0, where A0 is a storage area in the CPU where numbers can be stored. As many different types of CPUs exist, each with their own instruction sets, there are also many different assembly instruction sets. These have similar, but not identical instructions. The rules that describe how these basic instructions have to look are called the syntax of the assembly language.

Numerical machine code, and its equivalent assembly language, form the most basic instruction set for a CPU. Each command that a CPU can execute corresponds to a well-defined assembly instruction. Thus, any operation that a computer can potentially execute can be represented as a series of assembly instructions. However, complicated tasks may require millions of assembly instructions, which would take humans a significant amount of time to write, modify, proofread, and debug.[12]

A few years after the invention of the first computers, the need for more abstract instruction sets was recognized. These would include features such as repeated execution, composed conditionals, and the ability to use data types beyond plain numbers as operands. As a result, higher-level programming languages such as Algol, Fortran, C, Pascal, and Basic were created.

What is an algorithm?

An algorithm is a detailed sequence of instructions, often abstract, designed to solve a specific task or to reach a goal.

Recipes from cookbooks and car repair instructions are examples of algorithms. The basic math operations children learn in school, such as adding, multiplying, or dividing two numbers with a paper and pencil, are also examples of algorithms. Even starting a car follows an algorithm. For instance, if the temperature is below freezing and your vehicle is covered in snow, your first step would be to clean the windows and lights. Similarly, if you’re driving again after a long break, you would have to check the tires before you start the engine. You can execute an algorithm by strictly following its instructions, without necessarily understanding its underlying principles.

So an algorithm is a perfect fit for a computer, as computers are excellent at following instructions without really understanding what they are trying to accomplish.

An algorithm for calculating the sum of the first 100 natural numbers might look like this:

use two integer variables called i and sum
assign the value 0 to both variables

while i is less than 100 do:
  increase i by one
  add value of i to sum

optionally print the final value of sum

What is a programming language?

Most traditional programming languages were designed to translate algorithms into elementary CPU instructions. Algorithms typically contain nested conditionals, repetition, math operations, recovery from errors, and potentially plausibility checks. A complex algorithm can generally be split into various separate logical parts. These may include reading in data at one point, performing multiple processing steps at another, and storing or displaying data as plain text, graphics, or animation at yet another point. This division into parts is reflected in programming languages through the grouping of tasks into subroutines, functions, or procedures, which accept a set of input parameters and can return a result.

As algorithms often work not only with numbers but also with text, it makes sense to have a form of textual data type in a programming language too. Data types can also be grouped in various ways. For example, as sequences of multiple data of the same type, like lists of numbers or names. Alternatively, collections of different types can be created, such as the name, age, and profession of a citizen in an income tax database. Programming languages provide support for all these use cases.

Compilers and interpreters

We already learned that the CPU in the computer can execute only simple instructions, which we call numeric machine code or assembly instructions.

To run a program written in a high-level language that includes many abstractions, we need some kind of converter to transform that program into the basic instructions that the CPU can execute. For the conversion process, we essentially have two options: we can either convert the entire program into machine code, store it on disk, and then run it on the CPU, or we can convert it in small portions, maybe line by line, and run each portion as soon as we have converted it. Tools that convert the whole program first are called compilers. Compilers process the program that we have written, incorporate necessary library modules from other sources, check the code for obvious errors, and then generate the machine code, which we can then store and run. Typically compilers create executables that are customized for a specific CPU architecture and a single operation system. A program compiled for a x86 CPU and the Windows OS could not be run on a Linux box with an ARM CPU. Often, recompiling the source code for another target architecture is possible, but modifications to the source code may be necessary. Program code that has to be compiled can be distributes as textual source code, or as precompiled binary. For source code distribution, the targets systems needs a matching compiler, and for binary distribution, the binary has to match the CPU and the OS of the target system.

Tools that process the source code in small portions, like single statements, are called interpreters. They read a line of source code, investigate it to check if it is a valid statement, and then feed the CPU with corresponding instructions to execute it. The difference between compilers and interpreters is similar to two methods of picking strawberries: you can either pick one and eat it immediately, or you can collect them all into a basket to eat later. Interpreted program code is typically distributed as textual source code and can in principle be run on each system with an matching interpreter. But in practice, it is not that easy: The code may use functionality that is only available for a specific OS, or the code may require a specific interpreter version.

Both interpreters and compilers have advantages and disadvantages for special use cases. Compilers are capable of detecting errors before the program is run, and compiled programs generally execute quickly, as all the instructions are preprocessed and readily available when the programs run. The compiling step takes some time, of course, at least a few seconds, but for some languages and large programs, it may take much longer. This can slow down the software development process because, as you add or change code, you must compile the whole program before you can execute and test it. That can be inconvenient for beginner programmers, as they may have to do this editing and testing process very often. Some adopt a programming style that involves changing a tiny bit of the source code, running it, and observing the results. A more common practice, however, is to first thoroughly consider the problem, then write the code which, in most cases, performs nearly as intended. With this style of programming, you don’t need to compile and execute your code as frequently. Compilers have one significant benefit: they can detect many bugs, primarily typing errors, during the compilation phase and provide detailed error messages. Interpreters have the advantage of enabling code modifications and immediate execution without any delay. This feature is beneficial for learning a new language and for conducting quick tests; however, even simple typing errors can only be detected when encountered during program execution. If your test does not attempt to run a faulty statement, there will be no error, but it may surface later. Modern compilers use various techniques to enable also nearly immediate test when a part of the source code has been modified: Fast compilers, often running in parallel on all available CPUs, combined with caching and incremental compilation, makes the compilation step extremely fast. Additional, a technique called hot code reloading enables the exchange of parts of the program code without interrupting the program execution.

Generally, the execution of interpreted programs is much slower than that of compiled executables, as the interpreter has to continually process the source code in real-time as it’s being run, while the compiler does it only once before the program is run. To conclude this section, here are a few additional notes:

Compilers are sometimes paired with entities known as linkers. In such instances, the compiler transforms the source code, which may reside in multiple text files, into a sequence of machine code instructions. Subsequently, the linker amalgamates all these machine code instructions to form the final executable. Some compilers either do not require the linking step or automatically invoke the linker. Moreover, some interpreters convert the textual source code into so-called bytecode in a very fast, initial preprocessing step ("on the fly"), which can then be interpreted faster. Languages such as Ruby and Python employ this method. The Java language uses a mix of compilation and interpretation: In a first step, the Java source code is compiled into an intermediate JAR code format. This JAR file can be distributed and executed by Java’s virtual machine (JVM). The JVM acts as an intermediate layer between the hardware and the user program, and the JVM can even further optimize the code while it is run on the target machine.

Types of programming languages

Software can be crafted in numerous styles. A programming paradigm is a fundamental style of writing software, and each programming language supports a specific set of these paradigms. A popular paradigm is object-oriented programming (OOP), a concept taught in many introductory computer science courses. Other paradigm are procedural and functional programming.

We have already mentioned assembly languages, which provide only the basic operations that a CPU can perform. Assembly languages offer no abstractions, so it’s debatable whether we should categorize them as programming languages at all. Then, there are low-level languages like Fortran or C, which, while providing some basic abstractions, still work close to the hardware. These languages are primarily designed for high performance and low resource consumption (RAM), but they don’t prioritize detecting and preventing programming errors or simplifying the programming process. These languages already support some higher-order data types, like floating-point numbers or text (strings), as well as homogeneous, fixed-size containers (called arrays in C), and heterogeneous fixed-size containers (called structs in C).

A different approach is taken by languages like Python or Ruby, which aim to make writing code easier by offering many high-level abstractions. They provide better protection against errors but are not as efficient. These languages also support dynamic containers, which can grow and shrink, or advanced data structures like hash tables (maps) or support textual pattern matching by regular expressions (regex).

Another way to differentiate programming languages is by their typing system, which can either be static or dynamic. Ruby, Python, and JavaScript are all examples of dynamically typed languages. This means that they use variables capable of storing any data type. Therefore, the data type that a variable accepts can dynamically change during program execution. This appears to be user-friendly and often it is, particularly for brief programs intended for single-use, occasionally referred to as scripts. However, dynamic typing can make discovering logical errors more challenging. For instance, an illegal addition of a number to a letter may only be detected at runtime. Dynamically typed languages generally consume a lot of memory and their performance tends not to be as efficient. It’s akin to owning a set of large, equally-sized moving boxes and storing each piece of our belongings in separate boxes.

In statically typed languages, each variable has a well-defined data type such as integer number, real number, a single letter, a text element, and many more. The data type is either assigned by the author of the program with a type declaration, or is detected by the compiler itself when processing the program source code, a process called type inference. In this context, the variable’s type never changes. In this way, the compiler can check for logical errors early in the compile process, and the compiler can reserve memory blocks exactly customized to the variables that we want to store, so total memory consumption and performance can be optimized. Referring again to the box analogy, static typing is akin to using customized boxes for all your belongings.

All these types of programming languages are often called imperative programming languages, as the program specifies exactly what the computer has to do. There are also other types of programming languages, such as Prolog, which primarily provide a set of rules and then allow the computer to solve problems using these rules.

Moreover, there are emerging concepts like artificial intelligence (AI) and machine learning (ML). They rely less on algorithms and more on neural networks, which are trained with extensive data until they can yield the desired results. Nim, the computer language that this book focuses on, is an imperative language. As such, our focus will be on the imperative programming style. However, it’s worth noting that Nim can be used to create AI applications.

Additionally, we can distinguish between languages such as C, C++, Ada, Rust, D, Go, Nim, and many more that compile to native executables and can run directly on the computer’s hardware. In contrast, languages like Java, Scala, Kotlin, Julia, among others, use a large virtual machine (VM) as an intermediary between the program and the hardware, as do interpreted languages like Ruby and Python. Languages that use a virtual machine generally require some startup time when a program is invoked, as the VM needs to be loaded and initialized. Also, interpreted languages are typically slower.[13] The distinction between languages that compile to native executables, and those that are executed on a virtual machine, is not really sharp. For instance, Kotlin and Julia initially ran on a virtual machine, but they can now compile source code to native executables. And new developments, such as the Mojo languages, claims to be able to execute ordinary Python code, as well as to compile code with added type annotations to fast machine code.

An important class of programming languages is the group of so-called Object-Oriented-Programming (OOP) languages, which use classes with attached methods, and typically reference semantics, polymorphism, and inheritance with dynamic dispatch. OOP languages became very popular in the 1990s. For some time, it was assumed that Object-Oriented-Programming was the ultimate solution for managing and structuring large programs. Java is a prominent example of OOP languages. It requires programmers to use the OOP design, and other languages such as C++, Python, and Ruby also strongly encourage the use of the OOP design. Experience has shown that the OOP design is not the ultimate solution for all computing problems, as it can make the code verbose and might hinder optimal performance. So newer languages, like Go, Rust, and Nim, support some form of OOP programming but use it only as one paradigm among many others.

Another popular and important class of programming languages includes JavaScript and its more modern extensions, like TypeScript, among others. JavaScript was designed to run in web browsers to support interactive web pages, as well as programs and games running in the browser. In this way, programs become nearly independent of the computer’s native operating system. Note that despite what the name may suggest, JavaScript is not closely related to the Java language. Since Nim can compile to a JavaScript backend, it offers robust support for web development.

Finally, perhaps the most important criterion for choosing a language for a programming task is the handling of memory and other resources. Allocating memory blocks, and releasing them again when they are not needed anymore, can be a serious effort, and doing it wrong can lead to various bugs, like free-after-use or memory leaks. The original Pascal compiler had no function to release memory at all, which may have been a simple strategy to avoid this difficult matter. C does all the memory- and resource-handling manually, which is one reason why C programming is difficult, and C programs often have serious bugs. The C++ language handles most memory and resource management by scope-based destructors, but still supports manual memory- and resource handling like C. Rust is similar to C++ in this regard, but has advanced features like the borrow-checker. Fully automatic memory management is a difficult topic and can generate overhead or delay in program execution. This is why some modern languages, like Zig, Odin, and Jai, avoid automatic memory handling. Other languages like Python, Java, JavaScript, C#, Julia, Go, and D use some form of garbage collector, which makes life for the programmers much easier and avoids all the memory-management-related bugs.

Nim was initially designed to use a garbage collector, with an option for manual memory management in critical areas. However, since version 1.0, Nim additional supports ORC/ARC memory handling, a form of scope- and destructor-based automatic memory management. ARC can be used when our memory blocks have no cycles, which is often the case. And ORC can handle additional cyclic structures. ARC and ORC may not yet provide optimal throughput compared to the older Garbage Collector, referred to as REFC. However, they avoid delayed deallocation and delays in program execution, making them good choices for critical code like device drivers and games.

Table 1. Overview of popular programming languages. Here we list only languages similar to Nim, and ignore languages with dynamic typing like Python, Ruby, JavaScript, and also Java with its rigid OOP design.
Language Paradigm Typing discipline Syntax Execution Memory Management Generics Macros, Meta-programming Modules

C

Imperative, procedural, structured

Static, weak

Braces, semicolons

Native

Manual

No

Text preprocessor

No

C++

Imperative, procedural, structured, object-oriented

Static, weak

Braces, semicolons

Native

Destructors, RAII, manual, optional GC

Yes, Templates

Text preprocessor

C++20

Nim

Imperative, procedural, structured, functional, object-oriented

Static, strong, inferred

Python-like (off-side rule)

Native, web browser (JavaScript)

GC, refcount, destructors

Yes

AST based, hygienic

Yes

Rust

Imperative, procedural, structured, functional, object-oriented

Static, strong, inferred

Braces, semicolons

Native

Destructors, borrow-checker

Yes

AST based, hygienic

Yes

D

Imperative, procedural, structured, functional, object-oriented

Static, strong, inferred, generic

Braces, semicolons

Native

GC, destructors, manual

Yes

Yes

Yes

Go

Imperative, procedural, structured, functional, composition

Static, strong, inferred

Braces, semicolons

Native

GC

No

No

Yes

Zig

Imperative, procedural, structured, functional, (object-oriented)

Static, strong, inferred, generic

Braces, semicolons

Native

Manual, option types

(Yes)

No

Yes

Sometimes, source code written in one programming language is converted into another one. A prominent target for such conversions is JavaScript, as JavaScript enables the execution of programs in web browsers. Another important target language is C or C++. Creating intermediate C code, which is then compiled by a C compiler to native executables, has some advantages compared to direct compilation to native executables: C compilers exist for nearly all computer systems including microcontrollers and embedded systems, so the use of the original language is not restricted to systems for which a native compiler backend is provided. And C as intermediate code simplifies the use of system libraries, which typically provide a C-compatible interface. Due to decades of development, C compilers generally can do better code optimizations than young languages may manage to do. Some people fear that intermediate C code carries the problems of the C language, like verbosity, confusing and error-prone code, or undefined behavior, to the source languages. But these well-known concerns of C occur only when humans write C code directly, just as when they write assembly code directly. Automatic conversions are well-defined and well-tested, which means these conversions are free of errors to the same degree as direct machine code generation would be. But indeed there are some small drawbacks when C or C++ is used as a backend for a programming language: C does not always allow direct access to all CPU instructions, which may make it difficult to generate optimal code for some special constructs like exceptions. And C uses wrap-around arithmetic for unsigned integer types, which may not be what modern languages desire. The current Nim implementation provides JavaScript, C, and C++ backends. While the JavaScript backend is a design decision to enable web development, the C and C++ backends are a more pragmatic decision and could be later replaced or at least supported by direct native code generation or use of the popular LLVM backend.[14] When computer languages are converted from one to another, the term transpiler is sometimes used to differentiate the translation process from direct compilation to a binary executable. When program code is converted between very similar languages with nearly the same level of abstraction, then the term transpiler may be justified. However Nim is very different from C and has a higher abstraction level, and the Nim compiler performs many advanced optimizations. So, even when compiling to JavaScript or the C++ backend, it should not be referred to as a transpiler.

Why Nim?

In this section, we use many new Computer Science (CS) expressions but do not explain them. This is intentional; if you already know them, you may gain a better understanding of what Nim is. If you do not know them, you will at least learn that we can describe Nim using complex terms.

Three well-known traditional programming languages are C, Java, and Python. C, created in 1972, is essentially a simple language that operates close to the hardware. Compilers can generate fast, highly optimized native machine code for C. However, C has cryptic syntax, some peculiar semantics, and it lacks the higher concepts of modern languages. Java, created in 1995, strongly encourages the object-oriented style of programming (OOP) and runs on a virtual machine. This makes it unsuitable for embedded systems and microcontrollers. Python, created in 1991, is generally an interpreted language rather than a compiled one, which results in slower program execution. Both, Java and Python, do not effectively support writing of low-level code that operates close to the hardware, making them unusable for device-driver and kernel development. Because many Python libraries are written in highly optimized C, Python can appear quite fast when performing standard tasks, such as sorting data, processing CSV or JSON files, or crawling websites. Therefore, Python is not a poor choice when primarily used for calling library functions. However, its performance deficiencies become evident when custom Python code is required to solve a problem.

Of course, there are many more programming languages, each with its own advantages and disadvantages, and some are optimized for specific use cases.

Nim is a state-of-the-art programming language well-suited for systems and application programming. Its clean Python-like syntax makes programming easy and enjoyable for beginners, without imposing any restrictions on experienced systems programmers. Nim combines successful concepts from mature languages like Python, Ada, and Modula with a few established features of the latest research. It offers high performance with type and memory safety while keeping the source code short and readable. Both the compiler and the generated executables support all major platforms, including Windows, Linux, BSD, and macOS. Cross-compiling to Android and other mobile and embedded devices and microcontrollers is possible, and the JavaScript backend allows the creation of web apps and to run programs in web browsers. The custom package managers, Nimble, Nimph and Atlas, facilitate the easy and secure use and redistribution of programs and libraries. The C, C++, and LLVM-based backends enable easy OS library calls without additional glue code, while the JavaScript backend generates high-quality code for web applications. The integration of the "Read/Eval/Print Loop" (REPL), "Hot code reloading", and incremental compilation (expected for versions > 2.0), along with support for various development environments — including debugging and language server protocols — make working with Nim both productive and enjoyable.

Some facts about Nim

* Nim is a multi-paradigm programming language. Unlike some popular programming languages, Nim doesn’t predominantly focus on the OOP paradigm. It’s primarily an imperative and procedural programming language, but it also supports OOP, data-oriented, functional, declarative, concurrent, and various other programming styles. Nim supports common OOP features, which include inheritance, polymorphism, and dynamic dispatch.

  • The generated executables are small and dependency-free. For instance, a simple chess program with a plain GTK-based graphical user interface is only 100 KB in size,[15] and the Nim compiler executable itself is approximately 6.5 MB. It is possible to shrink the executable size of "Hello World" programs to about 10 KB for use on tiny microcontrollers.

  • Nim is fast, with its performance typically rivaling that of other high-performance languages, such as C or C++. There are still some exceptions: other languages may have libraries or applications that have been tuned for performance for many years, while similar Nim applications are so far less tuned for performance, or are perhaps written with more priority on short and clean code or run-time safety.

  • Nim has a clean, Python-like syntax characterized by significant whitespace. There’s no need for block delimiters such as `{}` pairs or `begin/end` keywords, and no need for statement delimiters like `;`.

  • Safety: Nim programs are type- and memory-safe. The compiler prevents memory corruption as long as unsafe low-level constructs, such as casts, pointers, the address operator, or the `{.union.}` pragma, are not used.

  • Nim boasts a fast compiler capable of compiling itself and other medium-sized packages in less than 10 seconds. The upcoming incremental compilation feature could further increase this speed.

  • Nim is statically typed, meaning each variable or other entity has a well-defined type. This feature catches most programming errors at compile-time, prevents run-time errors, and ensures optimal performance. At the same time, the static typing makes it easier to understand and maintain larger codebases.

  • Nim supports various memory management strategies, including manual allocations for critical low-level tasks, as well as various garbage collectors, including a destructor-based, fully deterministic memory manager.

  • Nim produces native, highly-optimized executables and also has the capability to generate JavaScript output for web applications.

  • Nim has a clean module concept, which helps to structure large projects.

  • Nim features a well-designed standard library that supports a multitude of basic programming tasks. The full source code of the library is included and can be viewed easily from within the HTML-based API documentation.

  • Library modules, such as the `os` module, provide OS-independent abstractions. These allow for the compilation and running of the same program on different operating systems without modifications.

  • The Nim standard library is supplemented by over 1000 external packages for a wide range of use cases. External packages can be installed easily with Nim’s package managers.

  • Nim supports asynchronous operation, threading, and parallel processing.

  • Nim supports all popular operating systems including Linux, Windows, macOS, and Android, as well as various hardware types such as x86, ARM and RISCV procesors, including embedded systems and micro-controllers.

  • Utilizing external libraries written in C is straightforward, requiring no additional glue code. Moreover, Nim can even work together with code written in other languages. For instance, some Nim-Python interfaces are available.

  • Many popular editors have support for Nim syntax highlighting and other IDE functionality like on-the-fly checking for errors and displaying detailed information about imported functions and data types.

  • In the last few years, Nim has reached some important milestones: Version 1.0, which brought some stability promises, has been released. Along with the ARC and ORC memory management strategies and full destructor support, fully deterministic memory management comparable to memory management in C++ or Rust is available. Therefore, problems associated with conventional garbage collectors, such as delayed memory deallocation or extended pauses in programs due to the garbage collection process, are eliminated. And some larger companies have started using Nim in production, the most influential is currently the Status Corp. with their Ethereum client development.

Nim supports many programming styles

We have already mentioned that Nim is a multi-paradigm programming language that supports various programming styles. While Nim can primarily be regarded as an imperative, procedural programming language, it also effectively supports popular functional and object-oriented programming styles.

In classical OOP languages, such as Python, we have the concept of classes with attributes and methods that are tightly bound to the classes:

class User:
  def say(self):
    print("It does not work!")

user = User()
user.say()

In this Python snippet, we define a class, `User`, with a custom method named `say()` attached to it. We then create an instance, `user`, of this class and invoke its `say()` method.

This tight coupling of methods to classes lacks flexibility. For example, extending a class with additional methods can prove difficult or, in some cases, impossible. Another challenge with this class concept is determining the ownership of a method when multiple classes are involved. For instance, if we need a method that appends a single character to a text `string`, would that method belong to the character class or the `string` class?

Nim avoids such a strict class concept, while its generalized method call syntax allows us to use a class-like syntax for all our data types. For example, to get the length of a `string` variable, we can write `len(myString)` in classical procedural notation, or we can use the method call syntax `myString.len()` or just `myString.len`. The compiler treats all these notations as equivalent, making the method syntax available without the restrictions inherent to the class concept. The method call syntax can be used in Nim for all data types, even for plain numbers — so the notation `abs(myNum)` is fully equivalent to `myNum.abs`.

The Python code from above might look like this in Nim:

type User = object

proc say(self: User) =
  echo ("It does not work!")

let user = User()
user.say()

Instead of classes, we use `object` types in Nim, and we define procedures and methods that can work on `objects` or other data types.

As an example of the functional programming style in Nim, we could examine a code fragment from a real-world app required to generate a string from four numbers, separated by commas. Using the `mapIt()` procedure imported from the `sequtils` module and the `fmt()` `macro` from the `strformat` module, we may write that in functional programming style in this way:

from std/strutils import join
from std/sequtils import mapIt
from std/strformat import fmt
const DefaultWorldRange = [0.0, 0, 800, 600]
let str = DefaultWorldRange.mapIt(fmt("{it:g}")).join(", ")
echo str # "0, 0, 800, 600"

In the imperative, procedural style, we would write it like

from std/strformat import fmt
const DefaultWorldRange = [0.0, 0, 800, 600]
var str: string
for i, x in pairs(DefaultWorldRange):
  str.add(fmt("{x:g}"))
  if i < DefaultWorldRange.high:
    str.add(", ")
echo str # "0, 0, 800, 600"

Nim is efficient

Nim is a compiled, statically-typed language. Unlike interpreted, dynamically-typed languages like Python, where every statement must be run to check for errors, the Nim compiler catches most errors during the compilation process. The static typing, in conjunction with Nim’s robust type system, allows the compiler to catch a majority of errors, such as undefined operations like adding a number to a letter, during compilation. These errors are reported in the terminal window or directly in the editor or IDE. When no errors are found or after all errors have been fixed, the compiler generates highly optimized, dependency-free executables. This compilation process is typically quite fast; for example, the compiler can compile itself in less than 10 seconds on a modern PC.

Modern concepts such as zero-overhead `iterators`, compile-time evaluation of user-defined functions, and cross-module inlining, in combination with the preference for value-based, stack-located data types, lead to extremely efficient code. Multi-threading, asynchronous input/output operations (async IO), parallel processing, and SIMD instructions including GPU execution are supported. Various memory management strategies exist: selectable and tunable high-performance Garbage Collectors (GC), including a new fully deterministic destructor-based memory management system, are supported for automatic memory management. These can be disabled for manual memory management. This makes Nim a good choice for application development and close-to-the-hardware system programming at the same time. The unrestricted hardware access, small executables, and optional GC will make Nim a perfect solution for embedded systems, hardware drivers, and operating system development.

Nim is expressive and elegant

Nim offers a modern type system with templates, generics, and type inference. Built-in advanced data types such as dynamic containers, sets, and strings with full UTF support are complemented by a large collection of library types like hash tables and regular expressions. While Nim supports the traditional Object-Oriented Programming style with inheritance and dynamic dispatch, it doesn’t enforce this paradigm, instead offering modern concepts such as procedural and functional programming. The optional method call syntax enables the use of all data types and functions in an OOP-like fashion; for example, instead of `len(myStr)`, we can also use the OOP style `myStr.len`.[16] The powerful AST-based hygienic `macro` system offers nearly unlimited possibilities for the advanced programmer. This macro and meta-programming system allows compiler-guided code generation at compile-time. This way, the Nim core language can be kept small and compact, while many advanced features are enabled by user-defined macros. For example, the support of asynchronous IO operations has been created with these forms of meta-programming, as well as many Domain Specific Language (DSL) extensions.

Nim is open and free

Both the Nim compiler and all modules of the standard library are implemented in Nim. All source code is available under the permissive MIT license.

Nim has a community

The Nim forum is hosted at:

and the software running the forum is coded in Nim.

Real-time chat is supported by IRC, Gitter, Discord, Telegram, and others.

Nim also has a presence on Reddit.com and Stackoverflow.com:

Nim is evolving

Initiated over 15 years ago as a small community project by a group of bright CS students under the leadership of Mr. A. Rumpf, Nim is now considered one of the most interesting and promising programming languages. Supported by countless individuals and leading companies in the computer industry, Nim is actively used in the areas of application, game, web, and cryptocurrency development. Nim has made a large amount of progress in the last few years: it reached version Nim v2.0 with some stability guarantees and a new deterministic memory management system was introduced, which will enhance parallel processing support and the utilization of Nim in embedded systems development.

Nim is not a virus

Because Nim is a powerful yet simple systems programming language, it has been exploited by a few individuals to write malware in recent years. As a result, numerous Nim programs, including the compiler and other official tools, frequently get falsely flagged as viruses on Windows. Unfortunately, this poses a serious issue for newcomers wishing to explore Nim, and it lacks an easy solution. Nim developers have already reported this issue to Microsoft and other related companies, but they appear to show limited concern about it. Advanced Windows users can manually disable virus scans and potentially firewall protection. However, this can be seen as risky should a genuine Nim-related virus ever emerge.

References:

Mr. A. Rumpf initiated the development of Nim in 2008, and since then, he, along with a handful of volunteers, has been diligently advancing its development. Finally, in 2018, Nim got some significant monetary support from Status Corp., and in 2019, the stable Nim version 1.0 was released. However, Nim is still developed by a small core team and some volunteers, while other languages like Java, C#, Go, or Rust are supported by large companies, or, like C and C++, have a very long history and well-trained users. Finally, there are many competing languages, some with a longer history and some possibly better suited for special purposes, like JavaScript, Dart, or Kotlin for web development, Julia or R for numeric applications, or Zig, C, and Assembly for the tiny 8-bit microcontrollers with a small amount of RAM.

While we’ve said that Nim can be used universally, from tiny microcontrollers to large desktop and web applications, we must admit that its use for mobile devices with Android or iOS operating systems is not as easy and well-documented. However, this applies to many other languages, including popular ones like Python, Go, and Rust. The reason simply is that Android and iOS devices are not really open systems. For example, Android is strongly coupled to Java or its new variant, Kotlin. However, using Nim on Android and iOS devices is possible. Games and apps have already been created for these devices. See https://github.com/treeform/glfm as an example.

Currently, Nim does not have a single perfect GUI library. Instead, there are a lot of attempts: Various GTK and Qt bindings, many web-based GUIs, a few simple, pure Nim GUIs, and the Fidget project. The situation is currently not really satisfactory, but the same is the case for most other modern languages like Go, Julia, Rust, and even Python. The exceptions are Dart with Flutter, perhaps C++ with Qt and the Java/Kotlin/Android bundle, and of course the commercial languages Swift and C#.

Some people just prefer languages with full OOP support and true classes. While Nim does support OOP design with heap-allocated reference objects, inheritance, and methods with dynamic runtime dispatch, it does not strongly enforce its use. People educated in the 1990s might still be influenced by the Java OOP hype and argue that classes make structuring larger programs easier.

Others detest all forms of automatic memory management and might believe that Rust’s borrow checker or Zig’s C-like memory management suffices. In fact, Nim might not always match Rust’s performance completely. And while Nim’s executables are already compact, Zig, being essentially an improved C, provides no overhead to C libraries and might generate even smaller executables.

For some "professional" programmers, Nim’s use of significant white space instead of curly brackets for identifying blocks and scopes could be a reason to avoid Nim. The use of significant white space, also called the Off-side rule,[17] has some tradition in computer textbooks and is used in some other languages, like Python, Haskell, and Scala 3. With Python being the most popular programming language these days, it is hard to believe that programmers really prefer the use of curly brackets. But actually, most professionals started their education with languages like C, C++, or Java, and just feel more professional when they have their curly brackets. Scala introduced significant white space in version 3 of the language, and its designer Martin Odersky said that this improves productivity overall by 10%.[18]

Nim programmers usually import symbols from other modules unqualifiedly ("import std/strutils" instead of "from std/strutils import …​"). Fully qualified symbol import is possible (from std/strutils import nil), but since Nim doesn’t use classes, this may make it difficult to use imported operators. It could also cause issues with Nim’s method call syntax not working properly (strutils.toUpperAscii(myStr) vs myStr.toUpperAscii). People coming from dynamically typed languages like Python sometimes express concern about namespace pollution and symbol conflicts due to unqualified imports. Experience has shown that unqualified import isn’t an actual problem in Nim. This is because procedure overload resolution typically works reliably when the `proc` parameter types are not all identical. Conflicts may only occur in rare situations for constants or enumeration data types. These are reported by the compiler and can easily be resolved by using module name prefixes when necessary. Nevertheless, some people worry and argue that fully qualified names make it easier to see the origin of symbols.[19]

A similar point is the style-insensitivity of Nim: With the exception of the first letter of a symbol, Nim does not distinguish between lower- and upper-case letters and ignores underscores. This approach has some advantages and disadvantages, but in practice, it’s not as problematic as it might seem. We will discuss it later in this book in more detail.

Not directly related to the Nim language itself, but to the user experience, is the programming environment or tooling: editors, IDEs, REPL (read–eval–print loop), package managers, and debugging and profiling support. All this may not be as perfect as for other popular major languages yet. Indeed, Nim’s language server support (based on nimsuggest) is not very reliable and tends to be slow.

The language server support depends on compile times, as nimsuggest is some form of a Nim compiler variant. So this may improve when Nim eventually receives incremental compilation support (IC), expected in Nim 2.0 or later. Providing good language server support is generally hard for languages with templates, generics, macros, and type inference — the Crystal language has similar issues.[20]

However, all this tooling is more of an implementation detail and not a direct issue of the language. Since Nim is a high-level language with very clear syntax, tooling should not be that important. Programs that compile successfully generally just work, so there may not be a significant demand for robust debugger support. In fact, Nim already has all of this tooling; it just doesn’t function as effectively as it could. [21]

Nim is already supported by more than 1000 external packages which cover many application areas, but that number is still small compared to really popular languages like Python, Java, or JavaScript. However, some current Nim packages might not measure up to the libraries of other languages, which have benefited from years of optimization by hundreds or thousands of full-time developers.

Indeed, the future of Nim is not entirely secure. Core developers might vanish, financial support could stop, or a better language could emerge. However, even if the development of Nim were to cease someday, you would still be able to use it, and many of the concepts you’ve learned with Nim could be applied to other modern languages as well.

Is Nim a good choice as the first language for a beginner?

When you use C as your first language, you may learn a lot about how computers really work, but the learning experience may not be as enjoyable, progress can be slow, and C lacks many concepts of modern programming languages. C++, Rust, and Haskell are often too difficult for beginners. So, currently, many beginners start with Python. While you can efficiently grasp high-level concepts with Python and quickly achieve useful results, you might not learn much about the internal workings of computers. Thus, you might not understand why your code is slow and consumes so many resources; you could also be uncertain about how to improve the program or run it successfully on restricted hardware. It’s like learning to drive a car without any knowledge about how a combustion engine, the transmission, or the brakes really work. Nim has none of these restrictions; it offers high-level concepts like Python, but also provides access to low-level operations, enabling a deeper understanding of internal workings if desired. Although learning resources for Nim are not yet as developed as those for mainstream languages, some good tutorials are already available. Hopefully, this book will also prove helpful to beginners.

Is Nim really a good teaching language?

Generally yes, in the same way as Pascal was in the 1980s, and Modula/Oberon was at the end of the last century. However, Nim still faces the same issues as Wirthian languages: it doesn’t necessarily assist in job seeking. If we teach children Python, JavaScript or C, they might find entry-level employment, particularly if they have to deviate from their intended educational path for some reason. Unfortunately, this is not the case with niche languages, so teachers should be aware of their responsibility. Furthermore, it doesn’t make much sense to teach against the interests of the kids. When they are keen to learn JavaScript to create visual effects or similar tasks easily, teaching another language that might not be immediately available on their home PC or smartphone becomes challenging.

So, is Nim really the best starting point for me?

Maybe not. If you intend to learn a programming language today and want to make a great video game tomorrow, then Nim is definitely not the best starting point. This is just not possible. While there are nice libs for making games with Nim already available, there exist easier solutions in other languages. With some luck, you might find source code in that language allowing you to patch a few strings, modify colors and background music, and claim it as your game.

After learning Nim, will I still have to learn other programming languages?

Nim is quite a versatile language, making it a good candidate for someone intending to learn only one language. But of course, it is always a good idea to learn a few other languages later. Generally, it’s hard to avoid learning C, given the prevalence of C code worldwide. Most algorithms that have ever been invented are available in a C implementation somewhere, and most libraries are written in C or at least have a C API that you can use with other languages, including Nim. Since C is a compact language without complex constructs, a basic understanding of C is typically sufficient to convert a C program to another language. Often, that conversion process is supported by tools, such as the Nim c2nim tool. So learning some C later is really a good idea, and when you have some basic understanding of Nim and CS in general, learning some C is an easy task. However, learning C before Nim could be an option, as more learning resources exist for C. A few years ago, some people would have recommended learning C or Python before Nim. However, Nim now has sufficient learning resources, so we indeed recommend starting directly with Nim.

Why should I not use Nim?

Perhaps it is simply not the ideal solution for you. Both a racing bicycle and a mountain bike are excellent, but for cycling a few hundred meters to the baker’s shop, neither might be the perfect solution. A standard bicycle would be more suitable. Even though Nim seems to combine the advantages of both a racing bicycle and a mountain bike — high performance and robust design — and isn’t expensive, it might not be the optimal solution for everyone. People who write only small scripts and aren’t concerned about performance can continue using Python. People who are interested solely in specific applications, perhaps just web development or 8-bit microcontrollers, might not necessarily need Nim. Nim can do this and much more well, but for special use cases, better-suited languages may still exist. Additionally, someone who has spent many years mastering C++ might decide to continue using it. Currently, another potential reason for not using Nim could be the absence of certain libraries. If you require certain important libraries for your project that are currently unavailable for Nim, of course, this could pose a significant problem if you lack the skills or time to write them from scratch or at least create high-level bindings to a C library.

How long does it take to learn Nim?

Some people might tell you that you can learn it in just two weeks.[22] Perhaps, when you are very, very bright. However, if it were that easy, the world would be filled with Nim experts. Studying the official tutorials Part I and II should really take only a few hours, and then you have already a basic feeling for the language and can do some simple exercises. In theory, to learn the fundamentals of Nim, reading this book should suffice, and you might even skip Part I and the exercises in Part IV. Thus, you actually only need to read 400 pages, which should be possible in 100 hours. But who can really read 8 hours a day, and remember all the details without practicing? Reading the language manual or Mr. Rumpf’s book would also be ways to learn the language.

I started with Nim in 2014, with some prior experience in Pascal, C, Modula, Oberon, Ruby, and assembly language. I learned from all the tutorials, the Nim forum, IRC, and later from the Manning book. I also studied the Nim language manual, the API docs of Nim’s standard library, and a few important external packages. I estimate it took me one year, studying 10 hours a week, to understand the basics and become proficient in the language. In addition to learning, I did some exercises, such as writing a simple chess game. So for me, it actually took more than 500 hours. We believe that with a good book, the learning process could be at least 50% faster. So, if you can dedicate 10 hours a week to learning and a few additional hours to practicing, you could consider yourself a Nim programmer after about six months. Of course, your motivation makes a big difference. Loving the language, having an interesting project for which you intend to use the language, and maybe even a job where you can use it, helps a lot.

Our first Nim program

To maintain our motivation, let’s now present our first tiny Nim program. Ideally, we would delay this section until after installing the Nim compiler on our computer. However, we can already run and test the program by copying it into one of the available Nim online playgrounds like

There are two more unofficial sites that can run Nim code online:

In the section What is an algorithm? we described an algorithm to sum up the first 100 natural numbers. Converting that algorithm into a Nim program is straightforward, resulting in the text file provided below. You can copy it into the playground and run it now if you want. The program uses some basic Nim instructions, which we will briefly describe here. Everything will be explained in much more detail in the next part of this book.

var sum: int
var i: int
sum = 0
i = 0
while i < 100:
  inc(i, 1)
  inc(sum, i)
echo sum

We write Nim programs as plain text files using an editor tool, and you will learn how to create them soon. We call these text files the source code of the program. The source code is the input for the compiler. The compiler processes the source code, checks for obvious errors, and then generates an executable file that contains the final CPU instructions and can be run. Executable files are sometimes called executables or binary files. The term binary could be considered misleading, as all computer files are indeed stored as binary data. However, the expression 'binary' is used to differentiate executable programs from text files, such as Nim source code, which we can read, print, and edit using an editor. Don’t try to load the executable files generated by the Nim compiler into a text editor, as the content is not plain text, but numeric machine code that may confuse the editor. On the Windows OS, executable files typically get a special name extension `.exe`, but on Linux, no special name extensions are used.

Nim source code files are processed by the Nim compiler from top to bottom. In principle, for the generated executable, program execution also starts at the top. However, there are some exceptions to program execution; for example, program code enclosed in functions is not immediately executed where it appears in the source code file but rather when the function is called (invoked). And the program execution is not a linear process — we can use conditional expressions to skip parts of the program, or various loop constructs to repeat the execution of some program segments. In fact, the program execution in Nim is more similar to languages like Python or Ruby than to the C language: A C program always needs a `main()` function with exactly this name, and the execution of a C program always starts with a compiler-generated call to this function.

Variables are elementary entities of computer programs and are essentially named storage areas in the computer. As Nim is a compiled and statically-typed language, we have to declare each variable before we can use it. We do that by choosing a meaningful name for the variable and specifying its data type. To tell the compiler about our intention to declare a variable, we start the line with the `var` keyword, followed by the chosen name, a colon, and the data type of our variable. We have to put at least one space character between the `var` keyword and the name of the variable, to allow the compiler to recognize the two separate entities. Usually, we also put a space after the colon that separates the variable name from its data type. But this is only a convention to improve the readability of the source code. For the compiler, the colon already separates the variable name from the data type. The first line of our program declares a new variable named `sum` of data type `int`. `int` is short for integer and indicates that our variable should be able to store negative or positive integer numbers. (Integer numbers are whole numbers without a fraction, like `-1, 0, 1234`. Floating-point numbers, like 3.14159, represent another important numeric data type that we will use later as well.) The `var` at the start of the line is a keyword. Keywords are reserved symbols that have a special meaning for the compiler. `Var` indicates that we want to introduce a new variable. The compiler recognizes this and reserves a memory location in the computer’s RAM to store the actual value of the variable.

The second line is nearly identical to the first: we declare another variable, again of `int` type and a simple name, `i`.

Variable names like `i`, `j`, and `k` are typically used when we cannot think of a meaningful name or when we intend to use these variable as (`array`) indizes or as counters in loops. Note that in Nim, we can use arbitrary names for variables (with some restrictions) and that the actual name of a variable is not coupled to its data type or behavior. In early Fortran, that was handled differently, as the convention was that variables named `i`, `j`, and `k` were automatically of integer type by default.

In lines 3 and 4 of our program, we initialize the variables, that is, we give them a well-defined initial start value. To do that, we use the `=` operator to assign a value to the variable. Operators are special symbols like `+`, `-`, `*`, or `/` to indicate our desire to do an addition, a subtraction, a multiplication, or a division. Note that the `=` operator is used in Nim like in many other programming languages for assignment, and not like in traditional mathematics as an equality test. The reason for this is that, in computer programming, assignments occur more frequently than equality tests. Some early languages, like Pascal, used the compound `:=` operator for assignment, which aligns more closely with mathematical usage. However, it is more difficult to type on a keyboard and is not visually appealing to most people. An expression like `x = y` assigns the content of variable `y` to `x`. In other words, `x` gets the value of `y`, the former value of `x` is overwritten and lost, and the content of `y` remains unchanged.

After such an assignment, `x` and `y` contain the same value. In the above example, we do not assign the content of a variable to the destination; instead, we use a literal numeric constant with the value `0`. When the computer has executed lines 3 and 4, the variables `sum` and `i` each contain the start value `0`. When we use the `=` operator for an assignment, we usually put a space character on both sides of the operator. However, this is merely a convention to improve the readability of the source code and is not strictly necessary. As a convention, spaces are typically placed on both sides of most Nim infix operators. This includes arithmetic operators, the assignment operator, and relational operators such as `<` or `>`. Also, similar to usage in ordinary text files, when we use a colon or a semicolon to separate two entities from each other, we usually put a space after the punctuation character.

Line 5 of our code example is much more interesting: it contains a `while` condition. The line starts with the term `while`, which is again a reserved keyword, followed by the logical expression `i < 100` and a colon. An expression in Nim is something that produces a result, like the math expression `2 + 2`, which yields the integer result of `4`. A logical expression doesn’t yield a numerical result; instead, it yields a logical (boolean) result, which can be `true` or `false`. The logical expression `i < 100` is dependent on the current value of the variable `i`. The two lines following the line with the `while` keyword are each indented by two spaces, meaning that these lines start with two additional spaces compared to the previous line. This form of indentation is used in Nim (and Python) to indicate blocks. Blocks are grouped statements. The complete while loop consists of the line containing the `while` keyword followed by a block of statements. The statement block after the `while` condition is executed as long as the `while` condition evaluates to the logical value `true`. For the first loop iteration `i` has the initial value `0`, the condition `i < 100` evaluates to the boolean value `true`, and the block after the `while` condition is executed for the first time. In this block, we have the `inc()` instruction. `Inc` is an abbreviation for increment. `Inc(a, b)` increases the value of variable `a` by `b`, while `b` remains unchanged. So in the above block, `i` is increased by one, followed by `sum` being increased by the current value of `i`. So when that block has been executed for the first time, `i` has the value `1` and `sum` also has the value `1`. At the end of that block, execution starts again at the line with the `while` condition, now testing the expression `i < 100` with `i` containing the value `1`. Again, it evaluates to `true`, so the block is executed again; `i` then gets the new value `2`, and sum becomes `3`. This process continues until `i` reaches the value `100`, at which point the condition `i < 100` evaluates to `false`, and execution proceeds with the first instruction after the `while` block. That instruction is an `echo` statement, which is used in Nim to write values to the terminal or screen of the computer. Some other languages use terms like `print` or `put` instead of `echo`. You might still be wondering about the colon that terminates line five, which contains the `while` condition. That colon serves solely as a marker to indicate the end of a conditional statement.

Don’t worry if you haven’t understood much of this short explanation; we will explain all of it in much more detail later.

If you decide to try the above program, perhaps on a playground Internet page or on your local computer, it is best to copy the source code verbatim instead of typing it from scratch, as tiny typos can cause a lot of trouble for beginners. If you decide to type it with your keyboard, you should try to replicate it exactly as displayed above. All the program code should start directly at the first column. However, the two lines after the `while` keyword should start with two spaces. This strict indentation is used in Nim and some other programming languages, such as Python and Haskell, to structure the program code and mark the extent of code blocks. Some other programming languages like C do a similar alignment of the source code for readability, but that alignment is ignored by the C compiler — instead, blocks have to be enclosed in curly braces `{}`. Note that you have to do the indentation really with spaces, as Nim does not accept tabulator characters in its source files. Also, be aware that the Nim compiler distinguishes between words starting with lowercase and uppercase letters. Nim keywords are written always in lowercase, and when we define a variable as `sum` then we should always refer to it in exactly this notation.[23] Also note that spaces in the Nim source code are important and can change the semantics: While spaces in C are mostly only used to separate distinct symbols, in Nim spaces have some more functionality. For instance, in mathematical expressions, `a - b` or `a-b` is both a valid subtraction in the case when `a` and `b` both have a numeric type for which an infix subtraction operator is defined, but the code segment `a -b` may give us an error message from the compiler. The reason is that in this case, the `-` sign is directly attached to `b` but separated from `a` by at least one space. In this case, the Nim compiler interprets the `-` sign as a unary operator attached to `b`. Even in the case that such a unary `-` may have been defined before, then the operands `a` and `b` would be not separated by an infix operator, which is an invalid syntax in Nim. An expression like `a - -b` would instead be valid syntax — with the unary minus attached to `b`, and `a` and `(-b)` separated by an infix `-` operator. In this example, we’ve already learned that the same symbol can have a different meaning in the Nim language, depending on the context. For operators or functions, this concept is called overloading, which most modern programming languages use. This sensitivity to the asymmetrical use of spaces also applies to the 'less than' operator used in the above example: `a < b` or `a<b` is the infix notation that we generally intend for a comparison operation, while `a <b` would be mostly invalid code. For infix operators, we typically put a space on each side to improve readability, although it’s not strictly necessary, and some people opt not to insert these spaces. Unary operators, like the unary `-` sign, should always precede a variable or a literal without a space.

All this might sound a bit complicated, and the compiler error messages about these formatting rules may not always be entirely clear for beginners. Ultimately, it’s akin to handwriting - after the initial learning phase, correct usage will become second nature.

Note that you can easily verify the result of our tiny program: Instead of summing up the first 100 natural numbers, we could simply sum up 50 pairs, each constructed from the first and the last numbers, the second and the second-to-last numbers, and so on. The sum of each pair is always `101`, so the sum of fifty pairs is `50 * 101 = 5050`. This trick is attributed to the famous German mathematician Johann Carl Friedrich Gauss (1777 – 1855), who is said to have used this method as a young schoolboy to quickly solve a similar task given by a teacher.[24]

Binary numbers

When we write numbers in everyday life, we typically use the decimal system with base 10, which includes the ten available digits 0 through 9. To get the value of a decimal number, we multiply each digit with powers of 10 depending on the position of the digit and sum the individual terms. The rightmost digit is multiplied by 10^0, the next digit by 10^1, and so on. A literal decimal number like 7382, therefore, has the numerical value `2 * 10^0 + 8 * 10^1 + 3 * 10^2 + 7 * 10^3`. Here, we have used the exponential operator `^` — where `10^3 = 10 * 10 * 10`. Current computers use binary representation internally for numbers. Generally, we do not care much about that fact, but it is good to know some properties of binary numbers. Binary numbers work nearly identically as decimal numbers. The difference is that we have only two available digits, which we write as `0` and `1`. A number in binary representation is a sequence of these two digits. Like in the decimal system, the numerical value results from the individual digits and their position: The binary number `1011` has the numerical value `1 * 2^0 + 1 * 2^1 + 0 * 2^2 + 1 * 2^3`, which is `11` in decimal notation. For binary numbers, the base is 2, so we multiply the binary digits by powers of two. Formally, the addition of two binary numbers works as we know it from the decimal system: we add the matching digits and take the carry into account, as in `1001 + 1101 = 10110`, because we start by adding the two least significant digits of each number, which are both `1`. That addition of `1+1` results in a carry and the resultant `0`. The next two digits are both zero, but we have to take the carry from the former operation into account, so the result is 1. For the next position, we have to add 0 and 1, which is just 1 without a carry. And finally, we have 1 + 1, which results in 0 with a carry. The carry generates an additional digit, concluding the operation. In the decimal system with base 10, a multiplication with 10 is easily calculated by just shifting all digits by one place to the left and writing a 0 at the now empty rightmost position. For binary numbers, it is very similar: a multiplication by the base, which is two in the binary system, is simply a shift to the left, with the rightmost position filled by digit `0`. [25]

In the binary system, the digits are typically called bits, and these bits are numbered from right to left, starting with `0` for the rightmost bit. For example, the binary number `10010101` is referred to as an 8-bit number because it requires eight digits to be represented in binary form. Often, individual bits are conceptualized as small bulbs, with a `1` bit represented as a lit bulb and a `0` bit represented as a dark bulb. A lit bulb is also referred to as a set bit. For instance, in the binary number `10010101`, bits `0`, `2`, `4`, and `7` are set, and the other bits are unset or cleared.

Groups of eight bits are called a byte, and sometimes, four bits are called a nibble. A word, which is an entity a computer can process in a single instruction, may consist of one, two, four, or eight bytes, depending on the CPU’s capacity. In the case of a CPU with an 8-byte word size, this means that the computer can, for instance, add two variables, each of 8-byte size, in a single instruction.

Let’s investigate some basic properties of binary numbers, starting with the assumption that we have an 8-bit word, also known as a byte. An 8-bit word can have 2^8 different states, as each bit can be set or unset independently of the other bits. That corresponds to the numbers `0` up to `255`. For now, we’ll assume that we’re working with positive numbers only, but we will discuss negative numbers soon. An important property of binary numbers in computers is the wrapping around, which is a consequence of the fact that we have only a limited set of bits available to store the number. Therefore, when we continuously add `1` to a number, all bits eventually become set. This corresponds to the largest number that can be stored with that number of bits. When we then add again 1, we get an overflow. The run-time system may catch that overflow, so we might receive an overflow error, or the number is just reset to zero, as it may happen in our car when we manage to drive one million miles, or when the ordinary clock jumps from 23:59 to 00:00 of the next day. A useful property of binary numbers is the fact that we can easily invert all bits, that is, replace set bits with unset ones and vice versa. Let us use the prefix `!` to indicate the operation of bit inversion, then `!01001100` is `10110011`. It is an obvious and useful fact that for each number x, we get a number with all bits set when we add x and !x. This means `x + !x = 11111111` when considering an 8-bit word. Furthermore, if we ignore overflow, it follows that `x + !x + 1 = 0` for each number `x`. This is a useful property that can be applied when considering negative numbers.

Now, let us investigate how we can encode negative numbers in binary form. In binary representation, only two states are available: 0 or 1, representing a set or an unset bit, respectively. But we have no unitary minus sign. The sign of a number could be encoded in the most significant bit of a word — if this bit is set, it indicates that the number is negative. Generally, a modified version of this encoding is used, called two’s complement: a negative number is constructed by first inverting all the bits — a 0 bit is transferred into a 1 bit and vice versa — and finally the number 1 is added. That encoding simplifies the CPU construction, as subtraction can be replaced by addition in this way:

Consider the case that we want to do a subtraction of two binary encoded numbers. The operation can be symbolically represented as `A - B` for arbitrary numbers `A` and `B`. Subtraction is, by definition, the inverse operation of addition. In other words, `A + B - B = A, or B - B = 0` for every number `B`.

Assume we have a CPU that can do additions and that can invert all the bits of a number. Can we perform a subtraction with that CPU? Indeed, we can.

Remember that for each number `X`, `X + !X + 1 = 0`, provided we ignore overflow. If that relation is true for each number, then it is obviously true for each `B` in the expression `A - B`, and we can write `A - B = A + (B + !B + 1) - B = A + (!B + 1)`, using the associative property of addition and subtraction in mathematics, that is we can group the terms as we want. But the term in the parenthesis is just the two’s complement, which we get when we invert all bits of `B` and add `1`. So, to perform subtraction, we need to invert the bits of `B` and then add `A`, `!B`, and `1`, ignoring overflow. That may sound complicated, but a bit inversion is a very cheap operation in a CPU, which is always available, and adding `1` is also a straightforward operation. The advantage is that we do not need separate hardware for the subtraction operation. Typically, subtraction in this way is not slower than addition because the bit inversion and the addition of `1` can be performed at the same time in the CPU as an ordinary addition.

From the equation above, indicating `A - B = A + (!B + 1)`, it is obvious that we consider the two’s complement `(!B + 1)` as the negative of `B`. Note that the two’s complement of zero is again zero, and two’s complement of `00000001` is `11111111`. All negative numbers in this system have a bit set to 1 at the leftmost position. This restricts all positive numbers to bit combinations where the leftmost bit is unset. For an 8-bit word, this means that positive numbers are restricted to the bits 00000000 to 01111111, which is the range 0 to 127 in decimal notation. The two’s complement of decimal 127 is 10000001. Seems to be fine so far, but note that there exists also the bit pattern 10000000, which is -128 in decimal. For that bit pattern, no positive value exists. If we try to construct the two’s complement of that bit pattern, we end up with the same pattern again. This is an asymmetry of two’s complement representation, which cannot be avoided. It generally is no problem, with one exception. We can never invert the sign of the smallest available integer, as that operation would result in a run-time error.[26]

Summary: When working only with positive numbers, we can store numbers from 0 up to 255 in an 8-bit word, also known as a byte. In a 16-bit word, we could store values from 0 up to 2^16 - 1, which is 65535. When we need numbers that can also be negative, we have for 8-bit words the range from -128 to 127 available, which is -2^7 up to 2^7 - 1. For a signed 16-bit word, the range would be -2^15 up to 2^15 - 1.

While we can work with 8 or 16-bit words, for PC programming the CPU usually supports 32- or 64-bit words, so we have a much larger number range available. But when we program microcontrollers or embedded devices we may indeed have only 8- or 16-bit words available, or we may use such small word sizes intentionally on a PC to fit all of our data into a smaller memory area.

An important note to conclude this section is that whenever we have a word with a specific bit pattern stored in our computer’s memory, we cannot directly determine the type of data from the bit pattern. It can be a positive or a negative number, but maybe it is not a number at all but a letter or maybe something totally different. As an example, consider this 8-bit word: 10000001. It could be 129 if we have stored intentionally positive numbers in that storage location, or could be -127 if we intentionally stored a negative value. Or it could be not a number at all. Is that a problem? No, it is not as long as we use a programming language like Nim which uses static typing. Whenever we are using variables, we declare their type first, and so the compiler can do bookkeeping about the type of each variable stored in the computer memory. The benefit is that we can use all the available bits to encode our actual data, without having to reserve any bits to encode the actual data type of variables. For languages without static typing, this is not the case. In languages like Python or Ruby, we can use variables without a static type, so we can assign whatever we want to them. That seems to be comfortable at first but can be confusing when we write larger programs and the Python or Ruby interpreter has to do all the bookkeeping at runtime, which can slow down the program and consume additional memory.

Put another way, to determine if an operation is valid, it’s generally sufficient to know only the data type of the operands. We do not have to know the actual content. The only exception is if we invert the sign of the most negative integer number or if we perform an operation that causes an overflow, as there are not enough bits available to store the result — we may get a run-time error for that case.[27] In a statically-typed language, each variable has a well-defined type, and the compiler can ensure at compile-time that all operations on that variable are valid. If an operation is not valid, the compiler will generate an error message. Then, when these operations are executed at run time, they are always valid operations, and the actual content, like the actual numeric value, does not matter (with the exception of overflow and perhaps a few other invalid math operations like division by zero).

Hexadecimal numbers

Hexadecimal numbers, based on the 16-base numerical system, might seem less prevalent compared to binary numbers and their technical rationale may not be immediately apparent. However, these numbers continue to be relevant and you might come across them occasionally in various contexts. Originally, hexadecimal numbers emerged from the infancy of computer science when programming was primarily conducted through numerical codes rather than sophisticated programming languages. Despite their historical origin, hexadecimal numbers remain integral to modern computing. They serve as a more human-friendly representation of binary numbers, facilitating their comprehension and manipulation. This function has led to their extensive use in different areas of computing, including programming and networking. So, even though hexadecimal numbers are seen as a remnant from the nascent phase of computing, they retain their utility and relevance in contemporary computer science.[28] To represent the 16 possible values of a hexadecimal digit, the 10 decimal digits `0` up to `9` are supplemented with the characters `A` through `F`. The most significant characteristic of a hexadecimal digit is that it can represent four bits — a unit equivalent to half of a byte, sometimes called a nibble. In the past, when manually entering binary numbers was necessary, it was often easier to encode a nibble using a hexadecimal digit:

Decimal Binary Hexadecimal

0

0000

00

1

0001

01

2

0010

02

3

0011

03

4

0100

04

5

0101

05

6

0110

06

7

0111

07

8

1000

08

9

1001

09

10

1010

0A

11

1011

0B

12

1100

0C

13

1101

0D

14

1110

0E

15

1111

0F

The only place where we will encounter hexadecimal characters again in this book will be when we introduce character and string data types. There, control characters like a newline character are sometimes specified in hexadecimal form, such as "\x0A" for a newline character.

Installation of the compiler

We will not go into great detail about installing the Nim compiler, as the process largely depends on your operating system, and the installation instructions may change in the future. We assume that you have a computer with an installed operating system and Internet access, and you are able to do at least very basic operations with your computer, such as switching it on, logging in, and opening a web browser or a terminal window. If that is not the case, then you should really seek help for these basic steps, and possibly with other basic tasks.

Detailed installation instructions are available on the Nim homepage at https://nim-lang.org/install.html.[29] Try to follow these instructions. If they are not sufficient, please seek help in the Nim forum: https://forum.nim-lang.org/

If you are using a Linux operating system, then your system usually provides a package manager, which should make the installation very easy.

For example, on a Gentoo Linux system, you would open a root terminal and simply type `emerge -av nim`. That command would install Nim, including all necessary dependencies, for you. It may take a few minutes as Gentoo compiles all packages fresh from the source code, but then you are done. Similar commands exist for most other Linux distributions. This installation by a package manager installs Nim system-wide, so all users of the computer can now use Nim.

Another solution, which is preferable when you want to ensure that you get the most recent Nim compiler, is compiling directly from the latest git sources. This process is also straightforward and is described here: https://github.com/nim-lang/Nim. However, before you can follow these instructions, you must ensure that Git software and a working C compiler are installed on your computer.

Creation of source-code files

Nim source code, like the source code of most other programming languages, is based on text files. Text files are documents saved on your computer that contain only ordinary letters, which you can type on your keyboard. This means no images or videos, no HTML content with fancy CSS styling. Generally, source code should contain only ordinary ASCII text, that is, no umlauts or Unicode characters.

To create source code, we typically use a text editor, which is a tool designed for creating and modifying plain text files. If you don’t already have a text editor, you could technically use a word processor to write your source code, though it’s not recommended. However, you would need to ensure that the file is saved as plain ASCII text. Editors typically support syntax highlighting, meaning keywords, numbers, and such are displayed with a unique color or style, making it easier to recognize the content. Some editors support advanced features like checking for errors while you type the program source code.

A list of recommended editors is available at https://nim-lang.org/faq.html

If you do not want to use a specialized editor now, then Gedit or Nano should be available on Linux. For Windows, you can use something like Notepad.

Typically, we store Nim source code files in their own directory, a separate section on your hard drive. If you’re working on Linux in a terminal window, you can type

cd
mkdir mynimfiles
cd mynimfiles
gedit test.nim

You type these commands in the terminal window and press the `return` key after each of the above lines — that is, you type `cd` on your keyboard and then press the `return` key to execute that command. The same for the next three commands. What you have done is the following: you navigated to your default working area (home directory), created a subarea named `mynimfiles`, entered that subarea, and finally, launched the `gedit` editor. The argument `test.nim` tells `gedit` that you intend to create or modify a file called `test.nim`. If `gedit` is not available, or if you work on a computer without a graphical user interface, then you may replace the `gedit` command with `nano`. While `gedit` opens a new window with a graphical interface, `nano` opens only a very simple interface in the current terminal. Notable text editors without a GUI include Vim or NeoVim. These are very powerful editors, but they are difficult to learn, and they might seem unconventional as they have both a command mode and an ordinary text input mode available. For NeoVim, there is very good Nim support available.

If you prefer not to work from a terminal, or if you are using Windows or macOS, you should have a graphical user interface that allows you to create a directory and launch an editor.

Once the editor is open, you can type in the Nim source code from our previous example and save it as `test.nim`. Afterward, you can close the editor.

Note that the `return` key behaves differently in editors than in the terminal window: In the terminal window, you type in a command and finally press the return key to "launch" or execute the command. In an editor, the `return` key behaves similarly to the other keys: if you press ordinary keys in your editor, the corresponding character is added to your text, and the cursor moves one position to the right. And when you press the `return` key, then an invisible newline character is inserted, and the cursor moves to the start of the next line.

Launching the compiler and running the program

If you are working from a Linux terminal, then you can type

ls -lt
cat test.nim

That is, you first list the content of your directory with the `ls` command, and then display the content of the Nim source code file that you have just typed in using the `cat` command.

Now type

nim c test.nim

This command invokes the Nim compiler and instructs it to compile your source code. The "c" letter is called an option or a sub-command. It tells the Nim compiler to compile your program and to use the C backend to generate an executable.

The compiler should display a success message almost immediately. If it displays error messages instead, you should relaunch Gedit or Nano, correct your typing error, save the modified file, and recompile.

When the source text is successfully compiled, you can run your program by typing

./test

In your terminal window, a number will be displayed, which is the sum of the numbers 1 to 100.

You may wonder why you have to type the prefix `./` in front of the name of your generated executable program, as you can launch most other executables on your computer without such a prefix. The prefix is generally needed to protect you and your computer from erroneously launching a program in the current directory while you intended to launch a system command. Imagine you downloaded a zip file from the internet, extract it, `cd` into the extracted directory, and type `ls` to see the directory content. Now, imagine that the directory contains an executable named ls, which is executed instead of the system ls. That foreign ls command could potentially damage your system. So, to execute non-system commands, you generally have to use the prefix `./`, in which the period refers to the current directory. Of course, you can install your own programs in a way that you don’t need such a prefix anymore. If you’re unsure how to do this, consider seeking help from someone experienced.

If you haven’t been able to open a terminal to invoke the compiler, you might want to consider installing an advanced editor like VS-Code. These editors typically have the capability to launch the compiler and run the program directly within the editor.

The command

nim c test.nim

is the most basic compiler invocation. The extension .nim is optional, the compiler can infer that file extension. This command compiles our program in default debug mode; it uses the C compiler back end and generates a native executable. Debug mode means that the generated executable includes many checks, such as `array` index checks, range checks, `nil` dereference checks, among others. The generated executable may not run very fast, and it will be large, but if your program has bugs, then it will provide a meaningful error message in most cases. Only after you have carefully tested your program, you may consider compiling it without debug mode. You may do that with

nim c -d:release test.nim

nim c -d:danger test.nim

The compiler option `-d:release` removes most checks and debugging code, and it enables the backend optimization by passing the option "-O3" to the C compiler backend, resulting in a very fast and small executable file. The option `-d:danger` includes `-d:release` and removes all checks. You should be aware that compiling with `-d:danger` means that your program may crash without any useful information, or even worse, it may run, but contain uncaught errors like overflows, which could lead to incorrect results. Generally, you should compile your program with plain `nim c` first. After you have tested it well, and if you need the additional performance, you may switch to the `-d:release` option. For games, benchmarks, or other non-critical tasks, you may try the option `-d:danger`, to get an executable without any checks for utmost performance.

There are many more compiler options. You can find explanations for them in the Nim language manual, or you can display them using the commands `nim --help` and `nim --fullhelp`. An important new option is `--mm:arc`, which enables the new deterministic memory management. You could combine `--mm:arc` with `-d:useMalloc` to disable Nim’s own memory allocator. This reduces the executable size and enables the use of Valgrind to detect memory leaks. Similar to `--mm:arc` is the option `--mm:orc`, which can additionally deal with cyclic data structures. Another powerful option is `--passC:-flto`. This option is for the C compiler backend and enables link time optimization (LTO). LTO enables inlining for all procedure calls and can significantly reduce the final program size. In recent versions of the Nim compiler, `-d:lto` can be used instead of `--passC:-flto`. Furthermore, for Nim v2.0, `--mm:orc` is the default memory management strategy. It’s worth mentioning that you can also try the C++ compiler backend using the `cpp` sub-command instead of the plain `c` command. Additionally, you may compile with the CLang backend instead of the default GCC backend using the `--cc:clang` option. You can additionally specify the option `-r` to immediately run the program after a successful build. For testing small scripts, the compiler invocation in the form `nim r myfile.nim` can be used to compile and run a program without generating a permanent executable file. Here’s an example of how you can use all these options:

nim c -d:release --mm:arc -d:useMalloc --passC:-flto --passC:-march=native board.nim

In this example, `-march=native` is additionally passed to the C compiler backend to enable the use of the most efficient CPU instructions of your computer. This could result in an executable that won’t run on older hardware. Of course, you can save all these parameters in configuration files, eliminating the need to type them in for each compiler invocation. You may find more explanations for all the compiler options in the Nim manual, or in later sections of this book; this includes the options for the JavaScript backend.

Stropping for keywords and operators

Before concluding this introduction, we should mention that the Nim language supports stropping for keywords and operators by enclosing them in backticks. This way, it is possible to use Nim keywords like `type`, `from`, or `object` as ordinary symbols, for example, as variable or field names. Typically, we avoid using keywords as ordinary symbols. However, when interfacing with C libraries, there may be instances where these libraries use symbols that are keywords in Nim. So, instead of renaming the symbols, we could use a notation like `object` for a `proc` parameter name, or `from` for a field name. Actually, we have to use stropping when we define procedures and functions that serve as operators, as in the example `proc `*`(c: char; i: int): string = c.repeat(i)`.

Part II: The Basics

In this section of the book, we will introduce some of the most essential constructs of the Nim programming language. These include statements, expressions, conditional and iterative code execution, as well as functions, procedures, iterators, templates, and exceptions. We will also discuss various basic data types, including the container types: array, sequence, and string.

Declarations

In the Nim programming language, declarations serve a significant role by allowing us to define constants, variables, procedures, and even our unique data types. Declarations serve to inform both the compiler and the human reader about crucial attributes such as the name and data type of the variable we intend to use. Being a statically and strongly typed language, Nim requires this information for the compiler to function correctly. These declarations are not only useful to the compiler but also prove beneficial for us as programmers. They act as compact references, simplifying the process of understanding and managing the code. This is particularly valuable when collaborating with others, as it ensures clear communication and consistency in coding style, fostering a more effective development environment.[30]

We will explain the type and procedure declarations in later sections. For now, we will focus on constant and variable declarations.

A constant declaration in its simplest form maps a symbolic name to a value, like

const Pi = 3.14159

We use the reserved keyword `const` to inform the compiler that we want to declare a constant named `Pi` and assign it the numeric decimal value `3.14159`. Nim has a small set of reserved keywords such as `var`, `const`, `proc`, and `while`, among others, which tell the compiler that we want to declare a variable, a constant, a procedure, or that we want to use a `while` loop for some repeated code execution. Reserved keywords in Nim are specific symbols that hold special significance for the compiler. Therefore, we should avoid using these symbols as names for other entities such as variables, constants, or functions to prevent confusion for the compiler. The symbol `=` is the assignment operator in Nim; it assigns the value or expression on its right side to the symbol on its left. You have to understand that this assignment operator is different from the equal sign we may use in mathematics to express an equality relation. Some languages, like Pascal, initially used the compound operator `:=` for assignments. However, this can be challenging to type and may confuse individuals unfamiliar with it. Since source code typically contains many assignments, using the symbol `=` is quite sensible. For the actual equality test of two entities, which is not used that often, we use the compound `==` operator in Nim, as in most other programming languages including C and Python. We call `=` an operator. Operators are symbols that execute basic operations, like `+` for addition of two numbers, or `=` for assignment of a value to a symbol. Most operators are used as infix operators between two arguments, as in the expression `2 * Pi`, which denotes the multiplication of the named constant `Pi` with the literal number `2`, resulting in the floating-point value `6.28318`. However, operators can also function as unary operators, such as in `-Pi` where in unary minus inverts the sign of a numeric value. When declaring named constants, we must always assign a value immediately. That value can never change, but of course, we can use the named constant in expressions to derive different values, as in

const Pi = 3.14
const TwoPi = 2 * Pi
const MinusPi = -Pi

When declaring constants, you can also specify the exact data type of the constant value, as in

const Pi: float = 3.14
const Two: float = 2

Typically, specifying the type isn’t necessary, as Nim employs type inference. From the literal value `3.14`, it is obvious that it is a decimal floating point number. For the second line, type inference would conclude that the constant `Two` is of integer type, as no fractional part is given. In this case, we can specify the desired data type after the name of the constant, separated by a colon. Alternatively, we could write `const Two = 2.0`. When dealing with numeric expressions with constants, the Nim compiler performs intelligent automatic type promotion. For instance, when given the expression `const TwoPi = 2 * Pi`, Nim assumes that what we actually intended was `const TwoPi = 2.0 * Pi`.

For numeric expressions with variables, this type-promotion is stricter. It aims to avoid unnecessary type conversions at runtime and to ensure that the final program truly utilizes the intended data types.

As mentioned in Part I of the book, we usually place a space on either side of an operator when we use it in infix notation between two operands. This convention improves the readability of the source code. As mentioned before, in Nim, spaces can sometimes change the interpretation of an expression. This is because Nim adheres to the conventions of handwritten notation. For instance, `a + -b` is significantly different from `a+-b`. We will discuss these notations in later sections of the book in more detail.

With the aforementioned constant declaration, we can use the symbol `Pi` in our program’s source code, eliminating the need to remember or retype the exact sequence of digits. Utilizing named constants, such as `Pi` from our previous example, simplifies value modification. If we need more precision, we can update the exact value of `Pi` in one place in our source code, rather than searching for the digit sequence `3.14` throughout our code files.

For numeric constants, such as our `Pi` value, the compiler will substitute the symbol with its actual numeric value in the source code during compilation.

Expressions assigned to constants are already evaluated at compile time. Thus, complicated constant expressions do not negatively impact the program’s performance. The expressions can contain simple operations like basic math, and most Nim functions can be used as well, but functions like `sin()` from external C libraries might currently be unavailable.

Variable declarations are more complex because they require the compiler to reserve a specific named storage location:

var velocity: int

In this case, we place the reserved keyword `var` at the start of the line to indicate to the compiler that we are declaring a variable. We then give the variable our chosen name, followed by a colon and the data type. The `int` type is a predefined numeric data type indicating a signed integer. The storage capacity of an integer variable depends on the operating system of your computer. On 32-bit systems, 32 bits are used, and on 64-bit systems, 64 bits are used to store one single integer variable. This range is adequate even for large signed integers, with a range from `-2^31` to `2^31 - 1` for 32-bit systems, and from `-2^63` to `2^63 - 1` for 64-bit systems.

While we generally use lower-case names for variables, the names of constants can start with an uppercase letter as well.

Variables declared using the `var` keyword act as simple containers, storing a value which can be accessed or modified later. We can assign an initial value to the variable immediately when we declare it, similar to how we do it for constants, or we can assign the value later. If no actual value is assigned to the variable, it assumes a default value, which for numeric variables is zero:

var start: int
var stop: int
var delta: int = 3
stop = 10 * start + 1

In the first and second lines, we declare two variables, `start` and `stop`, both of which initially hold the default integer value of zero. In the third line, we declare one more integer variable called `delta`, to which we assign an initial value of `3`. And finally, in the fourth line, we assign an integer expression to the variable `stop`. Nim offers more variants for variable declarations, which we will discuss shortly. These include utilizing type inference when immediately assigning an initial value, using `var` sections to declare multiple variables without repeating the `var` keyword, listing multiple names of the same data type in front of the colon separated by commas, or using the `let` keyword to declare immutable variables.

Nim v2.0 introduces the `strictDefs` pragma, which can enforce variable initialization. This helps avoid errors that might occur when variables default to zero but require a different initial value. The strictDefs pragma, along with other new features of Nim 2.0, is described in detail in the book’s Appendix.

In some Nim documentation, as well as in this book, the terms declaration and definition are used interchangeably, although this isn’t entirely accurate. Specifically, a declaration is a statement that announces the existence of something, whereas a definition provides a more detailed description. In the C programming language, there is a distinction between a function declaration, which only describes the function’s name and the number and types of its parameters, and a function definition, which also specifies the names of the function parameters and the source code of the function body. In Nim, function declarations are not commonly used because they are only necessary when two functions call each other. In such cases, we declare the first function, enabling us to use it in the definition of the other function before we finally also define the first function. For other entities, such as constants, variables, data types, or modules, the distinction between declaration and definition is less meaningful. Therefore, these terms are often used interchangeably.

Statements

Statements, or instructions, are a core component of Nim programs; they tell the computer what it shall do. Often statements are procedure calls, like the call of the `echo()` or `inc()` procedure, which we have already seen in Part I of the book. We will learn what procedures exactly are in later sections. For now, we can consider procedures as entities that perform specific tasks when we call (or invoke) them. We invoke them by writing their name in our source code file, followed by a list of parameters, or arguments. When we write `echo 7`, `echo()` is the procedure that we call, and `7` is the argument — an integer literal in this case. When the parameter list includes more than one argument, we separate the arguments with a comma and typically add an optional space afterward. As a result of our procedure call, the decimal number `7` is written to the terminal window when we execute the compiled program. The parameter list can be empty, and the parameters can be expressions, that may again contain function calls like `echo sin(0) + 2.0`. In contrast to languages like C, where the parameter list must always be enclosed in brackets, Nim often allows us to omit the brackets — a feature known as the command invocation syntax.

const SquareOfFive = 5 * 5
echo(5 * 5, SquareOfFive) # ordinary procedure call
echo 5 * 5, SquareOfFive # command invocation syntax

The command invocation syntax is typically used with the `echo()` procedure, or when a procedure has only a single argument. For multiple arguments, or when the argument is a complicated expression, the use of brackets is preferable. In some programming languages, like C, coding styles may suggest placing a space between the function name and the opening bracket. For Nim, we should not do that, the reason will become clear when we later explain the `tuple` data type. A few procedures have no parameters at all. When we call these procedures, we always have to use the syntax `myProc()` with an empty pair of brackets to make it clear to the compiler that we want to call that procedure. The statement `res = myProc()` assigns the result of the procedure call to `res`, while `res = myProc` assigns the procedure itself to `res`, which is a significantly different operation.

Functions are a special form of procedures that return a value or a result. For instance, in mathematics, `sin()` or `cos()` are functions — we pass an angle as an argument and they return the sine or cosine of that angle, respectively. On the other hand, the `echo()` procedure, which prints the arguments, is not a function as it doesn’t return a result.

Let’s examine this minimal Nim program:

var a: int
a = 2 + 3
echo a
echo(cos(0) + 2)

The Nim program above consists of a variable declaration and three statements: in the first line, we declare the variable we want to use. In the next line, we assign the value `2 + 3` to it, and finally, in line 3 we use the procedure `echo()` to display the content of our variable in the terminal window. In the last line, we once again use the `echo()` procedure with a conventional parameter list enclosed in brackets. The parameter list contains a single argument, which is the sum of a function call to `cos(0)` and the literal value `2`. Here, the compiler would first call `cos(0)`, then add the literal value `2` to that result, and finally pass the sum to the `echo` procedure to print the value. [31]

Nim programs are generally processed from top to bottom by the compiler, and they also execute in the same order after successful compilation. A consequence of this is that we have to write the lines of the above program exactly in that order. If we moved the variable declaration down, then the compiler would complain about an undeclared variable because the variable is used before it has been declared. If we exchanged lines 2 and 3, then the compiler would be still satisfied, and we would be able to compile and run the program. However, the output would be significantly different because the uninitialized value of the variable `a` would be displayed first and only then would it be assigned a value.

When we have to declare multiple constants or variables, we can use a block. That is, we write the keyword `var` or `const` on its own line, followed by the respective declarations as shown:

const
  Pi = 3.1415
  Year = 2020
var
  sum: int
  age: int

These blocks are also referred to as sections, for example, `const` section or `var` section, as is customary in Wirthian languages. Take note of the indentation — the lines following `const` and `var` begin with a few spaces, forming an indented block that allows the compiler to identify the end of the declaration. Typically, we use two spaces for each level of indentation. While other numbers of spaces can be used, it’s essential to maintain consistency in the indentation scheme. Two spaces are generally recommended as they are easily recognizable in the source code and do not consume excessive space; thus, they do not create overly lengthy lines that may not fit on the screen.

Also note that in Nim, we generally write each statement on its own line. The line break indicates to the compiler that the statement has ended. Special statement delimiters as the `;` in C are not require at the line end, but can be used to separate multiple statements on the same line. There are a few exceptions to this rule — for example, long mathematical expressions can continue on the next line. Generally, when a line ends with a punctuation character, and the next line is indented, the compiler recognizes the continuation. (for more details, refer to the Nim manual). Multiple statements can also be put on a single line by separating them with a semicolon:

var a: int
echo a; inc(a) (1)
a = 2 * a + (2)
  a * a
1 Here, two statements are separated by a semicolon on a single line.
2 A longer math expression split over multiple lines. An operator as the last character on a line indicates that the expression continues on the next indented line.

It is also possible to declare multiple variables of the same type in a single declaration, as shown below:

var
  sum, age: int

Alternatively, we can assign an initial start value to a variable as shown in the example below:

var
  year: int = 1900

Nim also currently supports the initialization of multiple variables with the same value:

var
  i, j: int = 1

Here, both `i` and `j` would get the initial value `1`. However, this notation is often avoided as it may not be immediately clear to all readers.

Lastly, we can use type inference for variable declarations when an initial value is assigned, as shown in the example below:

var
  year = 1900

The compiler recognizes in this case that we assign an integer literal to that variable, and so silently gives the variable the `int` type for us. Type inference can be convenient, but it might make the source code more difficult for readers to understand, or the type inference might not always yield the expected results. For example, in the above code, `year` gets the data type `int`, which is a signed 4 or 8-byte number. However, we might prefer an unsigned number or a number that occupies only two bytes in memory. For the final executable, it makes no difference whether a variable received its runtime type through direct user specification or by the use of type inference, as long as the actual data type is the same. Although the use of type inference may slightly increase the compile time for our source code, this increase is typically negligible.

Note: For integral data, we mostly use the `int` data type in Nim, which is a signed type with a 4 or 8-byte size. It usually does not make sense to use many different integral types — signed, unsigned, and types of different byte sizes. Mixing them in numerical expressions can be confusing and potentially even decrease performance, because the computer may have to do type conversion before it can do the math operation. Another problem associated with unsigned types is that mathematical operations on unsigned operands could yield a negative result. Consider the following example, where we use a hypothetical data type "unsigned int" to indicate unsigned integers:

var a, b: unsigned int
a = 3
b = 7
a = a - b

The true result should be `-4`, however, `a` is an unsigned type and cannot contain a negative value. So, what should happen — an incorrect result or a program termination?

Another aspect related to variable declarations is the initial value of variables. Upon declaration, Nim resets all the bits of our variables. This means that numerical variables automatically have an initial value of zero unless we assign a different value in the variable declaration.

In this declaration

var
  a: int = 0
  b: int

both variables get the initial value of zero.

We have already mentioned that Nim 2.0 introduces the strictDefs pragma, which enforces explicit initialization. That is explained in more detail in the Appendix where we summarize all the new 2.0 features.

There is a variant for variable declarations that uses the `let` keyword instead of the `var` keyword. `Let` is used when we need a variable that gets assigned a value only once, while `var` is used when we anticipate changing the content of the variable during the program execution. We say that we use `var` to create mutable variables, and `let` to create immutable variables. `Let` seems to be similar to `const`, but in `const` declarations, we can only use values that are known at compile time. `Let` permits us to assign values to variables that become available only at program runtime, possibly because the value derives from a previous calculation. However, `let` also indicates that the assignment occurs only once, and the content does not change later during the program’s execution. We refer to such a variable as immutable. The use of the `let` keyword can aid in understanding the source code and potentially help the compiler optimize for faster or more compact code. For now, we can just ignore `let` declarations and use `var` instead — later, we may use `let` where appropriate, and the compiler will tell us when `let` will not work, and we have to use `var`.

The way we declare constants, variables, types, and procedures in Nim is very similar to what was done in the Wirthian languages Pascal, Modula, and Oberon. Those familiar with languages like C sometimes argue that C’s variable declaration form, `int velocity;`, is more concise and superior compared to Nim’s `var velocity: int`. Indeed, in this case, the declaration is shorter. Some people prefer the data type written first, considering it more important than the variable’s name. This comes down to personal preference, and it should be noted that the C notation wouldn’t adequately distinguish between `var`, `let`, `const`, and `type` declarations.

With the knowledge we have gained in this section, we can rewrite our initial Nim example from Part I as follows:

const
  Max = 100
var
  sum, i: int
while i < Max:
  inc(i)
  inc(sum, i)
echo sum

In the code above, we declare both `int` type variables in a single line and take advantage of the compiler initializing them to `0`. We also use a named constant for the upper loop boundary. Another tiny fix is that we write `inc(i)` instead of `inc(i, 1)`. We can do that because there exist multiple procedures with the name `inc()` — one which takes two arguments, and one which takes only one argument and always increases that argument by one. Procedures with the same name but different parameter lists are referred to as overloaded procedures. Instead of `inc(i)`, we could have written also `i = i + 1`, and instead of `inc(sum, i)` we could write `sum = sum + i`. Either form would generate identical code in the executable, so it’s a matter of personal preference.

Input and output

We have already used the `echo()` procedure for displaying textual output in the terminal window. In previous code examples, we passed integer type arguments to the `echo()` `proc`. This procedure automatically converted these integers into a textual sequence of decimal digits for display in the terminal window. In the Nim programming language, text is represented by a predefined, built-in data type known as a `string`. We will delve into the details of the `string` data type in the next section. For now, it’s sufficient to know that it exists and we can use the `echo()` `proc` to print text strings. The `echo()` procedure is capable of automatically converting other data types, such as numbers or Boolean values (true/false), into human-readable text `strings` for terminal output. Recall that most data types are stored internally in our computer as bits and bytes, which have no true human-readable representation by default. Numbers, like most other data types stored in the computer, are essentially abstract entities. As we’ve learned, all data in a computer is stored internally in binary form, which means it’s stored as a bit pattern of `0s` and `1s`. However, even that bit pattern is an abstraction. We would require a procedure that prints a `0` for each unset bit and a `1` for each set bit to display the content of an internally stored number in binary form in the terminal or elsewhere. Similarly, we require a procedure to print an internally stored number as a human-readable sequence of decimal digits. Even text `strings` are stored internally as abstract bit patterns and require conversion procedures to be rendered as readable text. The `echo()` procedure is capable of accomplishing all this, although we will not delve into these details at this point.

For our subsequent experiments, we may want to input some user data in the terminal. As we do not know much about the various available data types and the procedures that can be used to read them in, we will just present a procedure that can read a text `string` that the user types in the terminal window. We will utilize the `readLine()` function for this task.

echo "Please enter some text"
var mytext = readLine(stdin)
echo "You entered: ", mytext

Please note that the `return` key must be pressed after entering your text.

The first line of our program demonstrates how we can print a literal text `string` with the `echo()` `proc`. To mark text literals unambiguously and to separate them from other literals like numeric literals or from variables, the `string` literals have to be enclosed in quotation marks. In the second line of our example program, we use the `readLine()` function to read textual user input. Note that we call `readLine()` a function, not a procedure, to emphasize that it returns a value. The `readLine()` function requires one parameter to specify the source of the input — for instance, the terminal window or a file. The `stdin` parameter directs the function to read from the current terminal window. Notably, `stdin` is a global variable of the `system` (`io`) module and represents the standard input stream. Finally, in line 3 we use again the `echo()` `proc` to print some text. In this case, we pass two arguments to `echo()`: a literal text enclosed in quotes, and the `mytext` variable, separated by a comma. The `mytext` variable has the data type `string`. In this example, we employed type inference to declare the data type. Since the `readLine()` function always returns a `string`, which is known to the compiler, our `mytext` variable is automatically declared as a `string`. We will learn more about the data type `string` and other useful predefined data types in the next section.

Nim supports the method call syntax, which was previously known as Uniform Function Call Syntax in the D programming language. With this syntax, we can write procedure calls in the form `a.f` instead of `f(a)`. We will discuss this syntax in more detail when we explain procedures and functions. For now, it’s sufficient to be aware of this syntax, as we may utilize it in some places in the subsequent sections. For example, for the length of text `strings`, we generally write `myTextString.len` instead of `len(myTextString)`. Both notations are entirely equivalent.[32]

When you try the example code from above, you might want a variant that reads the textual input not on its own line but directly after the prompt, such as 'What is your name: Nimrod'. As the `echo()` `proc` always writes a newline character after the last argument has been written, we have to use a different function to get the input prompt on the same line. We can use the `write()` `proc` from the `system` module for this. As `write()` can not only write to the terminal but also to files, it needs an additional parameter that specifies the destination. We can pass the variable `stdout` from the `system` module to indicate that `write()` should write to our terminal window. Often, beginners also desire the ability to read single-character input without the additional need to press the return key. For that, we can use the `getch()` function from the `terminal` module — that function waits (blocks) until a key is pressed and returns the ASCII character of the pressed key:

from std/terminal import getch

stdout.write("May you tell me your name: ")
var answer = readLine(stdin)
if answer != "no":
  echo "Nice to meet you, ", answer
echo "Press any key to continue"
let c = getch()
echo "OK, let us continue, you pressed key:", c

Don’t be misled by the fact that the first `write()` call and the subsequent `readline()` call do not appear on the same line in our example. In this case, the actual format of our source code does not influence the program output. We could write both function calls on a single line, separated by a semicolon. But that would make no difference for the program output. The key difference between the two function calls above is that `write()` prints the text without advancing the cursor to the next line in the terminal window, while `echo()` does so once all arguments have been printed. We say that `echo()` prints automatically a '\n' character, which we call a newline character, after all the arguments have been printed.

Data types

Nim is a statically typed programming language, which means that all variables have a well-defined data type, and this data type does not change during program execution. Moreover, we say that Nim is a strongly typed language, meaning that it does nearly no automatic type conversions when variables are assigned to each other or used in expressions or as arguments in function calls. Automatic type conversion may seem beneficial at first, but it can easily introduce errors or degrade the performance of our programs.

The most fundamental data type — in real life and in computer science — is the integer (whole) number. All other numeric data types, like fractional, floating-point, or complex numbers, and other fundamental types like the boolean type with its two values `true` and `false`, and character and text `string` types, can be represented as integers. For that reason, both the early computers built in the 1950s and today’s smallest microcontrollers work internally only with integer numbers. The integer data type is not only crucial for arithmetic operations, but it is also used as an index to access elements in data structures such as `arrays`. Furthermore, integer numbers are often interpreted as bit vectors to represent `set`-like data types. As all CPUs are able to do basic bit operations like setting or clearing individual bits, and as bit patterns map well to mathematical sets, `set` data types are well-supported by all CPUs, and so `set` operations are generally very efficient. Advanced computers, built in the 1980s, received support for the crucial class of floating-point numbers through specialized floating-point processors for fast numerical computations. Today, these floating-point units are typically integrated into the CPU, and GPUs can even process many floating-point operations in parallel. However, the precision of GPUs is typically limited to the ranges needed for games and graphic animations; that is, 32- or even 16-bit. Modern CPUs often also have some form of support for vector data types to process multiple values in one instruction (SIMD, single instruction, multiple data).

Non-numeric types like characters or text strings are internally represented by integer numbers. In the C language, the data type to represent text `strings` is called `char`, but it is indeed only an 8-bit integer type that supports all the mathematical operations defined for ordinary integer types. In Nim and the Wirthian languages, most math operations are not directly allowed for the `char` data type, which helps prevent misuse and allows the compiler to catch logical errors.

Nim also supports several built-in homogeneous container types like arrays and sequences, along with numerous built-in derived types like enumeration types, sub-ranges and slices, `distinct` types, and view types (experimental). The built-in inhomogeneous container types `object` and `tuple`, which allow grouping of other types, are complemented by a `variant` type container, which allows instances of that type to contain different child types at runtime. These inhomogeneous container types are similar to the struct and union types from the C programming language.

Other basic and advanced data types like complex and fractional numbers, types with arbitrary-precision arithmetic, as well as hash sets and hash tables, dynamically linked lists, or tree structures are available through the Nim standard library or external packages. Of course, we are also able to define our own custom data types with our own operators, functions, and procedures working on them.

Note that all the data types that are built into the language, like the primitive types `int`, `float`, or `char`, as well as the built-in container types like `tuple`, `object`, `seq`, and `string`, are written in lower case, while data types that are defined by the Nim standard library or that we define ourselves, by convention, start with a capital letter like the `CountTables` type defined in the `tables` module. Some people may regard this as an inconsistency, while others may say that this distinction allows us to differentiate built-in types from types defined by libraries.

At least, we can agree that using capital notation for common types such as `Int`, `Float`, or `String` would be more difficult to type and wouldn’t look as nice.

Integer types

We’ve already mentioned the `int` data type, a signed integer that can be either `4` or `8` bytes depending on the operating system. The reasoning behind Nim’s `int` size depending on the OS word size will become clearer as we explore concepts of references and `pointers`. For now, let’s provide a brief explanation for readers already familiar with `pointers` and their role in memory addressing. If you’re unfamiliar with `pointers`, feel free to skip this section. The reasoning behind Nim’s `int` size dependency on the OS lies in memory addressing. A 32-bit OS can generally address `2^32` bytes (which equals 4 GBytes), limiting `pointers` and references to `32` bits. Having more bits wouldn’t be practical. Integers often serve as indices for `arrays` and sequences, interacting with computer memory in ways similar to `pointers` and references. So, in a 32-bit OS with 32-bit pointers, 32-bit integers are sufficient as `array` indices since an `array` cannot have more than `2^32` entries. In contrast, a 64-bit OS, equipped with 64-bit `pointers`, might require 64-bit integers as indices for larger `arrays` and sequences. However, exceptions exist. There could be scenarios where 32-bit integers are sufficient on a 64-bit OS, or situations on a 32-bit OS requiring 64-bit integers, such as for extensive counting tasks. These considerations led to some advocating for a configurable `int` type of either `32` or `64` bits. Similarly, some proposed a user-defined `float` type of `32` or `64` bits. Yet, Nim’s `int` type is OS-determined, and its `float` type is invariably `64` bits. This approach represents a pragmatic solution. For other sizes, one can use the `int32`, `int64`, `float32`, and `float64` data types, which offer user-defined sizes.

Besides the `int` data type, Nim has some more data types for signed and unsigned integers: `int8`, `int16`, `int32`, and `int64` are signed types with well-defined bit and byte size, and `uint8`, `uint16`, `uint32`, and `uint64` are the unsigned equivalents. The number at the end of the type name indicates the bit size; we can calculate the byte size by dividing this value by `8`. Additionally, we have the type `uint`, which corresponds to `int` and has the same size, but stores unsigned numbers only. [33] Generally, we should try to use the `int` type for all integral numbers, but sometimes it can make sense to use the other types. For example, if you have to work with a large collection of numbers, know that each number is not very big, and your RAM is not really that large, then you may decide, for example, to use `int16` for all your numbers. Or when you know that your numbers will be huge and will not fit in a 4-byte integer, then you may use the `int64` type to ensure that the numbers fit in that type even when your program is compiled and executed on a computer with a 32-bit OS.

For integer numbers, we have the predefined operators `+`, `-`, and `*` available for addition, subtraction, and multiplication. Basically, these operations work as one might expect, but it’s important to remember that overflows can occur. For signed integers, we get compile- or run-time errors in that case, while unsigned integers just wrap around, see the example at the end of this section. For the division of integers, we have the operators `div`, `mod`, and `/` available. The `div` operator does an integer division ignoring the remainder, `mod` is short for modulus and gives us the remainder of the division, and `/` finally is currently only predefined for the signed `int` type and gives us a fractional result of data type `float`. That type is introduced in the next section.

It can be challenging to remember how `div` and `mod` behave when either the divisor or dividend is negative, as this behavior may vary across different programming languages. You can find a detailed and justified explanation for this specific behavior in the Nim manual and on Wikipedia.

Integer division for positive and negative operands
Result of i div j
   -4 -3 -2 -1  0  1  2  3  4
-4  1  1  2  4    -4 -2 -1 -1
-3  0  1  1  3    -3 -1 -1  0
-2  0  0  1  2    -2 -1  0  0
-1  0  0  0  1    -1  0  0  0
 0  0  0  0  0     0  0  0  0
 1  0  0  0 -1     1  0  0  0
 2  0  0 -1 -2     2  1  0  0
 3  0 -1 -1 -3     3  1  1  0
 4 -1 -1 -2 -4     4  2  1  1

Result of i mod j
   -4 -3 -2 -1  0  1  2  3  4
-4  0 -1  0  0     0  0 -1  0
-3 -3  0 -1  0     0 -1  0 -3
-2 -2 -2  0  0     0  0 -2 -2
-1 -1 -1 -1  0     0 -1 -1 -1
 0  0  0  0  0     0  0  0  0
 1  1  1  1  0     0  1  1  1
 2  2  2  0  0     0  0  2  2
 3  3  0  1  0     0  1  0  3
 4  0  1  0  0     0  0  1  0

When performance matters, we generally should try to use the "CPU native" number type, which for Nim is the `int` type. Furthermore, we should try to avoid using math expressions with different types, as the CPU may have to do type conversion in that case before the math operation can be applied. Adding two `int8` types on some CPUs can be slower than adding two `ints`, because the CPU may have to size extend the operands before the math operation is performed. But this depends on the actual CPU, and there are important exceptions: Multiplying two ints would result in an int128 result if the `int` size is 64 bits, which can be slow if the CPU does not support that operation well. Another essential factor to consider for maximum performance is cache usage. If you are performing operations on a large set of data, then you may get a significant performance gain when large fractions of your data fit in the caches of your computer, as cache access is much faster than ordinary RAM access. So using smaller data types, i.e. `int32` instead of Nim’s default int, which is int64 on a 64-bit OS, may increase performance in this special application.

When we use Nim on tiny microcontrollers, maybe even on 8-bit controllers like the popular AVR devices, it is recommended to use only integers of well-defined size like `int8`.

When we write integer literal numbers, we generally use our common decimal notation, as in `var i = 100`. To increase the readability of long number literals, we can use the underscore character as in `1_000`; that underscore character is just ignored by the compiler. We can also write integer literals in binary, octal, or hexadecimal notation. For that, we prefix the literal value with `0b`, `0o`, or `0x`. The leading zero is necessary, and the next letter indicates a binary, octal, or hexadecimal encoding. But such integer literal notation is very rarely used.

What’s more important is the actual size of integer literals, especially when we use type inference. Ordinary integer literals have the int type, but integer literals not fitting in 32 bits have `int64` type. We can also specify the type of integer literals by appending the literal with `i8`, `i16`, `i32`, or `i64` for signed types and with `u`, `u8`, `u16`, `u32`, or `u64` for unsigned types. We can separate the actual number from the suffix with a `'` character, although this is not necessary for integer literals.

var
  a = 100 # int literal in decimal notation
  b = 1234567890000 # int64
  c = 5'i8 # 8-bit integer
  d = 7u16 # unsigned integer with 2 byte size
  e = 0b1111 # ordinary integer in binary notation, value is 15 in decimal notation
  f = 0o77 # integer in octal notation, value is 7 * 8^0 + 7 * 8^1 in decimal notation
  g = 0xFF # integer in hexadecimal notation

echo g, typeof(g)

In arithmetic expressions, integer types of different sizes are generally compatible when all the types are either signed or unsigned. For example, in the code provided above, we could write `echo a + b + c`, and typeof(a + b + c) would be `int64`. This means that the expression is propagated to the largest type of all the involved operands. However, `echo a + b + c + d` would not compile because it’s not clear whether signed or unsigned arithmetic should be used when there’s a mix of signed and unsigned operands. It’s also worth noting that `echo typeof(a) is typeof(b)` would print `false`, even on a 64-bit OS.

An important property of the Nim implementation, by A. Rumpf, when used with the C backend, is that unsigned integers do not generate overflow errors but simply wrap around:

var x: int8 = 0

while true:
  inc(x)
  echo x

The code above would print the numbers `0` through `127`, then terminate program execution due to an overflow error. But when we change the data type to `uint8`, we would get a continuous sequence of the numbers `0` up to `255`. After the value `255` is reached, the value wraps around to `0` again and the process continues. This behavior can lead to strange bugs and is one of the reasons why the Nim team generally recommends avoiding unsigned integers.

For compatibility with external libraries, Nim has also the integer types `cint` and `cuint`, which exactly match the C types `int` and `uint` when we compile for the C or C++ backend. These types may also be available for the JavaScript backend, the LLVM backend, and other backends. For details, you should consult the compiler documentation. For most operating systems and C compilers, the `int` and `uint` types in C are `4` bytes in size. However, there can be exceptions, so it would be better not to write code that depends on the actual byte size of these types. The Nim types `cint` and `cuint` are mainly used only for parameter lists of (C) library functions. To match other C integer types like `char`, `short`, `long`, `longlong` Nim supports these types when we put a c letter in front of the name like clong. Again, you should consult the Nim language manual if you need more details, for example, when you create bindings to external libraries.

Floating-point types

Another important numeric data type is `float`, for floating-point numbers. `Floats` are approximations of real numbers. They can also store fractions and are most often printed in the decimal system with a decimal point, or in scientific notation with an exponent. Examples of the use of variables of the `float` data type are

var
  mean = 3.0 / 7.9
  x: float = 12
  y = 1.2E3

The result of the division of two `float` literals is assigned to `mean` — this result is also of the data type `float`, allowing the compiler to infer the same type for `mean`. If we printed the result of the division, there would be a decimal point and some digits following it. For variable `x` we specify the `float` type explicitly and assign the value `12`. We could use type inference if we assigned `12.0`, as the compiler can recognize from the decimal point that we want a `float`, not an `int` variable. In line 3 we use scientific notation for the `float` literal that we assign to `y`, and the assigned value is `1.2 * 10^3 = 1200.0`. Literal values, like `2E3`, are also valid `float` literals — the value would be `2000.0`. But literals with a decimal point and no digits before or after the point — `1.` or `.2` — are not valid in Nim.

In the current Nim implementation, `float` variables always occupy 64 bits. Nim also has the data type `float64`, which is currently identical to plain `float`, and `float32`, which can only store smaller numbers and has less precision.[34] `Floats` can store values up to a magnitude of approximately `1E308` with a positive or negative sign, and they typically have a precision of 16 digits.[35] That is, when you do a division of two arbitrary `floats` and print the result, you will get up to 16 valid digits. If you try to print more than 16 significant digits, then the additional decimal places will be just some form of random garbage. Note: The number of significant digits of a floating-point number is the total number of digits before and after the decimal point, but possibly leading zero digits would not be counted. The reason that leading zeros are not significant is just that in the ordinary notation of numbers, we always assume that there is just nothing before the first non-zero digit. For our car odometer, `001234.5 km` is identical to `1234.5 km`. And whether we give our body size as `1.80 m` or `180 cm` makes no difference; both values have 3 significant digits.

Generally, we use floating point numbers whenever integers are insufficient for some reason. For example, when we have to do complicated mathematical operations which include fractional operands like `Pi`, or when we have to do divisions and need the exact fractional value.

The `float`, `float32`, and `float64` data types provide the `+`, `-`, `*`, and `/` operators for addition, subtraction, multiplication, and division. Unlike with the `int` types, we never get overflow or underflow errors with the `float` types, and also no error for a division by zero. But the result of an operation of two `float` operands can be a special value, like `system`.`Inf`, system.NegInf or system.NaN. The first two indicate over- or underflow, and `NaN` (Not a Number) indicates that the result of an operation is not a valid number at all, such as the result of a division by zero or the result of calculating the square root of a negative number. This behavior is sometimes called saturated arithmetic. When a variable has one of these special values and we apply further math operations, this value is kept. So we can detect at the end of a longer mathematical calculation if something went wrong — we have not to check after every single operation.[36] An interesting property of floating-point numbers is, that when we test two variables of `float` type for equality, and one has the value `NaN`, then the test is always false. That is, the test `a == NaN` is always false. If we forget this fact, we might initialize a `float` variable to the value `NaN` and later test with if a == NaN: to check if we have already assigned a value. However, this is not what we really intend, as that test will always yield a negative result. The actual test for the value `NaN` is `a == a`, which is only false when `a` has the value `NaN`; alternatively, we can use `math.isNaN()`. More useful constants and functions for the `float` data types can be found in the `std/fenv` module, and functions working with `floats` like the trigonometric ones are available from the `std/math` module.[37]

For `floats`, we have the operators `+`, `-`, `*`, and `/` for addition, subtraction, multiplication, and division. To calculate powers with integral exponents, you can use the `^` operator, but you must import it from the `std/math` module. The expression `x ^ 3` is the same as `x * x * x`. The `math` module contains many more functions like `sin()` or `cos()`, `sqrt()` and `pow()`. The function name `sqrt()` is short for square-root, and `pow()` stands for power, so `pow(x, y)` is `x` to the power of `y` when both operands have type `float`. For performance-critical code you should always keep in mind that `pow()` is an actual function call, maybe a call of a dynamic library that can not be inlined, so a call of `pow(x, 2)` is typically a lot slower than a plain `x * x`. Even when using the `^` operator, as in `x ^ 3`, we should be a bit critical. But of course, we always hope that the compiler will optimize all that for us.

The operators `+`, `-`, `*`, and `/` can also be used when one operand is a `float` variable and the other operand is an integer literal. In that case, the compiler knows that we really intend to do a `float` operation and converts the integer literal automatically to the `float` type. However, when one operand is a `float` variable and the other is an integer variable, an explicit type conversion is necessary, such as in `float(myIntVal) * myFloatVal`. For the type conversion, we treat the desired type as a function, as in `float()`. One explanation for why the `int` value is not automatically converted to `float` in this case is that this may result in a loss of precision, as large `int64` values cannot be represented exactly as a `float`. Well, this reasoning does not really apply for `int32`, but there is still no automatic conversion. Indeed, given that Nim is used as a systems programming language, requiring explicit conversions in this case seems to be a sensible decision, as it clarifies the programmer’s intention. Generally, it’s advisable to avoid operations with mixed types, as they may necessitate type conversions and potentially affect performance. If we really do not care, we may import the module `std/lenientOps`, which defines the arithmetic operations for mixed operands.

Floating-point literals default to the `float` data type, but, similar to integer literals, we can also explicitly specify the data type: The suffixes `f` and `f32` specify a 32-bit `float` type, and `d` and `f64` specify a 64-bit type. We can separate the suffix from the actual number with a `'` character, but that is not required as long as there is no ambiguity. We can also specify `float` literals in binary, octal, or hexadecimal notation when we append one of these suffixes. In the case of hexadecimal notation, the `'` is obviously needed to separate the suffix, as `f` and `d` are valid hex digits.

Similar to integer variables, Nim also supports the compatible types `cfloat` and `cdouble`, which match the C types `float` and `double` when the C backend is enabled. For most C compilers, C `float` matches Nim’s `float32` and C double matches Nim’s `float64`.

Some CPUs and C compilers also support additional floating-point types beyond the common `float32` and `float64`. Intel x86 compatible CPUs generally support `float80` math operations, and the GCC C compiler may support `float128`. However, these types are not yet supported by the Nim compiler developed by A. Rumpf. There may, however, be external packages that support these types by calling C functions when the C backend is used.

Two important properties of `floats` are that not all numbers can be represented exactly, and that math operations are not absolutely accurate. Recall that in our decimal system, some fractions like `1/2` can be represented exactly as `0.5` in decimal notation, while others like `1/3` can be only approximated as `0.3333…​` Like all data, `floats` are stored internally in binary form, following the IEEE Standard for Floating-Point Arithmetic (IEEE 754). In that format, some values, such as `0.1`, cannot be represented exactly. As a consequence, some simple arithmetic operations executed on the computer may not give us the exact result we expect. It’s crucial to remember this fact, and to illustrate it, we will investigate this behavior with a small example program. In this program, we will divide a few small integers, converted to `float`, by another integer `j`, also converted to `float`, and sum the result `j` times:[38]

for i in 1 .. 10:
  echo "--"
  for j in 2 .. 9:
    let a = i.float / j.float
    var sum: float
    for k in 1 .. j:
      sum += a
    echo sum

which generates this output:

--
1.0
1.0
1.0
1.0
0.9999999999999999
0.9999999999999998
1.0
1.0
--
2.0 # for all iterations!
--
3.0 # for all iterations!
--
4.0
4.0
4.0
4.0
4.0
3.999999999999999
4.0
4.000000000000001
--
5.0
5.0
5.0
5.0
5.0
5.0
5.0
4.999999999999999
--
6.0
6.0
6.0
6.0
6.0
5.999999999999999
6.0
6.0
--
7.0
7.0
7.0
7.0
7.000000000000001
7.0
7.0
7.0
--
8.0
8.0
8.0
8.0
7.999999999999999
7.999999999999998
8.0
8.000000000000002
--
9.0 # for all iterations!
--
10.0
10.0
10.0
10.0
10.0
10.0
10.0
9.999999999999998

The `echo()` procedure prints up to 16 significant digits of a `float` value, making the accumulated tiny arithmetic errors visible. Given our previous remarks, this should no longer be surprising; the general solution is to round results to fewer than 16 decimal digits before printing. Various ways to do that will be shown later in the book. A related issue of `float` arithmetic is caused by scaling and extinction. When we add numbers with very different magnitudes, the result can be just the value of the largest number, as in `echo 1.0 == 1.0 + 1e-16`, which prints `true`. The tiny summand is just too small to actually change the result. This is similar to when you switch on a torch on a sunny day; it will not really become brighter. Perhaps more surprising is that calling `echo()` with some simple `float` literals will print a different value, such as when `echo 66.04` which gives `66.04000000000001` for Nim v2.0, while with Python3 we get `66.04` exactly. However, this is only surprising for people who do not fully understand what a statement like `echo 66.04` does: We already know that the value `66.04` is converted by the compiler to an internal binary representation, and then converted back to a decimal `string` when we run the program. Thus, it’s not surprising that some tiny inaccuracies can accumulate in this process. Actually, it should be possible to achieve exactly 16 digits of precision when a sophisticated conversion routine, such as the Ryu or DragonBox algorithm is used. We may still wonder why Python seems to consistently get it right. There are rumors that Python might be "cheating" with some post-processing to produce the `string` that the user may prefer.

From the above discussions, it should be clear that testing two `floats` for equality is often problematic. Instead of merely testing for equality, we can define a small epsilon value like `eps = 1e-14` and then write `(a - b).abs < eps`. This approach is generally good; it is frequently seen and often works, but not always. Imagine you write a program that processes chemical elements, and you work with atomic mass and radii. Consequently, the result of the above test could imply that all atoms in the periodic table have equal mass and size, especially when using the SI system with meter and kilogram as base units. So an equality test like

const eps = 1e-16 # an arbitrary relative precision
if (a == 0 and b == 0) or (a - b).abs / (a.abs + b.abs) < eps: # avoid div by zero

if (a - b).abs / (a.abs + b.abs + 1e-32) < eps: # a similar check, avoiding also a div by zero

can be a better solution in the general case. Whenever you need to perform a general equality test, consider the problem carefully and conduct some tests. The code provided above is merely an untested possible example.

The term machine epsilon is sometimes used in conjunction with floating-point numbers. This value is the difference between `1.0` and the next value representable by this data type, and is a measure for the floating-point precision of a computer system. Nim’s standard library provides a function, `almostEqual()`, that compares two `float` numbers based on this epsilon.

At the end of this section, some remarks about the performance of `float` data types compared to plain `ints`: On modern hardware like the popular x86 systems for the basic operations performance of `floats` and `ints` is very similar; addition, subtraction, and even multiplication is typically done in only one clock cycle, and division can be a bit slower. Even operations like `sqrt()` which have been regarded as slow in the past, are now close to a plain addition on modern hardware. As the CPU does its `float` arithmetic internally with 64 or even with 80 bits, `float32` is not faster than `float64`, as long as the operations are not memory bound, that is large data sets are processed so that it is an advantage when the data types are smaller so that more of it fits into the cache. For tiny microcontrollers and embedded devices, things are very different, as these devices typically lack floating-point units.

So the compiler has to emulate all the `float` arithmetic, maybe by the use of libraries. This is very slow and produces large executables. So when writing software for modern desktop PCs, there is no reason to try to avoid `float` math, when solving the problem with `float` is easier. When the data spans a wide range, for example, from nanometers to millions of kilometers, or when operations like square root or trigonometric functions are needed, there is typically no reason to avoid `float`. In cases where both `floats` and `ints` may work, it is generally a good strategy to initially try using `ints`. `Ints` may still provide better performance for SIMD, threading, and parallel processing, as `ints` may avoid the expensive saving of floating-point CPU registers. For restricted hardware, we should better try to avoid `float` math. For restricted hardware, it would be better to try to avoid `float` math. However, this is a complex topic, and this advice only provides some basic recommendations, which might not apply in every specific case. So finally you have to decide for yourself, and as always it is a good idea to do some performance tests. In the Appendix of this book, you can find a small test for the performance of various `int` and `float` operations in section Performance of multiplication vs. division.

References:

Distinct types

Before we continue with subrange types, we should introduce the `distinct` types. In the real world, there are many quantities for which the set of meaningful mathematical operations is restricted, and these should not be mixed with quantities of other types. For example, we may have physical quantities such as time and distance, measured in seconds and meters respectively, mapped to the `float` or `int` data type. While adding seconds and adding meters is a valid operation, adding seconds to meters makes no sense and would be a program bug if it should occur in the program code. However, dividing a distance by a time period, resulting in the average speed, would be a valid operation. Nim provides the `distinct` keyword, which allows the definition of new data types. These new types are based on existing ones but are not compatible with them or with other `distinct` types. The newly defined `distinct` types have no predefined operations; we have to define all desired operations ourselves.

type
  Time = distinct float # in seconds
  Distance = distinct float # in meters

var t: Time = 0.2 # not allowed
var t: Time = Time(0.2)

For `distinct` types, we have to define all the allowed operations ourselves. We can convert `distinct` types to the base types and then use operations of the base type, or we can borrow operations from the base type by use of the {.borrow.} pragma. Using `distinct` types can be complicated when the new type should support many operations, but it can make our code safer. For some data types with a very limited set of operations, `distinct` types can be used easily. `Distinct` types are explained in detail in the Nim language manual; we might explain them in more detail in later sections. For now, it is enough that we know about their existence.

Subrange types

Sometimes it makes sense to limit the range of numeric variables to only a sub-range. For this, Nim uses the `range` keyword with the following notation: `range[LowVal .. HighVal]`. Values of this type can never be smaller than LowVal or larger than HighVal. In Nim v2.0 we can also define range types by leaving out the `range[]`, that is, by using just two constants separated by `..`.

type
  Year = range[2020 .. 2023] # software update required at least for 2024!
  Month = range[1 .. 12]
  Day = 1 .. 31 # same as range[1 .. 31]

var a: int = 0
var d: Day = 1 # OK
d = 0 # compile-time error
d = a # run-time test and error
echo d

In the above example, the base type of the defined ranges is `int`. As a result, the ranges are compatible with the predefined `int` type, and we can assign values of `int` type to our `range` types, and vice versa. In our example, the size of the range types is the size of the `int` base type, but of course, we could use other base types, like `type Weekday = 1.int8 .. 7.int8`. If we try to assign to a range type a value that falls not into the allowed range, then we get a compile-time or run-time range error. This can help us to prevent or to discover errors in our programs. Note that whenever we use range types, the compiler may have to add additional checks to ensure that variables are always restricted to the specified range. This check is active in debug mode and also when we compile with the `-d:release` option. It is only ignored when we compile with `-d:danger` or when we explicitly disable range checks. Therefore, using a large number of range types may increase code size and decrease performance. For the example above, the line with the assignment `d = a` generates a runtime check. An important and often used range type is the data type `Natural`, defined as `range[0 .. int.high]`. This type is compatible with the `int` type and does not wrap around as `uint` would. It is regularly used as the type for `proc` parameters when the arguments must be non-negative. In the procedure body, we sometimes copy arguments of natural type to an ordinary integer — this way, we can ensure a non-negative start value and can avoid many range checks in the procedure body.

We can also declare sub-range types with `float` base types like `type Probability = range[0.0 .. 1.0]`.

Note that we can still mix different sub-range types:

var d: Day = 13
var m: Month = 3
d = d + m

Such an operation is generally a bug. To prevent it, we can put the `distinct` keyword in front of our ranges. However, we would then have to define the allowed operations ourselves or borrow them from the base type.

Enumeration types

Enumeration types are shortened as `enum` in Nim. While `enums` in C are nothing more than integers with some special syntax for creation, Nim’s `enums` are more complex.

In Nim, `enums` can be used whenever some form of symbols are needed, such as the colors `red`, `yellow`, and `green` for a traffic light, or the directions `north`, `south`, `east`, and `west` for a map or a game.

Most of the time, we declare an `enum` type and the corresponding values by simply listing them like

type
  TrafficLight = enum
    red, yellow, green

We can then use variables of the type `TrafficLight` like

var tl: TrafficLight
tl = green
if tl == red:
  tl = ... # some other enum value

`Enums` support assignment, plain tests for (in)equality and for less or greater. Additionally, the functions `succ()` and `pred()` are defined for `enums` to get the successor or predecessor of an enum, `ord()` or `int()` deliver the corresponding integer number and the `$` operator can be used to get the name of an `enum`. We can also iterate over `enums`, so we can print all the colors of our TrafficLight by

for el in TrafficLight:
  echo el.ord, ' ', $el

Ordinary `enums` start at `0` and use continuous numbers for the internal numeric value, which allows `enums` to be used as `array` indices.[39]

type
  A = array[TrafficLight, string]

var a: A
a[red] = "Rot"
echo a[red]

However, we can also assign custom numbers like

type
  TrafficLight = enum
    red = -1, yellow = 3, green = 8

We should avoid doing this, as these 'enums with holes' generate some problems for the compiler and may later be deprecated. For example, `array` indexing or iterating is obviously not possible for `enums` with holes.

It is also possible to set the `string` that the stringify operator `$` returns, like in

type
  TrafficLight = enum
    red = "Stop"
    yellow = (2, "Caution")
    green = ("Go")

Here the assigned numerical values should be 0, 2, and 3. Currently, the enum’s numerical values must always be specified in ascending order.

When there are many `enums` in a program, name conflicts may occur. For example, we may have an additional `enum` type named `BaseColor`, which also has `red` and `green` members. For such cases, the {.pure.} pragma exists:

type
  BaseColor {.pure.} = enum
    red, green, blue

With the pure pragma applied, we can use the fully qualified enum name when necessary, like `BaseColor.red`. But we can still use unqualified names like `blue` when there is no name conflict.

With the upcoming Nim 2.0, the compiler will have improved handling of enums: The pure pragma is not needed anymore, and for set expressions like {BaseColor.red, green} the compiler knows that the second set member is a BaseColor as well, so we do not need the prefix anymore. For details, see the Appendix.

Boolean types

Boolean types are used to store the result of logical operations. The type is called `bool` in Nim and can store only two values, `false` and `true`. Although we have only two distinct states for a boolean variable and so one single bit would suffice to store a `bool`, generally, a whole byte (8 bits) is used for storing a boolean variable. Most other programming languages, including C, do the same. The reason is that most CPUs can not access single bits in the RAM — the smallest entity that can be directly accessed in RAM is a byte. The default initial state of a boolean variable is `false`, corresponding to a byte with all bits cleared.

var
  age = 17
  adult: bool = age > 17
  iLikeNim = true
  iLikeOtherLanguageBetter = false

In the third line, we assign the result of a logical comparison to the variable `adult`. The next two lines assign the boolean constants `true` and `false` to the variables, with their type `bool` inferred.

Variables of type `bool` support the operators `not`, `and`, `or` and `xor`. `Not` inverts the logical value, `a and b` is only `true` when both values are `true`, and `false` otherwise. And `a or b` is `true` when at least one of the values is `true`, and only `false` when both values are `false`. `Xor` is not used that often. It is called exclusive or; `a xor b` is `false` when both values have the same logical state, i.e., when both are `true`, or both are `false`. When the values are not the same, then the result of the `xor` operator is `true`. The `xor` operator makes more sense for bit operations, which we will learn later — for the boolean type, `a xor b` is identical to `a != b`.

When using conditional execution, some people like to write expressions like `if myBoolExp == false:`, which is identical to `if not myBoolExp:`. While this may be permissible, avoid writing `if myBoolExp == true:` as it is redundant.

Sometimes it is useful to know that `false` is mapped to the `int` value `0`, and `true` to the `int` value `1`. That is similar to the C language, but C has no real boolean type, instead, the numerical value `0` is interpreted as `false` in conditional expressions, and all non-zero values are interpreted as true.

var a: int = 0
var cond: bool
if cond:
  a = 7

a = 7 * cond.int

The effect of the last line is identical to the `if` statement above. In very, very rare cases, working with the actual `int` value of boolean variables may make sense, but generally, we should avoid that. Later in the book, there is a section about branchless code where we will present a procedure that actually may get faster by using such a trick.

Characters

The data type for single characters in Nim is called `char`. A variable of this type has 8 bits and is used to store individual characters. Indeed, it stores 8-bit integers which are mapped to characters. The mapping is described by the ASCII table. For example, the integer value `65` in decimal is mapped to the character `A`. When we use single character literals, we have to enclose the letter in single quotes. As only 8 bits are used to store characters, we only have 256 different values, including upper and lower case letters, punctuation characters, and some characters with a special meaning like a newline character to move the cursor in the terminal to the next line, or a backspace character to move the cursor one position backward. In practice, single characters aren’t used frequently. This is because they are typically grouped into sequences known as `strings` to construct text.

The initial ASCII table contains only the characters with numbers 0 up to 127, here is an overview generated with the small program listed in the Appendix:

Visible ASCII Characters

      +0   +1   +2   +3   +4   +5   +6   +7   +8   +9  +10  +11  +12  +13  +14  +15
  0
 16
 32        !    "    #    $    %    &    '    (    )    *    +    ,    -    .    /
 48   0    1    2    3    4    5    6    7    8    9    :    ;    <    =    >    ?
 64   @    A    B    C    D    E    F    G    H    I    J    K    L    M    N    O
 80   P    Q    R    S    T    U    V    W    X    Y    Z    [    \    ]    ^    _
 96   `    a    b    c    d    e    f    g    h    i    j    k    l    m    n    o
112   p    q    r    s    t    u    v    w    x    y    z    {    |    }    ~

The position of a character in the table is calculated by summing the number on the left with the one on top. For instance, character `A` is at position `64+1=65`. This is the value that the Nim standard functions `ord('A')` or `int('A')` would return. The characters with a decimal value less than 32 cannot be printed and are called control characters, like linefeed, carriage return, backspace, audible beep, and such. Character 127 is also not printable and is called DEL. An important property of this table is the fact that decimal digits and upper- and lower-case letters form contiguous blocks. So to test, for example, if a character is an uppercase letter, we can use this simple condition: `c >= 'A' and c <= 'Z'`.

Characters with `ord() > 127` are so-called umlauts, exotic characters of other languages, and some special characters. However, these characters can look different on different computers, as their appearance depends on the active code page, which maps positions to the actual character, and there are multiple code pages. When we need more than the plain ASCII characters, then we use `strings` in Nim, which can display many more glyphs by using UTF-8 encoding.

The control characters with a decimal value less than 32 cannot be typed on the keyboard directly, and for some characters with a decimal value greater than 126, it can be difficult to enter them on some keyboards. For these characters, as well as for all other characters, escape sequences can be used. Escape sequences start with the backslash character, and the following characters are interpreted in a special way: The backslash can follow a numeric value in decimal or hexadecimal encoding, or a letter, which is interpreted in a special way. We mentioned already that the character 'A' is mapped to the decimal value 65, which is its position in the ASCII table. So instead of 'A', we could use the escape sequence '\65' for this character. Or, as decimal 65 is 41 in hexadecimal notation `(4 * 16^1 + 1 * 16^0)` we can use '\x41' where the x indicates that the following digits are hexadecimal. Given that remembering the numeric value of frequently used control characters can be challenging, an alternative notation that involves a letter following the backslash can be employed. For the important newline character, we can use the decimal numeric value '\10', the hexadecimal value '\xA', or the symbolic form '\n'. Here, the letter n stands for newline.

We can consider the backslash character, which initiates escape sequences, as a unique cautionary symbol for the compiler, indicating that the subsequent characters must be interpreted in a special way.

It is important that you understand that all these escape sequences are only a way to help the programmer to enter these invisible control characters — the compiler replaces the control sequences immediately with the correct 8-bit value from the ASCII table, so in the final compiled executable '\65' or '\n' are both only a plain 8-bit integer value:

var a, b: char
a = 'A'
b = '\65'
echo a, ord(a), b, ord(b) # if you don't know the output, read again this section and run this code.

The following table lists a few important control characters:

Decimal Hexadecimal Symbolic Meaning

10

xA

\n, \l

newline or linefeed — move the cursor one position down

12

xC

\f

formfeed

9

x9

\t

tabulator

11

xB

\v

vertical tabulator

92

x5C

\\

backslash

39

x27

\'

single-quote, apostrophe

7

x7

\a

alert, audible beep

8

x8

\b

backspace

27

x1B

\e

Escape, [ESC]

13

xD

\r, \c

return or carriage return — move the cursor at the beginning of the line

The hexadecimal numbers after the `\x` character can be in upper or lower case and can have one or two hexadecimal digits. For symbolic control characters like '\a' for alert, the upper case variant '\A' seems to be identical currently. Entering a single quote as ''' does give an error message, so you have to escape it as '\''. Unfortunately, by supporting this form of escaping it becomes impossible to enter a backslash character directly, so we have to escape the backslash character as '\\' to print a single backslash.

For Nim, the most important control character is '\n', which is used to start the output in a terminal window at the beginning of a new line. But '\n' is generally not used as a single character but embedded in `strings`, that is, sequences of characters. We will learn more about `strings` soon. Note that the `echo()` function inserts a newline character automatically after each printed line, but the `write()` function does not:

echo 'N', 'i', 'm'
stdout.write 'N', 'i', 'm', '\n'

It might be slightly confusing that while we use the backslash character as an escape symbol, the table above includes an entry '\e', also referred to as [ESC]. These '\e' control character with decimal value `27` is fully unrelated to the backslash character that we use to type in control characters. [ESC] is a different special character to start control sequences, it was used in the past to send special commands to printers or modems and can be used to control font style or colors in terminal windows.

Nim’s control characters should, with few exceptions, be identical to the control characters of the C language, so you may also consult C literature for more details.

Ordinal types

In Nim, integers, enumerations, characters, and boolean types are ordinal types. Ordinal types are countable and ordered, and for each of these types, a lowest and largest member exists. The integer ordinal types support the `inc()` and `dec()` operations to get the next larger or next smaller value, and the other ordinal types use `succ()` and `pred()` for this operation. These operations can produce overflow- or underflow-like errors if applied to the largest or smallest value. The function `ord()` can be used on ordinal types to get the corresponding integer value. Note that unsigned integers are currently not called ordinal types in Nim and that these unsigned types wrap around, instead of generating overflow and underflow errors.

Sets

In mathematics, sets are considered an unordered collection where we can test membership (x is included in mySet) and perform operations like building the union of multiple sets. In Nim, we can have sets of all the ordinal types and the unsigned integer types, but due to memory restrictions, integer types larger than two bytes can not be used as set base types. All elements in a set must have the same base type. A set can be empty, or it can contain one or multiple elements. A specific element can either be contained in a given set or not, but it can never be contained multiple times. A very basic set operation is to test if an element is or is not contained in a set. Sets are unordered data types; that is, sets containing the same elements are always equal, regardless of the sequence in which we added the elements. Important set operations are building the union and building the difference of two sets with the same base type: The union of `set` `a` and `set` `b` is a set that contains all the elements that are contained in `set` `a` or in `set` `b` (or in both). The intersection of `set` `a` and `set` `b` is a `set` that contains only elements that are contained in `set` `a` and in `set` `b`.

The mathematical concept of `sets` maps well to words and bits of computers, as most CPUs have instructions to set and clear single bits and to test if a bit is set or unset. CPUs can execute `and`, `or` and `xor` operations, which correspond to the union and intersection operations in mathematical sets.

Nim supports sets with base type `bool`, `enum`, `char`, `int8`, `uint8`, `int16`, and `uint16`. Note that we need a bit in the computer memory for each member of the base type. The types `char`, `int8`, and `uint8` are 8-bit types and can have `2^8 = 256` distinct values, thus requiring 256 bits in the computer memory to represent such a set. That would be 32 bytes or four 64-bit words. To represent a set of the base type `uint16` or `int16`, we need already 2^16 bits, that is 2^13 bytes or 2^10 words on a 64-bit CPU. So it becomes clear that supporting base types with more than 16 bits makes not much sense.

While testing whether an element is included in a `set` with the `in` or `notin` operators is always a fast operation, other operations, like building the intersection or union, and `set` comparison operations, may not be as fast with the `int16` or `uint16` base types, as these operations involve processing the whole `set` — that is, `2^10` words on a 64-bit CPU.

We will start our explanations with `sets` that have a character base type, as these `sets` are both easy to understand and very useful. Let us assume that we have a variable `x` of character type, and we want to test if that variable is alphanumeric, that is if it is a lower or upper case letter or a digit. A traditional test would be `(x >= 'a' and x <= 'z') or (x >= 'A' and x <= 'Z') or (x >= '0' and x <= '9')`. For this test, we use the fact that letters and digits build continuous blocks in the ASCII table. Using Nim’s set notation, we can write that in a simpler form:

const
  AlphaNum: set[char] = {'a' .. 'z', 'A' .. 'Z', '0' .. '9'}

var x: char = 's'
echo x in AlphaNum

Here, we have defined a constant of `set[char]` type that contains lower and upper case letters and decimal digits. We used the range notation to save a lot of typing ({'a', 'b', 'c', …​}). It works only in this case, as we know that all the lowercase letters, uppercase letters, and decimal digits form an uninterrupted range in the ASCII table.

With that definition, we can use a simple test with the `in` keyword. This test is equivalent to the procedure call, `AlphaNum.contains(x)`. Moreover, this `set` membership test should be faster than the test using `<=` and `or`, as mentioned above.

Some older languages, like C, do not have a dedicated `set` data type. However, since `sets` are so useful and efficient, C emulates these operations using bit-wise `and` and `or` operations in conjunction with bit shifts.

Two important operations for sets are building the union and the intersection:

const
  AlphaNum: set[char] = {'a' .. 'z', 'A' .. 'Z', '0' .. '9'}
  MathOp = {'+', '-', '*', '/'} # set[char]

  ANMO = AlphaNum + MathOp # union
  Empty = AlphaNum * MathOp # intersection

The constant `ANMO` now contains all the characters from `AlphaNum` and `MathOp` - that is, letters, digits, and math operators. The constant `Empty` is assigned all the characters that are concurrently contained in `set` `AlphaNum` and in `set` `MathOp`. However, as there isn’t a single common character, the `set` `Empty` is indeed empty. It’s not easy to remember the two operators, `+` and `*`, for union and intersection. For the intersection operator `*` it may help when we imagine the set members as bits, and we assume that we multiply the bits of both operands bitwise, that is we multiply the set or unset bits at corresponding positions each. The resulting bit pattern would have set bits only in positions where both arguments have set bits.

We can use the functions `incl()` and `excl()` to add or remove single set members:

var s: set[char]
s = {} # empty set
s = {'a' .. 'd', '_'}
s.excl('d')
s.incl('?')

The result is a `set` containing the letters `a`, `b`, `c` and the characters `_` and `?`. Note that calling `incl()` doesn’t affect the `set` when the value is already included, and similarly, calling `excl()` has no effect when the value isn’t present in the `set`.

Another operation is the difference of two `sets` — `a - b` is a `set` that contains only the elements of `a` that are not contained in `b`. In Nim, there is currently no operator for the complement or the symmetric difference of `sets` available. We can produce a `set` complement by using a fully filled `set` and then removing the elements of which we want the complement. For a character `set`, this would look like `{'\0'..'\255'} - s`, where `s` is the `set` to be complemented. And the symmetric difference of `set` `a` and set `b` can be generated by the operation (a+b) - (a*b) or by `(a-b) + (b-a)`.

As the `not` operator binds more tightly than the `in` operator, we have to use brackets for the inverted membership test, like `not(x in a)`, or we can use the `notin` operator and write `x notin a`. We can test for equality of `sets` `a` and `b` like `a == b` and for subset relation `a < b` or `a <= b`. `a <= b` indicates that `b` contains all members of `a` or more, and `a < b` indicates that `b` contains all members of `a` plus at least one more element.

Finally, we can use the function `card()` to get the cardinality of a `set` variable, that is the number of contained members.

It is also worth mentioning that we can have character `sets` that are restricted to a range of characters:

type
  CharRange = set['a' .. 'f']

# var y: CharRange = {'x'} #invalid

var y: CharRange = {'b', 'd'}
echo 'c' in y

In the code above, the compiler detects the first assignment to the variable `y` as invalid.

`Sets` of numbers work in principle in the same way as `sets` of characters. A key detail to note is that in Nim, integer numbers are generally 4 or 8 bytes large, but `sets` can only contain numbers with 1- or 2-byte size. Therefore, we have to specify the type of `set` members explicitly:

type
  ChessPos = set[0'i8 .. 63'i8]

var baseLine: ChessPos = {0.int8 .. 7.int8}
# var baseLine: ChessPos = {0 .. 7} # this also works
var p: int8
echo p in baseLine

In the code above, we defined a `set` type that can contain `int8` numbers in the range `0` to `63`.

We can also use another notation for numeric `sets` when we define an explicit `range` type like in

type
  ChessSquare = range[0 .. 63]
  ChessSquares = set[ChessSquare]

const baseLine = {0.ChessSquare .. 7.ChessSquare}
# or
const baseLineExplicit: ChessSquares = {0.ChessSquare .. 7.ChessSquare}
assert baseLine == baseLineExplicit

An important detail to note is that Nim’s `sets` support negative numbers:

type
  XPos = set[-3'i8 .. +2'i8]

var xp: XPos = {-3.int8 .. 1.int8}
var pp: int8 = -1
echo pp in xp

`Enum` `sets` are also very useful and can be used to represent multiple boolean properties in a single `set` variable instead of using multiple boolean variables for this purpose:

type
  CompLangFlags = enum
    compiled, interpreted, hasGC, isOpenSource, isSelfHosted
  CompLangProp = set[CompLangFlags]

const NimProp: CompLangProp = {compiled, hasGC, isOpenSource, isSelfHosted}

`Enum` `sets` can be used to interact with functions of C libraries, where for flag variables often or’ed `ints` are used. For example, for the Gintro C bindings, there is this definition:

type
  DialogFlag* {.size: sizeof(cint), pure.} = enum
    modal = 0
    destroyWithParent = 1
    useHeaderBar = 2

  DialogFlags* {.size: sizeof(cint).} = set[DialogFlag]

Here, the {.size.} pragma is used to ensure that the byte size of that `set` type matches the size of integers in C languages.

When we define a `set` of `enums` in this way to generate bindings to C libraries, then we have to ensure that the `enum` values start with zero, otherwise, Nim’s definition will not match with the C definition. For example, in the gdk.nim module we have

type
  AxisFlag* {.size: sizeof(cint), pure.} = enum
    ignoreThisDummyValue = 0
    x = 1
    y = 2
    pressure = 3
    xtilt = 4
    ytilt = 5
    wheel = 6
    distance = 7
    rotation = 8
    slider = 9

  AxisFlags* {.size: sizeof(cint).} = set[AxisFlag]

The first `enum` with ordinal value zero was automatically added by the bindings generator script to ensure type matching. Nim’s developers sometimes recommend using plain (`distinct`) integer constants for C `enums`. That may appear easier, but integer constants provide no namespaces, so names may be `aFlagWheel` instead of `AxisFlag.wheel` or plain `wheel` when there is no name conflict for pure `enums`. And with integer constants, we have to combine flags by an `or` operation like `(aFlagWheel or aFlagSlider)` instead of using the clean `{AxisFlag.wheel, slider}` syntax.

Can we print `sets` easily? As `sets` are an unordered type, it is not fully trivial, but we can iterate over the full base type and check if the element is contained in our `set` like

var s: set[char] = {'d' .. 'f', '!'}

for c in 0.char .. 255.char:
  if c in s:
    stdout.write(c, ' ')
echo ' '
! d e f

We will learn how the for loop works soon. Note that the sequence in which the `set` members are printed is determined by our query loop, not by the `set` content itself, as `sets` are unordered types.

At the end of this section, we should mention that Nim’s standard library has also a module called `setutils` that provides a few useful functions and a `template`: The function `'[]='` allows to write `s[x] = false` or `s[x] = true` to exclude or to include value `x` to `set` `s`, instead of using the `incl` or `excl` notation. And the functions `fullset()` and `complement()` make it easy to get a set that includes all possible members, and to complement ("invert") a set. Finally, the `template` `toSet()` can be used to convert other data types to corresponding sets.

Strings

The `string` data type is a sequence of characters. It is used whenever textual input or output operations are performed. Usually, it is a sequence of ASCII characters, but characters in the `string` can also be interpreted as UTF-8 Unicode characters, which allows the display of a vast range of symbols as long as the necessary fonts are installed on your computer and you can input them. Note that Unicode characters may not always be accessible via a simple keystroke. For now, we will only use ASCII characters, as they are simpler and work everywhere. `String` literals must be enclosed in double quotation marks. Nim’s `string` type is similar to the Nim `seq` data type: both are homogeneous variable-size containers. This means that a `string`, like a `seq`, expands automatically when you append or insert characters or other `strings`. Nim’s `seq` data type is discussed later in the book in some detail. Don’t confuse short `strings` consisting of only one character with single characters: A `string` is a non-trivial entity with an internal state like a data buffer (the characters it actually contains), length, and storage capacity, while a variable of the `char` type is nothing more than a single `byte` interpreted in a specific way. Therefore, a `string` like "x" is fundamentally different from 'x'.

var
  str: string = "Hello"
  name: string
echo "Please tell me your name"
name = readLine(stdin)
add(str, ' ')
echo str, name

In the above example code, we declare a `string` variable called `str` and assign it the initial literal value "Hello". We use the `echo()` `proc` to ask the user for his name and use the `readLine()` procedure to read the user input from the terminal. To demonstrate how characters can be added to an existing `string` variable, we call the `add()` procedure to append a space character to our `str` variable and finally call the `echo()` procedure to print the hello message and the name to the screen. Note that the `echo()` `proc` automatically terminates each output operation with a jump to the next line. If you desire an output operation without a new line, you can utilize the similar `write()` procedure. But `write()` needs an additional first parameter, for which we use the special variable `stdout` when we want to write to the terminal window.

So we could substitute the last two lines of the above code by

write(stdout, str)
write(stdout, ' ')
echo name

The Nim standard library provides a lot of functions for creating and modifying `strings`, most of these functions are collected in the `system` and in the `strutils` module. The most important procedures for `strings` are `len()` and `high()`. The `len()` fiunction returns the length of a `string`, namely, the number of ASCII characters or bytes that the `string` currently contains. The empty `string` "" has length zero. Note that the plain `len()` function returns the number of 8-bit characters, not the number of Unicode glyphs, when the `string` should be interpreted as Unicode text. To determine the number of glyphs of Unicode `strings`, you should use some of the `unicode` modules. The `high()` function is very similar to the `len()` function; it returns the index of the last character in the `string`. For each `string` `s`, `high(s) == len(s) - 1`; hence, `high("")` is `-1`. Remember that Nim supports the method call syntax, so we can also write `s.len` instead of `len(s)`.

The most important operators for `strings` are the subscript operator `[]` which allows access to individual characters of `strings`, and the `..` slice operator, which allows access to sub-strings. The first character in a `string` always has the index zero. For concatenation of `string` literals or `string` variables, Nim uses the `&` operator.

var s = "We hate " & "Nim?"
s[3 .. 6] = "like"
s[s.high] = '!'

In the example above, we define the `string` variable `s` by the use of two literal `strings` to show the use of the concatenation operator. In line two we use the slice operator to replace the sub-string "hate", that is, the characters with index position 3 up to 6, by the `string` literal "like". In this case, the replacement has exactly as many characters as the text to replace, but that is not necessary: We can replace sub-strings with longer or shorter `strings`, which includes the empty `string` "" to delete a text area. In the last line of the above example, we use the subscript operator `[]` to replace the single character '?' at the end of our `string` with an exclamation mark. For subscript and slice operators, Nim also supports a special notation that indicates indexing from the end of the `string`. Python and Ruby use negative integers for this purpose, whereas Nim uses the `^` character. So `[^1]` is the last character, `[^2]` the one before the last. So we could have written `s[^1] = '!'` for the last line of our code fragment above. The reason Nim does not use negative integers for this purpose is that Nim `arrays` don’t have to start at index zero; they can start with an arbitrary index, including negative indices. Therefore, for negative indices, it may not always be clear whether a regular index or a position from the end of the {string] is intended. The term `s[^x]` is equivalent to `s[s.len - x]`. We will learn some more details about the slice operator in a later section when we have introduced `arrays` and sequences.

Another important operator for `strings` is the "toString" or stringify operator `$`. It can be applied to variables of nearly all data types and returns their `string` representation, which can then be printed. Some procedures like `echo()` apply this operator to their arguments automatically. When we define our own data types, it can make some sense to define the `$` for them, in case we need a textual representation of our data, perhaps only for debugging purposes. Note that directly applying the `$` operator on a `string` has no effect and is ignored, as the result would not change.

`strings` can contain all characters of the `char` data type, including the control characters. The newline character '\n', which is used at the end, and sometimes as well in the middle, of `strings` to start a new line, is the most essential control character for `strings`. For `strings`, Nim also supports the virtual character "\p" to encode an OS-dependent line break. When compiled for Windows, "\p" is automatically converted to "\r\n", and to a plain '\n' on Linux. Note that "\p" can be used in `strings`, but not as a single character, as it is two bytes on Windows. "\p" is only needed to support very old Windows versions or potentially another exotic operating system, as modern Windows recognizes plain '\n' well.

Since `strings` support utf-8 Unicode, they can use an escape sequence starting with "\u" to insert Unicode code points. The "\u" follows exactly 4 hexadecimal digits or an arbitrary number of hex digits enclosed in curly braces {}.

Because `string` literals are enclosed in quotation marks, it follows that `strings` cannot directly contain this character. We have to escape it as in "\"Hello\", she said".

It may be worth mentioning that Nim `strings` use copy semantics for assignment. Since we have not yet introduced references or `pointers`, you should expect copy semantics. `Strings` behave just like all the other simple data types we have used before, such as integers, floating-point numbers, enums, and characters:

var
  s1: string
  s2: string
s1 = "Nim"
s2 = s1
s1.add(" is easy!")
echo s1 & "\n" & s2

The output is

Nim is easy!
Nim

The assignment `s2 = s1` creates a copy of `s1`, so the subsequent `add()` operation modifies only `s1`, not `s2`. This might not be surprising to you, but other programming languages may behave differently. For example, the assignment might not copy the textual content but only create a reference to the first `string`, so that modifying one of them also affects the other. We will delve deeper into the concept of references when we introduce the `object` data type.

Entering Unicode characters

UTF-8 is a variable-width character encoding. To cite the introduction section from https://en.wikipedia.org/wiki/UTF-8:

UTF-8 is capable of encoding all 1,112,064[nb 1] valid character code points in Unicode using one to four one-byte (8-bit) code units. Code points with lower numerical values, which tend to occur more frequently, are encoded using fewer bytes. It was designed for backward compatibility with ASCII: the first 128 characters of Unicode, which correspond one-to-one with ASCII, are encoded using a single byte with the same binary value as ASCII, so that valid ASCII text is valid UTF-8-encoded Unicode as well. Since ASCII bytes do not occur when encoding non-ASCII code points into UTF-8, UTF-8 is safe to use within most programming and document languages that interpret certain ASCII characters in a special way, such as "/" (slash) in filenames, "\" (backslash) in escape sequences, and "%" in printf.

In Nim, there are four ways to enter Unicode characters: by using hexadecimal digits following the "\x", by using a Unicode code point following the "\u", by typing the Unicode sequence directly on your keyboard either as one single keystroke when your keyboard layout supports it, or as a special OS-dependent sequence of keystrokes:

echo "\xe2\x99\x9A \xe2\x99\x94"
echo "\u265A \u2654"
echo "\u{265A} \u{2654}" # {} is only necessary for more than 4 hex digits
echo "♚ ♔"

The code above shows three ways to print the symbol for a black and a white king in a chess game. In the first line, we typed the Unicode sequence directly as hexadecimal digits. This method is rarely used today. In the second line, we used "\u" to enter the code point directly. We obtained the code from https://en.wikipedia.org/wiki/List_of_Unicode_characters. Lastly, the glyph was entered directly into an editor. For some Linux editors, like Gedit, you can hold down the Shift and Control keys, type `u`, release all keys, and then type the Unicode digits like `265a`, followed by a space. See https://en.wikipedia.org/wiki/Unicode_input for details and other operating systems.

The cstring data type

In the C programming language, `strings` are `pointers` to sequences of characters terminated by a null character '\0'. [40] The end of such a C `string` is generally marked with the character '\x0' — a null byte with all bits cleared. C functions like `printf()` need these "\x0" characters to determine the end of the C `string`. While Nim `strings` are complex entities that store their current size and other properties and can grow dynamically, the character sequence of Nim `strings` has also a hidden terminating '\x0' character at the end to make them compatible with C strings. Nim also has the data type `cstring`, called "compatible `string`" in modern Nim, which matches the `strings` in C language if we compile as usual with the C backend. The `cstring` data type is used in binding definitions for C libraries, but as `cstrings` cannot grow and support only a few `string` operations, they are only used in rare cases in ordinary Nim source code. The Nim compiler automatically passes the zero-terminated data buffer of Nim `strings` to C libraries whenever we call a C library, so there is no expensive type conversion involved. But the other way is much more expensive: When you have an existing \`cstring` and need a Nim `string` with the same content, then a simple conversion is not possible as a Nim `string` is a different, more complex entity. Therefore, we have to create a Nim `string` and copy the content. You can use the stringify operator `$` for this, as in `myNimStr = $myCString`. Generally, `string` creation is an expensive operation compared to simple operations like adding two numbers, so when performance matters, one should try to avoid unnecessary `string` creation and other unnecessary `string` operations as well. This is particularly important in loops, which are executed frequently. We will explain more about the internals of `strings` and why `string` creation and dynamically allocating memory is expensive in later sections of the book.

When we access text ranges with the slice operator or single characters with the subscript operator, we should never access indices beyond the current last index, which is the index `mystr.high` or `^1`. If we do that, we get an exception, as that index would contain undefined data or would not exist at all. We said earlier that Nim `strings` grow automatically if we insert or append data. But that does not mean that we can use the subscript or slice operator to access characters after the current end of the `string`. Such an operation wouldn’t make much sense. Imagine we have a `string` `var str = "Nim"` and now use the subscript operator and assign a character at position 10 with `str[10] = '!'`. What should be the content of characters 4 to 9? Well, maybe spaces would make some sense, but in fact, such access after the currently last valid character of the `string` is forbidden. You could use `str.add(" !")` for this purpose.

Another operation you should avoid is inserting the '\x0' null byte character somewhere in an existing Nim `string`. Nim stores the actual length of `strings` explicitly and additionally terminates the end of the actual data with a '\x0' to make the `string` compatible with C `strings` and allow passing the data buffer directly to the C library functions. A '\x0' character somewhere in the middle of a Nim `string` would generate an inconsistency, as C library functions like `printf()` would regard '\x0' as the `string` end marker, while pure Nim functions may assume still a longer `string`. In very rare cases, intermediate '\x0' bytes in `strings` can pose a problem when we receive the actual byte sequence from C libraries. For the same reason, a Nim `string` is not identical to or fully compatible with a `seq[char]`, as a `seq[char]` may contain multiple zero bytes, while Nim `strings` should not.

Escape sequences in strings

We learned about control characters already in the section about characters, and earlier in this section, we mentioned that `strings` can also contain control characters. As the use of control characters may not be really easy to understand, we will explain their use in `strings` in some more detail and give a concrete example.

The most important control character for `strings` is the newline character, which moves the cursor in the terminal window to the beginning of the next line. The `echo()` procedure prints that character automatically after each output operation. Indeed, it can be important to terminate each output operation with that character, as the output can be buffered, and writing just a `string` without a terminating newline may not appear at once on the screen, but can be delayed. That is bad when the user is asked something and should respond, but the message is still buffered and not yet visible.

The problem with special characters like backspace or newline is that we cannot enter them directly with the keyboard.[41] To solve that problem, escape sequences were introduced for most programming languages. An escape sequence is a special sequence of characters that the compiler can discover in `strings` and then replace with a single special character. Whenever we want a newline in a `string`, we type it as "\n", which is the backslash character followed by an ordinary letter `n`, "n" standing for newline.

echo "\n"
echo "Hello\nHello\nHello"

The first line prints two empty lines — one because the `\n` generates a jump to the next line, and another because `echo()` automatically adds a newline. The second line prints three lines, each containing the word `Hello`, and the cursor is moved below the last `Hello` because `echo()` automatically adds another newline character.

Historically, older versions of Windows employed a two-character sequence, '\r' (carriage return) and '\n' (linefeed), to initiate a new line. The carriage return would reset the position to the start of the line, and the linefeed would move it downward. You might encounter these control characters in older Windows text files, marking the end of each line. This combination was also common in older printers, facilitating direct text file printing by just copying the file to the printer device on Windows OS. In Nim, we have the "\p" escape sequence, which is known as the platform-dependent newline. On a Windows system, "\p" translates to "\r\n". In other words, when a program is compiled on Windows, the compiler replaces "\p" in our strings with both a carriage return and a linefeed character. Conversely, if the program is compiled on Linux, "\p" is replaced with only a newline character. Modern Windows versions, however, support '\n', allowing us to use this character more universally. The control character '\n' corresponds to the decimal value 10, and Nim provides an alternative control character '\l' with the same value. Similarly, the control character '\r', with a decimal value of 13, can also be expressed as '\c' in Nim. As a result, you may see descriptions indicating that "\p" maps to "\c\l" on Windows, equivalent to "\r\n". Currently, Nim allows the use of capital letters in place of the lowercase ones for these control characters, namely '\L', '\C', '\N', and '\R'

Raw strings and multi-line strings

In rare situations, you may want to print exactly what you have typed, so you do not want the compiler to replace a '\n' with a newline character. You can do that in two ways: You can escape the escape character, that is, you put one more backslash in front of the existing backslash. When you print the `string` "\\n", you will get a backslash and the `n` character in your terminal. Or, you can use so-called raw `strings`, where you put the character `r` immediately in front of the `string` literal, like:

echo r"\n"
echo "\\n"

Multi-line `strings` are also raw `strings`; that is, contained escape sequences are not interpreted by the compiler. As the name implies, multi-line `strings` can extend over multiple lines of the source text. A multi-line text starts and ends with three quotes, as demonstrated below:

echo """this is
three lines
of text"""

echo "this is\nthree lines\nof text"

Both `echo()` commands above generate the exact same machine code!

Comments

Comments are not a data type, yet they are important. Ordinary comments start with the hashtag character `#` and extend to the end of the line. The `#` character itself and all following characters up to the line end are ignored by the compiler. You can also start the comment with `##`; this designates a documentation comment. It is also ignored by the compiler but can be processed when you use tools to generate documentation for your code. Documentation comments are only allowed in certain places in the source code; often, they are inserted at the beginning of a procedure body to explain its use. There are also multi-line comments, which start with the two characters `#[` and end with `]#`. These forms of comments can extend over multiple lines and can be nested; that is, multi-line comments can again contain plain or multi-line comments.

# this is comment
## important note for documentation
#[ a longer
but useless comment
]#

Multi-line documentation comments also exist and can be nested as well.

proc even(i: int): bool =
  ##[ This procedure
  returns true if the integer argument is
  even and false otherwise.
  ]##
  return i mod 2 == 0

You can also use the #[ comment ]# notation to insert comments anywhere in the source code where a whitespace character is allowed, but this form of in-source comment is rarely used.

Other data types

There are additional predefined types such as the container types `array` and `seq`, which can contain multiple elements of the same base type, and the `tuple` and `object` types, which can contain data of different types. Nim `tuples` and `objects` are similar to C structs and are not as verbose as Java classes. We will learn more about these types in later sections of the book.

Nim source code

You have already seen a few examples of simple Nim source code. The code essentially consists of a plain text file made up of ASCII characters - that is, the ordinary characters that you can type on your keyboard. Generally, Nim source code can also contain Unicode utf-8 characters, so instead of using names consisting of ASCII characters for your symbols, you could just use single Unicode characters or sequences of Unicode characters. However, this typically doesn’t make much sense as entering Unicode isn’t easy with a keyboard. Additionally, it can only be displayed correctly on the screen or in the terminal if the editor or terminal properly supports Unicode and all necessary fonts are installed. This may be possible on your local computer, but what happens when someone else edits your source code?

Starting with Nim version 1.6, we received support for Unicode operators, which could be useful for some applications. For details, please see the Nim language manual.

Nim currently does not permit the insertion of tabular characters (tabs) in your source code, so you must indent blocks using spaces only. Typically, we use two spaces for each indentation level. Other quantities also work, but it’s best to stick to a consistent number.

Identifiers in Nim, as used for modules, variables, constants, procedures, user-defined types, and other symbols, can contain lowercase and uppercase letters, digits, Unicode characters, and additional underscores. However, names must not start with digits or begin or end with an underscore, and one underscore may not immediately follow another underscore.

var
  pos2: int # OK
  leftMargin: int # OK
  next_right_margin: int # OK
  _private: int # illegal
  custom_: int # illegal
  strange__error: int # illegal

Generally, we use camel-case like `leftMargin` for variable names, not snake-case like `left_margin`.

Current Nim has the special property that identifiers are case-insensitive and that underscores are simply ignored by the compiler. The only exception is the first letter of a name; that letter is case-sensitive. So the names `leftMargin`, `leftmargin`, and `left_margin` are identical for the compiler. But `LeftMargin` is different from all the others because it starts with a capital letter. This may sound a bit strange at first but works well in practice. One advantage is that a library author can use `snake_case` in their library for names, while users of the library can freely decide if they prefer `camelCase`. Still, you might think this could generate confusion. In practice, it does not, it prevents confusion. Imagine a conventional programming language that is fully case-sensitive and does not ignore underscores. In a larger program, we could then have names like `nextIteration` and `next_Iteration` or `keymap` and `keyMap`. What when both names are visible in the current scope, and we type the wrong one? The compiler may not detect it when types match, but the program may do strange things. Nim would not allow such similar-looking names, as the compiler would regard them as identical and would complain about a symbol redefinition.

You might wonder why the first letter is case-sensitive. This is to allow user-defined types to use capital letters in their names and then write something like `var window: Window`. So we can declare a variable named `window` of a user-defined data type named `Window`. That is a common practice.

The case insensitivity and the ignoring of underscores may not be the greatest invention of Nim, but it does not really hurt. The only exception occurs when we create bindings to C libraries where leading or trailing underscores are used, necessitating some renaming.

The only minor disadvantage of Nim’s fuzzy names arises when using tools like Grep or your editor’s search functionality. You cannot be certain if a search for "KdTree" will yield all results; you might have to try "Kd_Tree" or "KDTree" and potentially some other variants as well. To address this issue, Nim provides a tool called nimgrep that conducts a case- and style-insensitive search. Your editor may also support that type of search. You can enforce a consistent naming scheme by calling the compiler with the command-line argument `--styleCheck:error` or `--styleCheck:hint`.

Languages such as C use curly braces to mark blocks, while others, such as Pascal, use begin/end keywords for this purpose. At the same time, blocks are generally indented by tabs or spaces to make it easier for the programmer to recognize the extent of the block. This introduces some redundancy, which is not always helpful — block markers and indentation ranges can contradict each other and potentially lead to strange bugs. Like Python or Haskell, Nim does not need additional block markers; the indent level is enough to mark the block extents for the compiler and the human programmer. This style looks clean and compact and was used in pseudocode of textbooks for decades already. Some people still argue that this style is less "safe", as the behavior of the code depends on invisible whitespace. However, this argument is rather peculiar — the whitespace is always visible due to the presence of visible characters on the right. Of course, changing the indention of the last line of a block would affect the behavior of the code. But such a change is clearly visible. And program code contains many locations where changing one character breaks it. All numeric literals would suffer from adding a digit or deleting a digit. Consider operators like `++` or `+=` from C — the code may still compile after deleting the leading `+`, but the resulting code would be incorrect. Computer programming requires meticulous attention! Indeed, the use of curly braces for blocks has some advantages; e.g. many editors can highlight such blocks well, the editor may support jumping back and forth between the braces, and for really large blocks it may indeed be simpler to discover the whole block range. However, experience has shown that marking blocks solely with indentation works well; most people who have used this method for some time tend to prefer it.

Whenever you convert source code from other programming languages to Nim, you should first ensure that the original code is correctly indented. Some editors can maintain or rectify this, or you can use external tools. If you overlook this aspect and attempt to convert from C to Nim, removing the braces of blocks, you might introduce errors if the initial indentation was not correct.

Blocks, scopes, visibility, locality, and shadowing

Like most other programming languages, Nim has the concept of code blocks or scopes. The bodies of procedures, functions, `iterators`, `templates`, and `macros` as well as those of various loop constructs or code following conditional statements, build indented blocks and create new scopes. In this new scope, we can define variables, named constants, or types with the `var`, `let`, `const` and `type` keywords that are local to this block. These symbols are only visible in this scope, and local variables that require storage are actually created when the program executes the block and are destroyed when the block is exited. This holds true in principle, and at least for ordinary stack-allocated value variables; however, things are a bit more complicated for references and pointer variables. We will discuss this in more detail when we introduce references. Here, we have used the term 'code block' to clearly distinguish it from the `const`, `var`, `type`, and `import` sections, which are different forms of indented blocks. Remember that the compiler processes our program code from top to bottom, so we always have to define symbols before we can actually use them. When we define an entity in a code block, and a symbol with that name was already declared before outside this block, then that symbol is shadowed, that is, the previous declaration becomes temporarily invisible. Let us investigate the following small example program:

proc doSomething =
  type NumType = int
  const Seven = 7
  var a: NumType = Seven
  var b: bool = true
  if b:
    echo a, ' ', b # variables of outer scope are visible
    var a, sum: float # now outer a is shadowed
    a = 2.0
    sum = a * a + 1
    echo a, ' ', sum # local data only visible in if block

  echo a # initial int variable with value 7 becomes visible again

doSomething()

Although we haven’t officially introduced procedures as units for structuring our program code yet, we have intentionally enclosed the above code in the body of a `proc` called `doSomething()` this time. Actually, in real-life programs, nearly all the program code is embedded in `procs`. We will discuss the peculiarity of global code later. By enclosing the program code in a procedure, we can ensure that the two variables `a` and `b` defined in that `proc` are indeed stack-allocated and local to the scope of that procedure.. The variables `a` and `b` are created on the stack when the procedure is called, that is, when its execution starts with a statement like `doSomething()`. These two variables are never visible in code outside this procedure, and the storage for these two variables is automatically released when the execution of that procedure ends, in this case when the last line of the `proc` is reached. In the body of the procedure, we also define a new custom type and a named constant, just to demonstrate that it is possible. Both symbols are also local to this `proc` and invisible outside.

The indented code block following the `if b:` statement is sometimes called an "if then" block or just `if` block — in that block we define two other variables called `a` and `sum` of `float` type, which are also stack-allocated. If these two variables are already allocated when the `proc` starts its execution, or only when the then block following the `if` statements is executed, is actually an implementation detail. As the variable `a` of `float` type in the `if` then block has the same name as the outer variable of `int` type, that integer variable is shadowed in the `if` block — the outer value gets temporarily invisible as soon as the new symbol is declared. Other symbols of outer scopes remain visible. In the if then block as well as in most other indented code blocks we could also define named constants or custom types, these would be visible only in this block. Indented code blocks can be nested; within one block, we can have additional indented blocks in which all declared symbols are again local and invisible outside. The last `echo()` statement in our code example above is located after the if-then block, so the initial variable `a` of integer type becomes visible again.

Global code

In the introductory sections of the book, we generally used program code at a global level, not embedded in a procedure body. We did this for simplicity, as we hadn’t yet introduced procedures. Global code is sometimes used in small scripts or for special purposes, like program initialization. But for larger programs, most of the code is typically grouped into `procs`. The storage location for variables defined in global code isn’t well-defined; it can depend on the actual Nim compiler implementation and the compiler backend. The performance of global code can be worse than that of code enclosed in procedure bodies, so when performance matters, we should put our code in `procs`. One reason for the suboptimal performance of global code is that global variables are not located on the stack but in the global BSS segment of the program, and the backend cannot optimize global code well. For example, global variables may not be cached in CPU registers. Note that variables, which need to exist and retain their value for the entire runtime of the program and not just for the duration of a single procedure execution, must be defined as global. The same obviously holds for global variables that are used in the code of different procedures, such as the `stdout` and `stdin` variables of the `system` module. An alternative to using global variables, when a variable in a procedure should retain its value between different `proc` calls, is to attach the {.global.} pragma to a local variable within the `proc`. This way that variable is still only visible in that procedure where the variable is declared, but the variable is stored in the BSS segment instead of on the stack and so its value is preserved between procedure calls.

Note that structured named constants, such as constant `strings`, are also stored in the BSS segment, even when they are defined only locally within a procedure. So large structured constants can increase the executable size, as the BSS segment is a part of the program executable.

Whitespace, punctuation, and operators

The space character, with decimal ASCII value `32`, is used in Nim program code to indent code blocks and separate different symbols from each other. Nim’s keywords are always separated from other symbols by leading and trailing whitespace, while other symbols are most often separated by punctuation and an additional, optional space character. Whenever the syntax allows a space, we can also insert multiple spaces or a comment enclosed in #[ ]# into the source code. Tabulator characters are not allowed in the Nim source code, but we can use them in comments and of course in `string` literals. We have already mentioned that spaces can make a difference in how operators or function parameters are handled. In expressions like `a+b` or `a + b` the `+` operator is regarded as an infix operator, but in `a + -b` the minus sign is regarded as a unary operator bound to `b`. This way, asymmetric expressions like `a +b` or `a <b` would be invalid, as the operators are interpreted as unary ones attached to `b`, and then, there is no infix operator between the two variables. A procedure call such as `echo(1, 2)` is interpreted as a call to echo() with two integer literal arguments, while a call like `echo (1, 2)` — with a space after the `proc` name — is interpreted in command invocation syntax as a call with a `tuple` argument. Although it’s not uncommon in C code to always insert a space between the function name and its parameter list, we should avoid doing so in Nim for the reason described. We will learn more about procedure calls and the `tuple` data type later.

Operators

Nim uses the following punctuation characters as operators:

=, +, -, *, /, <, >, @, $, ~, &, %, |, !, ?, ^, ., :, \

These symbols can be used as single entities or in combination, and we can define our own operators or redefine existing operators. All these symbols can be used as infix operators between two arguments or as unary prefix operators. However, Nim does not support unary postfix operators, so a notation like `i++` from the C language is not possible in Nim. A few combinations of these punctuation characters have special meanings. We will learn more about that and how we can define our own operators later in the book.

In Nim, these keywords are also used as operators:

and, or, not, xor, shl, shr, div, mod, in, notin, is, isnot, of, as, from.

Operators have different priorities. For example, `*` and `/` have a higher priority than `+` and `-`. In most cases, the priority is as we would expect, with perhaps a few exceptions. If we are unsure, we can group terms with brackets or consult the Nim language manual for details.

Since version 1.6, Nim also allows the definition and use of a few Unicode operators, but these are still considered experimental.

Order of execution

Global program code, or code enclosed in procedures, is generally executed from top to bottom and from left to right, unless control structures enforce a different order. To demonstrate this, we use here a set of four different `procs`, which contain an `echo()` statement each, and return a numeric expression. However, we have not yet formally introduced procedures, so if the code below feels too complex, feel free to skip this section for now and return once you have read the section about `procs`:

proc a(i: int): int =
  echo "a"
  i * 2

proc b(i: int): int =
  echo "b"
  i * i

proc c(i: int): int =
  echo "c"
  i * i * i

proc d(i: int): int =
  echo "d"
  i + 1

echo a(1); echo b(1)
echo b(2) + d(c(3)) # (2 * 2) + ((3*3*3) + 1)
echo "--"
echo a(1) < 0 and b(1) > 0
echo a(1) > 0 or b(1) > 0

It should be no real surprise that the first three `echo()` statements produce this output:

a
2
b
1
b
c
d
32

For the term `d(c(3))`, it is obvious that the inner expression `c(3)` has to be evaluated first before that result can be used to call `proc` `d()`.

The last two lines demonstrate the so-called short-circuit evaluation for expressions with the Boolean `and` or `or` operators. As the expression `a() and b()` is always `false` when `a()` is `false`, in this case, `b()` has not to be evaluated at all. Similarly, as the expression `a() or b()` is always `true` when `a()` is `true`, in that case, `b()` does not have to be evaluated at all. So in the last two lines of the above code, `b()` is never called at all, and the output is just

a
false
a
true

Note that, in Nim as in most other programming languages, the assignment operator `=` behaves differently compared to ordinary operators like `+` or `*`. In assignments such as let a = b + c(), obviously, the right side has to be evaluated before the result can actually be assigned to variable `a`.

Control structures

Larger computer programs generally consist not only of code that is executed linearly but also of code for conditional or repeated execution.

The most important control structures of Nim are the `if` statement for conditional execution, the related `case` statement, and the `while` and `for` loops for repetitions. All these statements control program execution at runtime. Nim’s `when` statement, which is syntactically very similar to the `if` statement, is evaluated at compile-time. It can be used to adapt our program code for various operating systems or to compile our code with special options, such as for debugging or testing purposes.

All these control structures can be nested in arbitrary ways, so we can have in one `if` branch other `if` conditions or `while` loops, and in `while` loops again other control structures including other loops.

If statement and if expression

The `if` statement with multiple optional `elif` branches and an optional `else` branch evaluates a sequence of boolean conditions at program runtime. As soon as one condition evaluates as `true`, the corresponding statement block is executed, and thereafter, the program execution continues after the entire `if` construct. That is, at most one branch is executed. If none of the conditions after the `if` or `elif` keywords evaluates to `true`, then the `else` branch is executed if it exists. A complete `if` statement consists of one `if` condition, an arbitrary number of `elif` conditions, and one optional `else` part:

if condition1:
  statement1a
  statement1b
  ...
elif condition2:
  statement2a
  statement2b
  ...
elif condition3:
  statement3a
  statement3b
  ...
elif ...:
  ...
else:
  statementa
  statementb
  ...

The simplest form of an `if` statement is

if condition:
  statement
if age > 17:

  echo "You are of legal age, but remember to drink and smoke responsibly!"

Note that the branches are indented by spaces. We generally use two spaces, but other numbers work as well. Also, note that it is `elif`, not elsif as in Ruby, and that there is a colon after the condition. Instead of a single statement, we can use multiple ones in each branch, all on their own line and all indented in the same way.

No, the terminating colon is not really necessary for the compiler. The compiler could determine the end of the condition without it, as the following statement is indented. However, the inclusion of a colon enhances readability, making it easier for humans to understand the structure of the complete if statement. Therefore, the compiler currently expects the colons and will report an error otherwise.

When there is no `elif` and no `else` part, then we can also write the conditional code directly after the colon, like

if age > 17: echo "You may drink and smoke, but better avoid it!"

With an `elif` and an `else` branch, the example from above may look like

var age: int = 7
if age == 1:
  echo "you are really too young to drive"
elif age < 6:
  echo "you may drive a kid's car"
elif age > 17 and age < 75:
  echo "you can drive a car"
else:
  echo "drive carefully"

Note that we perform the age tests in ascending order. It would not make much sense to first test for a condition `age < 6`, and later to test for `age < 4`, because the `if` statement is evaluated from top to bottom. As soon as one condition is evaluated as `true`, that branch is executed, and the program execution continues after the entire `if` construct. So a later test `age < 4` would be useless when that condition is already covered by a prior test `age < 6`.

As the various conditions of the `if` statement are processed from top to bottom until one condition evaluates to `true`, it can be beneficial to place the most likely conditions first for optimal performance. This approach reduces the need to evaluate unlikely conditions in most cases.

Another strategy for larger if/elif constructs is to put the most simple and fast tests to the top when possible.

We can also have if/else expressions that return a value like in

var speed: float = if time > 0: delta / time else: 0.0 # prevent div by zero error

In C, for a similar construct, the ternary `?` operator is used.

In languages like C or Ruby, the assignment operator `=` is an expression that returns the assigned value, so in C we can write code like

while (char c = getChar()) {process(c)}

In Nim, the assignment operator is not an expression with a result, but we can group multiple statements in round brackets separated by semicolons, and when the last statement in the bracket is an expression, then the whole bracket has the same value. So we can use conditional terms like

while (let c = getChar(); c != '\0'):
  process(c)

If we declare a variable in this way using the `var` or `let` keyword, then that variable is only visible in the bracket expression itself and in the following indented block.

Note that if-expressions must always return a well-defined value, so they must always contain an `else` branch. A plain `if`, without an `else`, or an if/elif without an `else`, does not work. And as Nim is a statically typed language, and all variables have a strictly well-defined type, the if-expression must return the same type for all branches!

var a: int
var b: bool
a = if b: 1 elif a > 0: 7 else: 0 # OK
a = if b: 1 elif a > 0: 7 # invalid
a = if b: 1 # invalid
a = if b: 1 else: 0.0 # invalid, different types!

The when statement

The `when` statement is syntactically very similar to the `if` statement, but while all the boolean conditions are evaluated during the program runtime for the `if` statement, for the `when` construct all the when/elif/else conditions have to be constant expressions, and are already evaluated at compile-time. In ordinary program code, the `when` statement is not often used. However, it is useful when we write bindings to C libraries and low-level code. Common use cases for the `when` statement include the `isMainModule` condition test and testing for defined symbols, such as `defined(windows)`:

when not defined(gcDestructors):
  echo "You may try to compile your code with option --mm:arc"
when isMainModule:
  doAllTheTests()

The value `isMainModule` is only true for a source code file when that file is compiled directly as the main module, that is, when it is not indirectly compiled because it is imported by other modules. This way, we can easily include test code in our library modules. This test code is ignored when the module is used as a library, but it becomes active when we compile the module directly for testing.

A `when defined()` construct can be used to test for predefined or our own custom options. For example, we may pass the optional argument `-d:gintroDebug` to the compiler and test for this option within the code of the module, like `when defined(gintroDebug)`:

One difference between the `when` and the `if` statement is that the 'then' branches do not open a new scope. This means variables defined there are still visible after the construct has been processed:

when sizeof(int) == 2:
  var intSize = 2
  echo "running on a 16-bit system!"
elif sizeof(int) == 4:
  var intSize = 4
  echo "running on a 32-bit system!"
elif sizeof(int) == 8:
  var intSize = 8
  echo "running on a 64-bit system!"
else:
  echo "cannot happen!"

echo intSize # variable is visible here!

Another peculiarity of the `when` statement is that it can be used inside `object` definitions. We will show an example of that in a later section of the book when we introduce the `object` data type. Just like the `if` construct, `when` can also be used as an expression.

The case statement

The case statement is not used that often, but it can be useful when we have many similar conditions:

case inputChar
of 'x': deleteWord()
of 'v': pastWord()
of 'q', 'e': quitProgram()
else: echo "unknown keycode"

To enable optimizations, the case construct has some restrictions compared to a more flexible if/elif statement:

The variable following the case keyword must be of an ordinal type, such as `int`, `char`, or `string`. A `float`, however, would not work. Also, the values following each `of` keyword must be constant, that is, a single constant value, multiple constant values, or a constant range like `'a' .. 'd'` for the 4 first lower case letters. Of course, these constants must have a type compatible with the type of the variable after the `case` keyword. A `case` statement must cover all possible cases, so most of the time an `else` branch is necessary.

Since Nim version 1.6, the `case` statement can also contain optional `elif` branches with arbitrary boolean conditions. This was not the case in the Wirthian languages Pascal, Modula, and Oberon. It now makes Nim’s `case` construct very similar to the ordinary if/elif/else.

Unlike the similar switch statement in C, the case statement requires no `break` after each branch. If a condition following the `of` keyword evaluates to `true`, the corresponding statement or sequence of statements is executed. Afterward, the program execution resumes beyond the entire `case` construct.

The `case` construct can also be used as an expression, as illustrated below:

var j: int
var i: int =
  case j
    of 0 .. 3: 1
    of 4, 5: 2
    of 9: 7
    else: 0

Here, an `else` is necessary to cover all cases. And as you see, we can also indent the block after the `case` keyword if we want.

The while loop

The `while` loop is used when we want to implement conditional repetitions, i.e., when we want to check a condition and execute a block of statements only as long as the condition remains true. If the condition is `false` in advance or becomes false after some repetitions, then the program execution proceeds after the indented loop body block.

A basic `while` loop has the following structure:

while condition:
  statement1
  statementN
firstStatementAfterTheWhileLoop
var repetitions = 3
while repetitions > 0:
  echo "Nim is easy!"
  repetitions = repetitions - 1

The aforementioned loop would print the message three times. Like the condition in the `if`-clause, the condition is terminated with a colon. Note that the condition must change during the execution of the loop, otherwise, when the condition is `true` for the first iteration, it would remain `true` and the loop would never terminate. We decrease the loop counter `repetitions` in the loop. So, at some point, the condition will become `false`, the loop will terminate, and program execution will continue with the first statement after the loop body. Note how we decrement the loop counter. The right side of the assignment operator is evaluated, and once that is done, the new value is assigned to the counter.

Two rarely used variants of a `while` loop exist: The loop body can contain a `break` or a `continue` statement, each of which consists only of this single keyword. A `break` statement within the loop body stops the loop’s execution immediately, and the program execution resumes after the loop body. Alternatively, a `continue` statement within the loop body skips the following statements and returns to the beginning of the loop, at which point the `while` condition is evaluated again.

var input = ""
while input != "quit":
  input = readLine(stdin)
  if input == "":
    continue
  if input == "exit":
    break

The aforementioned code utilizes the `==` and `!=` operators. The `==` operator tests for equality, and `!=` tests for inequality. Both operators work for most data types like integers, `floats`, characters, and `strings`. The literal value of an empty `string` is written as "". In line 2, we test if the variable named `input` does not have the value "quit", and in line 4, we test if that variable is empty, that is, it contains no text at all.

The use of `break` and `continue` disrupts the expected flow in loops, which can make understanding loops more challenging. So we generally avoid their use, but sometimes `break` or `continue` are really helpful. For example, they can be useful when an unexpected error occurs, perhaps due to invalid user input.

Nim does not include a repeat loop as found in Pascal, which does the first check at the end of the loop when it was executed already for the first time. Repeat loops are not used that much in Pascal, and they are sort of dangerous because they check the condition after the first execution of the body, so potentially the body is executed with invalid data for the first iteration. Later, we will see how we can use Nim `macros` to extend Nim by a repeat loop that can be used as it would be part of Nim’s core functionality.

The block statement

The `block` statement can be used to create a new indented code `block`, creating a new scope in the same way that an `if true:` statement would:

block: # create a new scope
  var i = 7
echo i # would not compile, as the variable i is undefined

Blocks can be useful for structuring large code segments when no better ways are available, such as splitting the code into multiple procedures. For testing purposes, blocks can be useful too, to keep the symbols in a local scope. In fact, blocks are most useful when they are assigned names and when we use the `break` statement in a `while` or `for` loop to exit a nested loop:

let names = ["Nim", "Julia", "?", "Rust"]
block check:
  for n in names:
    for c in n:
      if c notin {'a' .. 'z', 'A' .. 'Z' }:
        echo "invalid character in name"
        break check
echo "we continue"

The `break check` statement would immediately exit the nested loops and continue with the first statement after the `block`, which is the last line in the code segment above. Using `break` in such a manner might complicate understanding the code structure, but it can sometimes be very useful.

Before Nim 2.0, it was possible to use a `break` statement in unnamed `blocks`, but this generates a warning in version 2.0 and may yield an error in future versions.

For loops and iterators

`For` loops can be used to easily iterate over containers, collections, ranges, and many other entities. We have not discussed the important `array` and `seq` containers yet, but we know already the `string` container. The characters of an ASCII `string` are numbered starting at `0`, and we can access them using the subscript operator `[]`. So we could print the single characters of a `string` in this way:

var
  s = "Nim is not always that easy?"
  pos = 0
while s[pos] != '?':
  echo "-->", s[pos]
  inc(pos)

It’s clear that the `pos` variable introduces some complexity here — we aim to process all the characters in the `string` sequentially, so the use of a position variable seems unnecessary. This method is susceptible to errors, such as forgetting to increment the `pos` variable within the loop (body). So most modern languages provide us with `iterators` for this purpose:

var
  s = "Nim is not always that easy?"
for ch in items(s):
  echo "-->", ch

This approach is notably shorter. The `for` construct might seem odd at first, but it’s a common pattern for writing iterations, utilized in languages like Python as well. Ruby uses something like s.each{|ch| …​} instead.

`For` loops can be used to iterate over containers or collections, picking each element in sequence during this process. The variable following the `for` keyword is used to access or reference individual elements. That variable automatically has the right type, which is the type of the elements in the container and in each iteration, gets the value of the next element in the container, starting with the first element in the container and stopping when there is no element left. `Items()` is here the actual `iterator`, which allows us to access the individual characters in sequence. There’s a convention in Nim, where an `items()` `iterator` is automatically called in a `for` loop construct when no `iterator` name is explicitly given, allowing for more concise syntax such as `for ch in s:` in this use case.

You may recognize that the output of the above `for` loop is not identical to the output of the previous `while` loop. The `while` loop stops when the last character, that is '?', is reached, while the `for` loop processes this last character also. That is intended for the `for` loop, its general purpose is to process all the elements in containers or collections.

The above `for` loop does a read access to the `string`, that is, we get basically a copy of each character, and we can not modify the actual `string` in this way. When we want to modify the `string`, we can use the `mitems` variant:

var
  s = "Nim is not always that easy?"
for ch in mitems(s):
  if ch == '?':
    ch = '!'

Here we use `mitems()` instead of the plain `items()`, where the leading 'm' signifies 'mutable'. In the loop body, we can assign different values to the loop variable and in this way modify the container content.

We can iterate not only over containers but also over many more entities, for example, over lines of a file or integer ranges. We can use predefined `iterators` or create our own ones, and then use the `iterator` in for loops. `Iterators` are similar to functions, but while functions return only once, `iterators` can `yield` results multiple times. Actually, Nim currently provides two types of `iterators` — inline `iterators`, which are currently the default type, and closure `iterators`, which are similar to functions. Inline `iterators` create a hidden `while` loop whenever they are called. In this way, they offer the highest performance, but they have some restrictions and increase the final code size of the executable, much as an explicit `while` loop would do. Closure `iterators` are real entities, like procedures, meaning we can assign them to variables. However, in the `for` loop, each call generates some minimal overhead. We will learn how to create our own `iterators` later in the book after we have learned all the details about procedures and functions.

Objects

We have worked with basic data types like numbers, characters, and `strings` already. Often it makes sense to join some variables of these basic data types to more complex entities. Assume you want to build an online store to sell computers and build a database for them. The database should contain the most important data of each device type, like the type of CPU, RAM and SSD size, power consumption, manufacturer, quantity available, and actual selling price.

We can create a custom `object` data type with fields containing the desired data for this purpose:

type
  Computer = object
    manufacturer: string
    cpu: string
    powerConsumption: float
    ram: int # GB
    ssd: int # GB
    quantity: int
    price: float

In the first line, we use the `type` keyword to tell the compiler that we want to define a new custom type. Writing the `type` keyword on its own line begins a `type` section where we can declare one or more custom data types. All type declarations in a `type` section must be indented. In the next line, we write our type name, an equal sign, and the keyword `object`. This indicates that we want to declare a new `object` `type` named `Computer`. Here, `Computer` is a type name; in Nim, we use the convention that user-defined type names start with a capital letter. In the following indented block we specify the desired fields of this `object`, each line contains the name of a field and a colon followed by the needed data type. That is similar to a plain variable declaration.

`Objects` in Nim are similar to structs in C. Unlike classes in Java, Nim `objects` contain only the fields, sometimes also called member variables, but no procedures, functions, or methods, and no initializers or destructors as in C++. In Nim, we keep the data `objects` separate from the procedures, functions, methods, and also optional initializers and destructors that work with those data `objects`.

Now that we have defined our own new `object` type, we can declare variables of that type and store content in its fields.

var
  computer: Computer

computer.manufacturer = "bananas"
computer.cpu = "x7"
computer.powerConsumption = 17
computer.ram = 32
computer.ssd = 1024
computer.quantity = 3
computer.price = 499.99

Of course, in real applications, we would fill the fields not in this way, but we would maybe read the data from a file, from a terminal, or maybe from a graphical user interface.

It may look a bit ugly that we have to write `computer.` before each field when we access the fields. Indeed, in recent Nim versions, this is not necessary; you may use the `with` construct instead.

import std/with
var
  computer: Computer
with computer:
  manufacturer = "bananas"
  cpu = "x7"
  powerConsumption = 17
  ram = 32
  ssd = 1024
  quantity = 3
  price = 499.99

We can use the fields like ordinary variables:

computer.quantity = computer.quantity - 1 # we sold one piece
echo computer.quantity

As mentioned earlier, the right side of the assignment operator is evaluated first, then the result is stored in the variable on the left side. But we can also just write `computer.quantity -= 1` or `dec(computer.quantity)`.

`Objects`, like all other data types that we have already used, are value types, which means that when an `object` is assigned to a new variable, all its components are copied as well. In this way, `objects` behave like `strings` — assignment copies the content, with the entities remaining independent of each other. We will learn about reference types soon, which behave differently.

To initialize `object` variables, we can use the `object` type names as a constructor with a syntax like Foo(field: value, …​). Unspecified fields get the field type’s default values:

var
  computer1 = Computer(price: 799.99, quantity: 2)
  comp2: Computer

comp2 = computer1
comp2.price = 999.00

To initialize the variable `computer1`, we used the constructor syntax. In line five, we use the assignment operator to copy the content of variable `computer1` into variable `comp2`, and finally, we overwrite the `price` field in `comp2`. As both variables are distinct instances, the fields of variable `computer1` are not modified this way.

Starting with Nim v2.0, object fields can have custom default values, instead of the binary zero. The syntax for the defaults is the same as the assignment for ordinary variables, as shown below:

type
  Computer = object
    freeShipping: bool = true
    manufacturer = "bananas"

Typically, a computer store would offer many different types of computers, so it would make sense to store all the different devices in a container like a sequence, called short `seq` in Nim. In the next section, we will learn how we can do that.

Arrays and sequences

Sequences and `arrays` are homogeneous containers. They can contain multiple elements of the same data type, while a plain variable, such as a `float` or an `int`, only contains a single value. In some ways, we can regard `objects` as containers as well because `objects` contain multiple fields. The same holds for `tuples` — `tuples` are a very simple, restricted form of `objects` and also contain fields. But more typical container data types are the built-in `arrays` and sequences, or for example, hash tables, which are provided by the Nim standard library. `Arrays`, sequences, and hash tables can contain multiple elements, but all elements must have the same data type, which we call the base type.[42] The data type of the base type is not restricted; it can even be an `array` or sequence type again, allowing us to build multidimensional matrices in this way. `Arrays` have a fixed, predefined size; they cannot grow or shrink during the runtime of our program. Sequences and hash tables can grow and shrink.

`Arrays` and sequences appear very similar. A sequence seems even more powerful because it can change its size, i.e., the number of elements it contains, at runtime, while an `array` has a fixed size. So why do we have `arrays` at all? The reason is mostly efficiency and performance. An `array` is a plain block of memory in the RAM of the computer, which can be accessed very fast and needs not much care by the runtime system. Sequences require much more effort, especially when we add elements and the sequence needs to grow. When we create sequences, we can specify how many elements should fit in it at least, and the runtime system reserves a block of RAM of the appropriate size. But when our estimation was too small, and we want to append or insert even more elements, then the runtime system may have to allocate a larger block of memory first, copy the already existing elements to the new location, and then release the old, now unnecessary memory block. And this is a relatively slow operation. The reason this process may be necessary is that the initially allocated memory block may not be able to increase in size if the neighboring space in the RAM is already occupied by other data. Now, let us see what we can do with `arrays` and sequences:

var
  a: array[8, int]
  v = 1
for el in mitems(a):
  el = v
  inc(v)
for el in mitems(a):
  el = el * el
for square in a:
  echo square

In the second line of the code above, we declare a variable named `a` of `array` type — we want to use an `array` with exactly `8` elements, and each element should have the data type `int`. To declare a variable of `array` data type we use the `array` keyword followed in square brackets by the number of the elements, and separated by a comma, the data type of the elements. We can also specify the range of the indices explicitly by specifying a range like `array[0 .. 7, int]` or `array[-4 .. 3, int]`. The first specification is identical to the one in the above example program, and the second one would allow us to access `array` elements with index positions from `-4` up to `3`.[43]

When we declare an `array` instance variable, then all the contained elements get the default value binary zero. But we can also explicitly assign initial values like `a: array[8, int] = [1, 2, 3, 4, 5, 6, 7, 8].` Here the expression on the right is Nim’s `array` constructor. Whenever we use an `array` constructor to initialize an array instance variable, then the number of elements that the constructor provides has to match the size of the array variable, and the element types have to match as well. To specify the element type of an `array` constructor, it is often enough to specify the type of the first element, so [1.int8, 2] is equivalent to [1.int8, 2.int8]. We can use `for` loops to iterate over all the elements of an `array`, in a similar way as we did it for `strings`. The first `for` loop of the above program fills our `array` — that is, for each of the `8` storage places in the `array`, we fill in some well-defined data. We use the `mitems()` `iterator` here because we want to modify the content of our `array` — we fill in numbers `1 .. 8`. In the next `for` loop, we square each storage location, and finally, we print the content. In the last `for` loop, we do not modify the content, so a plain `items()` instead of `mitems()` would work, but we have already learned that we don’t need to write the plain `items()` at all in this case.

Sequences, called just `seq` in Nim, work very similarly to `arrays`, but they can grow:

var
  s: seq[int]
  v = 0
while v < 8:
  inc(v)
  add(s, v)
for el in mitems(s):
  el = el * el
for square in s:
  echo square

We start with an empty `seq` here and use the `add()` `proc` to append elements. After that, we can iterate over the `seq` as we did for the `array`.

In the same way as we access single characters of a `string` with the subscript operator `[]`, we can use that operator to access single elements of an `array` or a `seq`, as in `a[myPos]`. The slice operator is available for `arrays` and sequences too and can be used to extract sub-ranges or to replace multiple elements. Because `arrays` have a fixed length, the slice operator can only replace elements in them, but not remove or insert ranges. The first element position is generally `0` for `arrays` and sequences. `Arrays` can even be defined in a way that the index position starts with an arbitrary value, but that is not used that often. Whenever you use the subscript or slice operator, you have to ensure that you access only valid positions, that is, positions that really exist. `a[8]` or `s[8]` would be invalid in our above example — the `array` has only places numbered `0 .. 7`, and for the `seq`, we have added `8` values which now occupy positions 0 .. 7 also, position `8` in the `seq` is still undefined. We would get a runtime error if we tried to access position `8` or above, as well as if we tried to access negative positions. You might think that an assignment for a `seq`, such as `s[s.length] = 9`, is the same as `s.add(9)`, but only the `add()` operation works in this case.

Note that in some languages like Julia `arrays` start at position `1`.[44] Nim `arrays` can have an arbitrary integral start position, including negative start positions, but the start position as well as the highest subscript position are determined in the program source code and can not change at runtime. We say that `arrays` have fixed compile-time bounds. Sequences always start at position `0`, we can specify an initial size, and we can always add more elements at runtime.

`Arrays` and sequences allow fast access to their elements: All the elements are stored in a contiguous memory block in RAM, and the start location of that memory block is well-known. As all the elements have the same byte size, it is an easy operation to find the memory location of each element. The compiler uses the start location of the `array` or `seq`, and adds the product of subscript index and element byte size. The result is the memory location of the desired element, which was selected by the index used in the subscript operator. When the `array` should not start at position `0`, then the compiler would have to adjust the index, by subtraction of the well-known start index. This operation doesn’t take much time, but nonetheless, `arrays` starting at position `0` can be slightly faster. As mentioned earlier, the compiler must perform a multiplication operation between the index position and element size — a task that involves integer multiplication and is consequently quite fast. When the element size is a power of two, then the compiler can even optimize the multiplication by using a simple shift operation, which can be even faster, depending on the CPU being used.

It should not be surprising that the internal structure of sequences is a bit more involved than that of `arrays`. `Arrays` are indeed nothing more than a block of memory, generally allocated on the stack for local data or allocated in the BSS segment for global data. Don’t worry if you do not yet have an idea of what the stack, the heap, and a BSS segment are; we will learn about them soon. The Nim `seq` data type, having a variable size, clearly requires not just a storage location for its elements, but also a counter to track its current number of elements and another counter for its maximum capacity. The element counter must be updated when we add or delete elements, and when the counter tells that there is currently no more space available for more elements, then a new block of memory must be allocated, and the existing elements must be copied from the old location into the newly allocated memory region before the old memory region can be released.[45] Due to this additional effort appending elements to a `seq` by using the `add()` `proc` is not extremely fast. You may wonder why we do not have to save size information for `arrays`. Well `arrays` have fixed sizes, so it is obvious that we never have to adjust something like a size counter, simply because the size would never change. But should we store the desired initial size of the `array`? In a way, yes. However, it is a constant value. During the compilation process, the compiler can already catch some errors for us — if we have an `array` as above with size 8, then the compiler would already be able to recognize some invalid access to `array` elements at compile time — a[9] would surely be a compile-time error. However, at runtime, when we execute our program, access to a non-existent index position may occur, for example, with constructs like `var i = 9; a[i] = 1`, when the `array` is declared as `var a: array[8, int]`. For catching that type of error, the compiler has to store the fixed `array` size somewhere and check against that value when an `array` access by using the subscript operator with a non-constant argument occurs, as the `a[i]` above. One related remark: Accessing `array` elements is as fast as ordinary variable access when we use a constant value as an index; that is, a constant literal or a named constant. The reason for this is, that when the index is a constant, then the compiler just knows the exact position of that `array` element in memory, just as it knows the address of plain variables, so there is no need for address calculations at runtime. Indeed, to access an `array` element at a specific constant index position, the compiler only needs to add a constant value to the current stack pointer, given that `arrays` are stored on the stack. To access a constant position in a `seq`, the compiler would have to add a constant to the base address of the memory block that contains the `seq` data.

Typically, if we need a container data type and its size is known at compile time, we use an `array` instead of a `seq`. This is because a `seq` has some minimal overhead and the compiler is better at detecting out-of-range access for `arrays` than for `seqs`. But there is one exception: Array instances declared inside of procedures and functions are stack-allocated, which ensures optimal performance for the allocation. However, we must remember that the stack size of a program is an OS-dependent constant and is generally not very large by default. On Linux, the default stack size is often only 8 MB, so it is clear that we cannot use arrays that are larger. We would use a `seq` in that case. Indeed, Linux users can use the ulimit command to increase the maximum stack size, but this is generally not recommended. Typically, very large stacks are not needed, and a restricted stack makes it easier for the OS to kill a program that does unlimited recursion due to a bug.

We said that appending elements to sequences is not extremely fast — indeed, it is several times slower than accessing an `array` element by its index using the subscript operator. So, when we know that our `seq` will need to contain at least a certain number of elements, it can be more performance-efficient to allocate the `seq` with this size from the beginning and then fill in the content using the subscript operator, rather than appending all the elements one by one. Here is one example:

var s: seq[int] = newSeq[int](8)
var i: int
while i < 8:
  s[i] = i * i
  inc(i)

We use the `newSeq()` procedure to initialize the sequence. The content of the square brackets instructs the `newSeq()` `proc` to create a sequence with a base type of `int`, and the number `8` as an argument indicates that the newly created sequence should contain `8` elements, each with the default value of 0. This procedure is what is known as a generic `proc`, and it requires additional information, specifically, the data type of the elements. Don’t confuse the square brackets in the `newSeq[int]()` call with the subscript operator `a[i]` used for `array` access, as they are completely unrelated. Note that the initialization of the `seq` above does not restrict its use in any way, we can still use it like an uninitialized `seq`, that is we can use the `add()` operator to add more elements, we can insert or delete elements, and all that.

Deleting elements from an `array` or a sequence can be very slow, particularly when we use the naive approach of moving all the elements located after the element that should be removed one position forward.[46]

This would maintain the order in the container, so sometimes this is the only solution, but of course, moving all the entries is expensive for large containers. Nim’s standard library provides the `delete()` function for this order maintaining delete operation. A much faster way to delete an entry in a `seq` or array is to remove the last entry and replace the one that should be deleted with that last entry. This operation moves the last entry to replace the one that should be deleted, so the order of elements is not maintained. Nim’s standard library provides the `del()` function for this faster, but order-changing delete operation. Naturally, we should use `del()` when the order is not important. The `delete()` and `del()` functions are actually only available for sequences, as `arrays` have a fixed size — but in principle, we could do similar operations with `arrays` as well; we just have to store the actual used size somewhere. [47]

In the section about `strings`, we mentioned that `strings` have value semantics. In other words, an assignment like `str1 = str2` creates a copy of `str2`, making `str1` and `str2` fully independent entities. As a result, modifying one does not change the content of the other. `arrays` and sequences behave in the same way; both have value semantics too. Indeed, `arrays` are true value types in Nim, as they live on the stack in the same way that plain variables like integers, `floats`, or characters do. Sequences have a dynamic data buffer, which is allocated on the heap, so it would be possible that an assignment like `seq1 = seq2` would not copy the data buffer but reuse the old one. In that case, both sequences would be not independent, `seq2` would be an alias for `seq1`. This is referred to as reference semantics, and some languages, such as Ruby, behave in this way. But in Nim, `arrays`, `strings` and sequences have value semantics; an assignment creates an independent copy. We will learn more details about reference semantics and the use of the stack or heap to store data soon when we discuss references to `objects`.

Some details

Let us investigate at the end of this section some internal details about `arrays` and sequences. Beginners who are not yet familiar with the concept of `pointers` should probably skip this subsection and perhaps come back later. We could consult the Nim language manual or the compiler’s source code to learn more details about `arrays` and sequences. Or we can write some code to test properties and behavior. Let us start investigating an `array`:

proc main =
  var a: array[4, uint64]
  echo sizeof(a)
  a[0] = 7
  echo a[0]
  echo cast[int](addr a)
  echo cast[int](addr a[0])

  var a2 = a
  a[0] = 3
  echo a2[0]

main()

When we run this program, we get this output:

32
7
140734216410384
140734216410384
7

The size of the entire `array` is 32 bytes, as we have 4 elements, each of which is 8 bytes in size. And the address of the `array` itself as well as the address of its first element are identical. Remember that the actual address values will differ with each run of our program and will be entirely different on different computers, because the OS randomly chooses the free memory area in which to run our program. This result is expected as the `array` is a plain block of memory stored on the stack. Indeed, the `array` follows copy semantics. When we create a copy called `a2` and later modify `a`, the content of `a2` remains unchanged. That’s not really surprising, so let’s investigate a sequence:

proc main =
  var dummy: int
  var s: seq[int64]
  echo sizeof(seq)
  echo sizeof(s)
  s.add(7)
  echo s[0]
  echo cast[int](addr dummy)
  echo cast[int](addr s)
  echo cast[int](addr s[0])

  var s2 = s
  s[0] = 3
  echo s2[0]

main()

When we run the above code, we get:

8
8
7
140732171249104
140732171249112
140463681433696
7

The first two lines of the output might confuse us, as a size of only 8 bytes could indicate a plain `pointer` value on a 64-bit system. Indeed, the sequence is not a large `object` that contains size and capacity fields, but only a tiny `object` that contains a single `pointer` to the data storage of that sequence. We know that it is not a plain `pointer` or `ref` because we cannot assign `nil` to sequences or test them for `nil`. (But an `object` which contains only a `pointer` is basically identical to a plain `pointer`, as Nim `objects` have no overhead as long as we do not use inheritance and when no padding to word size is needed for tiny fields like int8.) Capacity and length are stored also in the memory block that is allocated for the elements, as long as the sequence is not empty. Thus, empty sequences don’t consume much memory even when we have many of them, such as `arrays` or sequences of sequences (matrices). We use the dummy int variable in the code above as we know that plain ints are stored on the stack, and when we compare the addresses of our dummy variable and our sequence, then we see that the addresses indicate close neighborhoods, so the `seq` `object` is also stored on the stack. But the address of `s[0]` is very different, indicating that the data buffer is stored in a different memory region, which is the heap. If we continuously added elements to the `seq`, the address `s[0]` would eventually change, while the address of `s` would always remain the same. That is because the capacity of the data buffer would become exhausted at some point and a new data buffer with a different address would be used. Finally, we observe again that the sequence follows copy semantics, as the content of the copy `s2` remains unchanged when we modify the original sequence `s`. We could try to discover some more details of the internals of Nim’s sequences, i.e. we could try to detect where the capacity and size are stored. However, these are internal details that might not necessarily interest us, as they could change with new compiler versions or different compilers.

However, if you still have doubts about what we have explained, let’s delve one layer deeper. We strongly believe that a `seq` needs a length and a capacity field. And we assume that its data type should be `int`. We said that both fields should be adjacent to the buffer of the seq elements, which means at the start or at the end. Obviously, we can not access the end as long as we do not know the capacity, so the capacity field should be at the start, and then the length field also. We may find out which one is which by observing the content when the seq grows. So let us write some code:

proc main =
  var
    s: seq[int64] = newSeqOfCap[int64](4)
    s2: seq[int64]
    p: ptr int

  var h = cast[ptr int](addr s2) # prove that an uninitialized seq is indeed a pointer with nil (0) value
  echo cast[int](h) # address on stack
  echo h[] # value (0)
  echo ""

  for i in 0 .. 8:
    s.add(i)
    echo cast[int](addr s[0])
    p = cast[ptr int](cast[int](addr s[0]) - 8) # capacity
    echo p[]
    p = cast[ptr int](cast[int](addr s[0]) - 16) # length
    echo p[]

main()

The output when we run the program is:

140725732630192
0

140251431497824
4
1
140251431497824
4
2
140251431497824
4
3
140251431497824
4
4
140251431506016
8
5
140251431506016
8
6
140251431506016
8
7
140251431506016
8
8
140251431510112
16
9

Don’t worry if you do not understand the program and its output yet. You will better understand it when you have read the sections about references, `pointers`, and memory management. The first two output lines show us that an uninitialized `seq` is just a `pointer` pointing to `nil`. And the remaining output lines show us the address of the first `seq` element, the capacity, and the length of the `seq` whenever we add an element. We started with a `seq` with an initial capacity of `4`, so address and capacity are constant while we add the first 4 elements. Then the capacity of the allocated buffer is exhausted. A new buffer with a different address and doubled capacity is allocated, the already contained elements are silently copied to the start of the new buffer, and so on.

Multidimensional arrays and sequences

Nim does not support multidimensional arrays and sequences (also called matrices or tensors) as default built-in data types. However, we can create ordinary one-dimensional `arrays` and sequences, and each container element can be made an `array` or sequence again. For a two-dimensional matrix, we would then access an element with two indices like `m[i][j]`. To simplify element access, we can define a `template` for ourselves to just write `m[i, j]` instead. We can extend this to more than two dimensions. If you require matrices and tensors, you should also consider the use of external libraries, such as Arraymancer. Arraymancer is optimized for performance and also supports parallel operations like parallel matrix multiplication. In this section, we will present a few simple use cases for creating two-dimensional matrices and accessing their elements. This should be enough to get you started.

First, let’s create a chess board:

const
  Rows = 8
  Cols = 8

type
  Fig = int8
  Col = array[Rows, Fig]
  Board = array[Cols, Col]

var b: Board

const
  a = 0
  rook = 5 # whatever makes sense

b[a][0] = rook
echo b[a][0] # 5

# with user-defined templates we can simplify the index notation
template `[]`(b: Board; i, j: int): int8 =
  b[i][j]

template `[]=`(b: var Board; i, j: int; v: int8) =
  b[i][j] = v

b[a, 0] = rook
echo b[a, 0] # 5

Now, let’s investigate the case where one or both dimensions of the matrix can grow during program runtime, so we make those dimensions a `seq` instead of an `array`.

type
  T1 = array[4, seq[int]]
  T2 = seq[array[2, int]]
  T3 = seq[seq[int]]

var t1: T1
t1[0] = @[1, 2, 3]
t1[1].add(7)
echo t1[0][0] # 1
echo t1[1][0] # 7

var t2: T2
t2.add([1, 2])
echo t2[0] # [1, 2]

var t2x = newSeq[array[2, int]](10) # pre-allocate 10 rows
t2x[7] = [5, 6]
echo t2x[7] # [5, 6]

var t3: T3
t3.add(@[1, 2, 3])
t3.add(newSeq[int](1))
t3[1][0] = 19
for row in t3:
  echo row # @[1, 2, 3], @[19]

If both dimensions are dynamic, you can also use the `newSeqWith()` template from the `sequtils` module. We will cite the example of that module:

import std/sequtils
## Creates a seq containing 5 bool seqs, each of length of 3.
var seq2D = newSeqWith(5, newSeq[bool](3))
assert seq2D.len == 5
assert seq2D[0].len == 3
assert seq2D[4][2] == false

## Creates a seq with random numbers
import std/random
var seqRand = newSeqWith(20, rand(1.0))
assert seqRand[0] != seqRand[1]

Using seq/array types to create a matrix makes a lot of sense when the matrix is densely populated. For sparse matrices, using a hash table instead may save memory.

When iterating over matrices, keep in mind that for memory accesses such as `m[i, j]` and `m[i, j + 1]`, the RAM is accessed sequentially with good cache support. However, when the first index changes, we access memory regions that are far apart, implying inadequate cache support. We should keep this in mind, as it can significantly impact performance. Sometimes we can optimize loops for matrix access by altering our iteration method - either by rows or by columns.

Slices

Nim slices are `objects` of type `Slice` with two fields, a lower bound (`a`) and an upper bound (`b`). The `system` module also defines the `HSlice` `object`, called a heterogeneous slice, for which the lower and upper bound can have different data types:

type
  HSlice*[T, U] = object   ## "Heterogeneous" slice type.
    a*: T                  ## The lower bound (inclusive).
    b*: U                  ## The upper bound (inclusive).
  Slice*[T] = HSlice[T, T] ## An alias for `HSlice[T, T]`.

As the `Slice` and `HSlice` `objects` are not built-in types, their names start with capital letters. `Slices` are not used that often directly, but mostly indirectly with the `..` range operator, e.g. to access sub-ranges of `strings` and other containers.

One example of its direct use from the `system` module is

proc contains*[U, V, W](s: HSlice[U, V], value: W): bool {.noSideEffect, inline.} =
  result = s.a <= value and value <= s.b

`Slices` are used by functions of the standard library or by user-defined functions to access sub-ranges of `strings`, `arrays`, and sequences. Typically, we do not use an explicit `Slice` `object`, but we create the `Slice` by use of the infix `..` operator, which takes two integers and returns a `Slice` with these bounds:

Applied to container data types, slices look syntactically like sub-ranges:

var m = "Nim programming is difficult."
m[19 .. 28] = "not easy."
echo m
echo "Indeed " & m[0 .. 18] & "is much fun!"
var s = HSlice[int, int](a: 0, b: 18)
echo "Indeed " & m[s] & "is much fun!" # the same as line four

In line two, we use the slice to replace the sub-string "is difficult.", which starts at position `19`, with another `string`. Note that the replacement can be a longer or a shorter `string`, that is, the slice supports not only overwriting characters but also inserting or deleting operations. In line two, the actual `Slice` `object` is constructed by the `..` operator and the two integer bounds. In line four, we use the slice to access a sub-string and create a new `string` from it. As we learned earlier in the Strings section already, we can use the `^` operator to access elements counted from the end of the container, so we could have also written line two as `m[19 .. ^1] = "not easy."`. The last two lines in the above example show that we could have instead used a real `HSlice` `object` to access the sub-string.

`Slices` can be used in a similar way for `arrays`, `strings`, and sequences. But we have to remember that `Slices` are only `objects` with a lower and an upper bound, so there must always be a procedure that accepts the container and the `Slice` as arguments to do the real work.

When we are concerned with achieving the utmost performance, we have to be a bit careful with `Slices` as their use can generate copies. Consider this example:

type
  O = object
    i: int

proc main =
  var
    s = newSeq[O](1000000)
  for i in 0 .. (1000000 - 1):
    s[i] = O(i: i)

  var sum = 0
  for x in s[1 .. ^1]:
    sum += x.i

main()

Here, we use the slice construction operator `..` to exclude the first element from our summing operation. Unfortunately, when we use the slice operation in this way, the Nim compiler may create a copy of our sequence, which increases the run-time and memory consumption. At least for Nim versions up to 1.6, this was the case. Newer versions may use view types instead to avoid the copy. We may try to use the new toOpenArray() expression and attempt a construct like

  for x in items(s.toOpenArray(1, s.high)):

but that currently does not compile.

One current option is to create a custom `iterator` like:

iterator span*[T](a: openArray[T]; j, k: Natural): T {.inline.} =
  assert k < a.len
  var i: int = j
  while i <= k:
    yield a[i]
    inc(i)

and use

for x in s.span(1, s.high):

Alternatively, we may perform the summing in a procedure and pass that `proc` an `openArray` created with `toOpenArray()`, as shown below:

proc sum(x: openArray[O]): int =
  for el in x:
    inc(result, el.i)
echo sum(s.toOpenArray(1, s.high))

But this is a work in progress, so the situation may improve. See:

Value objects and references

We have already used different types of variables — integers, `floats`, characters, the custom Computer `object`, and some more. We said that variables are named memory regions or storage locations where the content of our variables is stored. These kinds of variables are sometimes called value types — to distinguish them from pointers and references.

Value types always imply copies when we do an assignment:

var i, j: int
i = 7
j = i
i = 3
echo i, j

Here, we have three assignments: first, we assign the integer literal `7` to the variable `i`; next, we assign the content of variable `i` to variable `j`; finally, we overwrite the old content of variable `i` with the new literal value `3`. The output of the `echo()` statement should be `3` and `7` because, in line 3, we copy the content of variable `i`, which is currently the value `7`, into variable `j`. The new assignment in line 4 in no way touches the content of variable `j`.

In section Objects we saw that the fields of `object` types like our `Computer` data type behave in the same way — assignments copy the content. The `tuple` data type, which has some similarities to `objects`, and which we will introduce later in the book, behaves the same. All these data types are stack-allocated, and we say that the data types have value or copy semantics. Even `strings` and sequences, which actually use a heap-allocated data buffer, behave in the same way in Nim.

Whenever possible, we should use this simple form of variables, as they are fast and easy to use.[48]

Perhaps that is not too surprising for you, but if we had references instead of plain variables, the situation would be different, as we will see soon. Actually, some other programming languages use reference semantics for entities like `strings` by default, for example in Ruby, an assignment of a `string` variable to another variable does not copy the content, so that both variables still use the same data buffer — when we then modify one variable, the content of the other changes too.

However, there are situations where we need some sort of indirection, and that’s when references and `pointers` come into play. For example, when the data entities depend in some form on each other, the elements may build linked lists, trees, or other structures. The entities may have some neighborhood relation, also called some many-to-one relation.

Indeed, value `objects` and references occur in real life also:

Imagine you have baked a cake for your family, and you know that your friendly neighbor loves cakes too. As you have still a lot of all the necessary ingredients and because the oven is still hot, you make one more identical cake to give it later to your neighbor. We can think of the cake as a value type, and your second cake can be considered a copy. When you give the copy to your neighbor, you still have your own, and when either you or the neighbor eats the cake, the other one still exists.

Now imagine that you know a good car repair shop. You can give the telephone number or location of that car repair shop to your neighbor, so he can use that shop too. So you gave him a reference to the shop, but you gave him not a copy. You can also give some of your other friends a reference to that shop, which requires nearly no effort for you, while baking a cake for all of them would require significant effort. But there is some danger with references: When one of your friends gets angry and burns down the car repair shop, then you and all your other friends have a serious problem.

You can regard the names of persons as some sort of reference too. Imagine you have a list with the names of all the people you intend to invite to your birthday party and another list with the names of people who owe you money. Some names may appear on both lists, indicating that they refer to the same person.

In computers, dynamic storage, called RAM, consists of consecutive, numbered storage locations, called words. Each individual word has its address, which is a number typically starting at zero and extending to a value, which is defined by the amount of memory available on your computer.[49] These addresses can be used to access the storage locations, that is, to store a value at that address, or to read the content again. Reading generally does not modify the content, you can read it many times and will always get the same value. When you write another value to that storage location, then the old content gets overwritten, and further reads will give you the new value.

Basically, for all the data that you use in your program, you need its address in the RAM in some form. Without the address, you cannot access it. But what about all the plain value object variables we have used before? We have never used addresses. That is true — we used only names to access our variables, and the compiler mapped our chosen name to the actual address of the variables in memory whenever we accessed the variable. For most simple cases, this is the best way to access variables. Now, let us assume we have such value object type of variable declared in our program, can we access it without using its name? When we have declared it, it should reside somewhere in the RAM when the program is executed. One way to access the content of the variable is by first determining its address from its name, which then allows us to access it either by name or by its memory location. Nim has the `addr()` function for this purpose, we give it the name of our variable as an argument and get its address. But this is rarely useful — if we can already access it by name, why should we then use its address to access it? One of these rare cases is when we want to call a C function and pass our variable, and that C function has an address parameter. Now, let us assume that we do not want to access our variable by name and that we do not know its address. Can we still access it? Well, we can search the whole RAM for the desired content. In practice, we would never do that, as it is stupid and would take very long, but we could do it. But how can we detect our variable? How can we be sure that it is indeed ours? Generally, we cannot. Even if we knew the value stored in that variable, we would only know what bit pattern it should have. Consequently, for most words of the RAM with a different bit pattern, we could say for sure that it cannot be our variable. However, whenever we find the expected bit pattern, it could just be a coincidence, as there could be many more words in RAM with that content. In some way, it is as if you would search for a person, and you know that the person lives on a long road with numbered houses. If you only know that the person wears brown shoes, but you do not know the number of the house nor the name of the person and no other unique property of that person, then you do have not much luck.

References and pointers

Introduction to pointers

In Nim, references are some form of smart or managed pointers. We will learn more about references later. The plain `pointer` data type is nothing more than a memory address. It is similar to an (unsigned) integer number. We say that a `pointer` points to an entity when the `pointer` contains the memory address of that entity.

Besides the `pointer` data type, which is just a RAM address, we also have the `ptr` entity. `Ptr` is not a datatype on its own, it is always used in conjunction with another data type:

var
 p: pointer
 ip: ptr int

Here the variable `p` is of type `pointer`, we could use it to point to some arbitrary memory address. The variable `ip` is of the type `ptr int`, which indicates that it should only point to memory addresses where a variable of data type `int` resides. So a `ptr` is a `pointer` that is bound to a specific data type. Generally, we speak only about `pointers`. Whether we are referring to an untyped `pointer` or a typed `ptr` is typically clear from the context.

When we only declare `pointers` but do not assign a value, then the `pointers` have the value `nil`, which indicates that they are regarded to point to nothing. Exactly speaking, a `pointer` can never point to anything in the same way as an integer variable can not contain any number. Just as an integer variable always contains a bit pattern, a `pointer` also always contains a bit pattern. But we are free to define a special pattern as `nil`, and whenever a `pointer` has this special value, then we know that it does not really point to something useful. In C instead of `nil`, NULL was chosen for the same purpose. In practice, `nil` and NULL are typically mapped to `0`, that is, a word with all bits cleared. However, this is more or less an arbitrary decision.

So how can we give our `pointers` above a useful value?

One possibility is to use Nim’s `addr()` function, which provides us with the memory address of each ordinary variable.

var
 number: int = 7
 p: pointer
 ip: ptr int
echo cast[int](p)
echo cast[int](ip)
p = addr(number)
ip = addr(number)
echo cast[int](p)
echo cast[int](ip)

First, we declare an ordinary integer variable called `number` which will reside somewhere in memory when we execute the program, and then we use the `addr()` function to assign the address of that variable to `p` and `ip`. The `addr()` function is a low-level function provided by the compiler. It can be used to determine the memory address of variables and some other entities known to the compiler.[50] We used the `echo()` `proc` to show us the numeric decimal value of the addresses in the terminal. Since it typically doesn’t make much sense to print addresses, `echo()` would refuse to do so. Therefore, we have used the construct `cast[int](someValue)` to instruct `echo()` to regard our `pointers` as plain integers and print them. That operation is called casting. We should mostly avoid it because it destroys type safety, but for learning purposes, it’s acceptable to use it. We will learn more about casts and related type conversions later.

The first two `echo` statements should print the decimal value `0`, as the `pointers` initially have the default value `nil`.

The `echo()` functions in the last two lines should print a value different from `0`, as we have assigned the valid address of an ordinary variable that resides in the RAM when the program is executed. Both outputs should be identical, as we have assigned `addr(number)` to each of the `pointers`.

An interesting fact, perhaps, is that when you run the program multiple times, the outputs of the last two `echo()` statements print different values. But that is not really surprising — whenever you launch the program, then for our variable `number`, a storage location in RAM is reserved. That location can vary with each new program execution. Just like on your next holiday at the same hotel, you might get a different room. So when we have the `pointer` `ip` pointing to a valid address, can we recover the content of that memory region? Sure, we use the dereference operator `[]` for that purpose. Whenever we have a typed `pointer` `x` we can use `x[]` to get the content of the memory location where the `pointer` is pointing to. Note that the operator `[]` is not really related to the subscript operator `[pos]` that we used earlier for `array`, `seq`, and `string` access. Nim uses ASCII characters for its operators, and that set is not very large. And maybe it would even be confusing when we would have a different symbol for each operator. We can consider `[]` as some form of content access operator — `mystring[pos]` gives us the character at that position, and `ip[]` gives us the content of the memory location where `ip` points to.

var
 number: int = 7
 ip: ptr int
echo cast[int](ip)
ip = addr(number)
echo cast[int](ip)
echo ip[]

What do you expect the output of the last `echo()` statement to be? Note that for the last `echo()` statement we do not need a cast, as `ip[]` has a well-defined type: `ip` has type ptr int, so `ip[]` is of well-defined type `int`, and `echo()` can print the content.

Now, let us investigate how we can use `pointers` to modify the content of variables:

var
 number: int = 7
 ip: ptr int
ip = addr(number)
echo ip[]
ip[] = 3
echo ip[]
echo number

What do you expect for the output of the last `echo()` statement? Well, remember, `ip` points to the location where the variable `number` is stored in RAM. So echo `ip[]` gave us the content of the `number`. Now `ip[] = 3` is an assignment, and the right side of the assignment operator is the literal number `3`, which is a value type. Earlier we said that for value types an assignment is a copy operation, the right side of the assignment operator is copied into the variable on the left side. Now `ip[]` stands for exactly the same content as the variable `number`, so assigning to `ip[]` is the same as assigning to `number`.

Pointer arithmetic

In low-level programming languages, `pointer` arithmetic can be useful. For example, old C code often iterates with `pointer` arithmetic over `arrays` using constructs such as `sum += *(myIntPtr++)`. This was done to maximize performance. Modern C compilers generally understand statements like `sum += el[i]; i++` and generate very efficient assembly instructions for them. Therefore, `pointer` arithmetic is not as necessary for C as it once was.

Nim does not provide math operations for `pointers` directly, but we can always cast `pointers` to integers and do arbitrary math. And of course, we could define our own operators for that purpose, but typically we should avoid that, as it is dangerous, error-prone, and generally not necessary. As an example, let us sum up some `array` elements:

proc main =
  var
    a: array[8, int] = [0, 1, 2, 3, 4, 5, 6, 7]
    sum = 0
  var p: ptr int = addr(a[0])
  for i in a.low .. a.high:
    echo p[]
    sum += p[]
    echo cast[int](p)
    var h = cast[int](p); h += sizeof(a[0]); p = cast[ptr int](h)
    #cast[var int](p) += sizeof(a[0]) # this compiles but does not work currently

  echo sum
  echo typeof(sizeof(a[0]))

main()

When we do `pointer` arithmetic or similar math to calculate the address of variables in the computer memory, then memory addresses are used like integer numbers, and so it makes some sense that Nim’s integers have the same byte size as `pointers`. Note that for arrays, addr(a[0]) is identical to addr(a), because an array is just a memory block, and the address of the block is identical to the address of the first element. Actually, in the general case, we should have used `addr(a[a.low])` instead of `addr(a[0])`, since `array` indices don’t necessarily have to start at position zero. For sequences and strings, addr(s[0]) is not identical to addr(a), as sequences and strings are objects, that contain not only the data buffer but also other data like the capacity. When we have to pass the data buffer of `strings` or sequences to C functions, we typically pass `addr(s[0])`, or in the case of `strings`, we may pass `s.cstring`.

References:

Allocating objects

In the previous section, we learned the basics of `pointers`. We used the `addr()` operator to initialize the `pointer` by assigning the address of an existing entity. However, this approach isn’t commonly used in practice and can be somewhat risky, as it’s not always guaranteed that the variable we apply `addr()` to will persist for the lifetime of our `pointer`. As a result, our `pointer` might eventually point to a memory location that’s already been freed or is now occupied by a completely different object. For this reason, the use of `addr()` is generally reserved for experienced programmers who have a firm understanding of its implications. Typically, `addr()` is unnecessary except in instances of low-level code, such as when interfacing with external libraries written in C. Instead of using `addr()` to assign a valid address to `pointers`, procedures such as `alloc()` or `create()` are often employed to reserve a block of memory:

var ip: ptr int
ip = create(int)
ip[] = 13
echo ip[] * 3
var ip2: ptr int
ip2 = ip
echo ip2[] * 3
dealloc(ip)

Here, the procedure `create()` is used to reserve a block of memory. The `int` parameter ensures that the block has the size of an integer value. After `ip` has a valid value, we can store a value in that memory location and read it again. Note that multiple `pointers` can point to the same memory location: We declared one more `int ptr` called `ip2`. However, for that `pointer`, we do not allocate a new block; instead, we assign the old block that we allocated for `ip` to `ip2`. Now both `pointers` point to the same object, the `int` value `13`. We may call `ip2` an alias, as it is a different way to access the same entity.

When we use `alloc()` or `create()` to allocate memory blocks, we have to deallocate them when we no longer need them. Otherwise, those memory blocks couldn’t be reused. If we continuously allocated memory blocks and never deallocated, or freed them, at some point all memory would be occupied — not only for our own program but for all programs currently running on the same computer. We would have to terminate our program - when a program is terminated, all resources are automatically freed by the OS.

The use of procedure pairs like `alloc()` and `dealloc()` is common practice in low-level programming languages like C, but it is inconvenient and dangerous: We can forget to call `dealloc()` and waste resources, or we may even deallocate memory blocks, but still use it by our `pointers`. The latter would at some point in time crash our program, as we would use memory blocks that are already released and may now be reused for other variables — from our own program or from other programs.[51] Note that in the source code above, there is only one single `dealloc()` call. The reason for that is we only allocated one single memory block in a single `create()` call; `ip2` is merely another `pointer` that points to that block. If we had used an additional `dealloc(ip2)` call, then that would be a so-called double-free error.

As you can see, using `pointers` is inconvenient and dangerous. However, there are situations where plain value type variables do not suffice. The solution of many higher-level programming languages to this problem is a Garbage-Collector (GC). The GC does the dangerous and inconvenient task of deallocating unused memory blocks for us automatically.

To distinguish the GC-managed "pointers" clearly from the manually managed ones, we call them in Nim references, in some other languages they are called traced `pointers`. References are always typed like `ptr`, there is no equivalent to the untyped `pointer` type for references.

For references, we still have to allocate the memory ourselves, before we can use the references. When we are done using them, the GC automatically frees the corresponding memory block. A typical scenario is that we use references in a procedure or in an otherwise limited block of code: We declare the reference in that code block, allocate and use it. When we exit the code block, the GC automatically frees the allocated memory. You might think that the fact that we still have to allocate the memory for our references ourselves is a concern, as we could forget that step. Well, it is not that dangerous; if we forget the allocation step, we would use a reference with the value `nil`, which would immediately result in a runtime error. So we would notice the problem immediately. However, other `pointer` errors, such as missing de-allocation or use-after-free, are less obvious and more dangerous. In languages like C tools like Valgrind are used to check for errors like "use after free". Valgrind is a very helpful tool, but it can not find all errors that may occur, and its reports can be very verbose. We may use Valgrind as well when we compile our Nim program with `--mm:arc` and `-d:useMalloc` — this can be used to ensure that our program really works perfectly, maybe when we have to use C libraries, and it may help us find the cause for bugs.

With references, we can rewrite our previous example code as follows:

var ip: ref int
new(ip)
echo ip[] # zero
ip[] = 13
echo ip[] * 3
var ip2: ref int
ip2 = ip
echo ip2[] * 3

We have replaced `ptr` with `ref`, and instead of `alloc()` or `create()`, we are using the `new()` `proc`. This procedure takes an uninitialized `ref` as a parameter and allocates a managed memory block for it. After the `new()` call, `ip` refers to a well-defined, managed memory block that can store an integer value. The content of that memory block is cleared initially, so `echo ip[]` would give zero. Again, we can create another reference, `ip2`, and assign to it the value of the other. As a result, both now refer to the same memory block. The advantage here is that we don’t have to worry about deallocating that block; the GC will handle it when appropriate.

To verify that in the example code above, both references really refer to the same `object` in memory, we could add two more lines of code:

ip2[] = 7
echo ip[]
echo ip2[]

Here, we are using the reference `ip2` to assign to the memory block the literal value `7`. After that assignment, both `echo()` statements would display that new content.

Using references and `pointers` to store basic data types like integers isn’t very common. In most cases, we work with larger `objects` and establish relationships between them. We will try that in the next section.

References to objects

You might still wonder what references are really useful for — they seem to be only a more complicated version of plain value type variables.

Now, let us assume we want to create a list of things or persons, maybe a list of our previously used `Computer` data type, or perhaps a list of persons we will invite to our next party. We will create the party list for now, as the `Computer` data type we used before has already many fields, and filling all the fields would be some effort, so let us use a new `Friend` data type which should store only the friend’s name for the beginning — we may add more fields later when necessary. So, we might have

type
  Friend = object
    name: string

With that declaration, we could declare a few `Friend` variables like this:

var harry, clint, eastwood: Friend

But that is not what we want. We would need a list of all our friends that we would like to invite to our party, we would want to add friends to the list, and potentially, we might also want to delete friends. You may think we could use Nim’s sequence data type for that, and you are right. But let us assume we could not use that predefined Nim data type for some reason. Then we could create a list of linked references to Person.

type
  Friend = ref object
    name: string
    next: Friend

Now our `Friend` data type is a reference to an `object`, and the `object` itself has an additional `next` field, which is again of type `Friend`.

This is a sort of recursion. If this seems too strange, imagine you have some numbered paper cards, each with two fields: one labeled 'name' and another labeled 'next'. In the 'name' field, you can fill in a friend’s name, and in the 'next' field, you write the number of the next card. The last card in the chain leaves the 'next' field empty.

In languages like Nim or C, lists — also called linked lists — are dynamically created data structures consisting of elements (called nodes), where each node has a field, which is a reference or `pointer` to its successor or predecessor. When the nodes have only a successor field, we call the list a singly-linked list, and when it also has a predecessor field, then we call it a doubly linked list. Contrary to `arrays` and Nim’s sequences, lists do not allow access to arbitrary elements; we can only traverse the list starting from its first element for singly-linked lists, or from its last element, for doubly-linked lists. The first element of a list is also called its head, and the last element is called its tail. Often, the `head` and the `tail` elements are just plain nodes. However, the `head` can also be an extended node `object` with additional fields that carry information for the whole list, such as an additional `string` field for the list name and an integer field for the actual list length. In this section, we use the simplest form of a list, which is a single-linked list, where the head is just an ordinary node. If the head has the value nil, then the list is empty.

Now, let’s create a small Nim program. It will read the names of our friends from the terminal, create a list of all friends, and finally, print the list.

type
  Friend = ref object
    name: string
    next: Friend

var
  f: Friend # the head of our list
  n: string # name or "quit" to terminate the input process

while true:
  write(stdout, "Name of friend: ")
  n = readline(stdin)
  if n == "" or n == "quit":
    break
  var node: Friend # (1)
  new(node)
  node.name = n
  node.next = f
  f = node

var ff = f # save f for later...
while ff != nil:
  echo ff.name
  ff = ff.next
1 The actual name for this temporary variable is arbitrary, we could have used `el` for element, maybe.

This example code doesn’t seem to be that easy. But it is not really difficult, and when you have understood it, you can already call yourself a Nim programmer. Perhaps you should think about the code above for a few minutes, before reading the explanations below.

First, let’s summarize what our program should do: It’s designed to read in the names of friends whom we’d like to invite to our next party. Of course, when entering the names, we would need a way to tell that we are done. In our program, we can do this in two ways: either by entering an empty name — just pressing the return key — or by entering the text "quit" to stop the loop. Unfortunately, this means we can never invite a friend named 'quit' to our parties. When we have terminated the input loop, then the next loop prints all the entries to the terminal.

Let us start with the type and variable declarations: We use a user-defined type named `Friend`, which is a reference to an `object`, that object type has a field `name` of type `string`, and a field `next`, which is again a reference to the same data type.

We are using two variables: one called `n` of type `string`, to read in a name or the quit command from the terminal, and another variable called `f` of type `Friend`. While the variable `f` seems to represent just a single friend, its `next` field means it can actually represent an entire list of friends, with `f` as the starting point or head of that list.

In the code above, we are using a special `while` loop — special because the construct `while true:` and because the loop contains a `break` statement. Earlier, we said that we should avoid the `break` statement in loops because it interrupts the control flow and can make it more difficult to understand and prove the flow. But in this case, that form makes some sense: For the first loop, we have to first read in a name from the terminal, and then we can decide what to do, so we can not really evaluate a condition after the `while` statement at the top. So we use the simple constant condition `true`, which would never terminate the loop. We need a `break` inside the loop body to terminate the loop.

Let’s first investigate the second loop, as it’s relatively straightforward: We use a new variable named `ff` in place of `f` for this loop to ensure the original `f` remains unmodified, preserving it for further use. In the `while` condition, we check if the current value of `ff` is `nil`, indicating that there are no more entries in our list. In that case, we terminate the loop, as we are done. If `ff` doesn’t equal `nil`, then `ff` points to a valid content — i.e., there’s at least one valid name that we can access using the field access operator and print with `echo ff.name`. Note that in Nim the field access operator `.` works in the same way for value `object` types as well as for `ref` `object` types. For `ref` `object` types, we could also use `ff[].name` instead of just `ff.name`. This means we first apply `[]` to `ff` to get the content, then use the `.` operator to access the `name` field. In some other languages like C, we would have to use a special operator -> to access fields of `pointer` or reference types.

The most intriguing statement in the output loop is `ff = ff.next`. We assign the content of `ff.next` to `ff` and proceed with that new content. The content could be a valid reference to one more `Friend` `object`, or it could be `nil`, indicating that our loop should terminate.

The input loop is also not that complicated: To make the process of adding more friends to the list easy, we always add new names at the beginning. First, we ask the user to enter a name. We use `write(stdout)` for this, as `echo()` always generates a new line, but we want to read in the name on the same line. If the name is empty or has the special value 'quit', then we terminate the input loop. In the loop, we use a temporary variable called `node` of type `Friend` and allocate a memory block for it with `new()`.[52] Then we assign the read in friend’s name `n` to the `name` field. The last two statements in the loop body can be a bit challenging to understand: First, we assign the value of `f` to `node.next`. Now, `node` is basically the start of our list, and its `next` field refers to the first element of the current list. Fine, but we said that the node variable is only a temporary variable, we do not intend to use it longer as necessary. However, `node` is currently the head of our list, making it very useful. On the other hand, the former starting point `f` is now redundant as the current `f` is identical to `node.next`. So the trick is, we just assign to `f` the value of `node`. Now, `f` represents the complete list, and we no longer need `node`. We can reuse the `node` variable in the next loop iteration, but we must allocate a new memory block for the `node` reference. The previous memory block is still in use; it contains the name we just entered and a reference to the next `object` in the list.

Note that we add new elements at the top of the list using this method. We’ve chosen this approach because it’s quite straightforward. For adding at the end of the list, we would have to use one more reference variable which allows us always access to the current end of the list, or we would have to traverse the list from head to tail whenever we would like to add elements at the tail.

For another exercise, let’s consider deleting entries from our list. Essentially, this operation is straightforward; we would just skip one entry. Let’s incorporate the following code into the previous program:

var f1 = f # save original f
while f1 != nil:
  write(stdout, "Name to delete: ")
  n = readline(stdin)
  if n == "" or n == "quit":
    break
  if f1.name == n:
    f1 = f1.next
  else:
    while f1.next != nil:
      if f1.next.name == n:
        f1.next = f1.next.next
        break
      f1 = f1.next

Here, we’re once again using an outer `while` loop to read in the names we want to delete. That loop uses the condition `while f1 != nil:` because, naturally, we should stop when the list is empty.

In the loop body, we have an `if` statement, and within the `else` branch of this `if` statement, we have another loop. The reason we need the `if` statement is that the case where the name to delete is the first in the list is somewhat special. Let’s examine the inner loop first. That loop operates under the assumption that there are at least two elements in the list, `f1`, and `f1.next`. We compare the name of the next entry with `n`. If they match, then we would have to skip the next entry. We can do that by the statement `f1.next = f1.next.next`. That is, we replace the reference from the current element `f1` to the next list entry, that is `f1.next`, by the next entry of the next element, which is `(n.next).next`. We do not have to write the parenthesis. The `n.next.next` entry can be `nil`, in that case, it is the end of the list. If we found a matching name, then we terminate the inner loop with a `break` statement, and we are done. Otherwise, we assign to `f1` the value of `f1.next` and continue the loop execution. Now to the special case where the name to delete is the first in the list. We need the first `if` branch for that — if already the first element matches the name to delete, then we just skip the first element by setting the head of the list to the next entry, which may or may not be `nil`.

This is one way to solve the task. For operations on lists, there are usually various solutions, some optimized for simple or concise code, some for performance. You may copy the code segment above to the end of the former code, and maybe add one more copy of our printing loop at the end again. Afterwards, you will have a program that reads in a list, prints the contents, asks for names to delete, and ultimately prints the updated list. Perhaps you can improve the code, or maybe you can detect special corner cases where it may fail. What happens, for example, when some of your friends have the same name? Might the program fail in that case? Or you may add more fields to your `Friend` data type. You could include a text field indicating 'male' or 'female', and subsequently report the male-to-female ratio. Could you potentially remove males from the list when there are more males than females?

For references to `objects`, the assignment operator `=` copies the references, but not the `object`. Similarly, the operator `==` used for equality tests compares the references, not the content of the `objects` to which the references point. If you want to compare the content of the `objects`, you can apply the dereference operator `[]` on both references:

type
  RO = ref object
    i: int

var
  ro1 = RO(i: 1)
  ro2 = RO(i: 1)
  ro3 = ro1

echo ro1 == ro2 # false
echo ro1[] == ro2[] # true
echo ro1 == ro3 # true
In modern Nim, we generally use the constructor syntax like `var ro1 = RO(i: 1)` or `var computer1: Computer(price: 799.99; quantity: 2)` to allocate and initialize ref objects, and avoid explicit `new()` calls for the allocation, followed by explicit field initialization. The constructor syntax is more compact, and the combined construction with initialization may allow the compiler to reason about the code more effectively and to produce better code. As the constructor syntax looks the same for value and ref objects, this may also simplify later changes of the program. Thus, the use of explicit `new()` calls is mostly considered a legacy approach and it is highly recommended to use `object` constructors instead. In rare instances where a complex constructor call fails to compile, one may resort to using `new()`.[53]

Procedures and functions

Procedures and functions, called `proc` and `func` in Nim, are used to structure the program source code. Functions, a subtype of procedures, return a value but do not modify global variables or otherwise change the state of the program. When we talk about procedures in this book, what we say applies to functions as well, unless stated otherwise.

Procedures and functions are typically used to group sequences of statements that perform a specific task.

We can pass parameters to procedures, e.g., data that the procedure should process, and the procedure can return a result. Related sets of procedures can be grouped into library modules, e.g., `procs` that perform various `string` operations. We will discuss the use and creation of modules later in the book.

The terms procedure and function were used in Pascal and other languages of Wirth already, while C uses the term function only, and Fortran uses the term subroutine instead. Finally, Python and Ruby use the rather unusual terms def and fun respectively.

Nim’s procedures are fundamentally similar, yet much more advanced than their equivalently named counterparts in the Wirthian languages or the plain functions in the C language. Nim’s `procs` support generics, overloading, named parameters, default values, special parameter types such as varargs and openArray, various methods of returning a result, and multiple calling conventions, including method and command calling conventions.

Introduction

We call or invoke a `proc` by just writing its name, followed by a parameter list enclosed in parentheses. The parameter list can be empty. When we call a `proc`, the program execution continues with that procedure, and when the execution of the procedure terminates, the next statement after that `proc` call is executed. Sometimes we say that we jump into a procedure and jump back when that procedure terminates.

In Nim, functions are a special form of procedures that return a result and do not modify the current state of the program. Modifying a global variable or performing an input/output operation would be examples of modifying the state. We have already used some predefined procedures like `echo()` for output operations, `add()` for appending single characters to `strings`, and `readLine()` for reading in textual user input. And we talked about math functions like `sin()`, `cos()`, `pow()` — these are functions as they accept one or two arguments and return a result, but do not change the state — calling them again with the same arguments would always give the same result. The procedure `readLine()`, despite its name, is not a function, as the result typically varies for each call: We pass a file variable as an argument, which might change its state for each call, possibly because the end of the file is reached. A function is only a special subtype of a procedure. The `func` keyword indicates to the reader of the code and to the compiler some special properties, namely, that a result is returned and that the global state is not changed. Whenever the `func` keyword is used, a `proc` would suffice as well, and in this text, we mostly speak about procedures, even when a function would suffice.

Let us start with a very simple function called `sqr()` for squaring.

func sqr(i: int): int =
  i * i

A procedure declaration consists of the keyword `proc`, a user-selected name, an optional parameter list enclosed in parentheses, and an optional colon followed by the result data type. For a function declaration, we use the keyword `func` instead of `proc`, and as functions return a result, we have to specify the `result` data type.

Note that this is only a declaration so far — the compiler could recognize the construct, its parameters, and its result type. We sometimes call this construct a procedure header.

Typically, we do not only declare a function, but we define it, that is, we add an equal sign to the `proc` header and add an indented procedure body that contains the code that is performed for each invocation.

Pure `proc` declarations can be necessary for rare situations, such as when two procedures call each other. In this case, the procedure defined first would call the other procedure, which is not already defined, so the compiler may complain about an unknown procedure. We could solve that problem by first declaring the second procedure so that the compiler would know about its existence. We would then define that second procedure later, closer to the end of the program file.

The `sqr()` `proc` above accepts an integer argument and returns its square, which is also of the same data type. We would call that `proc` as follows:

var j: int
j = 7
echo sqr(j)

Earlier in this book, we said that the compiler processes our source code from top to bottom and that the final program is executed from top to bottom too. The first statement is indeed true, for that reason, it can be necessary to declare a function at the top, and define it below, as we can not call a `proc` before it is declared or defined.

For the program execution, we have to know that `procs` are only executed when we call them. That is, when we write a `proc` at the top of our source code, then that `proc` is processed by the compiler, but it is not executed during program runtime before we call it. As the Nim compiler supports dead code removal, the code of procedures that we never call would not be included in the final executable.

The procedure body builds a new scope. We can declare entities like variables, constants, types, or other procedures and functions in that scope. These entities are only visible in the procedure body, but not outside the `proc`.

Parameter lists of procedures consist of one or more lists of parameter names, separated with commas, followed by a colon and the data type of the parameters. The sub-lists with the same data type are separated by semicolons:

proc p(i, j, k: int; x, y: float; s: string)

While the Wirthian languages would require semicolons to separate the parameter blocks, in Nim we could also use plain commas for that. For the data types of procedure parameters and as the result type all of Nim’s data types are allowed, including structured types, `ref`, `pointer`, and container types. Additionally, we can use the data types `openArray` and `varargs` as parameter types — these two types are not allowed for ordinary variables, and varargs is not valid as a result type. We will learn the details of all these types soon. When we call or invoke a procedure, we can pass literal values, named constants, variables, or expressions to it.

When we call a procedure with multiple arguments, we have to specify the arguments in the order in which they are listed in the `proc` header, separated by commas, and the arguments must have compatible data types:

var i: int = 7
var x: float = 3.1415
p(i, 13, 19, x, 2.0, "We call proc p() with a lot of parameters")

Here, compatible data types mean that for the `i`, `j`, and `k` parameters, which are specified as `int` types in the `proc` definition, variables of smaller `int` types like `int16` would work. For the two parameters of the `float` type, we would have to pass floating-point variables or a `float` literal. As a special case, an `int` literal would also work, as the compiler knows the desired data type and automatically converts the `int` literal into a `float` for us, as long as that is possible without loss of precision. We could pass `2` instead of `2.0`, but passing a very long `int` literal with more than 16 digits may fail at compile time:

proc p(i, j, k: int; x, y: float; s: string) =
  echo s

var
  n: int16
  m: int # int64 would not compile
  z: float32
p(n, n, m, 1234567890, z, "")

Actually, `float32` types and `int` literals up to ten digits seem to work for `float` parameters, but even on 64-bit systems, the `int64` data type is not permitted for `int` parameters. As you can see from the example above, it is possible to pass the same variable multiple times as a parameter, and empty `string` literals are, of course, allowed too.

Nim also supports default values for `proc` parameters and named parameters; that is, we can leave parameters unspecified and use the default value, or use the actual parameter names, like in a variable assignment, when we call a `proc`:

proc p(i: int; x: float; s: string = "") = echo i.float * x, s
p(x = 2.0, i = 3)

Here, we used named parameters when calling the `proc` `p()`. This way, we can freely order the parameters, and as parameter `s` has a default value, we can leave it unspecified and just use the default value.

Functions always return a result, and procedures can return a result, but they don’t have to. In the C language, function results can just be ignored, but in Nim, whenever there is a result, then we have to use it at the call site; that is, we have to assign the returned value to a variable, or we have to use it in an expression. Nim enforces this, as generally, the returned value is important. The returned value can be the actual result, as in a `sin()` call, or it may give us additional information, like the number of read characters when we do text processing, or perhaps an error indication, like the end of the file. For the rare conditions when we really intend to ignore the result of a function call, we can call that function as `discard myProcWithResult(a, b,…​)`. Another solution is to apply the {.discardable.} pragma to the function definition. We will learn more about pragmas later. When a procedure should not return a result, then we can use the `void` return type or just leave the return type out — the latter is recommended, void types are used only rarely in Nim. When the `proc` has no parameters at all, then we can even leave out the empty parameter list in the procedure definition:"

proc p1() =
  echo "Hello and goodbye"

proc p2 =
  echo "Hello and goodbye"

proc p3: void =
  echo "Hello and goodbye"

Calling procedures

When we call a procedure or a function, that is, when we intend to execute it, we always have to specify a parameter list enclosed in brackets, but the parameter list can be empty:

var i = myFunc(7)
var j = myF()
var p = myF # not a function call, but an assignment of the proc to variable p

Note that the last line in the above code is not a call of `myF()`, but an assignment of that function to the variable `p`. We will discuss this use case soon.

We have already learned that we can also use the method call syntax, like `7.myFunc` instead of `myFunc(7)`, that we can use the command invocation syntax like in `echo "Hello"`, and that we should avoid putting a space between the `proc` name and the opening bracket, as that would be interpreted as a command call with a `tuple` argument. When the function or procedure expects multiple arguments, we separate the arguments with commas, and we generally put a space after each comma. For the use of the command invocation syntax, there are some restrictions: When the procedure has more than one parameter and returns a result, the command invocation syntax cannot be used:

proc p(i, j: int): int = i + j # command invocation syntax does not work
proc p2(i, j: int) = echo i * j
echo p(1, 2) # ordinary proc call
echo 1.p(2) # method call syntax
p2 1, 2 # command invocation syntax
echo p (1, 2) # argument looks like a tuple, so this would not compile

For the `proc` definition above, we wrote the body statement directly after the equal sign. This is possible and sometimes used for very short procedures. Indeed, here `p()` is a function.

In the examples above, we passed plain integers as parameters to procedures. But of course, `proc` parameters can have any type; we can pass `strings`, `arrays`, `objects`, and more. The method we use to pass the parameters to the `procs` is sometimes called 'pass by value', an old term introduced for the Pascal language, used to indicate that the passed parameter seems to be copied to the `proc`. The `proc` is not able to modify the original instance. In the next section, we will learn about the `var` parameter type, which is used when we want to allow the `proc` to modify the original instance. In the Wirthian languages, the procedure parameters actually get copied, so inside the `proc`, we could modify them, but only the copy is modified, and the original instance remains unchanged. In Nim, it’s a bit different. When we pass parameters by value to a `proc`, we cannot modify it at all in the `proc` body. If we require a mutable copy, we have to generate that copy ourselves in the `proc` body. This allows some optimizations: Nim does not really need to copy the `proc` parameters, as they are immutable, Nim can just work with `pointers` to the original instances internally. In fact, there are rumors that for parameters smaller than `3 * sizeof(float)`, Nim copies the instances, but for larger instances, Nim works internally with `pointers` to the original value. However this is an implementation detail — data copied to the `procs` stack allows the fastest access, but on the other hand, the initial copy process can be expensive, so it is a compromise.

Procedure parameters of var type

Our `sqr()` function above accepts only one parameter, and that parameter is a value type, which indicates that we cannot modify it in the procedure body. That fact is useful to know for the caller of a `proc`, as one can be sure that the passed parameter has not been modified and is available unchanged after the `proc` call.[54] But of course, there are situations where we may want a passed parameter to be modified. Let’s assume that we want to "frame" a passed `string`; for example, we might want to pass in the `string` "Hello" and change it to "* Hello *". Furthermore, let’s assume that we might sometimes want to use other characters instead of the asterisk, perhaps a `+` sign.

proc frame(s: var string; c: char = '*') =
  var cs = newString(2)
  cs[0] = c
  cs[1] = ' '
  insert(s, cs)
  add(s, ' ')
  add(s, c)

# we can call that proc like
var message = "Hello World"
frame(message)
echo message

Note: In the Wirthian languages, we actually put the `var` keyword for procedure parameters in front of the parameter name; that is, we would have to write `proc frame(var s: string; c: char = '*') =` for the procedure header.

The `frame()` procedure above accepts two parameters and returns no result. The first parameter has the type `string`; it is not a value parameter but a `var` parameter, which is indicated by the `var` keyword between the colon and the type of the parameter. Note that we use here again the keyword `var` that we used earlier to declare variables. The main reason we use the same keyword again is that we do not want to use a new one — `var` `proc` parameters are different from `var` declarations. Parameters of `var` type can be modified in the procedure body, and that modification is visible after the `proc` call.[55] The second `proc` parameter is a plain value type; it is a character that has the default value '*'. To specify a default value for a parameter, we write an equal sign after the parameter type followed by the actual default value, as we would do in an assignment. Indeed, as in an assignment, we can even leave out the colon with the data type in this case, at least for the case that the compiler can infer the correct data type from the assigned default value. Default values are useful for parameters that have in most cases the same value but can be different sometimes. The advantage is that when calling that procedure, we can simply leave that parameter out. For default values, we have to be a bit careful; only value parameters can have default values. Furthermore, when we call a procedure with many parameters with default values, it may not always be clear which parameter we pass and for which parameter we want a default value.

It should be obvious that passing literals or named constants as `var` parameters, as in `frame("Hello")`, makes no sense and results in an error message from the compiler.

To generate the frame around the passed-in `string`, we need to insert two characters at the beginning of the `string` and append two more characters. Inserting in `strings` is not a very cheap operation, as it involves moving all the following characters. So we try not to insert two single characters, but we first create a short `string` consisting of the passed `c` character and a space character, and then insert that two-character `string` at the beginning of the passed `string`. We use the standard procedure `newString()` with parameter `2` to create a new `string` of length `2` with undefined content, and then fill in the content by using the subscript operator. We could have used the `add()` `proc` to add that two characters to an empty `string`, but that is a bit slower. Then we use the standard `proc` `insert()` to insert our two-character `string` at the front of our passed `string`. Finally, we add a space and the `c` character to the passed `string`. The passed `string` is now modified; it is four characters longer. That modification is noticeable for the caller of that procedure; in other words, `echo()` will print the modified version. Actually, when we think about it, we might feel that our strategy to first create the two-characters `string` `cs` is a bad idea, as the allocation may cost more time than just inserting the individual characters directly.

Passing mutable arguments to procedures using the `var` keyword was sometimes called "pass by reference" in the old Wirthian languages like Pascal. This leads to confusion for some people, unfortunately. Of course, `proc` `var` parameters are not really related to Nim’s `ref` type. Well, using Nim’s `ref` data types would also allow modification of `proc` arguments, just as using `pointers` would. But we never use `ref` types in Nim just to be able to modify passed data in `procs`, and also not to avoid a possible expensive copy operation for value types. We could create a `ref` instance with `var intRef: ref int = new int`, pass that `intRef` to a `proc`, and thereby allow the modification of the actual value to which `intRef` points, from inside the `proc`. However, this would be unnecessary, as the `var` parameter already allows for this. In Nim, we use reference types when we really need them, such as when we require reference semantics, or when we need to create highly dynamic, many-to-one data types, like tree structures.

Our `frame()` procedure above modifies the passed `string`. Instead, we could have decided that the procedure should not modify the `string`, but should return a new `string` consisting of the frame with the passed `string` in the center. Generally, when creating `procs`, we have to decide what is more useful — modifying a passed value or returning a modified copy. At times, we also need to consider efficiency. Returning newly created large data types like `strings` can be expensive. A `string` is not a trivial structure since it contains a dynamic buffer for the `string` content that needs to be allocated. On the other hand, for the passed `var` `string` we inserted characters, which involves moving characters and is also not a really cheap operation, and maybe when we insert a lot, the `string` buffer must be even enlarged, which is again expensive. Thus, for this use case, it is unclear which approach is better — we primarily used the `var` parameter to introduce `var` parameters. Let’s investigate how a function that returns a modified `string` might look:

func framed(s: string; c: char = '*'): string =
  var res = newStringOfCap(s.len + 4)
  add(res, c)
  add(res, ' ')
  add(res, s)
  add(res, ' ')
  add(res, c)
  return res

# we can call that proc like
echo framed("Hello World")
echo framed("Hello World", '#')

The above code is one possible solution. We can use the keyword `func` instead of `proc` here, as we only return a result and modify no states. We pass the initial `string` and the character for the frame both as plain value parameters and return a newly created framed `string`. In the function body, we start with an optimized version of the procedure `newString()` from the `system` module, called `newStringOfCap()`. Like `newString()`, this `proc` creates an empty `string` variable, but it ensures that the data buffer of the new `string` has exactly the specified size. That is an optimization, which makes sense in our use case, as we know that our newly created `string` will have `4` characters more than the passed `string`. So we can avoid that the result `string` has to be enlarged while we add characters or the initial `string`, and we ensure at the same time that no space is wasted — the data buffer size of the new `string` will be a perfect fit for the desired result. The rest of the function body is straightforward: we just `add()` what is needed and return the result. As mentioned earlier, `add()` is not extremely fast. Therefore, if you need to frame millions of `strings` each day, you might consider avoiding `add()`, and you already know enough about Nim to do this. Just try it. You might start with a `string` of the right size containing undefined content created by `newString(s.len + 4)`, and then you could copy in the required data, character by character, in a loop. Or you may use the slice operator to insert the passed `string` into the new `string`.

Click here to see a possible solution.
func framed(s: string; c: char = '*'): string =
  var res = newString(s.len + 4)
  res[0] = c
  res[1] = ' '
  res[2 .. s.high + 2] = s # we may insert the string by using the slice operator or
  # for p in 0 .. s.high: # we can use a for loop and
  #   res[p + 2] = s[p] # the subscript operator
  res[^2] = ' '
  res[^1] = c
  return res

The situation, where we may need a procedure that works on a `var` parameter in one case and returns a modified copy in another case, is not that rare. For example, Nim’s standard library contains a procedure called `sort()`, which can sort container data types in place, and a procedure called `sorted()`, which returns a sorted copy. This code duplication is not really that nice. Of course, `sorted()` is the more universal solution, as we can always replace `sort(data)` with `data = sorted(data)`. However, the latter creates a temporary copy, which may not be optimal for performance. Since Nim version 1.2, a `dup()` `macro` is available from the `sugar` module that creates copies of variables and then applies one or more in-place `procs` to the copy. Thus, the `procs` `sorted()` or our `proc` `framed()` would be unnecessary. We can use `dup()` as in this example:

from std/sugar import dup

proc frame(s: var string; c: char = '*') =
  var cs = newString(2)
  cs[0] = c
  cs[1] = ' '
  insert(s, cs)
  add(s, ' ')
  add(s, c)

echo "Hello World".dup(frame)
echo "Hello World".dup(frame, frame)
echo "Hello World".dup(frame('#'))

Note that we apply `frame()` twice in the penultimate line. Similarly, we could apply a sequence of different `procs`. The output of the above program is

* Hello World *
* * Hello World * *
# Hello World #

Returning from a procedure and the implicit result variable

The execution of a procedure terminates once the last statement of the procedure body has been processed. We can also terminate a procedure earlier when we specify a `return` statement somewhere.

Functions and procedures which return a result can also terminate with the last expression of the procedure body, or earlier with a return expression like `return i * i`. Functions and procedures with a `result` automatically declare a mutable `result` variable for us, which is of the function’s return type, and we may use or just ignore it. So for our previous `sqr()` function, we have various ways to write it:

func sqr1(i: int): int =
  i * i

func sqr2(i: int): int =
  result = i * i

func sqr3(i: int): int =
  return i * i

For short and simple procedures, the first form is often used. For longer procedures, where the result is constructed in multiple steps, like some `string` operations, using the `result` variable makes sense. Finally, when multiple points exist where we may return, using `return` statements may make sense. One use case involves an early error check, where we might want to `return -1` as a form of error indication when writing a procedure that should calculate the square root of an integer value. (Well in Nim we have other and sometimes better ways to catch errors, we will learn about that later.)

Generally, we should avoid writing something like

func sqr(i: int): int =
  result = i
  i * i

as it is unclear in this case whether the expression `i * i` is returned or the `result` variable with the value `i`. In Nim v2.0, we will receive a warning or an error message in such cases.

For the performance of our code, it may have a tiny benefit to only use the result variable and fully avoid return statements, as in this case for a function call like `var i = sqr(j)` the result variable may be just an alias for the actual result `i` here, so that the compiler can optimize the code and avoid temporary copies. This is a well-known optimization, called NRVO (Named Return Value Optimization), in languages like C++.[56]

Programmers often prefer to perform early checks at the beginning of a procedure to verify all parameters have valid values and to terminate the procedure execution immediately in case of invalid data by using a `return` statement. This approach avoids deeply nested code in the `proc` body for these checks. In contrast, compiler designers, such as Mr. Rumpf, prefer to avoid these `return` statements and instead use nested `if` clauses, as this approach allows for better control flow analysis and compiler optimization.

Var return type

A procedure, `converter`, or `iterator` may return a `var` type that can be modified by the caller. The Nim language manual provides this basic example:

var g = 0
proc writeAccessToG(): var int =
  result = g
writeAccessToG() = 6
assert g == 6

In this way, we can call a `proc` and immediately assign a new value to the result. In the aforementioned example, this works because the `result` is an alias for the global variable `g`.

Var return types are actually used for `iterators` like `mitems()` or `mpairs()`, which allow modification of the yielded `results`. For details about and restrictions on the var return type, you should consult the Nim language manual:

References:

Proc name overloading

Note that we used the `proc` names `sqr1`, `sqr2`, and `sqr3` above. Using the same name with the same argument types multiple times would result in a redefinition error, as the compiler could not know what `proc` body should be executed when that `proc` name is called. Redefining existing procedures, with the same name and the identical parameter list, is not allowed in Nim.

However, Nim supports so-called procedure overloading; that is, we can use the same name when the parameter list is different, as the compiler can select which `proc` has to be called based on the parameters in the `proc` call:

func sqr(i: real): real =
  i * i

We have only changed the parameter and result data types. Now there is no conflict with the `proc`, having the same name, that we defined for integers. Note that Nim uses only the parameter list for overload resolution, but not the result type of a procedure or function. The reason for that is that Nim supports type inference, and this would not work if we had two `procs` with the same name, each accepting an `int` parameter, but one returning an `int` and one returning a `float` number.

Nim also supports named arguments in procedure calls; for instance, we could invoke the `proc` above with `sqr(i = 2.0)`. Named arguments can be useful when `procs` or functions have many arguments, potentially some with default values, and when we do not remember the order of parameters or want to specify only a few.

Actually, we can use multiple `procs` with the same name and identical parameter list when we use named arguments for the invocation, as in

proc p(i: int): int =
  i * i

proc p(j: int): int =
  j + j

#echo p(2) # fails to compile, ambiguous call
echo p(i = 3)
echo p(j = 3)

Objects and ref objects as procedure parameters

In the previous section, we learned that we have to use `var` parameters when the procedure should be able to mutate the variable permanently. This also applies when the parameters are `objects`. When a procedure should modify fields of an `object` parameter, then we have to pass that `object` as a `var` parameter. In the following example, `proc` `t1` gives a compiler error because it tries to modify a field of an `object` while the `object` instance is not passed as a `var` parameter. If we remove `proc` `t1`, then we can compile and run the example:

type O = object
  i: int

proc t1(o: O) =
  o.i = 7 # Error: 'o.i' cannot be assigned to

proc t2(o: var O) =
  o.i = 13

proc main =
  var x = O(i: 3)
  echo x.repr
  t2(x)
  echo x.repr

main()

The output is:

O[i = 3]
O[i = 13]

The `proc` `t2` gets a `var` parameter and can modify fields of the passed `object`. Here we used the expression `echo x.repr` to print the whole `object`. `Strings` and sequences are value `objects` in Nim, so you have to pass them as `var` parameters when you want to change their length or when you want to modify elements. This code would give you compile errors unless you add the `var` keyword to make the procedure parameters mutable:

proc t1(s: string) =
  s.setLen(7)
  s[0] = 'x'

proc t2(s: seq[int]) =
  s.setLen(7)
  s[0] = 13

This was not really surprising. But what happens when we use a reference to an `object` and pass it to a procedures as a value or as a `var` parameter? In the code below, `proc` `t1` gets a variable of type `ref object` and the procedure can modify fields of the passed instance. That can be indeed surprising. In this case, passing the `ref object` without the use of the `var` keyword means only that we can not mutate the `ref` value itself in the procedure, but we are allowed to modify the fields of the `object`. For `proc` `t2`, we pass a `var` parameter. As always, we can modify a `var` parameter in the procedure, so we can assign to it a newly created instance.

type O = ref object
  i: int

proc t1(o: O) =
  o.i = 7

proc t2(o: var O) =
  o = O(i : 11)

proc main =
  var x = O(i: 3)
  echo x.repr
  t1(x)
  echo x.repr
  t2(x)
  echo x.repr

main()

When we compile and run the above code, we get the following:[57]

ref 0x7f054a904050 --> [i = 3]

ref 0x7f054a904050 --> [i = 7]

ref 0x7f054a904070 --> [i = 11]

For a `ref object`, the `repr()` function gives us the address of the `object` instance in memory and the contents of its fields. The first two `echo()` statements show the same address, indicating that `proc` `t1` has modified only a field of our instance, the instance itself (its address in memory) was not changed. But `proc` `t2` has created a new instance and assigned that value to the variable `x` in the `main()` procedure. We notice this as the address of variable `x` has changed. The old instance variable with the address `0x7f054a904050` is now unused and will be freed by the Nim memory management.

Nim v2.0 will provide the strictFuncs pragma, which can be used to ensure that a procedure with a `ref object` parameter is not allowed to modify fields of that `ref object`. For details, see the Appendix of this book or the latest version of the Nim language manual.

Special argument types: openArray and varargs

The `openArray` and `varargs` data types can be used only in parameter lists.[58] The `openArray` is a type that allows passing `arrays` and sequences to the procedure or function. This makes sense, as both `arrays` and sequences store their content in a block of memory, which can be processed uniformly. Although `arrays` generally do not have to start with index number `0`, when passed as `openArray`, the first element is mapped to index `0`, and the index of the last element is available by using the `high()` function on the passed `array` parameter. Whenever we write a procedure that accepts an `array` or a sequence, we should consider using the `openArray` parameter type to allow passing in both data types. `Strings` can also be passed to procedures accepting `openArrays` with `char` base type. Note that a `proc` with an `openArray` parameter type cannot change the length of a passed `seq`, as sequences are handled like `arrays` for the `openArray` parameter type. Thus, in the following code, the procedure `t1` generates a compiler error while `t2` compiles and works fine.

proc t1(x: var openarray[int]) =
  x.setLen(7)

proc t2(x: var seq[int]) =
  x.setLen(7)

In fact, since Nim version 1.6, it is possible to use the `openArray` type as the result type of `procs` and even as local variables. However, these `view types` are still experimental, see https://nim-lang.org/docs/manual_experimental.html#view-types.

The `varargs` parameter type is similar to the `openArray` type, but it additionally permits the passing of an arbitrary number of single arguments. The compiler automatically collects the individual arguments into an `array`, allowing us to use it as an `array` within the procedure body, for example, by iterating over it.

proc print(s: varargs[string]) =
  for el in s:
    stdout.write(el)
    stdout.write(", ")
  stdout.write('\n')

print("Hello", "World") # compiler builds the array for us
print(["Hello", "World"]) # we generate the array ourselves

There exists a variant of the `varargs` argument type that performs a type conversion automatically by applying a `proc` on all arguments. For example, `varargs[string, \`$] would apply the stringify operation on the passed arguments automatically. That is what `echo()` does.

`Varargs` arguments may only be allowed as the last argument in a parameter list.

Finally, one might wonder if it makes sense to specify a parameter of type `var varargs`. If we try to pass a constant `string` this will obviously not work, and if the compiler generates an `array` for us, it does also not work, the automatically generated `array` seems to behave like a constant `array`. But can we pass an `array` variable? Let’s try:

proc print(s: var varargs[string]) =
  s[0] = "Goodbye"
  for el in s:
    stdout.write(el)
    stdout.write(", ")
  stdout.write('\n')

var msg = ["Hello", "World"]
print(msg)

Surprisingly, this does not compile, although it works when we replace `varargs` with `openArray`.

Procedures bound to a data type

In some other programming languages, such as Python or Ruby, we can define class methods or static methods that are bound to a class or type and can be invoked as `MyType.myProc`. In Nim, we can achieve something similar using the `typedesc` procedure parameter type:

type
  Factory = object
    name: string

proc start(t: typedesc[Factory]) =
  echo "Factory.start"

Factory.start

Here, we use the method call syntax instead of `start(Factory)`. We will learn more about the `typedesc` data type later.

Scoping, visibility, and locality

Scoping, visibility, and locality are important concepts in computer programming that help to keep the source code clean. Imagine if a variable that we declare at some point in our program were visible everywhere. This could generate significant confusion, even for medium-sized programs — whenever we needed a variable, we would have to carefully check which names were already in use. Furthermore, this would be detrimental to performance, as all variables declared would reside permanently in memory.

So, most programming languages, including Nim, support the concept of locality — identifiers declared inside a procedure body or inside another form of a block are only visible and usable there. We say that they are only visible in that scope. For Nim, we can say that whenever Nim’s syntax requires a new level of indentation, that is a new statement block, then all symbols declared in that block are only visible in that block and in sub-blocks of this block, but not outside that block. Nim has another important concept of visibility, which is called modules and allows the separation of our code into logically separated text files with well-defined visibility rules; we will discuss modules later.

Visibility is indeed a straightforward concept. Consider the following illustrative example:

var e: float = 2.7

proc p1 =
  var x: float = 3.1415
  if x > 1.0:
    var y = 2.0 * x
    echo y # OK
  echo x # OK
  echo y # compile error, y is not visible
  echo e # OK, e is declared globally, so it is visible everywhere

echo e # OK
echo x # ?
echo y # ?

In the first line, we declare what’s known as a global variable, which becomes visible throughout the entire program after its declaration.[59] The variables declared in the `proc` `p1` are referred to as local variables, and they are not visible outside of `proc` `p1`. The variable `x` is declared at the start of the procedure body and is visible in the whole procedure everywhere, while variable `y` is declared in the `if` block and is visible only there. So, is it clear whether the last two `echo()` statements for `x` and `y` compile correctly? Remember that symbols that we define inside a new scope may shadow symbols that were visible outside the actual block, e.g. by defining a variable named `e` of arbitrary type in the `proc` `p1` from above would shadow the global variable `e`, that is the global variable `e` would become invisible until execution of procedure `p1` terminates. We have already discussed shadowing in the introductory section titled scopes, visibility, locality, and shadowing.

Related to the visibility of variables is their lifetime, that is the duration of how long they exist and how long they can store a value. Global variables exist for the entire program runtime — when you have assigned a value to it that value can be used everywhere as long as the program runs, and as long as you do not assign a different value, of course. Global variables are generally stored in a special memory region, which is called the BSS region.

Variables of value type defined locally inside a procedure or function only exist for the duration of that `proc’s execution. In other words, they are created when the procedure is invoked and vanish when the procedure terminates, which is when execution continues with the statement following the `proc` call.

Local variables declared in a procedure reside in a special memory region of the RAM, which is called the stack. The stack is nothing more than an arbitrary part of the whole RAM that is used in some clever fashion: The memory words in it are used in consecutive order. A so-called stack `pointer` is used to indicate the address of the first free area in that stack. So when a procedure is called, which may have `n` bytes of local variables, then the compiler can use the area where the stack `pointer` points to for that variables, and when the procedure is called then the stack `pointer` is increased by that size. So the stack `pointer` points again to the next free area of the stack, and another `proc` can be called in the same way from within the current procedure. Whenever a procedure terminates, the stack `pointer` is set back to the value that it had when the `proc` started execution. This method of memory management is simple and fast, but it does only work when the total amount of memory that the local variables in a procedure needs is known at compile-time so that the compiler can adjust the stack `pointer` accordingly. It does not work for dynamically sized data types like `strings` or sequences.

Note that `pointers` and references are value types themselves. We can regard `pointers` and references as plain integer variables interpreted in a special way — as memory locations. However, the memory blocks to which the `pointers` and references may point, and which are allocated by `alloc()` or `new()`, are different: These memory blocks are not allocated on the stack, but in the ordinary RAM, which we refer to as the heap to distinguish it from the stack.

So, why can’t the stack be used for memory blocks that `alloc()` or `new()` provide for us? An important factor for using the stack to store variables is that the total size needed by a procedure for all the static variables must be a compile-time constant. The stack `pointer` is adjusted by that amount when the `proc` starts, and all the local variables are accessed with a fixed offset to that stack `pointer` then. When we use `alloc()` or `new()` in a `proc`, then we may call that multiple times as we did in our previous list example, and for `alloc()` an additional fact is that the byte size that `alloc()` should reserve can be a runtime value. So the total amount of RAM that `alloc()` or `new()` would allocate is a runtime value, and we can not use the stack for it. Instead, `alloc()` and `new()` allocate a block of memory in a more dynamic fashion, which is basically that they ask the OS for a free block of the right size somewhere in the available RAM. That block is later given back to the OS for reuse by functions like `dealloc()` or automatically by the GC.

Let’s explore some special cases at the end of this section:

While in languages like C, we always have a well-defined `main()` function, and all program code is contained in this function or in other functions that are called from this main function, in Nim, we also have global code, as seen in scripting languages like Ruby or Python:

var i: int
while i < 100:
  var j: int
  j = i * i
  echo j
  inc(i)

It should be clear that the global variable `i` resides in the BSS segment. But what about the variable `j` declared in the body of the `while` loop? It is clear that this variable is only visible inside the body of the `while` statement. But does `j` reside on the stack? Since there seems to be no procedure involved, could there possibly be no stack? Could the variable `j` reside in the BSS segment too? This is not really clear and might vary among different Nim compilers. But why should we care about this detail at all? Well, it can be important for performance. Local `proc` variables allocated on the stack are generally optimal for performance, and they are usually well-optimized by the compiler. We will learn more about the reasons for that later when we discuss the data cache. For now, we should only remember that it is a good idea to avoid global code and put all code in `procs`. We may then have an arbitrarily named `main()` procedure and call it only from the global scope. At least for the current Nim v2.0, this seems to be a good idea. Potentially, later versions or other implementations will automatically move all global code into a hidden `proc` for us.

For optimal performance, you should put all your code in procedures or functions, avoid global code, and, when possible, avoid global variables.

Let’s discuss the above while loop again, but this time within the body of a `proc`:

proc p =
  var i: int
  while i < 100:
    let j: int = i * i
    echo j
    inc(i)

When we carefully investigate that procedure with the `while` loop, we may wonder about two points. First, we said earlier that we can and should use the let keyword instead of `var` when there is only one assignment to a variable, so the variable can be regarded as immutable. But if the loop is executed 100 times, how can we say there is only a single assignment to the variable `j`? The trick is that `j` is local to the `while` loop, and `j` is virtually newly created and initialized to `0` for each iteration. Therefore, using `let` is OK and the compiler does not complain.

We can test this fact with this simple program:

proc main =
  var i: int
  while i < 10:
    var a: int
    a = a + 1
    echo a
    inc(i)
main()

The output is `1` for each loop iteration because variable `a` is virtually recreated for each loop iteration.

We used "virtually recreated" because we cannot be sure how the compiler may handle it internally. Is storage for variable `a` already allocated when the procedure is invoked, in the same way that storage for the loop counter variable `i` is allocated on the stack when the `proc` is called? Or is storage for variable `a` reserved for each loop iteration by increasing the stack `pointer` at the start of the loop and resetting it at the end of the loop? We can’t be sure without reading the compiler source code, but ultimately, it doesn’t really matter, so we shouldn’t concern ourselves with it.

Generics

In the previous section, we defined a `sqr()` `proc` for `ints` and one for `float` numbers. Both procedures look nearly identical, only the data types differ. In that case, we can use so-called generic procedures.

func sqr[T](v: T): T =
  var p: T
  p = v * v
  return p

echo sqr(2)
echo sqr(3.1415)

We put a square bracket after the function name, which includes a symbolic name. That name is then used instead of concrete types in the procedure header or in the procedure body.

We can now call this `proc` with parameters of different types, including `int` and `float` types. You may wonder why that works — Nim is a statically typed language, so how can the parameter of function `sqr()` as well accept an integer and a floating-point number? Is there a hidden type-conversion involved? No, the trick is that whenever we call that generic `proc` with a different type, then a new procedure or function is instantiated. When we call the generic `sqr()` `proc` with an `int` and a `float` parameter, the compiler creates machine code for two separate functions during compile time: one that is called when an `int` is passed as a parameter, and another that is called when a `float` is passed. If we call this procedure again with an `int` or `float` parameter, one of the two existing `procs` would be used. However, for a different, otherwise unused data type like `float32`, a new `proc` would be instantiated again. In this way, generics procedures can lead to some code bloat. Note that calling the generic function with a data type like a character or a `string` would fail, as these types do not support multiplication with themselves.

A slightly different notation is available with so-called `or` types:

func sqr(v: int or float): auto =
  var p: typeof(v)
  p = v * v
  return p

echo sqr(2)
echo sqr(3.1415)

Here, we have limited the parameter types to the `int` or `float` type. We could have also defined a custom type first, like `type MyNum = int or float`, and used that type for the parameter type of our `sqr()` `proc`. These `or types` are also called `type classes`. Instead of the keyword `or`, the `|` character can be used for defining type classes. Again, the compiler would instantiate two separate functions for both data types. As we had not the symbolic type T available here, we have used the keyword `auto` as the return type, and for the type of variable `p` we used the `macro` `typeof()`. The type `auto` for the return type works as long as the function returns a well-defined type. Note that we cannot decide at runtime what type the function should return, so a construct like `if cond: return 2 else: return 3.1415` would not work, at least not when the values are variables of different types. For the literal value, it may work, as the compiler might be smart enough to guess that we want to return the `float` literal `2.0`.

A bit of care is needed when we define procedures for mutable `or types`:

# proc t(s: var seq[uint8] | var seq[char]) =
proc t(s: var (seq[uint8] | seq[char])) =

Here we try to define a `proc` called `t` which should accept a mutable `seq[uint8]` or a mutable `seq[char]` as a parameter. While the first line compiles fine, the `seq[char]` would be immutable. The correct notation is shown in the second line. This behavior was labeled "won’t fix" in the GitHub issue tracker, so we have to remember this case, see https://github.com/nim-lang/Nim/issues/15063#issue-665553657.

Let’s assume you want to define a `proc` that accepts two numbers of `int` or `float` type and returns a `float`. You may write it in one of these ways:

proc sqrsum(x, y: int | float): float =
  (x * x).float + (y * y).float

proc sqrsum2[T](x, y: T): float =
  (x * x).float + (y * y).float

proc sqrsum3[T1, T2](x: T1; y: T2): float =
  (x * x).float + (y * y).float

var i: int = 2
var x: float = 3.0

echo sqrsum(i, x)
#echo sqrsum2(i, x)
echo sqrsum2(x, 2)
#echo sqrsum2(2, x)
echo sqrsum3(i, x)

The commented-out lines would give you a compiler error. The reason for this is that the `proc` `sqrsum2[T]` defines a generic `proc`, but the compiler enforces that both parameters have the same type.

The expression `sqrsum2(x, 2)` compiles fine, as, due to the first parameter `x`, the compiler instantiates a `proc` for a sqrsum2(2, x)` does not compile, as due to the first parameter, which is an integer literal, a `proc` for integer parameters is instantiated, and the second `x` parameter of `float` type is not compatible with the instantiated `proc`.

Generics can become a bit complicated, as we may use multiple different generic types for different procedure parameters. We can also use generics for `object` types. For example, we may create lists as we did for our names list that not only works for `strings`, but can also work with other data types like numbers or sequences in a very similar way. We may explain that in more detail later.

Example for the use of generics

Generics are used extensively in Nim’s standard library. Most container types, like sequences or `tables`, accept generic types, and generic procedures like `sort()` are provided that can easily sort arbitrary data types and `objects`. We only have to provide a `cmp()` `proc` for our user-defined data types, which `sort()` can call to compare the values during the sorting process.

We will demonstrate the use of generics in library modules with a few small examples: Assume we create a library that should be able to store and process arbitrary data types. The stored values may have well-defined relations, which enables ordering or much more complicated spatial relations. Triangulation of spatial data points or grouping of the data in structures like `RTrees` for fast point location, as well as geometric processing with algorithms like finding the convex hull, are some examples. To make our example simple and compact, we define a generic container type that can store only two values of an arbitrary data type. The container allows for the sorting of the elements by size. The following code example defines a generic container called `MyGenericContainer`, a `proc` to `add()` data `objects` into the container instance and a `sortBySize()` `proc` to sort the two elements:

type
  MyGenericContainer[T] = object
    storage: array[2, T]

proc add[T](c: var MyGenericContainer[T]; x, y: T) =
  c.storage[0] = x
  c.storage[1] = y

# sort by direct field access
proc sortBySize[T](c: var MyGenericContainer[T]) =
  if c.storage[0].size > c.storage[1].size:
    swap(c.storage[0], c.storage[1])

# a simple stringify proc for our container data type
proc `$`[T](c: MyGenericContainer[T]): string =
  `$`(c.storage[0]) & ", " & `$`(c.storage[1])

type
  TestObj1 = object
    name: string
    size: int

proc main =
  var c: MyGenericContainer[TestObj1]
  var a = TestObj1(name: "Alice", size: 162)
  var b = TestObj1(name: "Bob", size: 184)

  add(c, b, a)
  echo c
  c.sortBySize
  echo c

main()

The `sortBySize()` `proc` in the above examples accesses the size field of our data `objects` directly. Therefore, we can use the container for arbitrary data types, provided that the data types have a size field and a `>` `proc` is defined for the data type of the size field. In the above example, we have defined a `$` procedure to convert instances of our container into a `string`, enabling us to call the `echo()` function on it. The output of our program looks like

(name: "Bob", size: 184), (name: "Alice", size: 162)
(name: "Alice", size: 162), (name: "Bob", size: 184)

We can avoid the restriction of a matching field name when we provide getter and setter procedures which the library `procs` can use to access the important fields:

type
  MyGenericContainer[T] = object
    storage: array[2, T]

proc add[T](c: var MyGenericContainer[T]; x, y: T) =
  c.storage[0] = x
  c.storage[1] = y

proc sortBySize[T](c: var MyGenericContainer[T]) =
  if c.storage[0].size > c.storage[1].size:
    swap(c.storage[0], c.storage[1])

proc `$`[T](c: MyGenericContainer[T]): string =
  `$`(c.storage[0]) & ", " & `$`(c.storage[1])

type
  TestObj1 = object # arbitrary field names
    name: string
    length: int

# this getter proc enables sorting
proc size(t: TestObj1): int =
  t.length

proc main =
  var c: MyGenericContainer[TestObj1]
  var a = TestObj1(name: "Alice", length: 162)
  var b = TestObj1(name: "Bob", length: 184)

  add(c, b, a)
  echo c
  c.sortBySize
  echo c

main()

In the above example, our `TestObj1` data type has no field with a name that matches the `sortBySize()` `proc`. However, we define a `size()` `proc` for our data type that the library function can use. This solution is more flexible, and when we add the inline pragma to the used `size()` `proc` or when we compile with link-time optimization (LTO) enabled, then the overhead should be negligible.

Generics are typically used in library modules, which provide some functionality to client modules. For example, a library module can provide a generic `sort()` function, which then can be used by different client modules to sort containers with arbitrary element types. We will discuss modules later in more detail. For now, it is enough to understand that each Nim module is a separate file, and we can use the `import` keyword to incorporate functionality from a (library) module into our main module. One restriction is that we can actually only import symbols marked with the `*` export marker in the imported module.

When we divide the above example into two modules, we might end up with something like:

#module t3.nim
type
  MyGenericContainer*[T] = object
    storage: array[2, T]

proc add*[T](c: var MyGenericContainer[T]; x, y: T) =
  c.storage[0] = x
  c.storage[1] = y

proc sortBySize*[T](c: var MyGenericContainer[T]) =
  if c.storage[0].size > c.storage[1].size:
    swap(c.storage[0], c.storage[1])

proc `$`*[T](c: MyGenericContainer[T]): string =
  `$`(c.storage[0]) & ", " & `$`(c.storage[1])
import t3

type
  TestObj1 = object # arbitrary field names
    name: string
    length: int

proc size(t: TestObj1): int =
  t.length

proc main =
  var c: MyGenericContainer[TestObj1]
  var a = TestObj1(name: "Alice", length: 162)
  var b = TestObj1(name: "Bob", length: 184)

  add(c, b, a)
  echo c
  c.sortBySize
  echo c

main()

Note that all procedures in module `t3` and the generic container data type are marked with the `*` export marker. This ensures that we can use these symbols in the main module that imports them. The example with direct field access would look for different modules like this:

# module t4.nim
type
  MyGenericContainer*[T] = object
    storage: array[2, T]

proc add*[T](c: var MyGenericContainer[T]; x, y: T) =
  c.storage[0] = x
  c.storage[1] = y

proc sortBySize*[T](c: var MyGenericContainer[T]) =
  if c.storage[0].size > c.storage[1].size:
    swap(c.storage[0], c.storage[1])

proc `$`*[T](c: MyGenericContainer[T]): string =
  `$`(c.storage[0]) & ", " & `$`(c.storage[1])
import t4

type
  TestObj1 = object
    name: string
    size: int

proc main =
  var c: MyGenericContainer[TestObj1]
  var a = TestObj1(name: "Alice", size: 162)
  var b = TestObj1(name: "Bob", size: 184)

  add(c, b, a)
  echo c
  c.sortBySize
  echo c

main()

You may wonder why we do not have to export the `size` field of our `TestObj1` (or maybe the `object` itself also) as it is used from code defined in a different module. We don’t need export markers because `sortBySize()`, while defined in the library module, is a generic procedure and is instantiated and executed in the application module. For the same reason, we had not to export the `size()` getter procedure before.

Lastly, another way to use generic library modules involves passing procedure variables to the library functions. The passed-in procedures may provide access to properties or attributes of the stored `objects`, or they may offer relations between the `objects`. The latter is often used for sorting purposes:

# module tx.nim
type
  MyGenericContainer*[T] = object
    storage: array[2, T]

proc add*[T](c: var MyGenericContainer[T]; x, y: T) =
  c.storage[0] = x
  c.storage[1] = y

proc sortBy*[T](c: var MyGenericContainer[T]; smaller: proc(a, b: T): bool) =
  if smaller(c.storage[1], c.storage[0]):
    swap(c.storage[0], c.storage[1])

proc `$`*[T](c: MyGenericContainer[T]): string =
  `$`(c.storage[0]) & ", " & `$`(c.storage[1])
import tx

type
  TestObj1 = object
    name: string
    size: int

proc smaller(a, b: TestObj1): bool =
  a.size < b.size

proc main =
  var c: MyGenericContainer[TestObj1]
  var a = TestObj1(name: "Alice", size: 162)
  var b = TestObj1(name: "Bob", size: 184)

  add(c, b, a)
  echo c
  c.sortBy(smaller)
  echo c

main()

Here, we have modified the `sort()` `proc` of our library module in a way that allows it to take an additional procedure parameter. In this case, we use a procedure signature that takes two `object` instances and returns a boolean value indicating if the first parameter is smaller than the second. In our application module, we define a matching procedure and pass that one to the `sortBy()` procedure. Again we get the desired sorted output:

(name: "Bob", size: 184), (name: "Alice", size: 162)
(name: "Alice", size: 162), (name: "Bob", size: 184)

This final method is commonly used in Nim’s standard library, for instance, for sorting sequences with custom `objects`. Unfortunately, this approach can introduce some performance regression because the procedure variable must be passed to the called `proc`. Consequently, inlining of that passed `proc` is not possible for the compiler.[60]

Method call syntax

A useful coding style introduced by Object-Oriented Programming (OOP) languages is the method call syntax. It was initially used in OOP for `objects` and later applied by languages like Ruby to all data types. In a way, Ruby regards all data as `objects`. Because the method-call syntax is so useful, we’ve already mentioned it a few times. But as that syntax belongs to the "procedures and functions" section, we will repeat the basic facts here:

Method call syntax means that, for example, for a variable `s` of data type `string`, we write `s.add(c)` instead of `add(s, c)`. Or for an integer variable `i`, we may write `i.abs` instead of `abs(i)`. Specifically, we place the first parameter of the `proc` parameter list before the procedure name, separating them with a period. The Nim compiler regards both notations as equivalent. The advantage of the method call syntax is two-fold: we can save a character, and it becomes clearer which "object" we’re working with, as it is placed before the expression.

Most OOP languages only allow this notation for a class. For instance, the `string` class might declare all possible operations that can be performed with `strings`, using the method-call syntax for these operations. One problem is that it can be difficult to add more operations that can be used in that style, as often all those operations are defined in the class scope; Ruby circumvented this limitation by permitting the so-called reopening of classes, enabling users to add more operations later on.

Like the D language, Nim generally allows this notation, but in D, it’s referred to as the Uniform Function Call Syntax (UFCS).

Procedure variables

Procedures and functions are not always fully static entities. We can assign procedures and functions to variables, pass them as parameters to other procedures or functions, and even generate and return new functions. Let’s investigate how procedure variables work:

var
  p: proc(i: int): int

proc p1(i: int): int =
  i + i

proc p2(i: int): int =
  i * i

p = p1
echo p(7)
p = p2
echo p(7)

The output of the two `echo` statements should be `14` and `49` — in both cases, we called the same `proc` variable with the same parameter, but the `proc` variable `p` was an alias for `p1` in the first call and an alias for p2 in the second call. Note that when we assign a `proc` to a `proc` variable, we only write the name of the `proc`; there is no `()` involved. This is because we assign that `proc` to the `proc` variable, but we do not call the procedure in this case. Of course, when we assign a `proc` to a procedure variable, the `proc` signatures must match; this means the parameter list and the `result` must be compatible.

Now we use a function as a `proc` argument.

type
  EchoProc = proc (x: float)

proc t(ep: EchoProc; x: float) =
  echo "The value is"
  ep(x)

proc ep1(x: float) =
  echo "==> ", x

proc ep2(x: float) =
  echo x

t(ep1, 3.1415)
t(ep2, 3.1415)

A common use case for using a function as a procedure parameter is sorting. We can use the same sort procedure for different data types when we provide a `cmp()` `proc` that can compare that data type.

from std/algorithm import sort

proc cmp(a, b: int): int =
  if a < b:
    -1
  elif a == b:
    0
  else:
    1

proc main =
  var a = [2, 3, 1]
  a.sort(cmp)
  for i in a:
    echo i

main()

The `sort()` procedure is provided by the `algorithm` module. The `sort()` `proc` accepts an `array` or a sequence, and a `cmp()` `proc` that gets two parameters of the same type as the elements in the passed `array`, and that returns -1, 0, or 1 as the result of the comparison. We could easily sort other data types like `strings` or our custom `objects` by an arbitrary key, as long as we can provide a matching `cmp()` procedure. For the `cmp()` `proc` it is important that it returns a well-defined result based on the input, and when both parameters are equal, it should really return `0`. If you were to swap the return values `1` and `-1` in the `cmp()` procedure above, you would invert the sort order.

Nested procedures and closures

While in C, all functions must be defined in the top-level scope and nesting of functions is not permitted, Nim allows procedures to contain other procedures. A special case occurs when the sub-procedures access variables of the outer scope. In this case, the sub-procedure is called a closure:

proc digitScanner(s: string) =

  var pos = 0
  proc nextDigit: char =
    while pos < s.len and s[pos] notin {'0' .. '9'}:
      inc(pos)
    if pos == s.len:
      return '\x0'
    result = s[pos]
    inc(pos)

  var c: char
  while true:
    c = nextDigit()
    if c == '\x0':
      break
    stdout.write(c)
  stdout.write('\n')

digitScanner("ad5f2eo73q9st")

When you run this program, the output should be

52739

This program is not that easy, but when you think about it a bit, you should be able to understand it. The task is to extract from a `string` all the digits and ignore the other characters.

To get the digits, we use a local `proc` that uses the `pos` variable of the enclosing `proc` and also accesses the parameter `s` of the enclosing procedure. The closure `nextDigit()` checks if the position in the `string` is still valid, that is, if it is still smaller than the length of the `string`, and also checks whether the current character is a digit. The first check uses the standard `proc` `len()`, which returns the length of a passed `string` parameter, that is, how many characters the `string` contains. We have used the method call syntax here instead of using the ordinary procedure call `len(s)`. The next check tests if the current character is not a decimal digit. For that test we could use a series of compares like if c == '0' or c == '1' or …​ or c == '9'. But to make such tests easier and faster, Nim offers one more data type, the `set` type. And the `notin` operator tests whether a value is not contained in a `set` constant. An important point for the expression after the `while` statement is, that it is processed from left to right. This fact is critical here because we have to first check if `pos` is still a valid position before we can use the subscript operator `[]` to access the current character and test if it is not contained in the `set`. If the check for the valid position would not come first, then we may access an invalid position in the `string`, and we would get a runtime range error.

While the position is still valid, but the current character is not a digit, we increase the position. The `while` loop can end by two conditions: Either the current character is a digit, or we have reached the end of the `string`, and we have to stop. For the last case, we use a special stop mark; we return a special character which we have entered in escape notation as '\x0'. That is a very special character, that is used in C to mark the end of `strings`. It is the first character in the ASCII table and has the decimal value `0`. We said earlier, that characters are encoded in 8 bits and correspond to the unsigned integer numbers `0` up to `255`. '\x0' is just a special notation for the first character, which corresponds to the integer value `0`. When the end of the `string` is reached, we return that character. Otherwise, we return the current character. Remember, from the while condition, we know that the `string` end is reached or the current character is a digit. As we tested for the `string` end before, we can only have the case that the current character is a digit now. But can we immediately return that character now? If we did, `s[pos]` would be a digit, and we would get exactly the same character for the next `proc` call! Therefore, we have to move to the next character by incrementing `pos` before we return that character. For this, the pre-declared `result` variable is useful. We assign the current character to the `result` variable and then increase `pos`. As the last statement in our procedure is not an expression but a plain `inc()` statement, the content of the `result` variable is returned. The other `while` loop in the outer procedure is very simple, we just call the closure in the body of the `while` loop and terminate the loop when we get the special `Null` character.

And finally, an example where one `proc` returns another procedure:

proc addN(n: int): auto = (proc(x: int): int = x + n)

let add2 = addN(2)
echo add2(7)

The output of `echo()` would be `9` in this case. This construct is sometimes named currying.

Anonymous procedures

In the section Module sequtils in Part III of the book, we will introduce a few functions which are often used in the functional programming style, like `map()` or `filter()`. These functions take procedures as arguments, which determine how container data types are converted. We can pass a regular named procedure as a second argument to functions like `map()` and `filter`, or in simple cases, we can just pass an anonymous `proc` or use the `` operator provided by the `sugar` module:

import std/[sequtils, sugar]

proc primeFilter(x: int): bool =
  x in {3, 5, 7, 13}

var s = (0 .. 9).toSeq # @[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

echo s.filter(primeFilter) # @[3, 5, 7]
echo s.filter(proc(x: int): bool = (x and 1) == 0) # @[0, 2, 4, 6, 8]

echo s.map(proc(x: int): int = x * x) # always @[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
echo s.map(x => x * x) # from sugar module

Here, we use the `toSeq()` `template` to create our initial sequence with numbers from `0` up to `9`, so we don’t have to type all the numbers in; we will explain `templates` soon. Then we apply the `filter()` `proc` to that sequence. The `filter()` `proc` expects a function, which takes an argument of the seq’s base type and returns a boolean value, as a second argument. We can pass the named function `primeFilter()`, or we can just pass an anonymous `proc` explicitly.

In the last two lines of our example, we use the `map()` function to convert the data of our sequence. The `map()` function expects a `proc`, which takes a parameter of the seq’s base type and returns a `result` of the same type, as a second argument. In the penultimate line, we specify an anonymous `proc` as a parameter, while in the last line, we use the `` operator from the `sugar` module to specify the actual conversion.

Compile-time proc execution

When a function is called with only constant arguments, the compiler can already execute it at compile time:

func genSep(l: int): string =
  debugecho "Generating separator string"
  for i in 1 .. l:
    result.add('=')

const Sep = genSep(80) # function is executed at compile-time

echo Sep

Here, we use a function called `genSep()` to create a `string` constant at compile time. When we compile the above program, we get the message "Generating separator string". As that `proc` is not executed at program runtime, it is not included in the final executable program. Here we had to use the `debugEcho()` `proc` instead of the ordinary `echo()`, because `echo()` is not really a pure function, and the compiler would complain when we use `echo()` in a pure function. The function `debugEcho()` is not really pure either, but the compiler ignores that fact, which is acceptable for debugging purposes. We could even make `genSep()` a plain `proc` and then use `echo()`, the compiler would not complain. But it would complain, if, for instance, we would access global variables from inside the `genSep()` procedure.

Inlining procedures

Calling procedures and functions always introduces some overhead — `proc` parameters may need to be put on the stack or loaded into CPU registers, some CPU or FPU registers may need to be saved, the stack `pointer` and the program counter have to be updated, and finally, the instruction cache has to be filled with new instructions.

Thus, for small procedures, the actual call to the `proc` may take more time than processing the code within the `proc`. To avoid this additional effort, procedures and functions can be inlined. The compiler may do this automatically for us, but we can support it by applying the {.inline.} pragma to tiny `procs`.[61] For inlined `procs`, the code is just inserted directly at the call site. This may increase the total executable size when the `proc` is used often. Therefore, we should use the inline pragma judiciously. Another option is to compile the entire program with link-time optimization by passing the `-d:lto` option to the compiler. This way, the C backend can automatically inline all `proc` code, even `procs` from imported modules One more option is to use `templates` instead of tiny `procs` — `templates` always do a plain code substitution, so `templates` can behave very similar to inline `procs`. We will discuss `templates` later. The following example shows how we can apply the inline pragma to procedures and functions:

proc max(a, b: int): int {.inline.} =
  if a < b: b else: a

Note that functions from shared libraries cannot be inlined, so calling external C functions, either directly or indirectly, can be slower than expected.

Recursion

Procedures and functions can call themselves in a repetitive manner, which is called recursion. Clearly, there must be some condition that eventually stops the recursion. Otherwise, the procedure would continually call itself, storing data on the stack for each call, including at least the `proc` return address. Ultimately, this could lead to a stack overflow and the program would crash. In general, recursion should be used only when it significantly simplifies the algorithm. In Part V of the book, in the section about the various sorting algorithms, we will discover some useful applications for recursion. In most cases, an iterative algorithm is faster than a recursive one, because all the overhead with many `proc` calls is avoided for iterative solutions. But sometimes recursive algorithms are easier to understand, or programming an iterative solution can be really complicated.

As one of the most simple algorithms, we will present here the recursive `fac()` function:

proc fac(i: int): int =
  if i < 2:
    1
  else:
    i * fac(i - 1)

This function should terminate, as it only calls itself again with a decreased argument. Naturally, using recursion in this case isn’t the most efficient approach. It should be relatively straightforward for you to convert the procedure into an iterative solution without recursion. It’s important to note that recursive procedures cannot be inlined!

Converters

Nim’s `converters` are a special variant of functions that are called automatically by the compiler when argument types do not match.

converter myIntToBool(i: int): bool =
  if i == 0:
    false
  else:
    true

proc processBool(b: bool) =
  if b:
    echo "true"
  else:
    echo "false"

var i = 7
processBool(i)
if i:
  echo "true"
else:
  echo "false"

With the above `converter`, we can pass an integer to a `proc` that expects a boolean parameter, and we can even use an integer as a logical expression in an `if` condition in the same way as it is done in the C language. `Converters` only work in a direct way, meaning automatic chaining is not supported: If we have one `converter` from character to integer and one from `int` to boolean, that does not mean that we can pass a character to a `proc` that expects a boolean. We would have to declare one more `converter` that directly converts a character to a boolean.

Whenever we consider using `converters`, we should think twice — `converters` can be confusing, may have unexpected effects, and could increase compile times.

You might have wondered why we wrote the above `converter` in such a verbose way. Well it was done intentionally, but you are right of course, we can write it just as

converter myIntToBool(i: int): bool =
  i != 0

Object-oriented programming and inheritance

Object-Oriented Programming and Inheritance became very popular in the early 1990s. Although OOP principles had already been introduced by languages such as Simula, Smalltalk, and many others, Java greatly popularized the OOP paradigm, which is also supported by most other popular languages, such as C++, Ruby, and Python.

The idea of OOP is that `objects` and procedures working on these `objects` are grouped into classes, and that classes can be extended with additional data fields and with additional procedures. In OOP, procedures and functions are often called methods, and data fields are called members. Sometimes the members are completely hidden and are accessed only by so-called getter and setter methods. That is called encapsulation. Encapsulation allows hiding implementation details, so that those details may change when necessary without being noticeable to users of the class, enabling them to use the class without discerning the change. Getters and setters also help to hide internal details and ensure that class instances are always in a consistent and valid state.

An important property of OOP is dynamic dispatch: When we create various subclasses of a common parent class and define methods for all these subclasses, we can have collections of instances from different subclasses. The compiler can then automatically ensure that the appropriate method for each instance is always called.

A classical example is a drawing program, where we have different geometrical shapes like rectangles, circles, and many more. All the geometrical `objects` are stored in some form of a list, when we want to draw all of them on the screen, we simply call a generic `draw()` method, and the compiler ensures that the matching `draw()` method is called for each shape. In Nim, that might look like

type
  Shape = ref object of RootRef

  Rectangle = ref object of Shape
    x, y, width, height: float

  Circle = ref object of Shape
    x, y, radius: float

  LineSegment = ref object of Shape
    x1, y1, x2, y2: float

method draw(s: Shape) {.base.} =
  # override this base method
  quit "to override!"

method draw(r: Rectangle) =
  echo "drawing a rectangle"

method draw(r: Circle) =
  echo "drawing a circle"

method draw(r: LineSegment) =
  echo "drawing a line segment"

proc main =
  var l: seq[Shape]
  l.add(Rectangle(x: 0, y: 0, width: 100, height: 50))
  l.add(Circle(x: 60, y: 20, radius: 50))
  l.add(LineSegment(x1: 20, y1: 20, x2: 50, y2: 50))

  for el in l:
    draw(el)

main()

The output of that program is:

drawing a rectangle
drawing a circle
drawing a line segment

Thus, we can have a sequence of the base type, add various subtype instances, and then iterate over the list to draw all these various subtypes. Of course, in the same way, we could do many more tasks like moving, rotating, or storing all the `objects` in one call. The compiler does the right dynamic dispatching for us; we just have to provide all the necessary methods. The need for the base method seems to be a bit strange, some other OOP languages do not need that. The base method is marked by a {.base.} pragma; we will discuss the purpose of pragmas later. In the example, we have used only one level of sub-classing, but of course, we can use many levels. For example, we can again subclass the `Circle` by creating a `FilledCircle` subclass with a `color` field.

The OOP coding style can be very convenient for some tasks. One important use case could be graphical user interfaces, where the graphical elements like labels, buttons, and frames build naturally a hierarchical structure. Another typical use case is a drawing application, with code similar to our basic example.

Note that the OOP style only works with `ref` `objects`, but not with value `objects`. The obvious reason is that we can have collections of different subtypes stored in `arrays` or sequences only for `ref` `objects`, as in `arrays` and sequences all element types have to have equal size. For references, that is the case, as references are basically `pointers`. But different value types would have different sizes. Linked lists would be not a better solution, as again we can not build lists with value `objects`.

For maximum performance, OOP code with `ref` `objects` is generally not optimal, as the dispatching itself needs some time, and the `ref` `objects` are not contained in a single block of memory. Instead, they are distributed throughout the RAM, which is not cache-friendly.

Inheritance for value-objects

type
  Person = object of RootObj # or Person {.inheritable.} = object
    name: string
  Student = object of Person
    id: int

var s1: Student
s1.name = "Alice"
s1.id = 123
var s2 = Student(name: "Bob", id: 124)

Inheritance can also be used for value objects to express some form of parent-child relation. To enable inheritance, we have to start with the `RootObj` data type, or we could use the `{.inheritable.}` pragma to mark the base type as inheritable. Inheritance is typically not used that much with value `objects`, but it might be useful when a set of `objects` have some common fields.

Copying value-objects with subtypes

Assignments between parent and child value types are not often needed, but it is good to know how these assignments behave. With the two data types, `Person` and `Student`, mentioned above, these assignments are possible:

var s1: Student
s1.name = "Alice"
s1.id = 123
var s2 = Student(name: "Bob", id: 124)

var s: Person
s = s1 # copy only the name

#s2 = s # not allowed
s2 = Student(s) # s2.id will get default value zero
s2.id = 3
Person(s2) = s # id field will keep it value!

Remember that assignments for value types copy the content; the source object does not change. A direct assignment like `s = s1` from a subtype to a parent type copies the common fields only. On the other hand, a direct assignment of a parent type to a subtype is not allowed as the new content of the additional fields of the subtype would be undefined in that case. But we can use type conversions, to enable these types of assignments: We can convert the source to the subtype before the content is copied — in this case, the common fields are copied, and the other fields get the default binary zero values. Alternatively, we can convert the destination to the parent type before the copy operation is executed. Then the common fields get copied, and the other fields of the subtype are kept.

Actually, there may still be some issues with these types of partially copied value objects. For instance, with Compiler version 1.9.3 (RC for 2.0), we got random content for the field `id` after the statement `s2 = Student(s)`, instead of the expected binary zero. Furthermore, when compiling with --mm:refc, the statements s = s1 and s2 = Student(s) gave runtime errors. We will fix this example when the final version 2.0 of the compiler is available.

Content copy of ref objects

As we already learned, assignments for reference and pointer types give us only an alias to access the data, but the content is not copied. But in some cases, we may actually need to copy the content. Assume that you have a CAD tool which shows various objects on the screen — lines, rectangles, circles, and many more. The user should be able to copy a shape to the clipboard and paste it again later. In principle, this is a difficult operation, as we would first have to determine the concrete runtime type of the selected entity, then allocate the destination instance, and finally copy the actual runtime content. Nim provides the `deepCopy()` procedure for this purpose, which simplifies this use case. When we use inheritance, the deepCopy() `proc` determines the concrete runtime type of the `object`, allocates the destination memory, and copies the content. Let us try that with the geometric `ref` types from the earlier example:

type
  Shape = ref object of RootRef

  Rectangle = ref object of Shape
    x, y, width, height: float

  Circle = ref object of Shape
    x, y, radius: float

  LineSegment = ref object of Shape
    x1, y1, x2, y2: float

proc main =
  var x, z: Shape
  var c = Circle(x: 60, y: 20, radius: 50)

  deepCopy(x, c)
  echo x of Circle
  Circle(x).radius = 33
  echo c[]
  echo Circle(x)[]

  deepCopy(z, x)
  echo z of Circle
  Circle(z).radius = 19
  echo Circle(x)[]
  echo Circle(z)[]

main()

We have defined two variables `x` and `z` of the base `Shape` type, and one more variable of the `Circle` subtype. We pass the target parameter as the first argument, and the source parameter as the second to the `deepCopy()` procedure. The call to `deepCopy(x, c)` allocates memory for `x` and copies the actual `Circle` content. Although `x` has the static `Shape` base type, it acquires the actual `Circle` type, which we can verify using type tests with the `of` keyword. We can also use additional `deepCopy()` calls such as `deepCopy(z, x)` between base types, and obtain the correct runtime types again. The fact that this is possible is indeed a bit surprising, as serialization modules, such as the `json` module from Nim’s standard library, cannot automatically determine the correct runtime types.

Note that the compiler option `--deepCopy:on` is currently required for ARC and ORC.

Other builtin data types

Tuple types

`Tuples` are heterogeneous container types similar to the struct type in C. As Nim’s `object` type creates no overhead and directly corresponds to the C struct type provided we don’t use inheritance, tuples are very similar to Nim’s objects.

The biggest advantage of `tuples` is that we can create anonymous `tuples` and Nim supports the automatic unpacking of `tuple` variables into ordinary unstructured variables.

Compared to `objects`, `tuples` do not support inheritance at all, all the `tuple` fields are always visible, and different `tuple` types are regarded as identical when all the field names and field data types match. Remember that two different `object` types are always distinct in Nim, even when the actual type definition looks identical.

We can define `tuple` types in the same way as we define `objects`, or we can use the `tuple[]` constructor. Additionally, we can define anonymous `tuples` just by enclosing their field types in round brackets. The fields of `tuple` types can be accessed by field names as we do with objects, or we can access the fields with constant indices starting at zero.

type
  Move = tuple # the object definition syntax
    fro: int
    to: int
    check: bool

type Move2 = tuple[fro: int, to: int, check: bool] # equivalent tuple constructor syntax

proc findBestNextMove(): tuple[dest: int; check: bool] =
  discard

proc findBestNextMove2(): (int, bool) =
  discard

let (dst, check) = findBestNextMove()

let (dst2, check2) = findBestNextMove2()

In the code example above, we show two equivalent ways to define a `tuple` type. However, we actually do not use that type at all, but instead, we return an anonymous `tuple` from our `proc`, which is a pair of an `int` and a `bool`.

Using automatic `tuple` unpacking and type inference, our `dst` and `check` variables get the data types `int` and `bool`.

`Tuples` are also useful when a function needs to return a value and an error state, or if it might not be able to return anything at all in specific cases. For reference types, we could return `nil` then, but for results of value type like `int` or `float`, we may not have a well-defined error-indicating constant, so we can return a `tuple` with an additional `bool` indicating success or error. But of course, we could use exceptions instead, or we could use Nim’s option type instead. We will learn more about that later.

Here are two examples that use a `tuple` as a `proc` parameter:

proc p1(x: tuple[i: int, j: int]): int =
  x.i + x.j

echo p1((7, 7))

proc p2(x: (int, int)): int =
  x[0] + x[1]

echo p2((7, 7))
echo p2 (7, 7)

The `proc` `p1()` creates a `tuple` type using the `tuple` constructor syntax with named fields, which allows us to access the fields by their names in the procedure body. On the other hand, `proc` `p2()` uses an anonymous `tuple` and thus has to access the fields by constant indices. Both procedures are invoked with an anonymous `tuple` parameter. The last line of above example code uses the command invocation syntax.

Object variants

Nim’s `object` variants, sometimes also called sum types or abstract data types (ADTs), are advanced and type-safe variants of the union type known from C. The basic idea is that we can use value types that can store similar but not identical data as elements in containers. Dynamically typed languages like Ruby or Python allow that of course, and we can do it in Nim with `ref` types and inheritance too, as we showed in a previous section with our `Shape` base type and various geometric shapes. We could store these `ref` types in `arrays`, sequences or linked lists and use dynamic dispatch for processing the various subtypes. While this is convenient, it doesn’t provide maximum performance due to dynamic dispatch at runtime and inefficient cache use. Therefore, we might want a value type with different content, allowing us to store all value types in a `seq` with all entities residing in a compact memory block for efficient cache use.

type
  ShapeKind = enum
    line, rect, circ

  Shape = object
    visible: bool
    case kind: ShapeKind
    of line:
      x1, y1, x2, y2: float
    of rect:
      x, y, width, height: float
    of circ:
      x0, y0, radius: float

proc draw(el: Shape) =
  if el.kind == line:
    echo "process line segment"
  elif el.kind == rect:
    echo "process rectangle"
  elif el.kind == circ:
    echo "process circle"
  else:
    echo "unknown shape"

var
  s: seq[Shape]
s.add(Shape(kind: circ, x0: 0, y0:0, radius: 100, visible: true))
for el in s:
  draw(el)

`Objects` variants can have common fields like the boolean state `visible` above, but the other fields are not allowed to have the same names. As a result, we used `x0` and `y0` as the names of the center coordinates in the circle variant.

As you can see, we can store all the different `object` variants as value `objects` in a sequence and iterate over it. Note that `object` variants may waste some storage, as all variants are silently enlarged to have the exact same size so that all variant types can be stored in `arrays` or sequences and can be passed as `proc` parameters in the same way to the same procedure. For more details about object variants please consult the Nim language manual.

Iterators

In the section For loops and iterators, we used a `for` loop to iterate over the individual characters of a `string`. `For` loops are useful for various iteration purposes, e.g. to iterate over container types like `strings`, `arrays`, and sequences, or over a numeric `range`, and other countable entities. We could do the same with a `while` loop, but using a `for` loop is often more convenient and less error-prone — we do not have to care for increasing a loop variable and for the stop condition.

Nim’s `for` loops are built on `iterators`; that is, whenever a `for` loop is executed, an `iterator` is used under the hood. Some `iterators` are used explicitly in `for` loops, e.g. `countup()` of Nim’s standard library, others like `items()` or `pairs()` are executed implicitly when no explicit `iterator` name is specified.

The creation and use of `iterators` is very easy in Nim. Before discussing all the details and some restrictions of `iterators`, as well as the important differences between inline and closure `iterators`, let’s look at a small example:

We have already used some of Nim’s standard `iterators` to iterate over the characters of a `string` or the content of a sequence.

In an earlier section of the book, we demonstrated a procedure that extracts all the decimal digits from a `string`. We can accomplish the same task using an `iterator`:

iterator decDigits(s: string): char =
  var pos = 0
  while pos < s.len:
    if s[pos] in {'0' .. '9'}:
      yield(s[pos])
    inc(pos)

for d in decDigits("df4j6dr78sd31tz"):
  stdout.write(d)
stdout.write('\n')

The definition of an `iterator` is very similar to the definition of a procedure or function. However, while a function returns a result only once to the caller, an `iterator` uses the `yield` statement to give data back to the call site multiple times, instead of returning just once.

Whenever a `yield` statement is reached in the body of the `iterator`, the yielded data is bound to the `for` loop variable(s), the body of the `for` loop is executed, and at the end of the `for` loop body, control returns to the `iterator`. In other words, execution continues directly after the `yield` statement. The `iterator’s` local variables and execution state are automatically saved between calls. The iteration process continues until the end of the body of the `iterator` declaration is reached and the `iterator` terminates.

`Iterators` are used in `for` loops to iterate over containers, `ranges`, or other data. After the `for` keyword, we specify one or more arbitrary variable names, which we then can use in the body of the `for` loop to access the yielded value(s). The data type of this iteration variable(s) is inferred from the `iterator’s` return type, and its scope is limited to the body of the `for` loop.

Nim’s standard library defines `iterators` named `items()` and `pairs()` for container types like `strings`, `arrays`, and sequences. `Items()` is the default name when a `for` loop with only one variable is used, and `pairs()` is the default name when two variables are used, such as the index position and the character when iterating over a `string`.

In Nim’s standard library, you may find `items()` and `pairs()` `iterators` like these two:

iterator items(a: string): char =
  var i = 0
  while i < len(a):
    yield a[i]
    inc(i)

iterator pairs(a: string): tuple[key: int, val: char] =
  var i = 0
  while i < len(a):
    yield (i, a[i])
    inc(i)

var s = "Nim is nice."
for c in items(s):
  stdout.write(c, '*')
echo ""
for i, c in pairs(s):
  echo i, ": ", c

In the example above, we specified the `iterator` names `items()` and `pairs()` explicitly in the `for` statement, but as these names are the defaults, we can just write `for c in s:` and `for i, c in s:`.

The two `iterators` in the example code from above use a value type as an argument and return single characters as a value type. This way, we canot modify the `string` content. When we intend to modify the content of a container by use of an `iterator`, we have to pass the container as a `var` parameter and return the elements as `var` also. By convention, for iterating over mutable containers the `iterator` names `mitems()` and `mpairs()` are used, where the leading `m` stands for mutable. We have to specify these names explicitly:

iterator mitems(a: var string): var char =
  var i = 0
  while i < len(a):
    yield a[i]
    inc(i)

iterator mpairs(a: var string): tuple[key: int, val: var char] =
  var i = 0
  while i < len(a):
    yield (i, a[i])
    inc(i)

from std/strutils import toLowerAscii
var s = "NIM"
for i, c in mpairs(s):
  if i > 0:
    c = toLowerAscii(c)
echo s # Nim

Whenever we iterate over a container, we should not delete, insert, or append elements to the container, as that may confuse the loop inside the `iterator` body. `Iterators` of Nim’s standard library check the length of the container and generate an exception when the length changes during the iteration.

Nim differentiates between inline and closure `iterators`. When a `for` loop uses an inline `iterator`, then the actual `iterator` loop is inlined in the `for` loop body in a way that for each `yield` statement in the `iterator` body, the body of the `for` loop is executed. Actually, the `for c in items(s): stdout.write(c, '*')` in our example from above is rewritten by the compiler into a code block like

var i = 0
while i < len(a):
  var c = a[i]
  echo c, '*'
  inc(i)

That is, the body of the `for` loop is inlined into the `iterator’s` loop.

This results in very fast code with no overhead; however, similar to the use of `templates`, this increases the total code size of the final executable. In fact, when the `iterator` uses multiple `yield` statements, the code of the body of the `for` loop is inserted for each `yield` statement.

Inline `iterators` are currently the default `iterator` type, so the `iterators` of the examples above are all inline `iterators`.

Closure `iterators` behave more like procedures; the `iterator` is actually invoked, which costs some performance. We can use all the `iterators` of the examples from above as closure `iterators` by applying the closure pragma as in `iterator items(a: string): char {.closure.} =`.

Closure `iterators` behave like `objects`; we can assign instances of closure `iterators` to variables and then call the instances explicitly:

iterator myCounter(a, b: int): int {.closure.} =
  var i = a
  while i < b:
    yield i
    inc(i)

for x in myCounter(3, 5): # ordinary use of the operator
  echo x

echo "---"
var counter = myCounter # use of an iterator instance
while true:
  echo counter(5, 7)
  if counter.finished:
    break

which gives us this output:

3
4
---
5
6
0

Here, we have used the `finished()` function to check if the `iterator` is done.

In fact, `finished()` returns `true` only when the `iterator` has already failed to `yield` a valid value, not when the last valid value was yielded. That is why, in the example above, the last value we get is the invalid value zero.

We can avoid this behavior when we rewrite the loop as

var counter2 = myCounter
while true:
  let v = counter2(5, 7)
  if counter2.finished:
    break # v is invalid
  echo v

Closure `iterators` are resumable functions, so one has to provide the arguments to every call. To get around this limitation, one can capture the parameters of an outer factory proc:[62]

proc mycount(a, b: int): iterator (): int =
  result = iterator (): int =
    var i = a
    while i < b:
      yield i
      inc(i)

var c1 = mycount(5, 7)
for i in c1():
  echo i

echo "---"

var c2 = mycount(2, 5)
while true:
  let v = c2()
  if c2.finished:
    break # v is invalid
  echo v

In this example from the Nim language manual, the `proc` `mycount()` captures the bound for the counter. When we compile and run the code above, we get:

5
6
---
2
3
4

At the end of this section, we will list some properties of `iterators`: `Iterators` have their own namespace, so we can freely use the same names for `procs` and `iterators`. Iterators have no predefined `result` variable and do not support recursion. Inline `iterators` can be used only inside `for` loops and cannot be forward declared because the compiler must be able to inline an `iterator`. (This restriction will be gone in a future version of the compiler.) Closure `iterators` are not supported by the JS backend, and cannot be executed at compile time. Inline `iterators` are second-class citizens and can be passed as parameters only to other inlining code facilities like `templates`, `macros`, and other inline `iterators`. In contrast, a closure `iterator` can be passed around more freely.

Templates

Nim `templates` are very different from C++ `templates`! In C++ `templates` are used for generic programming — a style of computer programming in which algorithms are written in terms of types to-be-specified-later that are then instantiated when needed for specific types provided as parameters.[63] This is referred to as generics in Nim and other programming languages. We learned about Nim’s generics earlier in this book.

Nim `templates` are a simple, parameterized code substitution mechanism, and are used similarly as procedures. The syntax to invoke a `template` is the same as calling a procedure. However, while procedures build a single block of code that is then called multiple times, `templates` work more like C `macros`, performing a (textual) code substitution. Wherever we invoke a `template`, the `template` source code is inserted at the call site. In this way, Nim `templates` have indeed some similarities to C `macros`. But while C `macros` are executed by the C pre-processor and can do only plain source text substitutions, Nim `templates` operate on Nim’s abstract syntax trees, are processed in the semantics pass of the compiler, integrate well with the rest of the language, and share none of C’s preprocessor `macros` flaws.

In some way, Nim `templates` are a simplified application of Nim’s powerful `macro` and meta-programming system, which we will discuss in detail in Part VI of the book.

In C we could use the "#define" preprocessor directive to define two simple C `macros`.

#define PI 3.1416
#define SQR(x) (x)*(x)

The C pre-processor would then replace the symbol `PI` in the C source code with the `float` literal `3.1416` before the code is processed by the C compiler. And as the C pre-processor can recognize some simple form of parameters, it would replace `SQR(a + b)` with `(a+b)*(a+b)`.

In Nim we would define a `const` for `PI` and use a generic `proc` or a `template` for `SQR()`:

const PI = 3.1416
proc sqr1[T](x: T): T = x * x
template sqr2(x: typed): typed = x * x

Here the `sqr2()` `template` uses the special `typed` parameter, which specifies that the parameter has a well-defined type in the `template` body, but that arbitrary data types are accepted. So `sqr1()` and `sqr2()` would work for all numeric types and also for other data types for which we have defined a `*` operation. When there is no `*` operator defined for the passed data type, the compiler will give an error message.

Nim `templates`, like `procs`, accept all of Nim’s ordinary data types, in addition to the abstract meta-types `typed` and `untyped`. The abstract data types `typed` and `untyped` can be used only for the types of `template` and `macro` parameters, but not for parameters of procedures, functions, `iterators`, or to define variables.

We will explain the differences between `typed` and `untyped` in detail later in this section. The short version of the explanation is that `typed` `template` parameters must have a well-defined data type when we pass them to the template, while `untyped` parameters can also be passed as undefined symbolic names.

So we can in principle replace each procedure or function definition with a `template`. The important difference between `procs` and `templates` is that ordinary `procs` are instantiated only once, generic `procs` are instantiated for each data type with which they are used, while `templates` are instantiated for each invocation of the `template`. The compiler creates for each defined `proc` some machine code, which is executed whenever the procedure is called. But for `templates`, the compiler does some code substitution — the source code of the `template` is inserted where the `template` is invoked. This avoids the need for an actual jump to a different machine code block when a procedure is called but increases the total code size for each use of a `template`. So we would typically avoid frequently used `templates` that contain a lot of code.

For each ordinary `proc`, one block of machine code instructions is generated, and when the `proc` is called, program execution has to jump to this block, and back when the procedure execution is done. This jumping involves some minimal overhead, which is noticeable for tiny `procs` called frequently. To avoid this overhead, we may either use a `templates` or inlined `procs`, which we discussed in the previous section. The `proc` inlining can be done automatically by the compiler when the procedure is defined in the source code file where it is used, or when we mark the `proc` with the `inline` pragma. Additionally, when we compile our program with `-d:lto`, the compiler can inline all procedures and functions. Generally, the compiler should know well when inlining makes sense, so in most cases, it doesn’t make much sense to just use `templates` instead of (small) `procs` merely to avoid the [proc} call overhead.

`Templates` can be used as a form of alias. Sometimes we have nested data structures, and would like to have a shorter alias for the access of fields:

type
  Point = object
    x, y: int

  Circle = object
    center: Point

template x(c: Circle): int = c.center.x

template `x=`(c: var Circle; v: int) = c.center.x = v

var a, b: Circle

a.center.x = 7
echo a.center.x

b.x = 7
echo b.x

The two `templates` simplify the access of field `x`, and as `templates` are pure code substitution, their use costs no performance. Since version 1.6, Nim also has the with `macro`, which can be used to save some typing. Note that in the second `template`, we have called the second `int` parameter `v` — calling them `x` would give some trouble:

Error: in expression 'b.center.7': identifier expected, but found '7'

Nim’s `system` module uses `templates` to define some operators like

template `!=` (a, b: untyped): untyped =
  not (a == b)

This way `!=` is always the opposite of `==`, so when we define the `==` operator for our own custom data types, `!=` is available for free.

In some situations, using `templates` instead of `procs` can avoid some overhead. Let us investigate a `log()` `template` that can print messages to `stdout` when a global boolean constant is set to `true`:

const
  debug = true

template log(msg: string) =
  if debug: stdout.writeLine(msg)

var
  x = 4
log("x has the value: " & $x)

Here, `log()` is called with the constructed argument `("x has the value: " & $x)`, which implies a `string` concatenation operation at runtime. As we use a `template`, the invocation of `log("x has the value: " & $x)` is actually replaced by the compiler with code like

  if debug: stdout.writeLine("x has the value: " & $x)

So, when `debug` is set to `false`, absolutely no code is generated. For an ordinary, non-inlined procedure, the situation is different: the expensive `string` concatenation operation would always have to be performed, but the `log()` `proc` would immediately return if `debug` is `false`. What exactly would happen when `log()` is an inlined procedure may depend on the actually used compiler backend. You may wonder if, inside our `template` from above, we should have used "when" instead of "if". The use of "when" should be possible, as `debug` is a compile-time constant, but we assume that the use of "if" generates the same machine code for this use case.

Note that the delayed (lazy) parameter evaluation for `template` parameters can have disadvantages. When we modify the `log()` `template` like this:

template log(msg: string) =
  for i in 0 .. 2:
    stdout.writeLine(msg)

var x = 4
log("x has the value: " & $x)

the expensive `string` concatenation operation would be done in principle three times in the `template` body.[64] In contrast, for a procedure, the already evaluated parameter would be passed. So, when we access a parameter multiple times inside a `template`, it can make sense to assign the parameter to a local variable and then use only that variable.

`Templates` can inject entities defined in the `template` body into the surrounding scope. By default, variables defined in the `template` body are not injected in the surrounding scope, but `procs` are:

template gen =
  var a: int
  proc maxx(a, b: int): int =
    if a > b: a else: b

gen()
echo maxx(2, 3)
# echo a

The call `echo maxx(2, 3)` compiles and works, while `echo a` complains about an undefined symbol.

A very special property of `templates` and `macros` is that we can pass code blocks to them when we use `untyped` for the type of the last parameter.

template withFile(f: untyped; filename: string; actions: untyped) =
  var f: File
  if open(f, filename, fmWrite):
      actions
      close(f)

withFile(myTextFile, "thisIsReallyNotAnExistingFileWithImportantContent.txt"):
  myTextFile.writeLine("line 1")
  myTextFile.writeLine("line 2")

The `template` `withFile()` from the above example has three parameters — a parameter `f` of `untyped` type, a `filename` of `string` type, and as the last parameter one more `untyped` parameter, which we called actions. For this last `untyped` actions parameter, we can pass an indented code block.

When we invoke the `withFile()` `template`, we pass the first two parameters in the well-known way by putting them in a parameter list enclosed in round brackets. However, instead of also passing the final actions parameter in this manner, we put a colon after the parameter list and pass the following indented code block as the last `untyped` parameter. In the body of the above `template`, we have an `open()` call which opens a file with the specified filename and the `fmWrite` mode. The `template` then executes the passed code block and finally closes the file. The first parameter of our `withFile()` `template` has also a special property: As we use `untyped` for the `f` parameter, we can pass the still undefined symbol `myTextFile` to the `template`. In the `template` body, this symbol is used as a variable name, and our two `writeLine()` `proc` calls can use it to refer to the file variable.

As Nim `templates` are hygienic, the instance of the file variable created in the body of our `template` can be used by the passed code block, but it actually exists only in the `template` and does not pollute the global namespace of our program.

By passing an integer and a code block to a `template`, we can easily create a function similar to the `times()` construct known from Ruby, to execute a code block `n` times:

template times(n: int; actions: untyped) =
  var i = n
  while i > 0:
    dec(i)
    actions

var x = 0.0
3.times:
  x += 2.0
  echo x, " ", x * x

Of course, instead of `3.times:`, we could have simply used `for _ in 1 .. 3:`.

We can also use `templates` to create new `procs`. An example is lifting procedures like `math.sqrt()` that accepts a scalar parameter and returns a scalar value, to work with `arrays` and sequences. The following example is taken from the official tut2 tutorial:

from std/math import sqrt

template liftScalarProc(fname) =
  proc fname[T](x: openarray[T]): auto =
    var temp: T
    type outType = typeof(fname(temp))
    result = newSeq[outType](x.len)
    for i in 0 .. x.high:
      result[i] = fname(x[i])

liftScalarProc(sqrt)   # make sqrt() work for sequences
echo sqrt(@[4.0, 16.0, 25.0, 36.0])   # => @[2.0, 4.0, 5.0, 6.0]

The `template` called `liftScalarProc()` creates a generic `proc` that accepts an `openArray[T]` as a parameter and returns a `seq[T]`. Well, we should be able to understand the basic ideas used in that code, but it is still fascinating that it really works.

Typed vs untyped parameters

Parameters passed to `templates` can be of any data type that we can use for `procs`, including special types such as `openarray`, `varargs` and `typedesc`. Additionally, we can use the symbols `untyped` and `typed` as parameter types.

The `typedesc` type can be used to pass type information to the `template`, e.g. when we want to create a variable of a special data type. The "meta-types" `typed` and `untyped` are used when we want to create a form of generic `template` that can accept different data types. In reality, the distinction between `typed` and `untyped` parameters is not as challenging or crucial for `templates` as it is for `macros`. In most cases, it’s evident whether we need the `typed` or `untyped` parameter type for a `template`, or if both will work fine. We discuss the differences between `typed` and `untyped` in much more detail in Part VI of the book, when we discuss `macros` and meta-programming.

The following example demonstrates the use of the `untyped` and the `typedesc` parameter:

template declareInt(n: untyped) =
  var n: int

declareInt(i)
i = 3
echo i

template declareVar(n: untyped; t: typedesc) =
  var n: t

declareVar(x, float)
x = 3.0
echo x

Since the parameter `n` is `untyped`, the compiler allows us to pass an undefined symbol to the `template`. If we changed the parameter type to `typed`, the compiler would complain with a message like "Error: undeclared identifier: \`i`".

For the second `template`, called `declareVar()`, we use an additional parameter of `typedesc` type so that the `template` can create a variable of the passed data type for us.

Citing the manual: "An `untyped` parameter means that symbol lookups and type resolution is not performed before the expression is passed to the `template`. This means that undeclared identifiers, for example, can be passed to the `template`. A `template` where every parameter is `untyped` is called an immediate `template`. For historical reasons, `templates` can be explicitly annotated with an immediate pragma and then these `templates` do not take part in overloading resolution and the parameters' types are ignored by the compiler. Explicit immediate `templates` are now deprecated. For historical reasons, stmt was an alias for `typed` and expr was an alias for `untyped`, but they are removed."

Earlier, we said that Nim’s `templates` are hygienic, so you may wonder why the variable declared inside of the `template` is visible outside. Actually, this is only the case because we pass the symbol `n` as a `template` parameter. An ordinary declaration like `var h: int` would create a variable that is only visible inside the `template` body; it could not be used after invoking the template. We can use the inject pragma to make such ordinary variables visible outside of `templates`. For more details, please consult the language manual.

Passing a code block to a template

In the `withFile()` example above, we demonstrated that a block of statements can be passed as the last argument to a `template` using the special `:` syntax. To demonstrate the difference between code blocks of `typed` and `untyped` data types, we will cite the Nim language manual. See https://nim-lang.org/docs/manual.html#templates-passing-a-code-block-to-a-template:

Usually, to pass a block of code to a `template`, the parameter that accepts the block needs to be of type `untyped`. Because symbol lookups are then delayed until `template` instantiation time:

template t(body: typed) =
  proc p = echo "hey"
  block:
    body

t:
  p()  # fails with 'undeclared identifier: p'

The above code fails with the error message that `p` is not declared. The reason for this is that the `p()` body is type-checked before getting passed to the body parameter, and type-checking in Nim implies symbol lookups. The same code works with `untyped` as the passed body is not required to be type-checked:

template t(body: untyped) =
  proc p = echo "hey"
  block:
    body

t:
  p() # compiles

Passing operators to templates

Another use case for `templates` with `untyped` parameters involves the generation of math operations for custom data types. Let us assume that we have created a custom `Vector` `object`, for which we have to define addition and subtraction operations. Instead of writing code for both cases, we can use a `template` and pass the actual math operator as `untyped` parameter:

type
  Vector = object
    x, y, z: int

template genOp(op: untyped) =
  proc `op`(a, b: Vector): Vector =
    Vector(x: `op`(a.x, b.x), y: `op`(a.y, b.y), z: `op`(a.z, b.z))

genOp(`+`)
genOp(`-`)

echo `+`(2, 3) # 5

var p = Vector(x: 1, y: 1, z: 1)
var p2 = p + p
echo p2 # (x: 2, y: 2, z: 2)

This works because mathematical operations like `1+2` can be written as `\`+(1, 2), and such an operator can be passed as an `untyped` parameter to a `template`.

Advanced template use

For more advanced `template` topics, you should consult the Nim language manual.

This includes the symbol binding rules, identifier construction in `templates`, lookup rules for `template` parameters, hygiene in `templates`, use of the `inject` pragma, and limitations of the method-call-syntax.

All this is explained well in the language manual, so there’s no need to repeat it here. It might be more beneficial to consult the manual when you actually encounter problems with the default behavior of `templates` in unique situations.

Casts and type conversions

While we have various types of casts in C++, Nim only supports one type of cast and type conversions. In Nim, `cast` simply reinterprets the same bit pattern for another data type. For example, the boolean value `false` is internally encoded as a `byte` with all bits cleared, while `true` is encoded as a `byte` with all bits cleared except for the least significant one. We could `cast` a `bool` to an `int8` of the same size and receive a number with a decimal value of `0` or `1`. Casting is not a real operation at all, as nothing is really done. We watch the same bit pattern, just from a different perspective. But casting is dangerous, it violates the safe type system of the language, and it can go very wrong: Can we cast between `float64` and `int64`? Well, they have the same size, and both are numbers. We can `cast`, but the result would be far away from what we may expect. While `int64` has a well-known and simple value encoding, where the rightmost bit stands for `2^0`, the next bit for `2^1`, and so forth, the encoding of floating-point numbers is much more complex and doesn’t follow such a simple scheme. In `floats`, some bits represent the so-called mantissa and some bits represent the exponent. When we `cast`, we may again get a number, but the value is not easily predictable. We have to be very careful when we `cast` between types of different sizes. Nim may permit that, but we have to think about what may really happen. When we `cast` between a `bool` and an `int64`, in one direction 7 bytes have to be ignored, and in the other direction, padding is necessary for the 7 missing bytes. We perform a `cast` by writing the desired type in square brackets after the keyword `cast`, followed by parentheses enclosing the source variable:

var i: uint8 = cast[uint8](myBoolVar)

Totally different from casting is type conversion. We can convert integers to floating-point numbers without problems, for the conversion we use the type like a `proc` call, that is `int(myfloat)` or `float(myInt)` — of course, we could use method call syntax like `myInt.float` instead. Type conversion requires some effort from the CPU, but most advanced CPUs should have fast instructions for basic conversions.

Nim generally only allows type conversions that involve not too much effort. So we should not expect something like `var i: string ="1234"; echo i.int * 7` to be available. Such a conversion is expensive, at runtime it costs many CPU cycles, as we would have to extract the digits, multiply by their weight and sum them up. For that operation, functions like `parseInt()`, which accept a `string` as an argument and return an `int`, are available from the Nim standard library. There exist different variants of `parseInt()`, one may raise exceptions for invalid input, and the other may return a boolean.

Bitwise operations

All systems programming languages, and most other programming languages, have support for bit manipulation operations, which includes querying and setting individual bits of variables, and combining the bits of two or more variables. As the CPU hardware supports these operations directly, these operations are very efficient. In the C programming language, operators like `&`, `|`, `<<`, `>>`, `^`, `~` are used for bit-wise `and` and `or` operations, for shifting all the bits of a variable to the left or to the right, and for the process of inverting all the bits and for applying the exclusive-or operation on the bits of two operands. Actually, for the right shift operation, we have to distinguish between a logical and an arithmetic shift: For a logical shift the bit pattern is only moved right, and the leftmost bit is always cleared. But for an arithmetic shift, the leftmost bit may stay set when it was set before, indicating a negative number in the case of a numeric variable. In C the actual behavior for a `>>` shift right operation can be implementation-dependent.

Nim prefers to use textual operators instead of cryptic symbols, so the logical operators `and`, `or` and not have overloads to work on the actual bit pattern of integer variables instead of on boolean values, and for logical left and right shifts the operators are called `shr` and `shl`. For `shl`, bits shifted in from the right are always cleared, while `shr` shifts in cleared bits from the left for unsigned arguments, but preserves the leftmost set bit for signed arguments, which corresponds to an arithmetic shift operation. The Nim standard library also provides an `ashr()` function for arithmetic shifts, but that one seems to be a legacy.

from std/strutils import toBin
var i = 1.int8 # 0b00000001
i = i shl 7 # 0b10000000
i = i shr 2 # 0b11100000 as sign is preserved
echo i.toBin(8)
var j: uint8 = 0b11111111
j = j shr 2 # 0b00111111, div 4 for unsigned int
echo j.int8.toBin(8)

The bit-wise operators `and`, `or`, and `not` behave very similarly to the boolean ones, but the operation is performed on all the bit values instead of just two boolean operands. The shift operators require a right-hand operand specifying how many positions the bit pattern of the integer variable on the left should be moved. As the `shr` operator preserves the leftmost sign bit for each individual shift when applied to a signed integer argument, we get a value with the three leftmost bits set in the above example. For showing the bit pattern, we used the `toBin()` function in the above code, the second parameter determines how many bits are actually printed. Remember that for unsigned numbers, shifting left (`shl`) by one position is equivalent to multiplying by two, and shifting right (`shr`) by one position is equivalent to dividing by two. Negative numbers are not allowed for the number of bits to shift. Although `i = i shl -1` does compile, the result is always zero. For all the shift operations, performing `n` shifts each by one position would yield the same result as a single shift by `n` positions. For most modern CPU hardware, all the bit shifting operations are very fast and generally take only one clock cycle, independent of how many positions we move the bit pattern and independent of whether it is a logical or an arithmetic shift operation.

We can use the `and` and `or` operators to extract single bits or set single bits:

var a = 3 # two rightmost bits, at position 0 and 1 are set
var b = a and 2 # extract bit 1, so b gets the value 2
b = a and 4 # extract bit 3, which is unset, so result is 0
b = a or (4 + 8) # result is \b00001111, decimal 15

This information should suffice for understanding the most basic bit operations. We may not use these operations frequently, but it’s important to be aware of their existence. The overloading of the `and`, `or`, and `not` operators for signed and unsigned integer numbers may appear convenient, but it can sometimes lead to confusion when we intend to perform boolean operations and instead operate on bit patterns. It was suggested to call the operators bitand, bitor, and bitnot instead, and indeed the `bitops` module of Nim’s standard library defines operators with these names and provides additional, more useful bit operations, including counting the number of set bits in a variable or determining the number of leading zero bits. These operations are not needed that frequently, but sometimes they can be very useful, and they are supported by fast CPU instructions on modern PC hardware. Note that while we have shown these bit operations on integer numbers only, you can always cast other data types to integers and then apply these operations as well.

Exceptions

When we execute our program code, sometimes things can go wrong: we may be unable to open a file, encounter an unexpected division by zero or an overflow, or receive invalid user input. There are various strategies to handle such situations. One is to terminate our program. We may do that by a plain `assert()` or `quit()` statement. If we have absolutely no idea how to recover from an error, then that is typically our best option. The user can restart the program, or the program can be restarted by a supervisor program. For more predictable errors, some form of error indicator can be a better solution. For example, a `parseInt()` procedure may return a boolean value indicating success. As we have to return the numerical result for success as well, the `parseInt()` `proc` can return a `tuple` or can use a `var` parameter in which the actual integer value is returned. Whenever a procedure returns a reference, the return value `nil` can be used to indicate some form of error. Alternatively, we may use Nim’s `Option` type to allow the caller to detect if a returned value is invalid.

Another popular strategy to handle error states is the use of `Exceptions`. If an invalid operation is detected somewhere in the code path, that code can `raise` an `Exception` to indicate that a serious error has occurred.

This raised error might be handled elsewhere in the program. If it is not handled at all, the raised `Exception` will finally cause a program termination.

Let us start again with a small example:

proc charToInt(c: char): int =
  if c in {'0' .. '9'}:
    return ord(c) - ord('0')
  raise newException(OSError, "parse error")

proc main =
  while true:
    stdout.write("Please enter a single decimal digit: ")
    let s = stdin.readline
    try:
      echo "Fine, the number is: ", charToInt(s[0])
    except:
      if s.len == 0:
        break
      echo "Try again"

main()

The `charToInt()` `proc` `raises` an `Exception` when the passed character is not a decimal digit. As the main program knows that `charToInt()` may `raise` an `Exception`, it encloses the `charToInt()` call in a `try/except` block: If code in the try block `raises` an `Exception`, then the program execution proceeds in the `except:` block.

The use of `Exceptions` seems to be a good idea to handle certain of rare errors, and most modern programming languages support some form of raising and catching `Exceptions`. However, there has also been some criticism: using exceptions and catching them with try/except blocks can disrupt the regular control flow of the program, making it difficult to reason about all possible code paths. For this reason, the popular Go programming language was initially released with `Exception` handling explicitly omitted, with the developers arguing that it obfuscated control flow. In fact, the Nim compiler can help with the management of all exceptions involved by using its effect system, which is described in detail in the Nim language manual and which we will briefly discuss in the next section. Nim’s `Exception` tracking is part of Nim’s effect system — we can annotate each `proc` with all the various types of `Exceptions` that it may `raise`, and the compiler can help us with this annotation and verify that it is correct.

Defects and catchable errors

Nim’s strategy for the handling of `Exceptions` has changed a bit in the last few years. Nim differentiates now between catchable errors, and defects, which may not be catchable, and are considered to be programming bugs. The prototype of a defect is the DivByZeroDefect. If we do an integer division by zero, then the most common CPUs will generate a signal and the OS will abort our program with SIGFPE. So to prevent the program abort by a possible DivByZeroDefect, we have always to ensure that for an integer division, the denominator is not zero, or we let the Nim compiler do this check by compiling with the option `--checks:on`, which costs performance and increases code size, as a check instruction is added for each division.

In Nim, all `Exceptions` types are `objects` that inherit from the `Exception` type of the `system` module and have public `name` and `msg` fields of `string` type.

`Exceptions` that indicate programming bugs inherit from `system.Defect` and can be uncatchable, as they can be mapped to operations that terminate the whole process, like a quit, trap, or exit operation. `Exceptions` that indicate other, catchable runtime errors inherit from `system.CatchableError`.

These types are further subclassed into `Defects` like `OverflowDefect` or `OutOfMemDefect`, and `Errors` like `ValueError` or `IOError`.

Raise statement

An `Exception` is raised using the `raise` statement. The `raise` statement expects a heap-allocated reference to an `Exception` `object`, as the lifetime of the `Exception` instance is unknown. Generally, we use the `newException()` `template` to create the `Exception` instance and set the `msg` field like

raise newException(IOError, "IO failed")

In principle, we could also create the `Exception` instance like

var
  e: ref OSError
new(e)
e.msg = "the request to the OS failed"
raise e

If `raise` is invoked without an `Exception` argument, the current `Exception` is re-raised. The `ReraiseDefect` `Exception` is raised if there is no `Exception` to re-raise. It follows that the `raise` statement always raises an `Exception`. Reraising an `Exception` can be useful in an `except` block (see below) when the actual `Exception` type cannot be handled.

Custom exceptions

Instead of raising one of the predefined `Exceptions` from the `system` module, we can also create our own variants and then `raise` them:

type
  LoadError = object of Exception

Try statement

In the Nim language manual, we have an example like this one:

import std/strutils
var
  f: File
if f.open("numbers.txt"):
  try:
    let a = f.readLine
    let b = f.readLine
    echo "sum: ", parseInt(a) + parseInt(b)
  except OverflowDefect:
    echo "overflow!"
  except ValueError, IOError:
    echo "value or IO error!"
  except:
    echo "Unknown exception!"
  finally:
    close(f)

The code tries to read two `strings` from a text file that is assumed to contain numeric data and to add them after conversion to integer numbers. Three errors may occur: The reading of the `strings`, the conversion to integers, or the addition may fail, with the last potentially causing an overflow. To catch the possible errors, we use the `try/except/finally` construct. The keywords `try`, `except`, and `finally` are followed by a colon, and each keyword marks the start of a corresponding block of instructions — after the `except` keyword we can list the `Error` and `Defect` types for which the following code block should be executed.

The statements in the `try` block are executed in sequential order until an `Exception` is raised. If an `Exception` is raised and the `Exception` type matched any listed in an `except` clause, the corresponding statements are executed. If no `Exception` types match and an `except` clause with no listed `Exception` types is specified, the following code block is executed. The statements following the `except` clauses are called `Exception` handlers. If there is a `finally` clause, it is always executed after the `Exception` handlers.

An `Exception` is "consumed" in an `Exception` handler. However, an `Exception` handler may `raise` another `Exception` or re-raise the current one, which then may be caught elsewhere or generate a program termination if it remains uncaught. If an `Exception` is not handled, it is propagated through the call stack. This means that often the rest of the procedure - that is not within a `finally` clause - is not executed (if an `Exception` occurs).

Try expressions

Just as we can use the `if` keyword as an expression, we can do the same with the `try` keyword. The data types of the `try` and the `except` branches have to be compatible in this case, and an optional `finally` branch has to return nothing (void):

from std/strutils import parseInt, parseFloat
let x = try: parseFloat("3.14")
        except: NaN
        finally: echo "well we tried." # always executed!
echo x # 3.14

let i = (try: parseInt("133a") except: -1)
echo i # -1

Except clauses

In an `except` block, we can use the function `getCurrentException()` to get the raised `Exception`, or `getCurrentExceptionMsg()` to get only the error message. Or, we can access the current `Exception` in an `except` block using the `as` keyword, as shown below:

try:
  # ...
except IOError as e:
  # Now use "e"
  echo "I/O error: ", e.msg, " (", e.name, ')'

Imported exceptions

It is possible to `raise` and catch imported C++ exceptions. For a detailed example, see the Nim language manual: https://nim-lang.org/docs/manual.html#exception-handling-imported-exceptions

Defer statement

The concepts statement can be used to ensure that special actions like closing a file or freeing resources are always executed. The concepts statement is transformed by the compiler into a `try/finally` construct.

proc main =
  var f = open("numbers.txt", fmWrite)
  defer: close(f)
  f.write "abc"
  f.write "def"

Is rewritten to:

proc main =
  var f = open("numbers.txt", fmWrite)
  try:
    f.write "abc"
    f.write "def"
  finally:
    close(f)

Using concepts is more concise, but `try/finally` makes it more obvious what is happening, so some people recommend not using the concepts statement. Actually, tasks like closing files should soon be performed by Nim’s destructors automatically, so concepts may get deprecated.

References:

Destructors

Destructors and finalizers are used for automatic resource management. For example, files can be closed automatically when a file variable goes out of scope. Similarly, when we create high-level Nim bindings to C libraries, we can use finalizers or destructors to deallocate entities of the C libraries when a corresponding Nim (proxy) `object` is freed. Libraries like the Gintro GTK bindings make use of this.

Finalizers are procedures that can be passed as an optional second parameter to the system `new()` `proc`. That way, the finalizer `proc` is attached to the data type of the variable which we pass as the first parameter to `new()` and that finalizer `proc` is automatically called whenever that variable is freed by the Nim memory management system. As finalizers are passed as a parameter to a `new()` call, and `new()` is only used for references, finalizers work only for `ref` data types.

Destructors do not have this restriction. We define the destructor for a value type, but it is also called for reference types by the compiler.

Starting with version 1.4, Nim introduced scope-based resource management, which is enabled when the program is compiled with `--mm:arc` or --mm:orc. In that case, variables are immediately deallocated when they go out of scope, and if a destructor was defined for the data type of that variable, it is called automatically.

In the C++ programming language, it is common practice for resources like files to be closed and released automatically by destructors when they go out of scope. Now, this is also possible in Nim. To make use of destructors for our own data types, we have to define a `proc` called `=destroy` which receives an instance of our data type passed as a `var` value `object`:

type
  O = object
    i: int

proc `=destroy`(o: var O) =
  echo "destroying O"

import std/random

proc test =
  for i in 0 .. 5:
    if rand(9) > 1:
      var o: O
      o.i = rand(100)
      echo o.i * o.i

randomize()
test()

In the `for` loop, we enter a new scope when the `if` condition evaluates to `true`. At the end of the `if` block, we leave the scope, and the destructor is called automatically. Inside the destructor `proc`, we could do some cleanup tasks, close files, and release resources. Destructors are also called when `ref` `objects` go out of scope:

type
  O = ref object of RootRef
    i: int

proc `=destroy`(o: var typeof(O()[])) =
  echo "destroying O"

import std/random

proc test =
  for i in 0 .. 5:
    if rand(9) > 1:
      var o: O = O() # new O
      o.i = rand(100)
      echo o.i * o.i

randomize()
test()

To use destructors, we have to compile our program with the `--mm:arc` or `--mm:orc` option; otherwise, the specified destructor `procs` will be ignored. In our code, we can test for working destructors with a construct like `when defined(gcDestructors):`.

Note that destructors do not work for plain `pointer` types:

type
  O = object
    i: int
  OP = ptr O

proc `=destroy`(o: var O) =
  echo "destroying O"

import std/random

proc test =
  for i in 0 .. 5:
    if rand(9) > 1:
      var o: OP = create(O) # new O
      o.i = rand(100)
      echo o.i * o.i

randomize()
test()

Therefore, using destructors to release data directly from C libraries is not possible. But at least for Nim >= v1.6 destructors work for `distinct` `pointer` types:

type
  O = object
    i: int
  OP1 = ptr O
  OP = distinct ptr O

proc `=destroy`(o: var OP) =
  echo "destroying OP"

import std/random

proc test =
  for i in 0 .. 5:
    if rand(9) > 1:
      var o: OP = OP(create(O)) # new O
      OP1(o).i = rand(100)
      echo OP1(o).i * OP1(o).i

randomize()
test()
81
destroying OP
3600
destroying OP
2401
destroying OP
9025
destroying OP

So using destructors to destroy data from C libraries should be possible now.

Destructors and inheritance

When we use an object-oriented programming style with subclassing of `ref` `objects`, it’s useful to know that for subclassed `ref` `objects`, the destructor of the parent class is automatically invoked if we do not define one for our subclassed type. This also works when we import the parent type from another module, at least since Nim v1.6:

# module tt.nim
type
  O1* = ref object of Rootref
    i*: int

when defined(gcDestructors): # check not really needed, as =destroy call is just ignored when condition is false
  proc `=destroy`*(o1: var typeof(O1()[])) =
    echo "destroy O1 ", typeof(o1)
# module t.nim
import tt

type
  O2 = ref object of tt.O1
    j: int

type
  O3 = ref object
    o1: tt.O1

type
  O4 = object
    o1: tt.O1

type
  O5 = ref object of tt.O1
    x: float

when defined(gcDestructors):
  proc `=destroy`(o5: var typeof(O5()[])) =
    echo "destroy O5 ", typeof(o5)
    tt.`=destroy`(o5)

proc main =
  var o1: tt.O1
  new o1
  echo o1.i

  var o2: O2
  new o2
  echo o2.j

  var o3: O3
  new o3
  new o3.o1

  var o4: O4
  new o4.o1

  var o5: O5 = O5(x: 3.1415)
  echo o5.x

main()

When we compile the module `t.nim` with `--mm:arc` or `--mm:orc` and run it, we get this output:

0
0
3.1415
destroy O5 O5:ObjectType
destroy O1 O1:ObjectType
destroy O1 O1:ObjectType
destroy O1 O1:ObjectType
destroy O1 O1:ObjectType
destroy O1 O1:ObjectType

Therefore, when our variables `o1` to `o5` go out of scope, the destructors are called. Module `tt.nim` defines a `ref` `object` type, but the destructor `proc` takes a `var` value parameter. The destructor is called when a value `object` or a `ref` `object` goes out of scope. Our variable `o1` has type `tt.O1`, so it was indeed expected that its destructor from module `tt.nim` is called. Variable `o2` is a `ref` `object` with parent `O1`. As we define no destructor for this type, the destructor of the parent type is called. The variables `o3` and `o4` are of `ref` `object` and of value `object` types, each with a field of type `O1`, and for that field, the destructor for `O1` is called. Finally, for type `O5`, we define our own destructor, which then additionally calls the destructor from module `tt`.

Destructors are mostly used for library implementations, e.g., for a `File` data type, which is automatically closed when a file variable goes out of scope. As you may never have to use destructors yourself, it is not necessary to remember all these details. However, it is good to know that destructors behave as one might expect. If you later want to use a destructor in your own code, you can refer back to this section or, perhaps more helpfully, consult the Nim manual.

References:

Finalizers

In Nim, finalizers are procedures that we can specify as an optional second parameter when we call the system `new()` `proc` to allocate heap memory for a reference type variable. The specified finalizer procedure is then later called by the Nim memory management system when the `ref` variable is freed:

type
  O = ref object of RootRef
    i: int

proc finO(o: O) =
  echo "finalize O"

proc newO: O =
  new(result, finO)

proc main =
  var o = newO()
  var o2 = new(O)
  var o3 = O(i: 7)

main()
GC_fullcollect()

We added a call to `GC_fullcollect()` to ensure that the REFC GC is actually invoked before the program terminates. For ARC/ORC we get this output:

finalize O
finalize O
finalize O

But when we compile with old REFC, we get only two finalizer calls:

nim r --mm:refc t.nim

finalize O
finalize O

For `o3`, the finalizer is not called. We don’t know if that is a bug or feature of v1.9.3.

The output of the above program may be surprising at first: we only call the `newO()` procedure to initialize the variable `o`, which then calls `new()` by passing a finalizer `proc` named `finO()`. For `o2` and `o3`, we allocate memory as usual, without the use of a finalizer `proc`. But when `o2` and `o3` go out of scope, even for these two variables, the finalizer procedure `finO()` is called. The reason for this is, that the system `proc` `new()` binds the optional finalizer procedure to the data type of the passed `ref` variable. This binding process occurs for the first call with a passed finalizer `proc`, and can not be reverted. We can later call `new()` without a finalizer or use the similar `O()` call to initialize the `ref` variable, but that can not undo the binding. Furthermore, using a different finalizer procedure for the same data type would not work anymore. Passing the same finalizer `proc` multiple times is OK and may be a common use case, but it has no real effect, as the first call did the binding already.

The behavior of finalizers in Nim can indeed be a bit confusing and prone to errors. We might pass a finalizer `proc` to `new()` somewhere in a large program and forget about it. Later, we use `new ()` without a finalizer or use the `O()` notation to reserve the memory for our `ref` variable. Therefore, we might think that no finalizer is involved, but since a finalizer was used at least once somewhere, it is now bound to all of our allocations of that data type. That can easily lead to bugs as the unintended called finalizers may do things that they should not do with our data.

Finalizer procedures must always be defined in the same module as the type for which they will be used:

This restriction appears to have been removed in Nim 2.0.
# module tt.nim
type
  O* = ref object of RootRef
    i: int

proc fin*[T](o: T) =
  echo "finalize T"

proc newO*: O =
  new(result, fin)
import tt

type
  OO = ref object of tt.O
    x: float

proc finn[T](o: T) =
  echo "finalize O"

proc main =
  var oo: OO
  new(oo, finn)

main()

We import the `tt.nim` module and subclass the `ref object` type `tt.O`. Although the `tt.nim` module defines a generic finalizer `proc` `fin()`, we cannot use that one for our subclassed type `OO`. Instead, we must copy it from the `tt.nim` module into our main module, and we might even need to use a different procedure name. Otherwise, we get the compiler message

Error: type bound operation `fin` can be defined only in the same module with its type (OO:ObjectType)

Whenever we really need a finalizer or a destructor, we should prefer destructors if we can compile our code with the compiler options `--mm:arc` or `--mm:orc`.

Modules

Modules are Nim’s way of dividing multiple source code files into clearly separated units and hiding implementation details. Nim uses a concept of modules, which is very similar to that of Modula-2 or Oberon. All the Nim standard libraries are divided into modules that collect and logically group data types and related procedures. In a sense, modules can be considered as Nim’s classes.

In Nim, each module directly corresponds to one text file. Currently, Nim does not support submodules, known from Ruby, which divide a single text file into multiple modules. Similar restrictions apply to module names as to other Nim symbols, e.g., the hyphen '-' is not allowed in module names. Typically, we use only lowercase and the extension ".nim" for the names of modules. It is strongly recommended to avoid using module names that are identical to symbol (type) names used within that module. Every text file containing Nim source code essentially constitutes a module, which can then be imported and used by other modules.

But all symbols like data types or procedures have to be exported to make them visible and usable by other modules. This is accomplished, as in Oberon, by appending an asterisk character to all symbols (names) that should be exported. These restricted exports allow for the hiding of implementation details — all symbols not exported are private to that module and can be changed and improved at any time without the importing module noticing.

Note that when we append the asterisk to the name of an `object` to export that type, the object’s fields are still hidden and cannot be accessed from within the importing module. You may append an asterisk to selected field names as well, or you may provide exported getter and setter `procs` for the field access. A read-only export, as known from the Oberon language, is currently not possible with Nim.

We can import whole modules, that is, all symbols that are marked for export by the asterisk, or we can import only the symbols that we need by specifying their names. Let us create a module that declares a single procedure to remove all characters from a `string` that are not letters:

# save this textfile with name mystrops.nim
proc remNoneLetters*(s: string): string =
  result = newString(s.len)
  var pos = 0
  for c in s:
    if c in {'a' .. 'z', 'A' .. 'Z'}:
      result[pos] = c
      inc(pos)
  result.setLen(pos)

We save the aforementioned text file containing our Nim source code as mystrops.nim. Note the export marker after the `proc` name. We can import and use that module as follows:

import mystrops

echo remNoneLetters("3h7.5g:8h")

When we import modules, we generally place the import statement at the top of the importing module; this makes it easy to see what modules are imported. The imported symbols can be used in the code following the import statement. Module names should be lowercase and may as other Nim symbols only contain letters, decimal digits, and the underscore character. We can import multiple modules with a single import statement when we separate the module names with commas. Starting with Nim v1.6, it is recommended to import modules from Nim’s standard library with the `std` prefix as in `import std/math` or `import std/[strutils, sequtils]`. Importing the same module multiple times is not a problem, and does not increase the file size of the final executable. Note that in the import statement the module names have to be used literally, so this would not work:

const strfuncs = "stringutils"
import strfuncs

Instead of importing whole modules, we can import only single symbols with the `from x import y, z` syntax like

from mystrops import remNoneLetters

echo remNoneLetters("3h7.5g:8h")

Both forms are examples of an unqualified import; that is, we can refer to the `proc` by only its name. We do not need the qualified form with a module name prefix like `mystrops.remNoneLetters()`, as long as there are no name conflicts. But whenever we want, we can use the qualified form also.

Nim programmers tend to prefer importing entire modules and using unqualified names, though this is often considered bad style in some other languages like Python. In dynamically typed languages like Python, unqualified imports may indeed pollute the namespace and generate many name conflicts, but in statically typed languages like Nim unqualified imports seems to generate name conflicts only in very rare cases. Procedures with the same name typically have different parameter lists, so the overload resolution of the compiler can decide what `proc` is to be used. And when really a name conflict occurs, then the compiler will tell us, and we can easily fix it by prefixing the procedure name with its module name.

For data types, constants, or enums, the likelihood of name conflicts might not be so small, potentially necessitating the use of qualified names.

We can also enforce a fully qualified import in Nim by a notation like

from mystrops import nil

In this case, we can use all symbols from that module only in qualified form. However, this approach doesn’t always work seamlessly in Nim, given that unlike Java, Nim doesn’t have classes. Consequently, qualified use of method call syntax or user-defined or overloaded operators can be challenging. Imagine strutils.add(s, '\n'), how should that look with method-call-syntax?

For imports, we have also the `except` keyword, so we may do something like

import std/strutils except toUpper

The `except` keyword can be used to prevent possible name conflicts, without having to use qualified names.

Note that the `system` module is imported automatically, so we should not import it explicitly. Also, note that Nim always imports only what is truly necessary in the final executable, meaning that importing only a few symbols from a module has no code size advantage over importing the whole module. Still, it may improve the readability of your code, when you import only single symbols for the case that you are sure that you require no more. Maybe like `from std/math import Pi`. Note that you can even in that case access other symbols of that module by fully qualified names like `math.sin()`.

With the growing standard library, it may occur that module names of the standard library interfere with your own module names. So Nim now allows and recommends qualified import of modules from the standard library like `import std/strutils`. And for external packages installed by the nimble package manager, imports in the form `import package/[mod1, mod2, mod3]` are permitted.

Finally, you can also import modules under a different name using the `as` keyword as follows:

import std/tables as maps

With the latest Nim compiler, you can also enforce fully qualified import and use of an alternate module name by using an import statement as follows:

from std/tables as maps import nil

With this import statement, you could access symbols from the `tables` module only by use of the `maps` module prefix like `maps.newTable()`.

Finally, with the `export` keyword, one library module can export other modules, which it imports itself. This may simplify the use of connected modules. As an example, when using the `gintro` bindings for GTK4, we import all the needed modules generally like `import gintro/[glib, gobject, gtk4]`. We may decide to simplify that import statement by creating one more module called `gtkplus` that consists only of these two lines:

# module gtkplus
import gintro/[glib, gobject, gtk4]
export glib, gobject, gtk4

Then, a user of `gintro` could simply write `import gtkplus` to have access to all the modules. However, for GTK, this is not really a good idea. We will discuss the `gintro` module and perhaps one other Nim GUI library in the second half of the book.

Cyclic imports

Typically, we try to arrange our own modules in a tree-like bottom-up structure. A module `x` may define basic types and simple functions working with these types, and a higher-level module `y` may import all symbols from module `x` and extend the functionality. But in rare cases, it could be necessary for the modules `x` and `y` to import each other, as `x` has to use types or functions of module `y`, and vice versa. This case is called cyclic import and is currently not supported by Nim. Indeed, we should generally try to avoid cyclic imports when possible, as cyclic imports make the software design difficult. But sometimes we cannot really avoid these cycles. In that case, currently, the best solution is, to put all the concerned data types in a separate low-level module, which is then imported from both other modules. The planned Nim version 2.0 may permit cyclic imports, so this restriction might vanish in the future.

We have already mentioned that the compiler only imports functions, data types, and other symbols from imported modules that are really needed. So a plain `import std/math` is fine, there is no need to write `from std/math import sin, cos, sqrt` to optimize the final executable size. The same is true when whole modules or single symbols from a module are imported multiple times: When modules `a` and `b` each import module `c`, and our top-level main module imports modules `a` and `b`, module `c` is still only imported once; there is no unneeded code duplication. The `import` statement is not merely an instruction to insert some code, but rather a hint to the compiler about which symbols may be needed. But remember, that the use of `templates`, inline `iterators`, generics, and inline procedures may indeed lead to code duplication, but that is by intent.

Include

The `include` statement should be not confused with the `import` statement. `Include` simply inserts a text file at the position where the `include` statement occurs. The `include` statement can be used to split very large modules into smaller entities.

Part III: Nim’s Standard Library

In this part of the book, we will introduce you to some of the most essential modules of Nim’s standard library. This includes modules for common operations like the serialization of Nim data types, which allows us to write them to external nonvolatile storage and read them back into the program later, or handling command-line options and parameters for programs that are launched from a terminal window. Further, we will introduce you to important container data types such as hash tables (sometimes referred to as hash maps in other programming languages) and various kinds of set data types. We will also introduce modules for working with regular expressions, and we will show how simple modules like the `times` and the `random` module can be used. Most modules mentioned in this part will be from the Nim standard library, so you will not have to install external packages to use them. However, there may be some exceptions, such as certain external Nimble packages with very useful functionality and an easy user interface. One of these exceptions is the `regex` module: Nim’s standard library comes with the `re` and `nre` modules, which both use the PCRE C library. We have decided to introduce the `regex` module instead, which is an external package written completely in the Nim language.

Formally, Nim distinguishes between pure and impure libraries and wrappers. The majority of Nim’s standard libraries consist of pure libraries, which are modules completely written in Nim code. Impure libraries provide a high-level Nim interface and can be used like pure libraries, but use C libraries under the hood. Examples are the two modules `re` and `nre`, which both use the PCRE C library, and some database modules. Impure libraries can be used in the same way as pure ones when the underlying C library is installed. The few wrappers that are shipped with Nim only provide a low-level interface to C libraries, which may use unsafe `pointers` as `proc` parameters and may require the user to do manual memory management. Some impure modules use these wrappers and hide all the ugly stuff for us, but we generally do not use the wrappers directly.

Nim’s standard library is supported by thousands of external packages, which can easily be installed with Nim’s package managers, and then can be used in the same way as the modules of the standard library. The next part of the book will introduce you to the use of external packages and presents some of the most important ones out there.

Command-line arguments

When we launch a program from inside the terminal window, we can pass it some additional parameters, e.g. the name of a file to process or option parameters to influence the behavior of the program. We have done so already when we launched the Nim compiler or maybe a text editor from inside our terminal window. Using command line parameters is convenient when we work from inside a terminal and there are parameters that we can know in advance. A more interactive way to collect parameters is reading in input while the program is already running, as we did in Part II of the book when processing the list of our friends. We will learn some more details about this interactive processing of input in the next section.

Nim allows processing command-line arguments in the same basic way as all C programs do, but Nim’s standard library and some external packages allow also much more advanced handling of command-line arguments. For simple cases, the C-like way is sufficient. For C programs the command line arguments are even coupled very closely to the language itself, the number of arguments and the list of parameters are the two typical parameters of the C `main()` function and are used in this way:

// C program expecting one command line argument
// Compile with gcc t.c
#include <stdio.h>
int main( int argc, char *argv[] ) {
  printf("Executing program %s\n", argv[0]);
  if( argc == 2 ) {
     printf("The argument supplied is %s\n", argv[1]);
  }   else if( argc > 2 ) {
     printf("Too many arguments supplied.\n");
  }
  else {
     printf("One argument expected.\n");
  }
}

Here `argc` is the number of available arguments, and `argv` is an `array` containing the actual arguments in the form of C `strings`. These values are passed to each C program by the OS when the program is launched from inside a terminal. Actually, the value of `argc` is the number of passed arguments plus one. This means that when we specify no arguments at all, `argc` still has the value of one. Additionally, `argv[0]` is always the name of the executed program. We need to understand that command-line arguments passed to a program are separated by white space, that is, at least one space or tab character. For this reason, we have to enclose single arguments containing white space in double quotes:

$ gcc t.c -o t
$ ./t Nim two
Executing program ./t
Too many arguments supplied.
$ ./t "Nim two"
Executing program ./a.out
The argument supplied is Nim two

In Nim, the same functionality is available through the `paramCount()` and `paramStr()` `procs`, which we need to import from the `os` module. But `paramCount()` gives us the actual number of parameters, so when we call our program on the command line without any arguments, `paramCount()` will return the value zero. Note that `paramStr()` is not a global `array` variable, but a procedure. `ParamStr(0)` gives us the name of our executable, and with arguments greater than zero we get the passed arguments as `strings` in ascending order. Using an index number for an argument that was not provided will cause `paramStr()` to `raise` an exception.

An argument evaluation similar to the one in our earlier C program could look like this:

from std/os import paramCount, paramStr

proc main =
  echo "Executing program ", paramStr(0)
  let argc = paramCount() + 1
  if argc == 2:
    echo "The argument supplied is ", paramStr(1)
    if paramStr(1) in ["-d", "--debug"]:
      echo "Running in debug mode"
  elif argc > 2:
    echo "Too many arguments supplied."
  else:
    echo "One argument expected."

main()

Using this straightforward API is acceptable when we expect one or two arguments, maybe a file name and an option, like the `-d` or `--debug` parameter used in the code above. With more command-line arguments, the process can become complex quickly, as arguments can be passed in arbitrary orders and combinations. So you should try one of the available libraries for that case. One of these is the `cligen` package, which we will present in Part III of the book.

References:

Reading data from the terminal

While using command-line arguments is convenient for data like filenames or options that we already know when we launch a program from the terminal window, often we have to provide textual user input while the program is already running. Functions for this task are provided by the `io` module, a part of the `system` module, which we do not have to import explicitly. In one of the introductory sections of the book, we already used the `readLine()` and `getch()` procedures to read in a line of text from the terminal and to wait for a single keypress event.

For input and output operations in a terminal window, the `io` module defines the three variables `stdin`, `stdout`, and `stderr` of the `File` data type. Many procedures in the `io` module expect a `File` type variable as the first parameter. We can explicitly open a named file to write data to external media like the SSD, or we can just use the `stdin` and `stdout` variables to read data from the keyboard and to write text to the terminal window. Unlike other named files, we do not have to call `open()` or `close()` on `stdin` and `stdout` to open or close the files, and some other file operations like `setFilePos()` may not work for these file variables:

var s: string = stdin.readLine()
stdout.write(s)
stdout.flushFile

As previously mentioned, the `readline()` function allows textual user input, including spaces. It’s important to note that you must terminate your input by pressing the return key. This action passes the input `string` to the OS, which then forwards the input to our program. This form of input is sometimes referred to as 'blocking' because while we’re waiting for user input, our program is essentially idle; it cannot perform other tasks until the user has pressed the return key. For single-character input where pressing the return key isn’t necessary, such as for simple `yes/no` input, you may use the `getch()` function. This function is also blocking. In a later section of the book, we may show how we can use threading to actually do some useful work, while we wait for user input. In the literature `stdin`, `stdout`, and `stderr` are often called streams, where `stderr` can be used instead of `stdout` for writing error messages. This can be useful in special cases when we have an application where we want to redirect error messages to a file or to separate regular output and error messages. If you need more details about these stream or file variables, and the use of the `stderr` variable, you may consult external literature.

The `io` module does not provide `read()` functions for other basic data types like numeric or boolean types. So we should use `readLine()` to read the user input in `string` form, which we can convert by functions like `parseInt()`, `parseFloat()`, or similar functions to numeric data. Note that parsing `procs` like `parseInt()` are provided by the module `strutils` as well as by the module `parseutils` — one function raises an exception for invalid input, while the other one returns a boolean value indicating conversion success. Of course, we should handle textual user input always carefully and never just assume that the input is actually valid data. Some of the modules that can be used for converting textual input data into other data types like the `strutils`, `parseutils` and `strscans` modules are described in more detail at the end of this part of the book.

For advanced user input processing, like cursor movement, colored display, or displaying progress bars, you may also consult the `terminal` module. Finally, to create advanced textual user interfaces (TUIs), we recommend trying external packages, such as the `illwill` library.

References:

Writing text to the terminal window

In previous sections, we have used the `echo()` function to write variables of various data types to the terminal window. The `echo()` function accepts multiple arguments, writes the `string` representation of these arguments to the terminal window, and concludes by writing the `\n` character. This moves the cursor to the beginning of the next line in the terminal window. We have already used the `write()` function from the `io` module for the case that we want to write a single `string` to the terminal without a terminating newline character. The `io` module contains overloaded `write()` functions for other basic data types such as `int`, `float`, and `bool`. It also includes a variant with a `varargs` parameter and applied stringify operator, allowing `write()` to function similarly to `echo()` if we pass `stdout` as the first parameter. The C library function `fprintf()` is used for the actual output operation. Keep in mind that write operations to `stdout` are generally buffered. Thus, the result of `write()` operations might remain invisible until we write a `string` containing a newline character or call the `flushfile()` function to enforce buffer writing.

Option types

Option types can be used to encapsulate values in a way that allows marking the value as undefined. This can be especially useful for the return types of functions, which may or may not return a meaningful value.

Let’s assume we have a function called `find()` that searches for the first index position of a character in a `string`:

proc find(s: string; c: char): int =
  result = -1 # not found
  var i = 0
  while i < s.len:
    if s[i] == c:
      return i
    inc(i)

echo "Nim".find('i')

The function returns the index position or `-1` to indicate that the character has not been found. This works because we typically use signed integers in Nim, and the valid `string` index positions are never negative. Hence, a negative result is an obvious indication of an error. Similarly, when a function needs to return a reference or a `pointer`, the special value `nil` can be used to indicate the absence of a value. Actually, in most cases, we can just define a special value as the indication for the absence of a `result` or as an error indicator, for example, `int.low`, `char(0)`, or `NaN` for `float` results.

Other ways to indicate failures include returning a boolean value for success and returning the actual result value as a `var` parameter, returning a `tuple` that encloses a boolean for success indication and the actual result, or returning the result(s) as a sequence that can be empty in the event of no success:

proc find(s: string; c: char; pos: var int): bool =
  pos = 0
  while pos < s.len:
    if s[pos] == c:
      return true
    inc(pos)

var p: int
echo "Nim".find('i', p), ": ", p
proc find(s: string; c: char): tuple[succ: bool, pos: int] =
  var i = 0
  while i < s.len:
    if s[i] == c:
      return (true, i)
    inc(i)

echo "Nim".find('i')
proc find(s: string; c: char): seq[int] =
  var i = 0
  while i < s.len:
    if s[i] == c:
      result.add(i)
    inc(i)

echo "Nim".find('i')

For a more formalized way to indicate the absence of a meaningful `result`, many modern programming languages provide the concept of `Option` types, which are sometimes also called `Maybe` types. `Option` types can encapsulate an arbitrary data type and provide functions like `isSome()` or `isNone()` to test for the existence of a valid value, and functions like `get()` to extract the actual value from the `Option` type:

import std/options

proc find(s: string; c: char): Option[int] =
  var i = 0
  while i < s.len:
    if s[i] == c:
      return some(i)
    inc(i)

var res = "Nim".find('i')
if res.isSome:
  echo res.get

The `options` module of Nim’s standard library provides the generic `Option[]` data type along with functions like `some()`, `isSome()`, and `isNone()`. These functions allow creating a new `Option` type encapsulating some data and checking if data is present. In the code above, we use `some(i)` to wrap the integer value in the `Option` type when we have found a match. For no match, the `proc` `returns` the default empty `Option` type instance. When we use the `find()` function with the `Option[int]` result, we first have to call `isSome()` to check if valid data is available, and then call `get()` to retrieve the actual data.

Nim’s `Option` types are based on `objects`. The generic `Option[T]` type is an `object` with two fields, a boolean indicating the presence of data, and a field that can store the actual data. Nim uses also an optimization: When the data type is of `ref` or `pointer` type, then the `bool` field is not necessary, as the absence of data is equivalent to a data entity with `nil` value.

The overhead of `Option` types is not that big — a procedure which would return a 4-byte integer would return an `object` instead — the additional boolean field would increase the size of the `result` to 5 bytes, which is generally extended by the compiler to multiples of the word size, that is 8 byte total. So, in the worst-case scenario, we have a 100% size overhead. Moreover, the loss of performance due to the encapsulation of data in `Option` types should not be significant in most use cases.

The `options` module provides some more procedures for the handling of `Option` types, but this short introduction should be enough to get you started. You can find an alternative implementation of a Nim Option-Type at https://github.com/arnetheduck/nim-results.

References:

Serialization — storing data permanently on external storage

When you start writing larger programs, these may generate data that you might want to permanently store on external nonvolatile storage devices, such as SSDs or traditional hard disks of your computer. For textual data, this is very easy, as you only have to write and read a stream of unstructured bytes. However, when your program deals with `object` instances, container data types like sequences, or references, the process becomes more complicated. Writing the data is always easy — you can just convert all the fields of your `object` data type to `strings` and write them to a stream or a file. But the reading back part is much more difficult: You would have to read in the data as `strings`, and then process each `string` — maybe converting it to a `float` number — and then assign it to the matching field of an `object` instance.

When your data consists only of value `objects` and no references, then you may consider just writing that data in plain binary form to a file and reading it back. This strategy seems to be simple, and it is very fast, as no type-conversion steps are involved. But at the same time, it has some drawbacks: The stored data can not be checked with tools like a text editor, it can generally not be used from other programs, and when you should change the data types used in your program, you could not read back stored files anymore.

So we will explain how you can store Nim data types in a human-readable text format first. Two popular text formats, JSON and YAML, are often used. JSON is a simple format, which is easy to parse, but not well readable for humans. YAML is more complicated, but more flexible and is very good readable for humans. Other popular data formats are XML or TOML.

For Nim, we have already many modules available, which we can use for storing data in JSON or YAML format. The Nim standard library includes the `marshal` and the `json` module. Both store the data in JSON format, but the `marshal` module can not separate the distinct data fields into multiple lines, which seriously restrict the human readability. So we will describe and use the `json` module in this section, which is also easy to use, but has a larger set of functionality and can generate real human-readable text files by use of the `pretty()` function.

Other available external packages for data serialization are the nim-serialization module set from (https://github.com/status-im/nim-serialization) and the very powerful but complicated NimYaml implementation (https://nimyaml.org/). We may describe these packages in Part V of the book. An alternative to the `json` module of Nim’s standard library is the https://github.com/treeform/jsony package, which has the advantage, that is can handle default values and missing object fields. Both are useful when we extend our software and need to process old data files.

When we have to store and read back Nim data to nonvolatile storage media, we have some serious points to consider: First, we have to handle various data types like integers, `floats`, `strings`, `objects` — and even the container types like sequences. And we may have to support reference types and maybe also inherited types and containers filled with heterogeneous, subclassed reference `objects`. The `json` module supports all Nim data types, including containers and references, but not heterogeneous sequences.

For our first JSON example, let us assume that we have written a small tool that let the user create some geometrical shapes, and we want to store the shapes in a file and read it back. For that, we generally use an intermediate step, which converts the data to a `string`, and the `string` back to the data `object`. The `string` is then written to a file or stream, and read back. Let’s start with the `string` conversion. Storing that `string` and reading it back from the file will be explained subsequently.

import std/json

type
  Line = object
    x1, y1, x2, y2: float

  Circ = ref object of RootRef
    x0, y0: float
    radius: float

  Data = object
    lines: seq[Line]
    circs: seq[Circ]

var
  l1 = Line(x1: 0, y1: 0, x2: 5, y2: 10)
  c1 = Circ(x0: 7, y0: 3, radius: 20)
  d1, d2: Data

d1.lines.add(l1)
d1.circs.add(c1)
d1.lines.add(Line(x1: 3, y1: 2, x2: 7, y2: 9))
d1.circs.add(Circ(x0: 9, y0: 7, radius: 2))

let str1 = pretty(%* d1) # convert the content of variable d1 to a \`[.str]#string#`
echo str1 # let us see how the \`[.str]#strings#` looks
d2 = to(parseJson(str1), Data) # read the \`[.str]#string#` back into a data instance
let str2 = pretty(%* d2) # and verify that we got back the original content
echo str2

# assert d1 == d2 would fail
assert str1 == str2

When we run the program, we would get this output:

{
  "lines": [
    {
      "x1": 0.0,
      "y1": 0.0,
      "x2": 5.0,
      "y2": 10.0
    },
    {
      "x1": 3.0,
      "y1": 2.0,
      "x2": 7.0,
      "y2": 9.0
    }
  ],
  "circs": [
    {
      "x0": 7.0,
      "y0": 3.0,
      "radius": 20.0
    },
    {
      "x0": 9.0,
      "y0": 7.0,
      "radius": 2.0
    }
  ]
}
{
  "lines": [
    {
      "x1": 0.0,
      "y1": 0.0,
      "x2": 5.0,
      "y2": 10.0
    },
    {
      "x1": 3.0,
      "y1": 2.0,
      "x2": 7.0,
      "y2": 9.0
    }
  ],
  "circs": [
    {
      "x0": 7.0,
      "y0": 3.0,
      "radius": 20.0
    },
    {
      "x0": 9.0,
      "y0": 7.0,
      "radius": 2.0
    }
  ]
}

As you can see, we converted the instance `d1` of type `Data` to a `string`, and then we converted that `string` back to the variable `d2`, achieving matching content. We have intentionally made `Circ` a `ref` `object` to demonstrate that the conversion works for both value and reference `objects`. In the example program, we applied the %* `macro` to our data instance `d1` to get a `JsonNode`, and finally use the `pretty()` function to get a nice multi-line `string`. To fill the variable `d2` with the content stored in `str1`, we first have to apply `parseJson()` on the `string`, and then use `to()` to unmarshal the JSON node into the matching `object` type.

Now, let us investigate what happens when we try to use the `json` module with a container with heterogeneous `ref` `objects`. For that, we subclass the `Disc` type, creating a new `Arc` type:

import std/json
from std/math import PI

type
  Line = object
    x1, y1, x2, y2: float

  Circ = ref object of RootRef
    x0, y0: float
    radius: float

  Arc = ref object of Circ
    startAngle, endAngle: float

  Data = object
    lines: seq[Line]
    circs: seq[Circ]

var
  d1, d2: Data

d1.lines.add(Line(x1: 0, y1: 0, x2: 5, y2: 10))
d1.circs.add(Circ(x0: 7, y0: 3, radius: 20))
d1.lines.add(Line(x1: 3, y1: 2, x2: 7, y2: 9))
d1.circs.add(Arc(x0: 9, y0: 7, radius: 2, startAngle: 0, endAngle: PI))

echo d1.circs[1] of Arc, " ", Arc(d1.circs[1]).endAngle

let str1 = pretty(%* d1)
d2 = to(parseJson(str1), Data)
let str2 = pretty(%* d2)
echo str2
echo d2.circs[1] of Arc

The output of that program looks like this:

true 3.141592653589793
{
  "lines": [
    {
      "x1": 0.0,
      "y1": 0.0,
      "x2": 5.0,
      "y2": 10.0
    },
    {
      "x1": 3.0,
      "y1": 2.0,
      "x2": 7.0,
      "y2": 9.0
    }
  ],
  "circs": [
    {
      "x0": 7.0,
      "y0": 3.0,
      "radius": 20.0
    },
    {
      "x0": 9.0,
      "y0": 7.0,
      "radius": 2.0
    }
  ]
}
false

While our initial instance `d1` contains a run-time value of `Arc` type, and so we can access the `endAngle` field, we get `false` as result for the `of Arc` test for the `d2` instance. So run-time type information is lost.

When we have to store different data types in one container, then one solution is to use `object` variants, which should work with the `json` module. Another obvious possibility is to just copy the data into containers with the appropriate static type before storing them in an external medium, and copy them back when we read the data back from external storage. We will show an example of that now:

import std/json
from std/math import PI

type
  Line = ref object of RootRef
    x1, y1, x2, y2: float

  Circ = ref object of RootRef
    x0, y0: float
    radius: float

  Arc = ref object of Circ
    startAngle, endAngle: float

  Data = object
    elements: seq[RootRef]

  Storage = object
    lines: seq[Line]
    circs: seq[Circ]
    arcs: seq[Arc]

const
  DataFileName = "MyJsonTest.json"

var
  d1, d2: Data
  storage1, storage2: Storage
  outFile, inFile: File

d1.elements.add(Line(x1: 0, y1: 0, x2: 5, y2: 10))
d1.elements.add(Circ(x0: 7, y0: 3, radius: 20))
d1.elements.add(Line(x1: 3, y1: 2, x2: 7, y2: 9))
d1.elements.add(Arc(x0: 9, y0: 7, radius: 2, startAngle: 0, endAngle: PI))

for el in d1.elements:
  if el of Arc:
    storage1.arcs.add(Arc(el))
  elif el of Circ:
    storage1.circs.add(Circ(el))
  elif el of Line:
    storage1.lines.add(Line(el))
  else:
    assert(false)

let str1 = pretty(%* storage1)

if not open(outFile, DataFilename, fmWrite):
  echo "Could not open file for storing data"
  quit()
outFile.write(str1)
outFile.close

if not open(inFile, DataFilename, fmRead):
  echo "Could not open file for recovering data"
  quit()
let str2 = inFile.readAll()
inFile.close

assert str1 == str2

storage2 = to(parseJson(str2), Storage)

for el in storage2.lines:
  d2.elements.add(el)
for el in storage2.circs:
  d2.elements.add(el)
for el in storage2.arcs:
  d2.elements.add(el)

for el in d2.elements:
  if el of Arc:
    echo "found arc with endAngle: ", Arc(el).endAngle

For this example program, we use the object-oriented programming style and keep all the geometric `object` instances as references in a single sequence. Note that doing this is not always a good idea, as this OOP style with the use of references and dynamic run-time dispatch can be slower due to many small heap allocations for each `ref` `object` and due to the dynamic dispatch (`if` el `of` …​) overhead. Using multiple, homogeneous sequences with value types for each of our data types can be a better solution, and in that way, you have more control whenever you process the data, for drawing them on the screen or user interaction for example. Maybe you want to draw all the lines first? But there can be situations where we really need to have all the `objects` as references in a single container. A typical situation is, that we use an `RTree` for fast `object` location. RTrees are data structures that can store two-dimensional or multidimensional geometric `objects` and their rectangular bounding boxes in a tree-like fashion for fast `object` location. This may be used in a drawing program so that coordinates of a user’s mouse click can be fast matched to an `object`. For such a use case, we would prefer having all the `object` instances available in a single `RTree` instead of using one `RTree` data structure for each `object` shape.

Our program defines an additional `Storage` data type, which contains homogeneous sequences for each possible geometric shape. We then copy all our `ref` `objects` from the sequence of elements in the matching sequences of the storage `object` using the dynamic `of` type query to select the exact matching sequence.

After that, we can use the already known JSON functions to serialize the storage `object` into a `string`, store the `string` to a file, read it back, and deserialize the data again into a different variable of `Storage` data type. Finally, we use a simple `for` loop to copy the `ref` `objects` from the temporary storage `object` into a `Data` variable called `d2`. For storing the data in an external nonvolatile medium, we use the `File` data type and the related functions `open()`, `close()`, `write()`, and `read()`. Their use should be obvious: We pass an uninitialized variable of `File` data type, a file name, and a file mode to `open()`, use `write()` to write the whole `string`, and use `readAll()` to read the data back. When done with each file, we use `close()` to close the file. The `File` data type is part of the `io` module, which is again part of the `system` module, so we don’t have to import these modules. We could have used as an alternative also the `streams` module. You will learn some more details about the `File` data type and the `streams` module in later sections of the book.

We should mention that unfortunately, life is not always that easy, as sometimes we can not freely select the textual output format. Imagine you are creating a CAD (computer-aided design) tool that needs to be compatible with another existing tool. In this case, the textual storage format is already defined by the existing tool, and generally, that format does not match the JSON or YAML file format. Even when the format should be one of these, matching it exactly would be difficult. While writing out our own data in that foreign format is still not really difficult, as we can just write single matching `strings`, reading in the textual data is more complicated: Typically, we would read the input file line by line, and we would have to inspect and interpret each input `string`, maybe by the use of regular expressions or a custom parser. That generally includes handling missing or invalid data.

References:

Streams and files

In the previous section, we learned how we can store structured data like a sequence of `objects`, in a human-readable form to nonvolatile media by use of the `json` module.

Text in the form of a single `string`, or in the form of a container holding multiple `strings`, constitutes a kind of unstructured data that we can write directly to nonvolatile storage media and read back later. We can do the same with containers of basic, unstructured data types like integer or floating-point numbers, and with some restrictions, we can even write `tuples` or `objects` directly as raw bits and bytes to external storage and read them back later. Of course, in this manner, the stored data becomes a binary blob, which cannot be read or modified by other tools, such as a text editor. But that may not be intended or advantageous at all, perhaps we’re conducting scientific data processing with a single tool and simply want to temporarily store the data to continue processing it later.

Files

For storing unstructured data, Nim provides the `io` module with the `File` data type and related procedures, and the `streams` module with the `Stream` data type and related procedures. While a `File` in Nim is currently only a `pointer` to a C file, the `streams` module operates at a higher abstraction level. Although the Nim language does not directly support interfaces, the `Stream` data type of the `streams` module is some form of an interface, which is implemented by a `StringStream` and a `FileStream` data type. Internally, this interface concept is realized by storing a set of function `pointers` in the `Stream` instance.

When we have to store unstructured data like text, it is not always clear if we better should use `Files` or `Streams`. `Streams` may be the better choice when we (also) want to use a `string` as a data source like a file or when we need the `peek()` functions of the `streams` module to access data without advancing the position in the stream.

We will use the `File` data type of the `io` module first. As the `io` module is part of the `system` module, we do not have to import it before we can use it. The principle usage of files is that we call the function `open()` to open a file with the given name, call some procedures to write or read data, and finally `close()` the file. While Nim supports destructors, when we compile with --mm:arc or --mm:orc, the `io` module does not yet use them, so we should actually call `close()` to close the file.

Historically, a file is a one-dimensional data type, which is accessed in sequential order. Up to the end of the twenty century, it was not uncommon that large files were stored on magnetic tapes, which could be read or written only slowly in sequential order. Read or write operations could take place only at the actual position, and available functions like `f.setFilePos()` were very slow as they involved moving the tape. The introduction of hard disks and solid-state disks removed this restriction, and modern operating systems often buffer files in RAM for longer time periods, so that files may have actually similar performance as `arrays` or sequences. Interestingly, with modern CPU caches, ordinary RAM storage can appear similarly slow and sequential compared to the extremely fast cache, much like magnetic tapes in the past.

from std/os import fileExists
proc main =
  const FN = "NoImportantData"
  if os.fileExists(FN):
    echo "File exists, we may overwrite important data"
    quit()
  var f: File = open(FN, fmWrite)
  f.write("Hello ")
  f.writeLine("World!")
  f.writeLine(3.1415)
  f.close
main()

Running that program will create a text file with this content in the current working directory:

Hello World!
3.1415

At the start of our `main()` proc, we check if a file with that name already exists in the current working directory by using the function `os.fileExists()` to ensure that we do not overwrite important data.

Module `io` provides multiple overloaded `open()` procedures. Here we use a variant that returns a file and raises an exception in the unlikely case of an error. The necessary parameters are a file name and a file mode. As we want to create a new file, mode `fmWrite` is used.

Note that `fmWrite` would clear the content of an existing file, so we cannot use `fmWrite` to append data to an existing file. We would have to use `fmReadWriteExisting` or `fmAppend` to append data to an already existing file. As this `open()` `proc` can `raise` an exception, it may make sense to enclose it in a try/except block, or we could use an `open()` variant which returns a boolean value to indicate success instead. When the file is successfully opened, we can use procedures like `write()` or `writeLine()` to write text `strings` to the file. Both `procs` accept multiple arguments and apply the stringify operator `$` on them before writing the content. `WriteLine()` writes a '\n' after the last argument to start a new line. When done, we call `close()` to close the file. The operating system will close the file for us when our program terminates, so calling `close()` may not seem important. However, if we open many files without closing them, we may eventually encounter errors from the operating system about too many open files, causing our program to fail or terminate.

The `close()` `proc` receives the file not as a `var` parameter, so it cannot set the file value to `nil`. When the file has the value `nil`, then the `close()` call is ignored, but when we would call `close()` multiple times with a non-`nil` argument, we get a program crash. We may use the `try/finally` or the concepts construct to ensure that we really close the file when done.

The `io` module provides some procedures like `writeBuffer()`, `writeBytes()`, or `writeChars()`, which gives us as a return value the actual number of bytes written. This return value should generally match the requested number of bytes to write but can be smaller when the write operation fully or partially failed, e.g. because the storage medium had no capacity left.

When performance really matters, we should note that passing non-string arguments to `write()` or `writLine()` `procs` using their optional auto-stringify feature involves the allocation of new `strings` and incurs some performance cost. When we already have a `string` variable available in our program, it can be faster to first convert our data into that variable and then pass it to the `write()` or `writeLine()` `procs`.

Reading `strings` from a file works very similarly:

proc main =
  var f: File
  try:
    f = open("NoImportantData", fmRead)
    echo f.readLine
    echo f.readLine
  finally:
    if f != nil: # test for nil not really necessary, close() would ignore the call for f == nil
      f.close
main()

The `readLine()` procedure reads in a line of text. The `LF`, `CR`, or `CRLF` line end markers are not part of the returned text `string`. Of course, we may get an empty `string` with length zero back, when we read a line that immediately starts with `LF`, `CR`, or `CRLF`, or we may get back a `string` with no visible characters but only a few spaces or tabulator characters '\t' when a line contains only white space. When our `read()` operations have moved the actual file I/O position to the end of the file, and we try to read more content, an exception is raised.

The `io` module provides a `readLine()` procedure that returns a newly allocated `string`, and another one that takes an existing `string` as a `var` parameter. The latter should be a bit faster, as it can avoid the allocation of a new buffer when the passed `string` has already enough capacity.

The `io` module provides a function called `endOfFile()` with a boolean result, which we can use to check if the end-of-file position is already reached. The provided functions `readBuffer()`, `readBytes()`, or `readChars()` return the actual number of bytes read, which can be smaller than the requested value when the end of the file is reached earlier. Currently, `readChars()` checks if the passed `openArray[char]` has enough capacity for the request, but `readBytes()` does no check!

We can also use the `lines()` `iterator` to iterate over the lines of a text file or use the `readLines()` procedure to read the content line by line.

proc main =
  var f: File
  f = open("NoImportantData", fmRead)
  for str in f.lines: # iterator
    echo str
  f.setFilePos(0) # read again from start index 0
  var s: string
  while f.readLine(s): # proc
    echo s
  f.close
  var sq = readLines("NoImportantData", 2) # read lines to seq of strings
  echo sq
main()

As iterating over the complete file line by line moves the actual file position to the end of the file, we need to call `setFilePos()` to return to the start position. The `readLines()` procedure takes a filename and the number of lines to be read as parameters, and returns a `seq` of `strings`. When the file does not contain at least the number of requested lines, an `EOF` exception is raised. Another provided procedure is `readAll()`, which reads the entire file content into a returned `string` variable. For `readAll()` to work, the actual file position has to be the start of the file. In case of an error, an exception is raised.

We can also write and read binary data directly to a file, without converting it to (human-readable) `strings` first:

proc main =
  var f: File
  f = open("NoImportantData", fmWrite)
  var i: int = 123
  var x: float = 3.1415
  assert f.writeBuffer(addr(x), sizeof(x)) == sizeof(x)
  assert f.writeBuffer(addr(i), sizeof(i)) == sizeof(i)
  f.close
  f = open("NoImportantData", fmRead)
  assert f.readBuffer(addr(x), sizeof(x)) == sizeof(x)
  assert f.readBuffer(addr(i), sizeof(i)) == sizeof(i)
  f.close
  echo i, " ", x
main()

Of course, these are low-level, dangerous operations. While `writeBuffer()` should never crash our program, `readBuffer()` can do that easily when we specify wrong sizes or destination addresses, as that may overwrite other data unintentionally. So we would generally not use these procedures directly but write safer helper `procs`, when we really need or want this form of binary file access. A potential use case may be quickly storing big data sets with limited hardware resources. For example, storing a `float32` only requires 4 bytes on the storage medium, and file I/O is fast. However, the same number, when represented as human-readable digits, may require more than 8 bytes (1.234567E3), and the process of converting to a `string` and parsing it back can be time-consuming.

In the same way, we can use `writeBuffer()` and `readBuffer()` to store `tuples`, `objects`, `arrays`, or sequences of these directly in binary form:

type
  O = object
    x: float
    i: int
    b: bool

proc main =
  var s: seq[O]
  s.add(O(x: 3.1415, i: 12, b: true))
  var f: File
  f = open("NoImportantData", fmWrite)
  assert f.writeBuffer(addr(s[0]), sizeof(O) * s.len) == sizeof(O) * s.len
  f.close
  f = open("NoImportantData", fmRead)
  var s2 = newSeq[O](1)
  assert f.readBuffer(addr(s2[0]), sizeof(O) * s2.len) == sizeof(O) * s2.len
  f.close
  echo s2[0]
main()

The output should look like this:

(x: 3.1415, i: 12, b: true)

But of course, this is dangerous and fragile. We present this example because beginners often inquire about it and may want to try it at least once. Obviously, this can only work when the `tuples` or `objects` contain only plain data types; that is, no `strings`, no references, and certainly no other nested container types like sequences or tables. And reading back data may fail when we use a different OS or a different compiler version.

The `io` module provides the `File` variables `stdin`, `stdout`, and `stderr`, which are the standard input, output, and error streams. Sometimes we use `stdout.write()` instead of the common `