Assembly school: operating system development. Flight of the Hummingbird. What an OS written entirely in assembly language is capable of. Assembly and compiling

I recently decided to learn assembler, but I wasn’t interested in wasting lines of code. I thought that as I studied assembler, I would master some subject area. So my choice fell on writing a bootloader. The result of my findings is here on this blog.

I would like to say right away that I love theory combined with practice, so let’s start.

First I'll show you how to create a simple one MBR so that we can enjoy the result as soon as possible. As we get more complex with practical examples, I will give theoretical information.

First, let's make a bootloader for a USB flash drive!

Attention!!! Our first assembler program will work both for a flash drive and for other devices such as a Floppy disk or HDD. Subsequently, in order for all the examples to work correctly, I will provide a number of clarifications regarding the operation of the code on different devices.

We will write on Fasm, since it is considered the best compiler for writing loaders, which is MBR. The second reason for choosing Fasm is that it makes compiling files very easy. No directives command line and so on. nonsense that can completely discourage you from learning assembler and achieving your goals. So, at the initial stage we will need two programs and some unnecessary Minimum size flash drive. I dug up 1Gb (it formats quickly, and it’s not a pity, if anything). After our bootloader works, the flash drive will no longer function normally. My Windows 7 refuses to format the flash drive. I recommend using a utility to bring the flash drive back to life HP USB Disk Storage Format Tool ( HPUSBFW.EXE) or other utilities for formatting flash drives.

We install them and throw the corresponding shortcuts onto the desktop or wherever you like.

Preparation is complete, let's move on to action

Open Fasmw.exe and write the following there. We will sketch out the bare minimum of code to see the result. Later we will analyze what is scribbled here. I will give my comments briefly.

FASM code: ============= boot.asm ==============

org 7C00h ; Our program addresses are calculated taking into account this directive

use16 ; hexadecimal code is generated

cli ;disable interrupts for changing addresses in segment registers

mov ax, 0

mov sp, 7C00h

sti ;enable interrupts (after changing addresses)

mov ax, 0003h ;set the video mode for displaying a line on the screen

int 10h

mov ax, 1301h ;the actual output of the string is the function 13h int 10h (more details later)

mov bp, stroka ;address of the output string

mov dx, 0000h ;line and column in which the text is displayed

mov cx, 15 ;number of characters of output string

mov bx, 000eh ;00-video page number (better not to touch) 0e-character attributes (color, background)

int 10h

jmp $ ;tread water (loops the program at this point)

string db "Ok, MBR loaded!"

times 510 - ($ - $$) db 0 ;filling the space between the previous byte and the next with zeros

db 0x55 ,0xAA ;last two bytes

Compile this code (Ctrl+F9) in fasm"e and save the resulting binary file as boot.bin in some convenient place. Before writing our binary to a flash drive, a little theory.

When you plug a flash drive into the computer, it is absolutely not obvious to the BIOS system that you want to boot from the flash drive, so in the BIOS settings you need to select the device from which you want to boot. So we chose to boot from USB (you will have to figure out how to do this yourself , since the BIOS interface has different variations... you can google it BIOS settings for your motherboard. There is nothing complicated there, as a rule).

Now that the BIOS knows that you want to boot from a flash drive, it must make sure that sector zero on the flash drive is bootable. To do this, the BIOS scans last two bytes of sector zero and, if they are equal to 0x55 0xAA, then only then will it be loaded into RAM. Otherwise, the BIOS will simply bypass your flash drive. Having found these two magic bytes, it loads sector zero into RAM at address 0000:7С00h, and then forgets about the flash drive and transfers control to this address. Now all power over the computer belongs to your bootloader and it, acting from RAM, can load additional code from a flash drive. Now we will see how this very sector looks in the DMDE program.

1.Insert your flash drive into the computer and make sure that it does not contain the information you need.

2.Open the DMDE program. Read all further steps in the pictures:

After watching this comic, you will have the skill of loading your MBR onto a flash drive. And this is what the long-awaited result of our loader looks like:


By the way, if we talk about the minimum bootloader code, it might look like this:

Org 7C00h
jmp$
db 508 dup(0)
db 0x55.0xAA

Such a bootloader, having received control, simply hangs the computer, executing one meaningless jmp $ command in a loop. I call her treading water.

I posted a video on YouTube that might help you:

Finally, a few brief facts about the bootloader:

1. The bootloader, also known as bootloader, also known as MBR, has a size of 512 bytes. Historically,
that this condition must be met to support older media and devices.
2. The bootloader is always located in the zero sector of a flash drive, floppy disk, hard drive, from the point of view of the DMDE program or other hex editors that allow you to work with devices. To load a binary (our boot.bin) onto one of the listed devices, we do not need to think about their internal physical structure. The DMDE program simply knows how to read the sectors on these devices and displays them in LBA mode (simply numbers them from 0 to the last sector). You can read about LBA
3. The bootloader must always end with two bytes 0x55 0xAA.
4. The bootloader is always loaded into memory at address 0000:7С00h.
5. The operating system begins with the bootloader.


Original: AsmSchool: Make an operating system
Author: Mike Saunders
Published date: April 15, 2016
Translation: A. Panin
Translation date: April 16, 2016

Part 4: With the skills you've gained from reading the previous articles in this series, you can start developing your own operating system!

What is it for?

  • To understand how compilers work.
  • To understand CPU instructions.
  • To optimize your code for performance.

Over the course of several months, we went through a difficult path, which began with the development simple programs in assembly language for Linux and ended in the last article in the series with the development of self-contained code that runs on a personal computer without an operating system. Well, now we will try to collect all the information together and create a real operating system. Yes, we will follow in the footsteps of Linus Torvalds, but first we need to answer the following questions: “What is an operating system? Which of its functions will we have to recreate?”

In this article, we will focus only on the basic functions of the operating system: loading and executing programs. Complex operating systems perform many more functions, such as virtual memory management and processing network packets, but their correct implementation requires years of continuous work, so in this article we will consider only the basic functions present in any operating system. Last month we developed a small program that fit into the 512-byte sector of a floppy disk (its first sector), and now we will modify it a little to add the function of loading additional data from the disk.

Boot Loader Development

We could try to reduce the size of our operating system binary code as much as possible to fit it into the first 512-byte sector of the floppy disk, the one that is loaded by the BIOS, but in this case we would not be able to implement any interesting functions. Therefore, we will use these 512 bytes to house the binary code of a simple system boot loader, which will load the OS kernel binary code into RAM and execute it. (After this, we will develop the OS kernel itself, which will load the binary code of other programs from disk and also execute it, but we will talk about this a little later.)

You can download the source code for the examples discussed in this article at www.linuxvoice.com/code/lv015/asmschool.zip. And this is the code of our system boot loader from a file called boot.asm:

BITS 16 jmp short start ; Jump to label, skipping disc description nop ; Addition before the disk description %include "bpb.asm" start: mov ax, 07C0h ; Load address mov ds, ax ; Data segment mov ax, 9000h ; Stack preparation mov ss, ax mov sp, 0FFFFh ; The stack grows down! cld ; Setting the direction flag mov si, kern_filename call load_file jmp 2000h:0000h ; Transition to the OS kernel binary code loaded from the file kern_filename db "MYKERNELBIN" %include "disk.asm" times 510-($-$$) db 0 ; Padding the binary code with zeros up to 510 bytes dw 0AA55h ; Boot loader binary end marker buffer: ; Start of buffer for disk contents

In this code, the first CPU instruction is the jmp instruction, which is located after the BITS directive, which tells the NASM assembler that 16-bit mode is being used. As you probably remember from the previous article in the series, the execution of the 512-byte binary code loaded from the BIOS from disk starts from the very beginning, but we have to jump to a label to skip a special set of data. Obviously, last month we simply wrote the code to the beginning of the disk (using the dd utility) and left the rest of the disk space empty.

Now we will have to use a floppy disk with a suitable MS-DOS file system (FAT12), and in order to work correctly with this file system, we need to add a set of special data near the beginning of the sector. This set is called a BIOS Parameter Block (BPB) and contains data such as disk label, number of sectors, and so on. It should not interest us at this stage, since more than one series of articles could be devoted to such topics, which is why we have placed all the instructions and data associated with it in a separate source code file called bpb.asm.

Based on the above, this directive from our code is extremely important:

%include "bpb.asm"

This is a NASM directive that allows the contents of a specified source file to be included in the current source file during assembly. This way we can make our bootloader code as short and understandable as possible by placing all the details of the implementation of the BIOS parameter block in a separate file. The BIOS parameter block must be located three bytes after the start of the sector, and since the jmp instruction takes up only two bytes, we have to use the nop instruction (its name stands for "no operation" - this is an instruction that does nothing but waste CPU cycles ) to fill the remaining byte.

Working with the stack

Next we will have to use instructions similar to those discussed in the last article to prepare registers and the stack, as well as the cld (stands for "clear direction") instruction, which allows us to set the direction flag for certain instructions, such as the lodsb instruction, which, when executed, will increment the value in the SI register rather than decrement it.

After that, we put the address of the string into the SI register and call our load_file function. But think about it for a minute - we haven't developed this feature yet! Yes, this is true, but its implementation can be found in another source code file we included called disk.asm.

The FAT12 file system used on floppy disks that are formatted in MS-DOS is one of the simplest available file systems, but working with its content also requires a considerable amount of code. The load_file subroutine is about 200 lines long and will not be shown in this article, since we are considering the process of developing an operating system, not a driver for a specific file system, therefore, it is not very wise to waste space on the log pages in this way. In general, we included the disk.asm source code file almost before the end of the current source code file and can forget about it. (If you are still interested in the structure of the FAT12 file system, you can read the excellent overview at http://tinyurl.com/fat12spec, and then look at the disk.asm source code file - the code contained in it is well commented .)

In either case, the load_file routine loads the binary code from the file named in the SI register into segment 2000 at offset 0, after which we jump to the beginning of it for execution. And that's all - the operating system kernel is loaded and the system bootloader has completed its task!

You may have noticed that our code uses MYKERNELBIN instead of MYKERNEL.BIN as the operating system kernel file name, which fits well into the 8+3 naming scheme used on floppy disks in DOS. In fact, the FAT12 file system uses an internal representation of file names, and we save space by using a file name that is guaranteed not to require our load_file routine to implement a mechanism to look up the dot character and convert the file name to the internal representation of the file system.

After the line with the directive for connecting the disk.asm source code file, there are two lines designed to pad the binary code of the system boot loader with zeros up to 512 bytes and include the end mark of its binary code (this was discussed in the previous article). Finally, at the very end of the code is the "buffer" label, which is used by the load_file routine. Basically, the load_file routine needs free space in RAM to do some intermediate work while searching for a file on disk, and we have plenty of free space after loading the boot loader, so we place the buffer here.

To assemble the system boot loader, use the following command:

Nasm -f bin -o boot.bin boot.asm

Now we need to create a virtual floppy disk image in MS-DOS format and add our bootloader binary code to its first 512 bytes using the following commands:

Mkdosfs -C floppy.img 1440 dd conv=notrunc if=boot.bin of=floppy.img

At this point, the process of developing a system boot loader can be considered complete! We now have a bootable floppy disk image that allows us to load the operating system kernel binary code from a file called mykernel.bin and execute it. Next, a more interesting part of the work awaits us - the development of the operating system kernel itself.

Operating system kernel

We want our operating system kernel to perform many important tasks: display a greeting message, accept input from the user, determine whether the input is a supported command, and execute programs from disk when the user specifies their names. This is the operating system kernel code from the mykernel.asm file:

Mov ax, 2000h mov ds, ax mov es, ax loop: mov si, prompt call lib_print_string mov si, user_input call lib_input_string cmp byte , 0 je loop cmp word , "ls" je list_files mov ax, si mov cx, 32768 call lib_load_file jc load_fail call 32768 jmp loop load_fail: mov si, load_fail_msg call lib_print_string jmp loop list_files: mov si, file_list call lib_get_file_list call lib_print_string jmp loop prompt db 13, 10, "MyOS > ", 0 load_fail_msg db 13, 10, "Not found! ", 0 user_input times 256 db 0 file_list times 1024 db 0 %include "lib.asm"

Before looking at the code, you should pay attention to the last line with the directive to include the lib.asm source code file, which is also located in the asmschool.zip archive from our website. This is a library of useful routines for working with the screen, keyboard, strings and disks that you can also use - in this case we include this source code file at the very end of the main source code file of the operating system kernel in order to make the latter as compact and beautiful as possible . Refer to the lib.asm Library Routines section for additional information about all available subroutines.

In the first three lines of operating system kernel code, we fill the segment registers with data to point to segment 2000 into which the binary code was loaded. This is important for guaranteed correct operation instructions such as lodsb, which must read data from the current segment and not from any other. After this, we will not perform any additional operations on the segments; our operating system will work with 64 KB of RAM!

Further in the code there is a label corresponding to the beginning of the loop. First of all, we use one of the routines from the lib.asm library, namely lib_print_string, to print the greeting. Bytes 13 and 10 before the greeting line are escape characters. new line, thanks to which the greeting will not be displayed immediately after the output of any program, but always on a new line.

After this, we use another routine from the lib.asm library called lib_input_string, which takes the user's keyboard input and stores it in a buffer pointed to in the SI register. In our case, the buffer is declared near the end of the operating system kernel code as follows:

User_input times 256 db 0

This declaration allows you to create a buffer of 256 characters, filled with zeros - its length should be enough to store commands for a simple operating system like ours!

Next we perform user input validation. If the first byte of the user_input buffer is zero, then the user simply pressed the Enter key without entering any command; Don't forget that all strings end with null characters. So in this case we should just go to the beginning of the loop and print the greeting again. However, in case the user enters any command, we will have to check first to see if he entered the ls command. Until now, you could only observe comparisons of individual bytes in our assembly language programs, but do not forget that it is also possible to compare double-byte values ​​or machine words. In this code, we compare the first machine word from the user_input buffer with the machine word corresponding to the ls line and, if they are identical, move to the code block below. Within this block of code, we use another routine from lib.asm to get a comma-separated list of files on disk (which should be stored in the file_list buffer), print that list to the screen, and move back into the loop to process user input.

Execution of third party programs

If the user doesn't enter the ls command, we assume they entered the name of the program from disk, so it makes sense to try to load it. Our lib.asm library contains an implementation of the useful lib_load_file routine, which parses the FAT12 disk file system tables: it takes a pointer to the beginning of a line with the file name through the AX register, as well as an offset value for loading binary code from a program file through the CX register. We already use the SI register to store a pointer to the string containing user input, so we copy this pointer to the AX register, and then place the value 32768, which is used as an offset to load the binary code from the program file, into the CX register.

But why do we use this particular value as the offset to load binary code from a program file? Well, this is just one of the memory map options for our operating system. Because we are working in a single 64 KB segment and our kernel binary is loaded at offset 0, we have to use the first 32 KB of memory for kernel data and the remaining 32 KB for load program data. Thus, offset 32768 is the middle of our segment and allows us to provide sufficient RAM to both the operating system kernel and loaded programs.

The lib_load_file routine then performs a very important operation: if it cannot find a file with the given name on disk, or for some reason cannot read it from disk, it simply exits and sets a special carry flag. This is a CPU state flag that is set during the execution of some mathematical operations and should not interest us at the moment, but at the same time we can determine the presence of this flag to make quick decisions. If the lib_load_asm routine sets the carry flag, we use the jc (jump if carry) instruction to jump to a block of code that prints the error message and returns to the beginning of the user input loop.

In the same case, if the transfer flag is not set, we can conclude that the lib_load_asm subroutine has successfully loaded the binary code from the program file into RAM at address 32768. All we need in this case is to initiate the execution of the binary code loaded at this address , that is, start executing the user-specified program! And after the ret instruction is used in this program (to return to the calling code), we will simply need to return to the loop for processing user input. Thus we created an operating system: it consists of the simplest command parsing and program loading mechanisms, implemented in about 40 lines of assembly code, albeit with a lot of help from routines from the lib.asm library.

To assemble the operating system kernel code, use the following command:

Nasm -f bin -o mykernel.bin mykernel.asm

After this we will have to somehow add the mykernel.bin file to the floppy disk image file. If you are familiar with the trick of mounting disk images using loopback devices, you can access the contents of the disk image using floppy.img, but there is an easier way using GNU Mtools (www.gnu.org/software /mtools). This is a set of programs for working with floppy disks that use MS-DOS/FAT12 file systems, available from package repositories software all popular Linux distributions, so all you have to do is use apt-get, yum, pacman, or whatever utility you use to install software packages on your distribution.

After installing the appropriate software package, you will have to run the following command to add the mykernel.bin file to the floppy.img disk image file:

Mcopy -i floppy.img mykernel.bin::/

Note the funny symbols at the end of the command: colon, colon, and slash. Now we're almost ready to launch our operating system, but what's the point if there are no apps for it? Let's correct this misunderstanding by developing an extremely simple application. Yes, now you will be developing an application for your own operating system - just imagine how much your authority will rise among the ranks of geeks. Save the following code in a file named test.asm:

Org 32768 mov ah, 0Eh mov al, "X" int 10h ret

This code simply uses the BIOS function to print an "X" on the screen, and then returns control to the code that called it - in our case, that code is the operating system code. The org line that begins the application's source code is not a CPU instruction, but a NASM assembler directive telling it that the binary code will be loaded into RAM at offset 32768, and therefore all offsets must be recalculated to account for this.

This code also needs to be assembled, and the resulting binary file needs to be added to the floppy disk image file:

Nasm -f bin -o test.bin test.asm mcopy -i floppy.img test.bin::/

Now take a deep breath, get ready to contemplate the unparalleled results of your own work, and boot the floppy disk image using a PC emulator such as Qemu or VirtualBox. For example, the following command can be used for this purpose:

Qemu-system-i386 -fda floppy.img

Voila: the system bootloader boot.img, which we integrated into the first sector of the disk image, loads the operating system kernel mykernel.bin, which displays a welcome message. Enter the ls command to get the names of two files located on the disk (mykernel.bin and test.bin), and then enter the name of the last file to execute it and display an X on the screen.

It's cool, isn't it? Now you can start finalizing command shell your operating system, add implementations of new commands, and also add additional program files to disk. If you want to run this operating system on a real PC, you should refer to the section “Running the bootloader on a real hardware platform” from the previous article in the series - you will need exactly the same commands. Next month, we'll make our operating system more powerful by allowing downloadable programs to use system functions, introducing a code-sharing concept to reduce code duplication. Much of the work is still ahead.

lib.asm library routines

As mentioned earlier, the lib.asm library provides a large set of useful routines for use within your kernels operating systems and individual programs. Some of them use instructions and concepts that have not yet been touched upon in articles in this series, others (such as disk routines) are closely related to the design of file systems, but if you consider yourself competent in these matters, you can read them yourself with their implementations and understand the principle of operation. However, it is more important to figure out how to call them from your own code:

  • lib_print_string - accepts a pointer to a null-terminated string via the SI register and prints the string to the screen.
  • lib_input_string - accepts a pointer to a buffer via the SI register and fills this buffer with characters entered by the user using the keyboard. After the user presses the Enter key, the line in the buffer is null-terminated and control returns to the calling program code.
  • lib_move_cursor - moves the cursor on the screen to a position with coordinates transmitted through the DH (row number) and DL (column number) registers.
  • lib_get_cursor_pos - this subroutine should be called to obtain the current row and column numbers using the DH and DL registers, respectively.
  • lib_string_uppercase - takes a pointer to the beginning of a null-terminated string using the AX register and converts the characters of the string to uppercase.
  • lib_string_length - takes a pointer to the beginning of a null-terminated string via the AX register and returns its length via the AX register.
  • lib_string_compare - accepts pointers to the beginning of two null-terminated strings using the SI and DI registers and compares these strings. Sets the carry flag if the lines are identical (to use a jump instruction depending on the jc carry flag) or clears this flag if the lines are different (to use the jnc instruction).
  • lib_get_file_list - Takes a pointer to the start of a buffer via the SI register and puts into that buffer a null-terminated string containing a comma-separated list of file names from disk.
  • lib_load_file - takes a pointer to the beginning of a line containing the file name using the AX register and loads the contents of the file at the offset passed through the CX register. Returns the number of bytes copied into memory (that is, the file size) using the BX register, or sets the carry flag if a file with the given name is not found.

I’ll say right away, don’t close the article with the thoughts “Damn, another Popov.” He just has a polished Ubuntu, while I have everything from scratch, including the kernel and applications. So, continuation under the cut.

OS group: Here.
First I'll give you one screenshot.

There are no more of them, and now let’s talk in more detail about why I am writing it.

It was a warm April evening, Thursday. Since childhood, I dreamed of writing an OS, when I suddenly thought: “Now I know the advantages and asm, why not make my dream come true?” I googled sites on this topic and found an article from Habr: “How to start and not quit writing an OS.” Thanks to its author for the link to the OSDev Wiki below. I went there and started working. There was all the data about the minimal OS in one article. I started building cross-gcc and binutils, and then rewrote everything from there. You should have seen my joy when I saw the inscription “Hello, kernel World!” I jumped right out of my chair and realized that I would not give up. I wrote "console" (in quotes; I didn't have access to a keyboard), but then decided to write a window system. In the end it worked, but I did not have access to the keyboard. And then I decided to come up with a name based on the X Window System. I googled Y Window System - it exists. As a result, I named Z Window System 0.1, included in OS365 pre-alpha 0.1. And yes, no one saw her except myself. Then I figured out how to implement keyboard support. Screenshot of the very first version, when there was nothing yet, not even a window system:

The text cursor didn't even move, as you can see. Then I wrote a couple simple applications based on Z. And here is the release 1.0.0 alpha. There were a lot of things there, even system menus. A file manager and the calculator just didn't work.

I was directly terrorized by a friend who cares only about beauty (Mitrofan, sorry). He said: “Wash down the VBE mode 1024*768*32, wash it down, wash it down! Well, let’s drink it up!” Well, I was already tired of listening to him and still cut him down. About the implementation below.

I made everything with my bootloader, namely GRUB. With its help, you can set the graphical mode without complications by adding a few magic lines to the Multiboot header.

Set ALIGN, 1<<0 .set MEMINFO, 1<<1 .set GRAPH, 1<<2 .set FLAGS, ALIGN | MEMINFO | GRAPH .set MAGIC, 0x1BADB002 .set CHECKSUM, -(MAGIC + FLAGS) .align 4 .long MAGIC .long FLAGS .long CHECKSUM .long 0, 0, 0, 0, 0 .long 0 # 0 = set graphics mode .long 1024, 768, 32 # Width, height, depth
And then from the Multiboot information structure I take the framebuffer address and screen resolution and write pixels there. VESA did everything very confusingly - RGB colors must be entered in reverse order (not R G B, but B G R). For several days I didn’t understand why the pixels weren’t displayed!? In the end, I realized that I forgot to change the values ​​of 16 color constants from 0...15 to their RGB equivalents. As a result, I released it and at the same time cut down the gradient background. Then I made a console, 2 applications and released 1.2. Oh yes, I almost forgot - you can download the OS at

Assembler

Assembler(from the English assemble - assemble) - a compiler from assembly language into machine language commands.
There is an assembler for each processor architecture and for each OS or OS family. There are also so-called “cross-assemblers” that allow you to assemble programs for another target architecture or another OS on machines with one architecture (or in the environment of one OS), and obtain executable code in a format suitable for execution on the target architecture or in the target environment OS.

x86 architecture

Assemblers for DOS

The most famous assemblers for the DOS operating system were Borland Turbo Assembler (TASM) and Microsoft Macro Assembler (MASM). The simple assembler A86 was also popular at one time.
Initially, they only supported 16-bit instructions (until the advent of the Intel 80386 processor). Later versions of TASM and MASM support both 32-bit instructions, as well as all instructions introduced in more modern processors, and architecture-specific instruction systems (such as, for example, MMX, SSE, 3DNow!, etc.) .

Microsoft Windows

With the advent of the Microsoft Windows operating system, a TASM extension called TASM32 appeared, which made it possible to create programs to run in the Windows environment. The latest known version of Tasm is 5.3, which supports MMX instructions, and is currently included in Turbo C++ Explorer. But officially the development of the program has completely stopped.
Microsoft maintains a product called Microsoft Macro Assembler. It continues to develop to this day, with the latest versions included in the DDKs. But the version of the program aimed at creating programs for DOS is not being developed. Additionally, Stephen Hutchesson created a MASM programming package called "MASM32".

GNU and GNU/Linux

The GNU operating system includes the gcc compiler, which includes the gas assembler (GNU Assembler), which uses AT&T syntax, unlike most other popular assemblers, which use Intel syntax.

Portable assemblers

There is also an open source assembler project, versions of which are available for various operating systems, and which allows you to obtain object files for these systems. This assembler is called NASM (Netwide Assembler).
YASM is a rewritten version of NASM under the BSD license (with some exceptions).
FASM (Flat Assembler) is a young assembler under a BSD license modified to prohibit relicensing (including under the GNU GPL). There are versions for KolibriOS, GNU/Linux, MS-DOS and Microsoft Windows, uses Intel syntax and supports AMD64 instructions.

RISC architectures


MCS-51
AVR
At the moment there are 2 compilers produced by Atmel (AVRStudio 3 and AVRStudio4). The second version is an attempt to correct the not very successful first one. The assembler is also included in WinAVR.
ARM
AVR32
MSP430
PowerPC

Assembly and compiling

The process of translating a program in assembly language into object code is usually called assembly. Unlike compiling, assembly is a more or less unambiguous and reversible process. In assembly language, each mnemonic corresponds to one machine instruction, while in high-level programming languages, each expression can hide a large number of different instructions. In principle, this division is quite arbitrary, so sometimes the translation of assembly programs is also called compilation.

Assembly language

Assembly language- a type of low-level programming language, which is a format for recording machine commands that is convenient for human perception. Often, for the sake of brevity, it is simply called assembler, which is not true.

Assembly language commands correspond one to one to processor commands and, in fact, represent a convenient symbolic form of recording (mnemonic code) of commands and their arguments. Assembly language also provides basic software abstractions: linking program parts and data through labels with symbolic names (during assembly, an address is calculated for each label, after which each occurrence of the label is replaced by this address) and directives.
Assembly directives allow you to include blocks of data (described explicitly or read from a file) into a program; repeat a certain fragment a specified number of times; compile the fragment according to the condition; set the fragment execution address different from the memory location address[specify!]; change label values ​​during compilation; use macro definitions with parameters, etc.
Each processor model, in principle, has its own set of instructions and a corresponding assembly language (or dialect).

Advantages and disadvantages

Advantages of assembly language

Minimal redundant code, that is, the use of fewer instructions and memory accesses, allows for increased speed and reduced program size.
Ensuring full compatibility and maximum use of the capabilities of the desired platform: the use of special instructions and technical features of this platform.
When programming in assembly language, special capabilities become available: direct access to hardware, input/output ports and special processor registers, as well as the ability to write self-modifying code (that is, metaprogramming, without the need for a software interpreter).
The latest security technologies introduced into operating systems do not allow the creation of self-modifying code, since they exclude the simultaneous possibility of executing instructions and writing in the same memory area (W^X technology in BSD systems, DEP in Windows).

Disadvantages of Assembly Language

Large amounts of code and a large number of additional small tasks, which leads to the fact that the code becomes very difficult to read and understand, and therefore debugging and modification of the program becomes more difficult, as well as the difficulty of implementing programming paradigms and any other conventions. which leads to the complexity of joint development.
Fewer number of available libraries, their low compatibility with each other.
Not portable to other platforms (except binary compatible ones).

Application

Directly follows from the advantages and disadvantages.
Since large programs in assembly language are extremely inconvenient to write, they are written in high-level languages. In assembler, small fragments or modules are written, for which the following are critical:
performance (drivers);
code size (boot sectors, software for microcontrollers and processors with limited resources, viruses, software protection);
special capabilities: working directly with hardware or machine code, that is, operating system loaders, drivers, viruses, security systems.

Linking assembly code to other languages

Since only fragments of a program are most often written in assembly language, they need to be linked to other parts in other languages. This is achieved in 2 main ways:
At compilation stage— insertion of assembler fragments (inline assembler) into a program using special language directives, including writing procedures in assembly language. The method is convenient for simple data transformations, but full-fledged assembly code with data and subroutines, including subroutines with many inputs and outputs that are not supported by high-level languages, cannot be created using it.
At the layout stage, or separate compilation. For the assembled modules to interact, it is enough that the connecting functions support the required calling conventions and data types. Individual modules can be written in any language, including assembly language.

Syntax

There is no generally accepted standard for the syntax of assembly languages. However, there are standards that most assembly language developers adhere to. The main such standards are Intel syntax and AT&T syntax.

Instructions

The general format for recording instructions is the same for both standards:

[label:] opcode [operands] [;comment]

where opcode is the direct mnemonic of instructions to the processor. Prefixes can be added to it (repetitions, changes in addressing type, etc.).
The operands can be constants, register names, addresses in RAM, etc. The differences between the Intel and AT&T standards relate mainly to the order in which the operands are listed and their syntax for different addressing methods.
The mnemonics used are usually the same for all processors of the same architecture or family of architectures (among the widely known are mnemonics for Motorola, ARM, x86 processors and controllers). They are described in the processor specifications. Possible exceptions:
If the assembler uses cross-platform AT&T syntax (original mnemonics are converted to AT&T syntax)
If initially there were two standards for recording mnemonics (the command system was inherited from a processor from another manufacturer).
For example, the Zilog Z80 processor inherited the Intel i8080 instruction system, expanded it and changed the mnemonics (and register designations) in its own way. For example, I changed Intel's mov to ld. Motorola Fireball processors inherited the Z80 instruction system, cutting it down somewhat. At the same time, Motorola has officially returned to Intel mnemonics. And at the moment, half of the assemblers for Fireball work with Intel mnemonics, and half with Zilog mnemonics.

Directives

In addition to instructions, a program may contain directives: commands that are not translated directly into machine instructions, but control the operation of the compiler. Their set and syntax vary significantly and depend not on the hardware platform, but on the compiler used (generating dialects of languages ​​within the same family of architectures). As a “gentleman’s set” of directives, we can highlight:
definition of data (constants and variables)
managing program organization in memory and output file parameters
setting the compiler operating mode
all kinds of abstractions (i.e. elements of high-level languages) - from the design of procedures and functions (to simplify the implementation of the procedural programming paradigm) to conditional constructs and loops (for the structured programming paradigm)
macros

Example program

An example of a Hello world program for MS-DOS for x86 architecture in the TASM dialect:

.MODEL TINY CODE SEGMENT ASSUME CS:CODE, DS:CODE ORG 100h START: mov ah,9 mov dx,OFFSET Msg int 21h int 20h Msg DB "Hello World",13,10,"$" CODE ENDS END START

Origins and criticism of the term "assembly language"

This type of language gets its name from the name of the translator (compiler) from these languages ​​- assembler (English assembler). The name of the latter is due to the fact that on the first computers there were no higher-level languages, and the only alternative to creating programs using assembler was programming directly in codes.
Assembly language in Russian is often called "assembler" (and something related to it - "assembler"), which, according to the English translation of the word, is incorrect, but fits into the rules of the Russian language. However, the assembler itself (the program) is also called simply “assembler” and not “assembly language compiler”, etc.
The use of the term "assembly language" may also lead to the misconception that there is a single low-level language, or at least a standard for such languages. When naming the language in which a specific program is written, it is advisable to clarify what architecture it is intended for and what dialect of the language it is written in.

Today in our cabinet of curiosities there is a curious example - an operating system written in pure assembler. Together with drivers, graphical shell, dozens of pre-installed programs and games, it takes up less than one and a half megabytes. Meet the exceptionally fast and predominantly Russian OS “Hummingbird”.

The development of "Hummingbird" proceeded quite quickly until 2009. The bird learned to fly on different hardware, minimally requiring the first Pentium and eight megabytes of RAM. The minimum system requirements for Hummingbird are:

  • CPU: Pentium, AMD 5x86 or Cyrix 5x86 without MMX with a frequency of 100 MHz;
  • RAM: 8 MB;
  • video card: VESA-compatible with support for VGA mode (640 × 480 × 16).

Modern “Hummingbird” is a regularly updated “nightly build” of the latest official version, released at the end of 2009. We tested build 0.7.7.0+ dated August 20, 2017.

WARNING

In the default settings, KolibriOS does not have access to disks that are visible through the BIOS. Think carefully and make a backup before changing this setting.

The changes in nightly builds, although small, have accumulated quite a lot over the years. The updated "Hummingbird" can write to FAT16–32 / ext2 - ext4 partitions and supports other popular file systems (NTFS, XFS, ISO-9660) in read mode. It added support for USB and network cards, and added a TCP/IP stack and audio codecs. In general, you can already do something in it, and not just look once at an ultra-light operating system with a GUI and be impressed by the launch speed.



Like previous versions, the latest “Hummingbird” is written in flat assembler (FASM) and occupies one floppy disk - 1.44 MB. Thanks to this, it can be placed entirely in some specialized memory. For example, craftsmen wrote KolibriOS directly into Flash BIOS. During operation, it can be entirely located in the cache of some processors. Just imagine: the entire operating system, along with programs and drivers, is cached!

INFO

When visiting the site kolibrios.org, the browser may warn you about the danger. The reason, apparently, is the assembler programs in the distribution. VirusTotal now defines the site as completely safe.

"Hummingbird" can be easily loaded from a floppy disk, hard drive, flash drive, Live CD or in a virtual machine. To emulate, just specify the OS type “other”, allocate one processor core and some RAM to it. It is not necessary to connect the drive, and if you have a router with DHCP, “Hummingbird” will instantly connect to the Internet and local network. Immediately upon downloading, you will see a corresponding notification.


One problem is that the HTTPS protocol is not supported by the browser built into Kolibri. Therefore, it was not possible to look at the site in it, just like opening the pages of Google, Yandex, Wikipedia, Sberbank... in fact, no usual address. Everyone has long switched to a secure protocol. The only site with old-school pure HTTP that I came across was the “Russian Government Portal”, but it didn’t look the best in a text browser either.



The appearance settings in Hummingbird have improved over the years, but are still far from ideal. A list of supported video modes is displayed on the Hummingbird loading screen when you press the a key.



The list of available options is small, and the required resolution may not be there. If you have a video card with an AMD (ATI) GPU, then you can immediately add custom settings. To do this, you need to pass the -m parameter to the ATIKMS loader x x , For example:

/RD/1/DRIVERS/ATIKMS -m1280x800x60 -1

Here /RD/1/DRIVERS/ATIKMS is the path to the bootloader (RD - RAM Disk).

When the system is running, the selected video mode can be viewed with the vmode command and (theoretically) switched manually. If “Hummingbird” is running in a virtual machine, then this window will remain empty, but with a clean boot, Intel video drivers can be added from i915 to Skylake inclusive.

Surprisingly, KolibriOS can accommodate a ton of games. Among them there are logic and arcade games, tag, snake, tanks (no, not WoT) - a whole “Game Center”! Even Doom and Quake were ported to Kolibri.



Another important thing was the FB2READ reader. It works correctly with Cyrillic and has text display settings.



I recommend storing all user files on a flash drive, but it must be connected via a USB 2.0 port. Our USB 3.0 flash drive (in a USB 2.0 port) with a capacity of 16 GB with the NTFS file system was identified immediately. If you need to write files, then you should connect a flash drive with a FAT32 partition.



The Kolibri distribution kit includes three file managers, utilities for viewing images and documents, audio and video players and other user applications. However, its main focus is on assembly language development.



The built-in text editor has ASM syntax highlighting and even allows you to immediately launch typed programs.



Among the development tools there is the Oberon-07/11 compiler for i386 Windows, Linux and KolibriOS, as well as low-level emulators: E80 - ZX Spectrum emulator, FCE Ultra - one of the best NES emulators, DOSBox v.0.74 and others. All of them were specially ported to Kolibri.

If you leave KolibriOS for a few minutes, the screensaver will start. Lines of code will appear on the screen, in which you can see a reference to MenuetOS.

Continuation is available only to members

Option 1. Join the “site” community to read all materials on the site

Membership in the community within the specified period will give you access to ALL Hacker materials, increase your personal cumulative discount and allow you to accumulate a professional Xakep Score rating!


Top