2.5 – Languages and Integrated Development Environments (IDE’s)

In this section:

2.5.1 – Languages

Fact: Computers only understand binary. That is their language. Everything else is meaningless to them.

It might seem odd, then, that there are hundreds of programming languages available today and none of them look anything like binary. How is this possible? Why do they exist?

Computer programming is a problem that we have successfully abstracted over the years. In other words, we have hidden the underlying complexity of having to write using only binary. Don’t be fooled – originally, when computers were first invented, someone absolutely had to write code in pure binary to get them to work, but since then we have developed something magical – translators.

Translators allow us to write computer code in a structured form, which looks something like a mix of logic and English and then the translator converts this into binary for us so it will be understood and executable on a computer.

To summarise – even the most modern and complex of computers can only execute binary instructions. We use a programming language (C#, VB, Python, Java) to write code in a form we understand and can quickly read, then use a translator to convert this code into binary on our behalf. Simple!

Low level languages

A low level language is a language which is either binary, or very close to binary instructions. We call it “low level” because it is said to be “close to the hardware” meaning it is exactly, or very nearly a language which the processor itself can understand.

Machine Code

Obviously, the lowest level language is binary itself. When we use binary to form program instructions, this is called machine code. Machine code is:

  • Pure binary instructions and data
  • The only form of code a CPU can understand and execute.
  • Directly executable by the CPU without translation.

Code written in machine code is extremely efficient but also hideously difficult to write and understand. Writing a program in machine code today would be one of the most pointless exercises you could undertake – it would be an extremely slow and painful process and you would make countless errors along the way.

Good enough for Bill Gates…

The computer in the picture above is a MITS Altair 8800, built around 1975 ish and was one of the first computers designed as a home computer “kit.” Enthusiasts would build them themselves and then marvel at the sheer wonder that lay in front of them. Maybe.

In reality, once you’d got your Altair working you were faced with a daunting prospect – programming it. Look at the front panel and you will notice rows of switches and lights. These switches relate directly to binary bits being on or off. To program the Altair, there was no keyboard, no screen – you had to click the switches into the positions that represented the binary for a single instruction or piece of data, then flick another switch to commit that instruction to memory. Repeat the process several hundred times and you had a program. This was pure machine code programming. I can imagine lots of Altairs now have dents in the top from various programmers heads.

Assembly Language

Clearly, something needed to change to make it easier for programmers to create code. The first stage was something called Assembly Language. Assembly language takes each CPU instruction and gives it a 3 or 4 letter mnemonic. A mnemonic is a complicated way of saying “abbreviation.” Some assembly language instructions are very obvious – ADD, for example, does exactly what you’d expect.

Assembly language makes heavy use of hexadecimal numbers. You will (or should) recall from Unit 1 that hexadecimal is used solely to make life easier for us as humans, the computer does not use nor understand it. Hexadecimal reduces each set of 4 binary bits down to a single digit 0-F. This makes it far less likely that we will make mistakes when writing out numbers, especially larger numbers.

A typical assembly language instruction consists of two parts – an Operator and an Operand. Operators are the instructions, operands are data or addresses.

In the example above, the operator is the instruction “Compare” which would compare the value in the Accumulator with the given operand – in this case the hexadecimal number FD.

Assembly language is far, far easier to write and understand than machine code. Code written in assembly language is equally efficient as that written in machine code. The only difference is that assembly is easier to read!

There’s still a disadvantage to assembly language – although it is easier to read, that doesn’t mean that it is easy to write programs in assembly. CPU instructions are incredibly simplistic and doing even basic things such as writing a loop or an IF statement requires many, many more instructions that would be used in a high level language.

An example of X64 Assembly Language

Currently, there is very little need for any programmer to write programs using assembly language. Certain languages, especially C, with good compilers are capable of producing code which executes almost as quickly as pure assembly/machine code. It would perhaps be necessary to write using assembly if a brand new piece of hardware was being designed or absolute control was required for a piece of code – possibly in terms of timing and you need to know exactly how many CPU cycles it would take to execute a certain block of code. Other than this, 99% of software is written in a high level language and translated into machine code.

One final, very important note. Assembly language is not understood by the CPU! It still requires translation using a special program called an assembler which converts the assembly into machine code.

Please do not make the mistake that many students make in stating that high level languages are converted into Assembly, they are not. Assembly is a very niche language, which you should compartmentalise as “it’s own thing.”

To summarise:

  • Machine code is the lowest level language.
  • Machine code is the only language the CPU understands.
  • All languages above machine code must be translated (converted) into machine code.
  • Assembly language uses short mnemonics to refer to each CPU instruction.
  • Data or addresses in assembly are written in hexadecimal.
  • Assembly makes it much easier to make low level/machine code programs.
  • It is less prone to errors and mistakes and can be debugged much easier than machine code.
  • It must be assembled using an assembler in order for it to be run by the CPU.

High level languages

A high level language is one where program code is written in structured English. The ultimate high level language would be one where the programmer simply describes what they are trying to achieve – until recently this was an age old pipe dream, but the advent of advanced AI’s like ChatGPT have bought this one step closer to reality. It is now possible to ask ChatGPT to write you program code and it will do a pretty decent job. It isn’t perfect – it’s based on code freely shared on the internet, it cannot handle more complex problems and sometimes the code produced isn’t great or just doesn’t work but it is absolutely a sign of the future. I’m fairly sure that you will be the generation that finally gets to use “programs that generate programs” which was always the vision of the future back in the 1980’s!

Back to the present day and your GCSE, a high level language is one which:

  • Is written using structured English.
  • Requires translation in order to be executed.
  • Uses key words such as IF… WHILE… PROCEDURE etc.
  • Has a rigidly and logically defined set of grammar rules called “syntax.”

Examples of high level languages are numerous (literally hundreds) and you’ve probably heard of or used some of the more popular ones:

  • C#
  • Python
  • Java
  • Visual Basic
  • C++

High level languages allow us to write code in a form that usually makes sense to most people, in other words it is accessible. High level languages have the following advantages:

  • Code is easier to read.
  • Code is easier to understand.
  • Code is more maintainable than low level code.
  • Code may be cross platform compatible – meaning you write code in one language, once, but it may run on many different types of system or CPU.
A program written in a high level language (C#) which implements the Caeser Cipher

There are always some downsides:

  • High level languages must be translated – either through compilation or interpretation.
  • High level languages can generate large amounts of machine code.
  • High level languages often rely on libraries or “run time environments” that may not be available or up to date on an end user system.

Translation – Compilers and Interpreters

We have repeatedly referred to the word “translation” when talking about assembly language and high level languages. Translation means converting code into pure binary (machine code). It is worth repeating again – no code of any sort can be run on a computer unless it is converted to machine code.

When translating code from a high level language into machine code, we have two choices – compilation or interpretation. Both options achieve the same objective of converting high level languages into machine code, however they approach the task in totally different ways. To further muddy the waters, some languages implement both forms of translation! Each option has its own particular strengths and weaknesses.

Compilers

Compilation is a process which takes the entire code for a program and coverts it all, at once, into a single binary executable file.

There are some distinct advantages to compiling code:

  • Compilation only needs to take place once – when the binary file is generated, it can then be run without further translation.
  • Compiled code tends to run faster as there are no overheads when running compiled code – it is ready to run.
  • Once compiled, there is no need for any extra programs, translation, interpretation etc – the file will run as a stand alone program.

Compilation tends to be used more with lower level languages and, once compiled, the resulting executable file can only run on the intended target system. This leads us nicely onto the disadvantages:

  • Compilers target a specific operating system and type of CPU. If you wish your code to run on a different system (say a Mac instead of a Windows machine) you would need to re-compile the code again, telling the compiler to target this new system. This means that compiled code is not cross-platform compatible and results in a fair bit of extra work to get a program working across multiple systems.

Cross-platform compatibility is a problem for programmers. In an ideal world, you would write one single program and it would run on any system regardless of CPU and operating system. Whilst this is possible in many cases, complex programs tend to rely on CPU or OS specific optimisations to help them run more quickly or cope with large data sets, for example. This leads programmers to make two or three different versions of the same program, targeted at different operating systems and therefore dramatically increasing the amount of work they have to do.

In the case of popular applications, there may be little choice. Photoshop would be a good example – Adobe have to maintain Mac, PC and IOS versions of their software. They do not make versions for Linux even though this would be more than possible. This is a business decision, there are currently not enough Linux users to justify the cost of making a third or fourth code base to target this operating system.

In other situations, developers target one system exclusively and do not produce any versions for other operating systems. Examples would include games that are made exclusively for one console and no other system, or Microsoft Windows which only works on Intel/AMD based PC’s.

The solution to this cross platform problem is usually to use an interpreted language.

Interpreters

Lots of modern languages are interpreted at run time – Python is one very popular example. There are at least three different methods of interpretation that are commonly used by modern programming languages, but all are based on the same core idea:

Interpreters translate code at run time and convert the program code as it executes (line by line) into machine code.

The difference between compilation and interpretation.

There are many advantages to interpreted languages:

  • Code may be written once and run on many platforms – it is usually cross platform. This is because each platform will have its own, specific interpreter which will deal with machine specific translations. Javascript is probably the most cross platform code of all – it runs in any modern web browser on any platform which can run a browser!
  • Interpreters can be paused or interrupted making them ideal for debugging. Interpreters allow line by line execution and inspection of the code during run time in order to monitor a program and find errors.
  • Code written in interpreted languages usually results in smaller file sizes. By definition, it is not compiled and therefore will not come with all the resources necessary for the program to be executed, making interpreted code much smaller and easier to distribute.

Disadvantages of interpretation are:

  • Overhead! There is a certain amount of processing power and time required to run an interpreted program each and every time it is executed.
  • An interpreter must be written for each target platform.
  • An interpreter must be installed on the target platform.

The line between compiled languages and interpreted languages is extremely blurry these days. Languages like Java and .NET languages (such as C#) do actually feature some level of compilation. Java, for example, uses an imaginary “virtual machine.” When you write Java code, it is first compiled into what is known as Byte Code. This is an intermediary language – it is not high level, but it is also not machine code. This compiled Byte Code can then be fed into a Java Virtual Machine which then interprets that code and executes it. This intermediary byte code / compilation speeds up the execution of interpreted code as some of the work is already complete before execution.

Fortunately, you don’t need to know this level of detail for your GCSE, but you will if you go on to do A-Level!

2.5.2 – The Integrated Development Environment (IDE)

Visual Studio is one example of an IDE – software which contains all the tools necessary to create and debug program code.

What is an IDE?

In the early days of computer programming there were very few tools available to developers, usually just a compiler and if you were very lucky, perhaps some form of debugging tool. As computers grew in capability and complexity, so did the programs that were created to run on them. The more complex a program is, the more difficult it becomes to manage.

Lots of tools have been developed to help programmers to write, debug and maintain their code. Just focussing on the act of writing code, we now have tools which colour code to make it clearer and easier to understand, code completion so we don’t make spelling mistakes and pretty printing so code is laid out nicely and auto indented.

Modern programs are split into many different files. Some may contain code, others assets that are used in the program such as graphics and sound. These files may also be edited and worked on by teams of people, some of whom are not even in the same building. Again, we have project management tools and source code control tools to help keep code organised and easy to manage.

It should be obvious that with so many tools available to programmers it would be cumbersome to have these all split up on your computer or installed separately. An IDE is the solution – bringing all of these tools together in one place, making them work together in a seamless fashion.

IDE stands for “Integrated Development Environment” and is a collection of tools for the creation, maintenance and debugging of program code.

IDE’s allow you to create program projects, manage the source code, write code, debug code and ultimately package that code so that it can be released and installed on other computers. There are lots of examples of IDE’s – some are standard applications installed on your machine such as Visual Studio, IDLE, Eclipse and others. Some even run through a web browser and allow you to develop code in an entirely online environment.

Tools and features of an IDE

Any IDE will contain three core features:

  • Code Editor
  • Compiler/Run Time Environment
  • Debugging tools

There may well be other features available, but these are the ones that we need to focus on for the exam.

The code editor

The code editor is the place where you write the actual program code. The code editor serves the following purposes and has the following features:

  • Create, edit, modify code.
  • Line numbering – so you can quickly identify a specific line of code (especially when correcting errors).
  • Pretty printing – usually the indentation of code so that you can see which code “belongs” to each block in the program.
  • Syntax highlighting – colouring the code so key words, variable names and so forth stand out. This helps debugging because if a syntax or grammar error is made then the subsequent code will be the wrong colour, thus highlighting the error.
  • Code completion – suggestions automatically pop up as you are typing to suggest variable names, function or procedure names and so forth. This can speed up development but more importantly helps to avoid syntax and typing errors.

The run time environment

An IDE will provide tools that enable you to quickly run and test your programs as you develop them. Usually this is a single button press or keyboard shortcut that begins program execution. The tools an IDE uses will vary but, in most cases, they will contain an interpreter and a compiler. Even for languages that are compiled, an IDE may have a built-in interpreter that enables you to execute program code one line at a time to help when debugging.

Run time environments do more than just compile and execute code, however. A decent IDE will monitor the program whilst it is running to give you some idea of how it performs, the resources it is using (CPU, memory, disk access etc) and provide tools to pause/stop execution at any time.

Debugging Tools

One of the most valuable assets of any IDE are the debugging tools provided. Most IDE’s will provide all or some of the following:

  • Breakpoints
  • Code stepping
  • Variable watch
  • Error messages and diagnostics

Breakpoints are used to pause the program at a certain line of code. The IDE then allows you to drop back into the code editor and see what is happening whilst the program is running. For example, you could hover the mouse over a variable name and it would show you the contents.

You can then make use of tools such as code stepping which allow you to execute one line of code at a time. Each time a line is executed you can view the output, look how variables have changed or simply observe where the flow of execution goes to make sure that the program is running in the order you expected it to.

Variable watch is a window in the IDE which displays a list of selected variables and their contents at run time. By observing these outputs you can determine if a variable holds the value expected or not and when it changes to the incorrect value to give you an idea of which piece of code is causing unexpected results.

Error messages come in many forms but usually alert you to the type of error and where it occurred in the program. For example, if you attempted to access part of an array which didn’t exist you might get an “index out of range exception” which tells you fairly clearly what the problem was. This makes solving run time or logic errors much easier!