AI News, Difference between revisions of "Foundations of Computer Science"

Difference between revisions of "Foundations of Computer Science"

Have you ever wondered what computing is and how a computer works?

If we attempt to understand the concepts by studying the technologies result from researches in computing and computers, it can be overwhelming as technologies can be very complicated and constantly changing.

Another approach is to try to strip away all unnecessary complexity and focus on the fundamental ideas and underlying principles.

Because they are fundamental they should help develop a deeper understanding of discipline, for instance, we study general biology to get a big picture of biology.

Due to the abstract nature of the material we will use lots of analogies and metaphors to illustrate concepts to draw new concepts into our frame of understanding.

In this book we will try to unpack and illustrate big ideas, such as the nature of computing, representations of information, algorithms, abstraction, recursion, and computability, and use modern technologies to .

In this course, we try to focus on computing principles (big ideas) rather than computer technologies, which are tools and applications of the principles.

Computing is defined by a set of principles or ideas, which underlies a myriad of teachnologes that are created based on the principles.

In the second half of the course, we will study various technologies to demonstrate the power of computing and how principles are applied.

In addition to principles of computing and technologies there are practices of computing - what professionals do to advance computing.

As professionals in the field of computing we need to know the two ends and everything in the middle - the practices (activities and skills that make computing useful and effective).

One of the big ideas of computing is that information processes can be carried out purely mechanically via symbol manipulation.

Toward the end of the book we will see this is true for all modern computers - digital computers manipulate two symbols (zero and one) blindly according to instructions.

We can define a procedure P that takes two symbols ('a' through 'j') as the input and produces two symbols in the same set as the output.

Internally, the procedure uses the first input symbol to find a row that starts with the same symbol, then uses the second input symbol to find a column with the same symbol at the top, and then report/return the symbols at the cross point.

It is not hard to imagine such a table lookup procedure can be done purely mechanically (blindly) by a simple agent (e.g.

For example, if the symbols 'a' through 'j' represent quantities of 0 through 9 respectively, this procedure performs single decimal digit addition.

For instance, p(d, f) = p(3, 5) = ai = 08, which is the correct result of 3+5.

Now that we have a simple procedure P that can instruct a simple agent to add two single digit decimal numbers, we can devise a new procedure P1 that can add three single-digit decimal numbers as shown in the following chart.

The new procedure P1 employs three instances of procedure P to add three decimal digits and return two digits as the result.

We can view the procedures as machines with inputs and outputs and the lines are pipes that allow the symbols to go from one place to another place.

Note that the dotted rectangle represents the new procedure P1 made up of instances of P and the answer given by P1 for the sample inputs is correct.

Again the symbols used in the process can be any set of symbols because internally simple table lookups are performed.

P2 uses P1 to add two double-digit numbers, in fact we can simply add more P1s to the design to deal with any numbers of digits.

If we follow the same line of reasoning, it is not hard to imagine we can create increasingly more complex procedures to instruct the simple machine to do progressively more intelligent things, such as

In summary, from this example we can see that simple symbolic operations can be assembled to form lager procedures to perform amazing activities through computational processes.

If we can represent abstract ideas as symbols (as we represent abstract quantities as concrete numbers) and device procedures to manipulate the symbols according to the relations among the ideas we can model reasoning as computational processes.

This is what computer science is fundamentally about - information processes with two essential components: representations and a sequence of rules for manipulation of the representations.

The machine that carries out such processes does not need to know the meaning of the symbols and why the process yields correct results.

As an example, you can read about a mechanical computer (difference engine) designed by Charles Babbage that can tabulate polynomial functions: Richard Feynman used another similar analogy (file clerk) to explain how computers work from the inside out in his Computer Heuristics Lecture (1 hour 15 mins):

A computer's ability to perform amazing tasks depends on its ability to manipulate symbols according to well defined rules.

Gottfried Leibniz (1646-1716), a German philosopher, is considered the first person to dream of reducing reasoning to calculation and building a machine capable of carrying out such calculations.

His observed that in arithmetics we represent abstract quantities using symbols and manipulate the symbols to get useful results according to rules.

He dreamed that we could represent abstract ideas using symbols and reason with the ideas according to the logics between the ideas via similar concrete symbol manipulation as we do in arithmetics.

Such manipulations give use correct results not because whoever does the manipulation is intelligent but because the rules of manipulation mirrors relationships between quantities and logics between ideas.

A computer is fundamentally a physical device that can manipulate symbols following very simple logic rules.

Computer science is fundamentally about the information process (dealing with abstract ideas) that takes place through symbol manipulation, which follows a recipe (a set of rules).

No wonder so many computer programming books are called cookbooks :) In computer science we study how to represent information and how to design and apply algorithms to get meaningful results.

The engineering disciplines we try to apply in the software development process to produce quality software is called software engineering.

Practices, sometimes called 'know-hows', define someone's skill set and the level of competency: beginner, competent, and expert.

Programming an integral part of computer science because it allows us to explore abstract ideas in computer science in concrete ways.

In this course we will program in a very high-level graphical programming environment to explore ideas in computer science.

If we can represent information using symbols and know how to process the symbols and interpret the results properly we can get valuable new information.

Google published its self-driving car video on March 28 2012 to showcase its technology in building autonomous cars.

For example, we receive information when we read a book, listen to a story, watch a movie, or dream a dream.

We give information when we write an email, draw a picture, act in a show or give a speech.

For instance, a conversation on the phone communicates information but the information is represented by sound waves, electronic signals and etc.

Therefore before any information can be processed or communicated it must be quantified/digitalized - a process that turn information into (data) representations using symbols.

Shannon is considered the father of information theory because he is the first person who studied and built mathematical models for information and communication of information.

His seminal paper “A mathematical theory of communication” (1948) changed our view of information laying the foundation for the information age.

Shannon discovered that the fundamental unit of information is a yes or no answer to a question or one bit with two distinct states, which can be represented by only two symbols.

He also founded the design theory of digital computer/circuit by proving that propositions of Boolean algebra can be used to build a 'logic machine' capable of carrying out general computation (manipulation of two types of symbols).

The external representation is used for communication between human and computers, but internally all information is stored and processed using a more unified representation.

Everything we see on a computer (monitor) screen, whether it is text, image, or motion picture, is a representation of certain information.

Computers also represent information externally using sound and other media, such as touch pad for the blind to read text.

This makes it easy to represent bits physically - any device capable of having two distinct states works, e.g.

With ten fingers humans conveniently adopted the base ten (decimal) numbering system, which requires ten different symbols, for instance the arabic numerals uses 0 through 9.






In fact we commonly use other base systems to represent quantities of different nature: base 7 for drays in a week, base 60 for minutes in an hour, 24 for hours in a day, 16 for ounces in a pound, and etc.

It is not hard to imagine base 2 (two symbols) is the simplest base system because with less than two symbols we can not represent change therefore no information.

All base systems work in the same way - the right most digit represent quantity of the base raised to the zeroth power (always 1), and each digit to the left represents a quantity that is base times larger than the one represented by the digit immediately to the right.



When we use different base systems it is necessary indicate the base as the subscript to avoid confusion, for example it becomes clear that



The computer science unplugged organization produced an class activity to demonstrate the idea of binary numbers using a deck of cards.



In studying computing we often need to convert between decimal representation, which we are most familiar with, and binary representation, which is used internal by computers.

Converting the binary representation of a non-negative integer to its decimal representation is straight-forward process - summing up the quantities each binary digit represents yields the result.








by dividing 23 by 2, writing down the remainder in the third column, and repeating the same process on the quotient till it becomes zero.

In addition to numbers bits are used internally to represent many other types of information or external representations of information.

piece of text can be viewed as a stream of symbols can be represented/encoded as a sequence of bits resulting in a stream of bits for the text.


perceived image is the result of light beams physically coming into our eyes and triggering nerves to send signals to our brain.

In computing an image is simulated by a rectangular grid of dots (pixels - picture element), each of which has a particular color.

In fact, the computer screen use such a grid of pixels to display images and text, each symbol of which is also represented by a grid of pixels externally but internally each symbol is represented as an ASCII or Unicode code.

In addition the encoding of an image includes other meta data, such as the size of the image, the encoding standard, and the date time when it is created.

It is not hard to imagine that videos can be encoded as series of image frames with synchronized audio tracks also encoded using bits.

For instance, if you were to represent a decimal number by writing it down on a piece of paper the size of the paper and the size of the font limit how many digits you can put down.












Note that they are all powers of twos because they are limited by the number bits available for the counting (representing the counts):







For example, one third can never be represented precisely by a decimal format with a fractional part because there will be an infinite number of threes after the decimal point.




The following video demonstrates how limits in the binary representation can cause numerical errors in computing precisely because computers use binary representation internally: This is a limit that is hard to overcome completely but there are practical means to deal with such round-off error so that the computational results are still useful.

We learned that computers use the binary system internally to represent everything as sequence of bits - zeros and ones.

Chapter 1 of the Blown to Bits book talks about the digital explosion of bits as the result of the innovations in computing and technologies, which enable us to turn information into bits and shared them with unprecedented speed.

We will learn that information processes start with conceptual solutions to problems and then can be implemented (coded) in ways understandable to machines.

You are already familiar with many algorithms, such as tying your shoes, making coffee, send an email, and cooking a dish according to a recipe.

For an algorithm to be useful, it must be correct - the steps must be logical and specific for machines to carry out - and efficient - it must finish in a reasonable amount of time.

Alan Turning is the first person who studied algorithms mathematically by creating a universal machine model, later called Turing machine.

He also proved that computation is unavoidable in circumstances - we will have to perform the computation to the result, which separates computing from mathematics (the birth of computer science).

The turing machine can represent/encode information in a standard form and interpret and updates the representation according to rules (algorithms), which is part of the representation.

For instance, we can view each algorithm as a state machine: an algorithm always starts with a state - represented by a data representation of the input and the internal information - and move through a number of states to its final state as the result of performing operations prescribed in the algorithm.

When the number of possible initial states approach infinity the same algorithm can generate potentially infinite number of computation.

In computing we deal with representations of information (data), so a step can be adding two integers and storing the result to a variable.

Machine languages are written in instructions consisting of zeros and ones (binary bits) because computers are fundamentally machines that can manipulate two types of symbols.

Each different type of machine is designed to understand its own native language -- patterns of zeros and ones -- because they can be very different computing hardware.

Normally we write programs to express our algorithms in high level languages - languages that are close to our natural language, e.g.

Then we use tools (compilers and interpreters) to translate our programs in higher level languages to machine languages, which is analogous to using a personal interpreter when we travel abroad and don't understand the native language.

High level languages hides the differences between machines to allow us to write programs in a machine independent way, which is a huge time saver.

So the steps or units of work must be defined in terms of a higher abstraction - a set of common data structure, operations and control structures that all languages support.

By creating algorithms conceptually using abstractions allows us humans to think on a higher level closer to the problem domain we know.

The following diagram shows that the same algorithm can be implemented in a variety of programming languages and the result programs can be executed on different machine models (maybe after some translation).

We know how to carry it out manually, but we would certainly not solve the problem using the process expressed in psedo code.

The psedo code, even though in natural language, must use the aforementioned constructs (data structures, operations, and control structures) that are common to computers.

Even we, as human beings, will not be able to scan a long list, say a million numbers, and find the largest number at a glance.

The algorithm we write must be expressed in terms of what a computer can do and are scalable to inputs (data sets) of arbitrary sizes.

In fact, it makes little difference to a computer whether the list has three numbers or three million numbers.

The top-most conditional defines a repetition (loop) because their an unconditional branching back to the conditional as expressed in the arrow with no label.

In summary constructing and studying algorithms allows us to study (algorithmic) solution to problems in a language and computing environment neutral way.

We could use the 'finding largest number' algorithm as a single block to form more complex algorithms, but we need to keep in mind this block is not a unit of work as the number of steps involved depends on the input size (the length of the list).

We will revisit this topic in the future when we talk about functional decomposition (to keep algorithms simple) and algorithm complexity analysis (count the cost of algorithms).

For instance, if we have a list of students we can search a student by last name, birthdate, or shoe size, which are search keys.

However, if the list is ordered by the search key we can use a much better algorithm by taking advantage of this ordered property of the data.

For instance, home phone numbers in a phone books are usually ordered by owner's last names and business phone numbers are ordered by business types.

If the list of numbers are random but ordered increasingly we can always guess the target number is in the middle of the search range (initially the whole list).

If we are not lucky, we can eliminate half of the candidates - if the number in the middle is greater than the search target the new search range will be the first half of the previous search range, otherwise the second half.

When solving a Rubik's cube (of any size), it is important to know the step by step instructions or algorithm to reach an efficient solution.

The top down approach to design is starting by examining the full problem or one way to think of it is to look at the big/whole picture first.

The general will breakdown a mission by assigning each soldier with a specific task to complete that in turn contributes to a critical part of the overall mission.

Using this method ensures that the smallest unit has been successfully tested and therefore, when you start solving or implementing the next sub-solution that it will work due to the previous layers working successfully.

The following four basic logic structures describe the type of blocks used for creating algorithms:

According to Wikipedia, a constructed language 'is a language whose phonology, grammar, and vocabulary has been consciously devised for human or human-like communication, instead of having developed naturally'.

Computing algorithms can be described/defined in terms of the common units of work that computers can do so that they are neutral to programming languages and the execution environment (computers).

If we liken a computer to a car, programmers are drivers who can make the car do different things (to a certain extent) and users are like passengers taking advantage of the things the car can do.

This type of errors are easy to fix, in fact most modern program editors can detect and warn us about such errors.

Another more subtle bug may cause the computer program to never finish, known as the infinite loop, which is obviously not what we wanted.

If the actual output matches the desired output the program passes the test, otherwise their is a bug in the program or in the test (tests can be buggy too).

When a program becomes larger and more complex the number of tests need to cover all possible cases grows very fast.

There are usually always multiple ways to solve the same problem and, therefore, multiple algorithms to make computers solve the problem for us.

In computing we care the most about the solution speed because a fast algorithm can solve larger problem or more problems in the same amount of time.

One possible approach is to implement the algorithm, run the result program, and measure its execution time - elapsed time between the start and the end of a program run.

Secondly, to run two programs to compare their execution time we must subject them to the same input (a.k.a a workload, e.g a list of one million numbers to be sorted) and the size of the input is never ideal.

The amount of food needed will surely magnify the difference - as we increase the amount of ingredients the lengthy recipes will take even longer to follow.

If a recipe involves beating eggs can instruct the cook to break each egg and scramble it individually or break all eggs first and scramble them together.

Recall that algorithms must be defined in terms of the units of work (steps) computers can perform so that it is straightforward to implement them in programming languages.

The way the steps are ordered and organized in algorithms can significantly affect the execution time of the programs implement the algorithms.

In this approach we take an algorithm described in pseudo code or flow chart, count the number of steps (units of work), which is always a function of the input size.

In the aforementioned example recipe the time it takes to follow the first method (break each egg and scramble it individually) is directly proportional to the number of eggs involved.

Instead of measuring the steps taken for a particular input size, we focus on the relationship function between the number of steps and the input size, which shows the pattern in which the amount of work (cost) grows as the input size increases.

Then, we apply asymptotic analysis, which compare functions as inputs approach infinity, to simplify the functions because as the input size approaches infinity the difference between the units of works disappears (we can assume breaking an egg and scrambling it take the same amount of time) and the cost of most 'complex' part of the task will dominate (a part that repeats a step 10 time will dwarf any step that is only done once) the total cost.

For example, in a recipe we have a one quick step (takes 0.0001 seconds per serving) that repeats 10 times and a slow step (takes 10 seconds per serving) doesn't repeat, an amount of serving of N (input size) would cost

In asymptotic analysis we can ignore the slow step because its contribution to the total cost is negligible when N approaches infinity.

With simplify growth functions we can put them into categories considering algorithms in each category to have similar performance characteristics.

This type of analysis is not completely precise but it can be very useful in studying the nature of algorithms and predicting their performances.

The following script implements algorithm 1 using a block that iterates through all the numbers in the range and adding them to the sum one at a time.

The same problem can be solved using algorithm 2, which use a function to calculate the result as shown in the following script and the report block.

If you run the programs and try different input size you will observe that as you increase the input size the execution time of the first program increases steadily whereas the second program shows not change in execution time.

The first algorithm loops through the numbers to get the sum, therefore the relationship function between cost (the number of steps - additions) and the input size is

We assign algorithms with this type of growth function into the leaner algorithm to the leaner this type of functions linear time category denoted by

The second algorithm always takes the same amount of time because it performs the same set of operations regardless of the input size.

When we analyze algorithms we focus on worst cases because the performance of an algorithm is dependent from the actual input and we should not rely on luck!

because in the worst case (all numbers are unique) the algorithm has to compare each number to all the other numbers in the list before it arrive at the answer.



If you are interested you can reproduce and run the following script (an implementation of the algorithm) to verify the predicted performance characteristics of the algorithm.

This example is a little contrived but it demonstrate a new performance category and is very useful when we discuss encryption techniques.

Assuming outputting a number is the unit of work (it takes the same time output a number) the cost to generate all one digit numbers is 10.





One technique we use to keep our algorithms and programs simple is abstraction, which is an idea widely used in many fields such as art, math and engineering.

The internal working of a car is unnecessary detail to drivers unless they are race car drivers who need this type knowledges to drive the car most efficiently.

This car interface extracts common car features (what a driver needs to know to drive a car) from all kinds of cars.

We have learned that algorithm design is a integral part of problem solving using computing and a first step in programming.

you can implement an algorithm as a block, which then can be used anywhere in your script as long as you can call the block with a proper sequence of parameters according to the interface.

Layering separates the functional units (blocks) into layers to separate concerns and simply interactions patterns to make the complexity more manageable.

On the contrary if any arbitrary interaction is allowed we may end up with tightly coupled system as shown in the following figure.

The machine was built through abstraction - we construct larger building blocks using smaller (more elementary) ones and treat each block as a unit ignoring the internal details.

In order to make a sprite to draw different equal-lateral shapes we can create a block for drawing each type of shapes such as triangles, squares, pentagons, and etc.

The draw triangle block repeat three times to draw each side with a turn of 120 degrees.

Through this example, we have demonstrated defining blocks to abstract away details of a task and generalizing a solution to solve more problems.

Similarly, a directory tree of a file system on a computer and an ancestry tree genealogy exhibit a similar pattern.

can be calculated from 4!, which is 4 times 3!, which is 3 time 2!, which is 2 times 1!, which is 1 by definition.

If the problem we are solving is self-similar - solving the whole problem is a matter of solving the parts that are similar to the whole - the solution we are defining for the whole can be used to solve the subproblems(recursion).

To design a recursive solution we practice wishful thinking - as we describe the solution to whole problem we wish it is fully defined and can be used to solve the smaller subproblems.

To program recursively, we describe the solution to a problem as a function/procedure/block, in which we break the bigger problems into smaller ones and use the function we defining to solve the smaller problems.

By the time our program reaches all base cases, we would have solved the whole problem because all the subproblems are solved including the problem we start with.

If both parts exists and are structured properly, the algorithm (function) can solve problems of any size by asking clones of itself to solve partial problems.

Recursive problem solving is a powerful idea because it simples our thinking: if you can define a problem recursively you can solve it recursively.

To search one half of the ordered list is just a smaller version/part of the overall problem so we can apply the same algorithm wishing the algorithm is fully defined.

higher order functions offer a more powerful ways to generalize solutions to problems by allowing blocks to take blocks as parameters and returning a block as a return value.

An example higher order function in math is the derivative function which takes a function as the input and produces another function (the derivative of the first function) as the output.

In computer science, a map function takes an arbitrary function and a data set (e.g.

Another example is the reduce (or fold) function, which takes a input function and data set and produces the aggregation of all items in the data set using the function.

For instance, if the input function is the addition the reduce function returns the sum of all items in the data set as the output.

It doesn't simply programming because someone has to write this map block (check the source code to see how complicated it is), but it makes programmers happier because it keeps the thinking part simpler.

Note that the reduce (combine with) function can take any function with two input parameters to aggregate a list of values to a single value.

By using higher order functions we can create generalized solutions that can be customized to solve a larger set of problems.

In this block two reporter blocks are taken in as parameters, which are then used to form the report value - a new block.

To use the composed function we can call the compose function with the two input functions and use the 'call' block to execute the composed block against an input value as shown in the following script.

With this 'compose' block we define new functions by combining existing functions on the fly - when we need them.

In 1936 Alan Turing proved that a general algorithm to solve the halting problem for all possible program-input pairs cannot exist, i.e.

More specifically, the halting problem is undecidable because it is the simplest type of question that answers a yes or no (decision) question: whether a program will eventually halt (stop) given an input.

Obviously it is infeasible to actually run the program with the input to see whether it will halt because it may take forever if there is an infinite loop in the program.

All proofs by contradiction start with an assumption that the proposition we are trying to proof is false, follow a logical sequence of valid steps, and arrive at a conclusion that is clearly false or contradicts, which is also false because a statement can be either true or false but never both.

If we assume the halting problem is decidable/solvable we should be able to design an algorithm and implement it as a program that solve the problem for us.

The program should take a program and the input to the program as a input parameters and return the answer - whether the program will halt or not on the input.

takes a block (program) as input date and an interpreter program takes the source code of a program as data and runs the program.

On the contrary if the best known algorithm we know takes a long time to solve a problem, it is hard because computers cannot solve it fast.

If the big-O notation of a category contains only polynomial terms, the problems solvable using algorithms in this category are called P problems (Polynomial time solutions exist), such as



For example, integer factorization (or prime decomposition) problem has no known polynomial time solution but given an answer we can verify it very quickly by simply multiplying the factors and comparing the result to the integer.

Collectively we call problems that take too long to solve intractable problems, which include problems with best algorithm in exponential time (






Obviously P is a subset of NP because NP is defined only by polynomial answer verification time and being able to solve a problem in polynomial time (P) certainly qualifies for that.

All problems in this category are NP problems sharing one special property - ALL other NP-complete problems can be translated/reduced to each of the NP-complete problems in polynomial time.

Most encryption algorithms are computationally secure because breaking them are NP-problems - there is no known efficient polynomial time solution.

Intro to Algorithms: Crash Course Computer Science #13

Algorithms are the sets of steps necessary to complete computation - they are at the heart of what our devices actually do. And this isn't a new concept. Since the ...

Algorithm using Flowchart and Pseudo code Level 1 Flowchart

Algorithm using Flowchart and Pseudo code Level 1 Flowchart By: Yusuf Shakeel 0:05 Things we will learn ..

Programming Basics: Creating an algorithm/flowchart and then adding a counter.

via YouTube Capture.

Representing Numbers and Letters with Binary: Crash Course Computer Science #4

Please take our PBS Digital Studios Survey! Today, we're going to take a look at how computers use a stream of 1s ..

Early Computing: Crash Course Computer Science #1

Hello, world! Welcome to Crash Course Computer Science! So today, we're going to take a look at computing's origins, because even though our digital ...

Algorithms lecture 2 -- Time complexity Analysis of iterative programs


Time complexity analysis - How to calculate running time?

See complete series on time complexity here In this lesson, we will see how to ..

1. What is Computation?

MIT 6.0001 Introduction to Computer Science and Programming in Python, Fall 2016 View the complete course: Instructor: Dr. Ana ..

Quantum Algorithm - The Math of Intelligence #10

Quantum Computing offers hope for computing progress as we approach the limits of transistor density on silicon hardware. We're going to talk about the theory ...