CS2106 in the news: OS for Cells

Bid to program new life forms with ‘operating system’ for cells.
http://edition.cnn.com/2011/11/18/tech/innovation/program-living-cells/index.html

“For each single application you have in mind, you have to start from scratch, and you have to start all of the design of the biological roots from scratch,” he said. “The analogy in the computer industry would be that each time you write a computer program you have to write the entire operating system.”

That’s why Krasnogor hopes his team will be able to create a line of cells running a generic “cellular operating system” that could be re-programmed with different applications.

Questions from Final Exam 2010/11

The number of comments discussing the past year questions hit a record of 100 today. I thought it would be useful to summarize the confusing points from the discussion in a post here.

Q1

  • There is a confusion over Choice A.  Microkernel fits the description of A, but we may still implement things in the user space without resorting to Microkernel architecture.  One example is to schedule threads in the user space instead of kernel space (which we did not cover this semester)

Q2.

  • Running exec() will not change the pid of the current process.  Only the core image will be replaced.
  • Since exec() replaces the core image of the process, any line of code after exec() will not be printed unless there is an error.

Q3

  • Context switch can happen when you make system calls (Lecture 2) and when page fault occurs (Lecture 9)

Q4

  • The confusion is on Choice D.  Disk I/O takes longer time if there are multiple processes contending for the same disk.  The disk scheduler would have more requests to serve, and a process disk I/O request would have to wait longer.

Q8

  • The confusion is basically over the meaning of “when making a function call.”   The most direct interpretation (at least to me, anyway) is when we are already inside the function f.  If you interpret it any other way (e.g., somewhere between the caller and the callee), then it’s ambiguous.

Q10

  • Kernel code and structures are normally pinned in the physical memory, since the kernel is accessed very often and to avoid irresolvable situations (imagine if the page fault handler is paged out — who will handle the page fault now?)

Q11

  • Two things you need to consider is: (i) how to access the end of a huge file quickly?  and (ii) how to support reading a huge file backward (from end to beginning)

Q12

  • There is a confusion over Part (c).  As discussed in tutorial, size of swap space does not affect the number of page fault.

Q15

  • In RR, reducing time quantum improves the response time.  This question extends this principle to MLFQ — if we reduce the time quantum of low priority job, the high priority jobs will get to their turn faster as well.

Lab 8: Comments

Lab 8 has been graded. The grades and comments are available through IVLE gradebook.

As usual, please take a look and let me know if you see any unpleasant surprise.

Most of you did not setup the pipes properly. When I print out what you pipe/dup/close, things are pretty messy. Some create extra pipes, some do not close unused ends, some close everything, some dup the file descriptors in wrong order, etc.

We deduct marks according depending on how serious is your bug.

It’s very sad that despite multiple warnings, students still do not follow instructions, resulting in extra time needed to grade the labs, and delaying the release of marks for everyone. As promised in the lab sheet, 3 marks are deducted accordingly for these students *heart pain*

Errata: Implementation of Soft Link using i-node

I would like to make a correction to a wrong fact I said in class.

Suppose we have a file P, which is a soft link to another file Q. During the lecture, I incorrectly said that the address inside i-node of P contains the i-node number of Q. This is WRONG. The i-node of P contains address of the data block of P on disk instead (just like normal file). To implement the link, in the data blocks on disk, we store a string that contains the path to Q.

Here is a video that re-explains this: http://goo.gl/3pbt3

What are the pros and cons of these two different approaches?

If we store the path to Q, then we need to take extra steps navigating through the file hierarchy to find the i-node of Q. Whereas if we store the i-node number of Q, we can access Q directly. On the other hand, if we store the path to Q, Q can be a file residing in a different partition.

But here is a deal breaker: what happen after Q is deleted? In both approaches, when we access P, we will encounter an error. But, if we store the i-node number of Q, this i-node number can be reused by another file!! So P can unintentionally linked to another file at a later time. This is not good.

Sorry for the confusion caused by this error.

Lab 8: Solution

We are in the middle of grading Lab 8. In the mean time, take a look at one possible solution below:

int main()
{
     // code removed for brevity         
        :
        :   
     } else {
    //  loop through each commands, setup pipes.
            int i;
            int fd[2*MAX_PIPE_LENGTH];
            for (i = 0; i < num_of_cmds; i++) {
                // for k cmds, there should be k-1 pipes.
                // the ith cmd reads from fd[2i-2], write to fd[2i+1].
                if (i < num_of_cmds - 1) {
                    pipe(&fd[2*i]);
                }
                pid = fork();
                int read_from = (i<<1)-2;
                int write_to  = (i<<1)+1;
                switch (pid) {
                    case -1: perror("fork"); exit(1);
                    case  0: 
                        if (i != 0) {
                            if (dup2(fd[read_from], STDIN_FILENO) == -1)
                                perror("dup2");
                            close(fd[read_from+1]);
                        } 
                        if (i != num_of_cmds-1) {
                            if (dup2(fd[write_to], STDOUT_FILENO) == -1)
                                perror("dup2");
                            close(fd[write_to-1]);
                        }
                        execvp(args[i][0], args[i]);
                        perror(args[i][0]);
                        close(fd[read_from]);
                        close(fd[write_to]);
                        exit(1);
                    default:
                        if (i > 0) {
                            if (close(fd[2*i-2]) == -1) {
                                perror("close");
                            }
                            if (close(fd[2*i-1]) == -1) {
                                perror("close");
                            }
                        }
                        break;
                }
            }
            for (i = 0; i < num_of_cmds; i++)
                waitpid(-1, NULL, 0);
        }
        free(command);
    }
    return 0;
}

Lab 6: Comments, Part II

The grading for Lab 6 is done. As usual, please email me if the grade is not what you expected or you have any questions about the given remarks.

Disappointingly, many of you throw away precious points, just like that, by not following instructions on how to name files and the length of pages.

Many of you also did not explain the graphs, but merely describe what the graph looks like in words. Describing pictures is easy — given the graphs, anyone can describe what it looks like without know what is RR or MLFQ. The lab question specifically ask you to explain (i) and (ii). If you describe the graphs rather than explaining them, you would loose two points.

Here are the key points to mention:

  1. Turnaround/waiting time increases as number of CPU bound jobs increases. WHY: CPU bound jobs spend more time computing than blocking for I/O — so CPU is kept busy, queues are longer, jobs take longer time to complete.
  2. Response time increases slowly as number of CPU bound jobs increases. WHY: response time matters more for I/O jobs — so increasing the number of CPU jobs has little effect.
  3. The increment is more for RR than MLFQ. WHY: there is no differentiation between I/O-bound jobs and CPU-bound jobs in RR. So I/O bound job would have to wait behind more CPU-bound jobs if percentage of CPU-bound job increases. MLFQ gives priority to I/O bound jobs, which tends to stay at high priority while CPU-bound jobs get lower priority. More CPU-bound jobs do not have significant effect on the I/O-bound jobs.
  4. MLFQ has smaller turnaround/waiting time than RR. WHY: MLFQ approximates the behavior of shortest job first, so short jobs finish faster — leading to smaller turnaround/waiting time.
  5. MLFQ and RR have almost the same turnaround/waiting time at 10% and 90%. WHY: When most jobs are I/O-bound or CPU-bound, the jobs fall into the same queue in MLFQ.
  6. MLFQ has much lower response time than RR. WHY: MLFQ gives priority to I/O jobs.