CS 475 - Operating Systems
Hwk 4: David Shell (dsh)
A shell is an interactive command-line environment in which users can issue commands to the OS. Even with modern OS’s support of point-and-click GUIs, many still prefer to use the shell. Today, many shells exist and are supported by Unix-based systems, but the Bourne-Again Shell (bash) is probably the most widely used.
Your goal for this assignment is to create your very own shell, David Shell (dsh
), named after your favorite OS Professor.
Student Outcomes
- To understand how a simple command-line interface to the OS is implemented.
- To work with environment variables with
getenv()
- To become familiar with process creation with
fork()
andexecv()
- To become familiar with parent-child synchronization with
wait()
Starter Code
Starter code for this assignment is provided on the github repo. You must do these steps in order to submit your work to me on github.
-
Login to github, and go here: https://github.com/davidtchiu/os-dsh.
-
Click on the green Use this template button
and select the Create new repository option. In the next page, give your repository a good name (the “suggestion” they give is fine). My only request is that you don’t name it to be the same as mine. This is hide your homework solution from Google searches.
-
This will create your own copy of the repository with the starter code I provided! Copy the URL of your repo from the browser window.
-
Now from VS Code, open a terminal, and *clone* your new Github repo down to your local working directory using:
1
git clone <your-github-url-for-this-project>
Working Solution
I have included a working solution of my program along with the starter code. The binary executable file is called dshSol
. You can run it from the terminal by first navigating in to the Hwk directory and typing the command ./dshSol
.
Preliminary: Access of Environment Variables
Before we get started, we should first cover what environment variables are within shells. I tend to think of shells to be a suped-up command-line interface. They are the user interface to the OS before there were windows and desktops. There are things that the shell needs to remember about you so that it’s customized to your liking. Many of these customizations are stored in the shell’s environment variables. On one hand, these variables can hold values that make the shell more convenient to use (for instance, remembering the path to your “home” directory), and on the other, they store values that customize the command-line environment to your personal taste (for instance, assigning various colors to files and directories.)
There are certain environment variables that are pretty standardized, including $HOME
, $PATH
, $SHELL
, $PWD
, and so on.
-
If you’re curious to see what’s stored in an environment variable, you can use the
echo $<VAR>
command. For instance, the following example shows that/home/dchiu
is stored inHOME
.1 2
$ echo $HOME /home/dchiu
-
Try echoing
$SHELL
(what shell am I using?),$PWD
(where am I currently?),$PATH
(what are the known paths to executable files?), and so on, … -
Importantly for this assignment, to return the value of an environment variable as a string, you can use the function getenv(), provided in
#include <stdlib.h>
.
Handling Input
-
As soon as
dsh
starts, it should repeatedly provide the user with a command-line prompt:dsh>
Check out the example below:1 2
$ ./dsh dsh>
From the Terminal, I run the David Shell program using
./dsh
and it immediately vives me thedsh>
prompt. I am now in the David Shell environment, and am able to issue commands just like other when you’re inside other shell. -
At the
dsh>
prompt, the user would be able to enter a command which David Shell should attempt to execute as a separate process.- Some commands may be followed by a list of arguments (for instance,
ls -l ../
andps au
), so you will have to read in the entire line as a string and tokenize (split) it by whitespace. You may assume that arguments are separated by a single space, ignoring tabs, consecutive spaces, etc. For accepting a whole line of input, I recommend looking into fgets(), which has the following signature:
1 2 3 4 5
char* fgets( char* str, /* OUT */ int num, /* IN */ FILE* stream /* IN */ )
This function will read
num
characters from thestream
, and place it in a pre-allocated string bufferstr
. Thestream
, in our case, is simply input asstdin
(the standard input device). The method will return a pointer tostr
. In your program, if the user’s input is longer than 256 characters, you should ignore it and output an error (command too long).Example:
1 2 3 4 5 6
char *line = (char*) malloc(256); // create an empty buffer to store the input // reads up to 256 characters into the buffer if (fgets(line, 256, stdin) == NULL) { exit(0); // exit the program if EOF is input }
- You too may assume that user input will occupy no more than 256 characters.
- Keep in mind that
fgets(..)
will gobble up everything you enter, including the trailing newline character from hitting theenter
key! fgets()
will also returnNULL
if you give it an end-of-file (EOF) signal on input. This can be done usingcontrol+d
. When we detect this, we simplyexit(0)
.
- Some commands may be followed by a list of arguments (for instance,
-
Once you have the input line, you should trim it (that is, remove all preceding and trailing white space). If the trimmed input is an empty string, you should simply re-prompt the user. To this end, I would write a function to trim the input string, and test its length.
Understanding Program Execution and Full-Path Resolution
After the line of input is read, dsh
will need to verify that the command entered is valid. There’s an easy way to tell if it’s valid, but first, we need to understand how file system paths work. To create a process and run a program, you need to construct its full path in the file system. An full path to a program, say ls
, might look like this:
1
/bin/ls
To locate the ls
program, the OS needs to first traverse to the “root” directory or folder (that’s the initial /
), and from there, traverse into the bin
directory. Then from /bin
, to look for a file named ls
in there. It’s really nice when the user types out the full path to the program they want to run, but that’s usually not going to be the case. (Try typing /bin/ls
in your terminal. It works as advertised!) We typically only type ls
, and it works, which means that David Shell must do some behind-the-scenes work to figure out where to look for ls
.
This is all pointing to a couple of modes of execution we need to support. The user could either give the full path to an executable file, or the user might simply give the name of the executable file they want to run.
-
Mode 1 (full path is given): Suppose the user types in the full path to an executable. You know they typed a full path if their input starts begins with a
'/'
character! First, check to see if the given path even exists. To do this in C, I would first include theunistd.h
file, and use itsaccess()
function:1 2 3 4 5 6 7
if (access(path, F_OK | X_OK) == 0) { // File exists and is executable! Can run! } else { // No good! File doesn't exist or is not executable! // Alert the user and re-prompt }
-
If the executable isn’t found or if it can’t be executed (see above snippet), then simply output an error to the screen, and re-display the
dsh>
prompt. -
If the executable is found, then there are two further options for execution:
-
Run in foreground: Execute the file using
fork()
andexecv()
as you learned in class. The call toexecv()
requires the full path to the executable, which the user already gave you. Commands may have an arbitrary number of arguments that follow it, which are delimited by whitespace. You’ll need to input the command-line arguments into a NULL-terminated array of strings, and pass it along toexecv()
. When running a process in the foreground, the parent process (that is,dsh
) mustwait()
for the child process to finish. Therefore, you would not expect to see thedsh>
shell prompt again until the child process terminates.- This is the usual mode of execution when you’re on the terminal!
-
Run in background: If the last character in a valid command is an
&
symbol, it indicates that the executable file is to be run in the background. In other words, when the David Shell forks a child, the shell should not wait for the child to terminate. After forking, the OS will run the new process concurrently withdsh
. This means that you’ll seedsh>
being re-displayed immediately by the parent (dsh
) process, leading to possibly strange output to the screen: If the child process also prints to the terminal, the outputs will interleave.- Running processes in the background is useful when you’re multitasking, and need to spin off a program, but get your David Shell prompt back so you can run other programs!
-
-
-
Mode 2 (full-path construction): This case triggers when the input does not start with a
/
. Now we’ve got some work to do before we can evenfork
andexec
! We need find the true location of the given command, and we’ll use the following steps. The technical term for this process is called Path Resolution:-
First, we check to see if the executable file can be found in the current working directory. That is, the location of where your shell thinks you’re in. Look into using
getcwd()
, defined inunistd.h
. Concatenate the user’s input to the the current working directory, and see if it exists. If so, fork and execute it right away as per the procedure given above. If not, then move on to the next step.- For instance, if I typed
ls2
and my current working directory is/home/dchiu
, then the first place my program would be/home/dchiu/ls2
. Of course, this executable file may not exist in my current directory… read on!
- For instance, if I typed
-
If it cannot be found in the current working directory, then there are other paths in which it can be found. These paths are stored in the environment variable
PATH
. -
For example, a
PATH
might hold this value:1
/opt/local/bin:/opt/local/sbin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/opt/X11/bin:/usr/texbin
What you’re seeing here is a
:
-delimited string that separates all the different locations to look for the executable file. You need tostrcpy()
this string and split it up by:
usingstrtok()
. Next, concatenate the user’s input string to the end of each token.So, if I typed
ls2
, then the first location to try would be/opt/local/bin/ls2
, the second place to try is/opt/local/sbin/ls2
, and so on, until you find it. As soon as you detect that a full path to the executable exists, then fork and execute it! If you’ve tried all the paths without finding the executable, then you must output some kind of aCommand not found
error message and re-prompt for the next command.
-
Support of Built-in Commands
Some commands are not meant to invoke another program. Instead, they are to be handled directly by the shell by calling functions that you built-in. For example, when a user inputs exit
, your shell is not supposed to attempt to find an executable named exit
in the user’s PATH! In this case, dsh
should simply terminate by exiting the loop. These special commands are called “builtins.”
Here is a list of built-in commands that David shell needs to support.
-
exit
should exit David Shell. Don’t forget to clean up any allocated memory! -
pwd
should print the current working directory. Look intogetcwd()
, defined inunistd.h
. -
cd [path]
should change the current working directory to the optionally given path. If path is not given, then David Shell should change the working directory to the user’s home directory, stored in the environment variableHOME
. Look into thechdir()
function, defined inunistd.h
. If the given path does not exist, you should output an error,"<directory>: no such file or directory".
Flowchart of Activities
Phew! That’s a lot to take in. The figure below shows the abstract flowchart for this program. This should (hopefully) give you a better idea of what all needs to be done.
Hints and Tips
This assignment can be tricky to get started, because there are so many pieces that need to come together. You are warned to start early. If I were doing this assignment, I’d probably work on things in this order:
-
First, you need to figure out how to allocate an array of strings. This is important for invoking the
execv()
system call later on, and for thesplit()
function (see next bullet point). Because strings are themselves arrays, then an array of strings is essentially a 2D array (or a pointer to pointers tochars
). You can create an array ofnum
strings as follows:1 2 3 4 5 6 7 8 9 10 11 12 13
// this creates num pointers to strings char **array = (char**) malloc(num * sizeof(char*)); // this loops through each array element and instantiates // an array of chars of length CAPACITY for (int i = 0; i < num; i++) { array[i] = (char*) malloc(CAPACITY * sizeof(char)); } // now I can assign strings to individual array elements for (int i = 0; i < num; i++) { strcpy(array[i], "hello world"); }
-
Write a function
char** split(char *str, char *delim)
, that exhibits similar behavior as Java String’ssplit(..)
. Your function should input a stringstr
, and a string delimiter, and return an array of substrings (tokens) split on the given delimiter.-
First, figure out the number of tokens that you will end up with. I would do this by counting the number of instances of the delimiter found in
str
. The number of tokens is just 1 plus that number and call this valuenumTokens
. -
Now use the previous bullet point to allocate
numTokens+1
pointers to chars. Then write your loop to mallocnumTokens
arrays of chars. (Yes, it is crucial that you only loop up tonumTokens
, and notnumTokens+1
). -
Use the string function
strtok()
to loop through all of the tokens, and assign each to a corresponding element in your new array by usingstrcpy()
. -
Because the user of your function wouldn’t know the size of the array that you’re returning, make sure you set the final element of your array to
NULL
. (This is why I had you mallocNUMTOKENS+1
spots in the first place). Here’s how you might use your new method:
1 2 3 4 5 6 7 8 9 10
//split cmd on whitespace! char cmd[] = "git add ."; char **terms = split(cmd, " "); // print out all the tokens int i = 0; while (terms[i] != NULL) { printf("%s\n", terms[i]); i++; }
Should result in the output below:
1 2 3
git add .
This function is a huge workhorse for this assignment.
-
-
Write the main command-prompt loop to repeatedly accept input. Test the
split(..)
function you just wrote on various inputs, including empty string.- You should be using
fgets()
to accept user inputs. Remember that the “enter” key is logged as a'\n'
character at the end of the string! You’ll probably want to truncate that newline character as soon as you obtain the user input, and that’s as simple as putting the'\0'
character in its place.
- You should be using
-
Work on handling the built-in commands next.
-
Work on command execution when given the full-path to an executable. (Mode 1)
-
Finally, work on execution when given just the name of an executable. (Mode 2: path resolution)
Simplifying Assumptions
To make your lives easier, you may make the following assumptions.
-
A line of input, which includes the newline and termination character, shall be no more than
MAXBUF
length.MAXBUF
is given insidedsh.h
to be 256. You may assume that no tests greater than that length will be made when grading. -
You may assume that, if the trailing
&
is given, it always follows a whitespace. That is, it shall not be attached to a command or argument. -
In most shells, multiple commands can be given on the same line, separated by semi-colon. For instance,
cd /home; ls
. Your shell does not have to deal with multiple inline commands. -
In most shells, the
~
character is replaced with your home directory. For instance,ls ~/os-shell
would really bels /home/david/os-shell
, andcd ~
would always take you to your home directory. You do not need to handle the special~
character.
Example Output
This example assumes there’s a file named feelGood.c
in my current working directory. (Yes, this file
is given to you as part of the initial code base for the assignment.) The full path to the executable, cat
, is given. This causes Mode 1 to run, meaning that the shell will not do a search of the PATH environment variable for cat
.
Further, the argument feelGood.c
is given in an array to execv()
. Specifically the array should contain
["/usr/bin/cat", "feelGood.c", NULL]
. The cat
executable will simply print the contents of feelGood.c
to
the terminal. As there is no trailing &
(ampersand), /usr/bin/cat
is executed and David Shell waits for it
to finish before re-prompting. Therefore, you will see all the outputs of cat
before the next dsh>
prompt.
1
2
3
dsh> /usr/bin/cat README.md
#os-shell
dsh>
Next, I enter NotARealCommand
, which David Shell to invoke Mode 2 and search for
NotARealCommand
in all the known paths of my PATH
environment variable. At this point, the shell has also determined that NotARealCommand
is not a built-in command. This file is not found in any of the paths (and is not a built-in), and therefore the error is printed and the shell re-prompts for the next input.
1
2
3
dsh> NotARealCommand -o
ERROR: NotARealCommand not found!
dsh>
In the snippet below, I enter ls -l
, which causes my shell to invoke Mode 2 and search for the
ls
binary in all the known paths listed in my PATH
environment variable. The ls
binary is of course
found (on my machine, inside /usr/bin/
). Further, the -l
is given in a NULL-terminated array to execv()
. Specifically the array should contain ["/usr/bin/ls", "-l", NULL]
. As there is no trailing &
, the ls
is executed and the shell waits for it
to finish before re-prompting. (Therefore, you will see all the outputs of ls -l
before the next dsh>
prompt.)
1
2
3
4
5
6
7
8
9
10
dsh> ls -l
total 56
-rw-rw-r-- 1 dchiu dchiu 112 Feb 15 2023 Makefile
-rw-rw-r-- 1 dchiu dchiu 17 Jan 13 2023 README.md
-rw-rw-r-- 1 dchiu dchiu 323 Feb 15 2023 dsh.c
-rw-rw-r-- 1 dchiu dchiu 128 Feb 15 2023 dsh.h
-rwxrwxr-x 1 dchiu dchiu 30656 Feb 15 2023 dshSol
-rw-rw-r-- 1 dchiu dchiu 149 Feb 15 2023 feelGood.c
-rw-rw-r-- 1 dchiu dchiu 301 Jan 13 2023 main.c
dsh>
Assuming that the given feelGood
binary is in the current working directory. This sequence below runs feelGood
in the background (notice that the trailing &
is given), so David Shell will not wait for feelGood
to finish (and it won’t finish, as feelGood
runs an infinite loop that prints a message that makes me feel good about myself every second… heh heh heh…) Note that I am given the dsh>
prompt immediately, but feelGood
continues to output independently of what I’m doing in David Shell. In fact, even after I exit the shell, feelGood
continues to run as an independent process!! You should figure out how to terminate feelGood
>:)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
dsh> ./feelGood &
dsh> ls -l
total 56
-rw-rw-r-- 1 dchiu dchiu 112 Feb 15 2023 Makefile
-rw-rw-r-- 1 dchiu dchiu 17 Jan 13 2023 README.md
-rw-rw-r-- 1 dchiu dchiu 323 Feb 15 2023 dsh.c
-rw-rw-r-- 1 dchiu dchiu 128 Feb 15 2023 dsh.h
-rwxrwxr-x 1 dchiu dchiu 30656 Feb 15 2023 dshSol
-rwxrwxr-x 1 dchiu dchiu 149 Feb 15 2023 feelGood
-rw-rw-r-- 1 dchiu dchiu 149 Feb 15 2023 feelGood.c
-rw-rw-r-- 1 dchiu dchiu 301 Jan 13 2023 main.c
Students think you're inspiring!
Students think you're inspiring!
dsh> exit
$
Students think you're inspiring!
Students think you're inspiring!
Students think you're inspiring!
Students think you're inspiring!
Grading
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
This assignment will be graded out of 80 points:
[40pt] User input is properly handled, and invalid commands (not found in PATH or
current working directory) generates an error.
[25pt] Running a valid command works as expected (in-foreground vs. in-background too).
[5pt] cd [path] works as expected, by changing the current directory to the path
(if given), or $HOME (if not given).
[3pt] exit and pwd works as expected
[2pt] Your program observes good style and commenting.
[5pt] Your program is free of memory leaks and dangling pointers.
Submitting Your Assignment
-
Commit and push your code to your Github repo. Make sure your repo is public.
-
On canvas, simply submit the URL to your Github repo. No other form of submission is accepted.
Credits
Written by David Chiu. 2022.