RSS FeedUNIX System Administration – Study Guide (Another Semester Draws to a Close)
So as another semester draws to a close, the questions about a “study guide” “mock finals” and such arise again. This guide is a brief outline in that regard for CSCI 4418 (UNIX System Administration) – the undergraduate + graduate computer class that I taught at GWU during Spring 2012. This is not an exhaustive guide, and there are almost always some questions in the final which are not touched upon here – the lectures and the class discussions are a superset of this document. Students should feel free to discuss these topics and questions amongst yourselves, but I do not provide answers to these. And the answers change every year anyway.
Topics
Scripting and Shell, Access Control, Process management, File system, User management, Software management, Setting up Cron, Backup – Dump and Restore, Drivers, Devices and Kernel, Networking, Routing and DNS, Web Applications, Performance Monitoring
Broad Questions
There is really only one basic question that I want to ask at the end of my class, and that question is the same, at the end of every class, and the end of every semester (or a weekend, if I happen to be teaching a weekend seminar class). That question is: How has this class changed your outlook on the subject matter covered, and what are some of the things that you may do differently now, with the benefit of this class.
Since that question has the abstractness of Eliot’s poetry in it, sometimes I use other questions in it’s stead. These are some examples of questions that are intentionally vague. Focus is on checking how broad you think. Real life problems are usually this vague. In each of these, the setting is that you are the system administrator for the IT system (the set of servers, etc,) for that organization.
- A user complains that another user has been able to read his private files. What measures would you take to prevent this from happening again.
- Your applications team wants to build and deploy a web application that will allow the employees to see how much vacation time they have left. What are some of the security considerations you will check/ensure/discuss with them.
- Some of the servers seem to slow down suddenly at different times. What debugging would you perform to check the server performance, see who & what is causing the slowness, and then to possibly improve it?
- You have many users who are business analysts, and they have secretly confided in you that their definition of computer usage involves Facebook more than it involves UNIX/Linux. When they log into a UNIX server that you manage, they go through a list of items: set system color settings so they can read the system outputs, connect to an Oracle database, run some reports, download those reports, etc. They would like to not have to do all these repetitive work. The reports that they run can change from day to day, but the rest of it is pretty much the same. How can you help them get through the repetitive work?
- Your users occasionally delete some files and ask you to “restore” them. How can you create a scheme or a framework for preparing for these kinds of requests? You may not be able to prevent data loss completely, but your goal is to create a framework for “reasonable” backups. For example, if the total size for files for all users is 20 TB, approximately how much space are you allocating for backups?
- Your users would like to share some files. What options will you consider?
- Your application team is using a database which has many reads and writes. They have heard that RAIDing is a “good idea”. What are some of the RAID choices that you may consider, and what are their advantages and disadvantages?
- When might you need to write a device driver?
Scripting Questions (1 Example)
Write a script that counts all unique words that appear in any of *.txt files that exist anywhere inside the folder /home/courses/cs4418. For example, suppose these are the two files:
/home/courses/cs4418/a/b/c/d/e/f/abc.txt: {Justin Bieber is a great singer.}
/home/courses/cs4418/de/f/xyz.txt: {Lady Gaga is a superb singer.}
Then, your script should print the answer that there are a total of 9 unique words in 2 files.
Objective Questions
Objective questions for this question typically test the student’s knowledge of specific commands, and specific concepts. For example:
- How many times would a given cron job run between two given dates.
- ______ command does x.
- Command y does ____.
- What is kernel space?
- What are two broad kernel architectures?
Teaching Unix System Administration at GWU
During the spring 2012, I am teaching Unix system administration at GWU, which is a mix undergraduate and graduate class. Most of the students are graduate students, and they bring a lot of interesting perspectives to this class. When we finish this class in May, there is a small likelihood that the following scenario plays out somewhere:
Defining the center – Mean, Average, Geometric Mean, Median, Mid Range, or something else?
Frequently, the need arises to define the central value of a given population. While the problem is a bit harder in sociology (for example, what does an average American want?), the problem is easier when considering the scientific world, where the population is given by a sequence of numbers.
Here are some of the many concepts that are used to define the central value. A central value tries to represent the population, and is almost never perfect. It may be tricky to decide which of these concepts to use to define the central value.
- Arithmetic mean (also sometimes simply called mean or average) is the average, that is SUM of all numbers, divided by n. For example, the arithmetic mean of 1, 2, 3, 4, 5 is 3.
- Median is the middle entry in the sorted sequence. For example, the median of 1, 3, 4, 10, 17 is 4. If there is an even number of values, then just take the average of those two. For example, the median of 1, 3, 8, 10, 21, 25 is the average of 8 and 10, that is, 9.
- Geometric mean is the 1/n th root of PRODUCT of all numbers. For example, the geometric mean of 1, 2, 3, 4, 5 = 5th root of (1 * 2 * 3 * 4 * 5), that is 5th root of (120). That comes to be about 2.6.
- Mode is the number that appears most often. For example, given 1, 1, 2, 2, 2, 3, 3, the mode is 2.
- Range is simply the range, that is the minimum and and the maximum, and therefore it is not really one central value. For example, the range of 1, 4, 1000 is [1, 1000].
- Mid-range is the middle value of the range. For example, for 1, 4, 1000, the mid-range is 500.5
Here are some observations on these central values:
- When you have numbers that are in a arithmetic series, like 1, 2, 3, 4, 5 (step size is 1) or 2, 5, 8, 11, 14 (here step size is 3), then the average is approximately the middle number. For example, for population [1,2,3,...99,100], the average is 50.5.
- Similarly, for all numbers from 300 to 500, the average is 400. (Obviously).
When to use Arithmetic Mean versus Geometric Mean
- The arithmetic mean is a good measure when numbers are of the same order of magnitude – like students scores on a test.
- Geometric mean would be appropriate if the numbers are in different ranges (ballparks) entirely and you do not want one very large number to affect things that much.
- For example, if you have following numbers: 1, 10, 100, 1000, 10000, the average is more than 2200. But a more appropriate “middle” number is 100 in this case. And 100 is the geometric mean here.
- Real world example: 0.98, 8.7, 121, 1400, 9000. From these 5 numbers, arithmetic mean is about 2100. That number hardly means anything. Geometric mean is about 105, which represents more of a central point.
- Interestingly, median is also a good measure in the previous case. But one problem with median is that the largest number (9000) was changed to 10,000 or 10 million, then still the median would be 121, and would not change at all. However, the geometric mean would change (increase) a bit. Arithmetic mean would change TOO much.
- Yet another scenario where geometric mean is more appropriate than arithmetic mean is when the numbers are given as percentage increases or decreases, rather than absolute values. For example, if the housing market rose 40% 1 year, dropped 40% next year, then an appropriate representation of average growth rate can be found by taking a geometric mean of 1.4 and 0.6, which comes up to be about 0.91, that is about 9% drop. The arithmetic mean would have conveyed a 0% average change.
Grading on a curve, “Yahan ka system, hi hai kharab” and Sir Ken Robinson
A generation ago, I was a student at IIT Delhi. But, I wasn’t a brilliant student. Had a fundamental disconnect with the grading process there – 12.5% of students (roughly, rounding etc) had to be given A, A-, B, B-, C, C-, D and F grades. (All grades other than F are considered passing grades). While there were some professors who deviated from that significantly (and certainly didn’t hand out F grades), that still was the suggested grading policy. That bothered me significantly. Should I hope for my friends to mess up royally, so that I can get a better grade? Should I help someone else study, and in the process, screw my own grade? Should I lend you the book so you can push me to a different color of the pie chart? Even more innocuously, should I really study hard, and in the process push you down one bit? (Really, what kind of friend am I?)
Call it the Buridan’s ass, the obstinacy of the teenage years, or give it a fancier name, but when the system bothers you, you write and sing songs like purani jeans aur guitar and repeat “yahan ka system, hi hai kharab”. 
And now, how the times have changed. Some of my students at GWU ask me, “Would you be grading on a curve?” There is optimism in their voice. The answer they are looking for is “YES!”. “Most definitely!!” “¡Cómo no!” I kind of understand that. The students who are hoping that I grade on a curve, are suggesting that their score may be lower than my “flat passing line”, so if I grade on a curve, they stand to benefit. But alas, I deny them that very easy pleasure. I remind my students that I am going to grade everyone individually, and that I am not bound to give 20% As, 40% Bs and 20% Cs etc. All of them may finish with an A. Or, all of them may finish with a B. One student’s grade is independent of how well the entire class does. The simple reason I do this is to promote goodwill among the students, and to encourage collaboration (wherever appropriate).
In the collaborative aspect, I am in extremely distinguished company. In this awesome video, Sir Ken Robinson, international adviser on education in the arts talks about how the current education system stifles creativity, rather than encouraging it.
Locating a Distribution Center
Locating a Distribution Center is perhaps the most commercial application of the general facility location problem in computer science. Consider a retailer which imports goods for selling in three cities, Indianapolis IN, Columbus OH, and Lexington KY. Further, suppose that this retailer uses an ocean carrier, and the port of call is Norfolk. In that case, where should the distribution center be located? Here is a map view:
Apps
