|[Home] [Credit Search] [Category Browser] [Staff Roll Call]||The LINUX.COM Article Archive|
|Originally Published: Monday, 20 November 2000||Author: Kevin Cullis|
|Published to: featured_articles/Featured Articles||Page: 1/1 - [Std View]|
Software Quality 101: Essential Concepts
We carry a lot of content on Linux.com, and a large portion of it has to do with software development. While most of it deals with linkers, compilers and programming languages, our weekly feature this week is written from the quality assurance aspect. Kevin Cullis has been in QA for years, and although he comes from the business perspective, he makes a few extremely valid points to the hacker community.
Linux and the quality of Linux have become the topic of late in the media, both print and on the Web. As a two-year user, I find that Linux has met most of my computing needs. However, if Linux wants to increase the user base, improvements in the developmental process are needed in the operating system and software. Linux software needs to solve problems by providing solutions not only to the developer, but also to the business and hobby users.
Linux solved my reliability problems, but created other problems in usability and software availability. These should be addressed in the near future. Linux developers and users in the community need to realize that the process developed determines the level of quality it will achieve. Below are some key concepts that may help Linux increase the rate of world domination.
Since the first incidence of a broken bow in the pursuit of early Man's next meal, the concept of quality has been important. With the advent of mass production, the level of quality has become increasingly important, especially in the past 30 years. In the 1920's, Dr. Walter Shewhart designed the first Control chart for the manufacturing process, and later with Dr. W. Edwards Deming's and other people's help, a new approach to quality improvement was born. Since then, Total Quality Management, Process Improvement, Quality Assurance, Quality Control, Kaizen, Reengineering, Quality Improvement, Quality is Job One, and Six Sigma Quality are all "programs" started by various organizations to improve products and services for their customers. The key point in each initiative is continuously improving the quality of a product or service over time to achieve reduced costs, increases in sales or adoption, and ultimately, the ability to stay in business. The focus is always the same: getting better at business! The "how to do it" will depend on people and the skills and knowledge they possess and will learn, the trust, understanding, and communication levels of all those involved, and the market. However, in most cases, quality takes a back seat to features. If Linux doesn't keep pace with improvements in all phases of software development, Microsoft or the Mac OS will!
Some organizations in the past have made their process improvement program work. They include: Ford's Quality is Job One, Tata Consultancy Services of India achieving Software Engineering Institute's Capability Maturity Model's Level 5 certification, Toyota's Total Production System, Motorola's Six Sigma, or the Ritz-Carlton winning the Malcolm Baldrige National Quality Award a second time. Others have spent lots of time and money and never got it "right." A recent article in American Society for Quality Software revealed that for every $1 invested in process improvements, there resulted a $4-7 return in savings. Regardless of successes or "failures," the principle of process improvement still applies. When you improve the process, you increase productivity with the result of reducing costs, increasing sales and profits.
Today's software industry attitude as a whole is reminiscent of Detroit during the 80's when Japanese auto quality began eating away at the American automobile market share. While Linux may not have the exact same quality issues as other systems, evidence shows that new skills and habits need to be learned by the IT industry in order to reach the next level of quality.
Why should an organization focus on a Process Improvement Strategy? Where do they focus their effort? What does it take to move forward? First things first:
There are three parts of the "Big Picture" to providing a solution and solving a problem: Product, Process, and People. These comprise an organization's internal customer system and the external system consists of their customers and suppliers. People design and oversee the processes put into place that produce the products or services to be sold. The customer's expectations determine whether the product or service adds value, i.e. solves a problem, which then becomes a sale. It can also be seen in the "small picture" as well: a small applet, which works with GNOME or KDE, such as Xmcd.
Linux is the product! A customer may not be satisfied with the quality of a certain product or service and looks elsewhere for solutions. Or, a customer's expectations may be satisfied with what is currently produced in the market, but someone comes along and says, "I can do it better!" At that moment of decision of either the customer or the producer, the forces of need and necessity come together to increase expectations and reach another level. In a Free Market, producers raise the bar while the customer benefits from that competition. That competition produces different levels of quality.
Level of Quality and Market
Most of us talk about the quality of other operating systems and how Linux has better reliability. Keep in mind there are different levels of quality. There is a Saks Fifth Ave, a Nordstrom, and a K-Mart level of quality. Depending on how you define your level of quality and design your processes will determine the level of quality and the target market to aim for.
Cost of Quality
When undertaking a process improvement effort, care must be given to the Cost of Quality, which includes three areas: prevention (of software defects with better designs), appraisal (of quality by testing, vendor quality, and documentation), and failure (internal, dealing with the customer). Just look at Detroit when the superior Japanese auto quality during the 80's began eating away at Detroit's market share; it is looked at as the Cost of Poor Quality or "good enough quality" when better quality arrives.
A system is the sum total of all of an organization's processes. For example, a Linux software developer such as TurboLinux would purchase computers, testing software, or a LAN network from suppliers to help produce their distribution for customers. Usually, there must be constant scanning of product, process, and people to find weaknesses as well as the bottlenecks to ensure there is constant improvement in delivering quality. From a TurboLinux customer's perspective, a system can be defined as the sum total of all of the company's and supplier's processes put together to produce one product or service. For example, selling one CD of a Linux distribution is the sum total of everything in that system; a service configuring and setting up Web servers; or writing a program such as IP chains. A system can also be described as the life cycle of software development: requirements gathering, architectural design, coding, compiling, and testing. What it usually does not include in normal conversation by developers is documentation of software for the users, but it should. However the system is defined determines the results, and the Linux user is the end result of the software development system.
Everything within a company begins with a process. At a high level, finance, operations, sales, warehouse, human relations, and research and development are all processes within a system. Within each of these divisions, there are processes to hire people, calculate commissions for sales, take a customer's order, select a supplier, determine a company's product or service within a given market, produce the product or service, market the company's product or service, collect and pay taxes, and even a process to order supplies from a local office supply store. In software development, it's coding or writing code comments to increase programmer participation. Whatever needs to be done, there is a process, which is hopefully defined and followed to accomplish the intended results, whether it is written or verbal. In some cases, processes are in place and followed, while in others they are not followed exactly because of the "need for speed" or to "get it done!" Whichever the case, processes are designed and followed to meet or exceed someone's expectations, either the customer or the producer. The larger the organization, such as Redhat or SuSe, or a project, such as the Linux Documentation Project, the more essential the communication of the process becomes to those willing to participate and/or contribute.
A process is a step by step, thought out design and/or course of action to produce an intended result and to ensure consistency of quality. A process can also be considered a mathematical formula, such as 1+1+1+1 = 4. While the formula and the answer are precise, the results will always be the same given the exact same inputs and operators. The expectations are the same within a company. However, when the operators or inputs are changed, much like a person getting sick or the computer goes down, the results will change. As long as the process doesn't change, the results will always give the same answer (see Variation below for more info).
One of the most crucial principles of process design is that in today's business climate, tasks or jobs are no longer the benchmark of business, but process design have taken their place. Smart companies ensure that workers focus on contributing to processes, not doing a job.
Nevertheless, the working world is not as precise as a mathematical formula and variation1 enters into the equation. Variation is the inherent changes of a process, which varies actual results around the process capability results. The average person may take about 30 minutes to drive from their home to their place of work. Since this is the real world, that person doesn't arrive precisely 30 minutes after they left for work every time, but the drive usually centers around the 30 minute drive because of variation of the driving-to-work process. Every process has variation because of problems that occur and there are two terms that further subdivide the term variation: common causes and special causes of variation. Common causes are the most predictable and occur "naturally" within the process and should be eliminated over time while special causes are a different matter altogether and are far less predictable.
Both common and special causes can be eliminated, but special causes are the most difficult because of their unpredictability, such as lightning strike or an earthquake. When considering variations, focus on why it occurred, and what is the root cause (not the symptom) of the variation. Is it predictable and can it be eliminated? Whatever the case, the question is why problems occur; the goal is to find the root cause of the variation and reduce or eventually eliminate it.
An additional point of a process improvement is that it should include traceability. Traceability is the documentation of a process so that assurance of quality is achieved; there is the prevention of errors over time, and to simplify the investigation of product or services failures. Documenting the traceability of a process allows people to discover loose ends of a process and reduce or eliminate non-value-added effort that could affect the outcome of the process. While it might seem that the process moves slower, in the long term it prevents heroics, maintains consistency of production, and causes a ripple affect throughout an organization as to what affect changes have on the rest of the organization, i.e. changes to the system.
Metrics and Statistics
Metrics are the various measurements of a process. While most companies measure sales as a result of a process, most do not measure the sales process itself to determine where and how to make improvements. The sales process usually follows these processes: Lead, Quote, Order, Sale, Backorder, and Delivery and most companies don't drill down far enough to utilized that information as a source of opportunities for improvement or as competitive intelligence. How many sales were lost and why? How many Quotes were given out and why weren't they turned into Orders? Why was there such a long Backorder time? As the saying goes, "What gets measured gets done." If a process is not measured, then there can be no determination of the value of the process. Sales are a metric that are constantly used, but only tell part of the story. Were customers satisfied with the value of the product that they received from the company? Notice the issue is value, not price. Everyone would love to buy a computer for $1, but it won't happen. So the question becomes, are they satisfied with the value of what they received, if they are, then you have a true sale. However, if what you sell them doesn't meet their value expectations, then you have a potential loss of a customer.
While metrics are the measurement of a process, statistics are the quantification of the measurement data for the basis of inference and induction. Statistics do not have to be ANOVA, Piosson, or Weibull statistics, but can be very simple counts, maximums, averages, or minimums used in a standard checklist. I once talked with a Quality Director and he told me not to talk about statistics because a Big Eight Accounting firm had scared them on the cost of the process improvement effort. Needless to say, if you don't measure anything, how do you know if you're getting better? In addition, when designing metrics, there needs to be an emphasis not only on what the statistics say, but consideration must also be given to what they do not say, such as customer satisfaction example above. The collection of data is for the clarification of a problem or opportunity and insure that it represents typical conditions and reveal the desired information, not what is easy to obtain. Collection of data is easy, sifting through the data to find the gold mine of information that can be translated into a competitive advantage can be time consuming, but well worth the effort once it is known.
Here are some actual numbers of a company's server farm and the metrics, which were used in analysis (while Unix is the actual OS by this company, Linux can be equally substituted for greater potential savings).
Server Admins Salary / each # of machines
Now, most managers would look only at the salary of an individual and say that it's cheaper to run NT than Unix. In Quality Assurance circles, this type of focus is called sub-optimization, only looking at the smallest part and not looking at how it performs within the total system. This is the prevalent thought pattern people who would fight tooth and nail to make sure costs are cheaper and cheaper. In QA circles, this is focusing on COST only, not productivity! Let's look at the REAL numbers and the administrator's salaries.
Server Admins Salary / each # of machines Ratio Cost
So, one Unix administrator takes care of 15 machines versus an NT administrator takes care of six machines. Now, let's look at the total cost of the salaries. Now, if you were to take the 1:15 ratio and convert the NT servers to Unix, what would the cost be (180 servers divided by the ratio of 1:15 administrators):
Server Admins Salary / each # of machines Ratio Cost
Botton line: The difference (savings) between the NT and Unix (new) is $600,000 in first year savings alone! In addition, that's a 50% increase in a person's salary! This is not taking into consideration that the NT servers may not be able to handle the traffic as the Unix systems can and therefore fewer than 180 may be needed. This does not also consider security issues, etc. but is only used to address metrics and analysis and to show fallacies, which are strictly adhered to (Your mileage may vary!).
Cycle time is the time it takes to produce one process and/or the complete effort of the total system depending on the definition. A system produces one system cycle whereas a process produces a process cycle. Back to the driving to work example, driving to and/or from work is one cycle depending on the definition of a cycle whether it's to, from, or to and from home or work. Now that the cycle has been identified, improvements can be pursued and metrics established. But notice one thing, first a process has to be defined before cycle time can be measured. Without a process being defined, there can be no cycle time demarcation to establish improvement efforts.
Once you understand process, variation, cycle time, and metrics, now the emphasis is on predicting the outcome of a process over the course of a specific frame of time.
Once processes are in place and documented and the company's system defined, bottlenecks of the system can be identified. The purpose of defining a system with its myriad of processes is to accurately determine all of the critical processes that contribute to the final product and begin to locate and eliminate the bottlenecks of a system. Without good system definitions, efforts may be applied to inaccurate or illogical problems that may cost time and money, which could be better spent on competing in the market.
In Example 3, the presence of software defects is mostly concentrated in the coding of the development process. For this example, 138 defects were found, while the other processes have less than 40 defects each. Upon investigation, we find that the bottleneck is in the coding. While improvements can be made on other processes within the system, most of the improvements won't directly apply to the overall production of the system, hence the necessity for documenting all of the company's processes and knowing the location of the system's bottleneck. If production is increased in coding with the reduction of defects, then focus can be placed on other areas. However, if the reduction of software defects in coding is reduced to under 30, then the next area to focus on is testing. The most interesting thing is that without the documentation and measurement of a company's processes, the managers of an organization may be making good decisions on bad or incomplete information. For example, if there were no measurements of defects in the compile process and the true measurement of defects were around 60, after the reduction of defects in coding from 138 to 40, then the effort would wrongly focus on testing and would waste valuable time and effort.
Implied in a system are two factors: definitions of the process and the demarcation of the processes. Both are essential in helping to determine the location and corrective action necessary to improve quality.
Effective versus Efficient
Being effective is delivering the right product to the right customers at the right time. Being efficient is delivering the right product to the right customer at the right time at the best value. However, being efficient without being effective is delivering the wrong product to the right customer at the wrong time at the lowest cost, i.e. poor quality. Your Cost of Quality just increased because you have to gain back the customer you lost! The focus of companies should be to be effective first, then become efficient.
Process Improvement Skills and Habits
Two of the most important aspects of a quality/process improvement strategy for software developers are: the skills, habits, and training they have in getting the job done to meet the customer's expectation and where they fit within the system of software development. Linux developers have the greatest opportunity for improving Open Source Software (OSS) because it is OSS. The complete understanding of process improvement must be taught or learned, expected, championed, and allowed to be used in order for a company to get better at doing business. Machines used in business don't have the ideas, people do, and without their involvement, productivity would be at a standstill or even get worse. A definition of insanity is appropriate here: Insanity is doing the same thing over and over, but expecting different results. Companies that don't involve and reward the people will ultimately develop "insanity." Process improvement knowledge, skills, and habits are in addition to the normal, everyday skills such as a programmer would use and is a part of the development of Linux software. Programmers don't need to be QA gurus, but they do need to understand their place in software development and how those skills can turn into better programming habits.
Ownership of the Process
Owning the process is taking responsibility for one's actions. Anytime an organization removes that responsibility may encounter disagreements and stress. In the quality improvement field, there was a tendency to establish a Quality Assurance branch or division. The results of those reorganizations were the "passing off" of the responsibility of the process and level of quality to the Quality Office and thus burdening the limited office personnel with too much work and not enough time to finish. By placing the ownership of the process at the level at which it belongs, to the person doing the job, quality is increased. But to continuously improve an organization's processes, a Quality Coach(es) must be identified to do research, train, investigate, assist, facilitate, advise, and support the organization's personnel in finding better ways of getting the work done: everyone is supported by the Quality Coach! Without a Coach, the effort can wallow.
Agreeably, not everyone wants to be an owner of a process, but if the question is not asked of the person doing the job, the answer will never be known and the opportunity to improve can be lost. Ownership of the process does require responsibility and some may not be ready or willing to accept it, for whatever reason. Given time, most will come around when they're ready.
Levels of Empowerment
While empowerment is a significant word in the lexicon of a organization's value system, there are different levels of empowerment. The CEO of a company will have different levels of empowerment than an administrative assistant or the director of a division. Linux developers are no different.
During an interview with the Quality Director of the Ritz-Carlton in Aspen, Colorado, for a Master's thesis, the Quality Director described learning about empowerment within their organization. They surveyed all of their front line employees, cooks, cleaning crews, and others and found that 90% of the decisions they would make solving a customer's problem were the same ones they were coming to the supervisor to help solve. So, the hotel chain empowered each employee up to $2,000 to solve a customer's problem and any amount above that would require the supervisor's involvement. With the supervisors now having more free time, they were not fired or let go, but allowed to concentrate on more strategic concerns and quality improvement efforts which added to the esteem of the workers and the supervisors. As one gets more experience, more responsibility and empowerment is allowed.
Process Improvement is not a Witch Hunt. If at anytime a process improvement program becomes a witch-hunt, the efforts toward improvement will be hindered. Why? Because most people don't come to work with the express purpose of failing. Most want to do a good job and just need coaching to improve. Managers should blame the process first before ever blaming the process owners.
Company employees must be adequately compensated and rewarded for their efforts, either intrinsic or extrinsic, but rewards must be given to thank those who have contributed to the improvements.
The evidence is overwhelming that process improvement is a necessity for organizations, not only to remain in business, but the remain competitive in the marketplace. Training is first, or the understanding of the why of process improvement. People require a complete understanding of a process improvement philosophy and should obtain training in these new skills and habits. Tools and techniques are second, or understanding the how of process improvement. These must be understood to do the right job with the right tools. Time is last, or can be considered the reallocation of "recovered" resources when there is an "investment" of resources up front. It is important that time be applied to these new training and tools to ensure that waste, both in time and money, is eliminated so that resources can be better utilized elsewhere within the organization.
During the thesis process of my masters program for the survey I was completing, I was seated in Denver General Hospital's Chief of Neurology office. I watched as a secretary hit a number of keys on an old text version of WordPerfect. As I watched her hit the 30-40 keys or so to bring up the memo template, I began to watch. I asked here how often she brought up the memo template: up to 30 times a day was her reply. I timed her keystrokes to bring up that memo template and found it took her about 30 seconds to compete that task. So, up to 30 times a day at 30 seconds each amounted to about 15 minutes a day was spent keying the same information. I taught her the keys to the macro recorder and asked her to retype the key sequence again and saved the macro. I asked her to hit the two macros keys at the same time and watched for her reaction as the template immediately popped up. Her smile said everything I knew process improvement was meant to be.
The fundamental principle behind process improvement is not the controlling of the code, which is being written, but the communication of the contribution of the code, whether it is good or bad. A library's card catalog does not limit an author's right to publish, but allows those looking for the author to find their work with greater ease and in context with other works. So to, process improvement does not limit a programmerís effort, but allows those who can contribute to your project greater access and thus greater participation in improving Linux software development. Point #9 of Deming's Fourteen Points of Management is to Break Down Barriers, and Point #12 is Permit Pride in Workmanship, so, keep on doing this in the Linux community by adopting this new approach to continuously improving Linux.
Think - Understand - Do!
"We have learned to live in a world of mistakes and defective products as if they were necessary to life. It is time to adopt a new philosophy in America." ~W. Edwards Deming
Common causes: causes of variation that are inherent in a process over time. They affect every outcome of the process and everyone working in the process (see also "special causes")." 4
Cost of poor quality: the costs associated with providing poor-quality products or services. There are four categories of costs: internal failure costs (costs associated with defects found before the customer receives the product or service); external failure costs (costs associated with defects found after the customer receives the product or service); appraisal costs (costs incurred to determine the degree of conformance to quality requirements); and prevention costs (costs incurred to keep failure and appraisal costs to a minimum)." 5
Empowerment: a condition whereby employees have the authority to make decisions and take action in their work areas without prior approval. For example, an operator can stop a production process if he detects a problem or a customer service representative can send out a replacement product if a customer calls with a problem." 6
Plan-Do-Check-Act cycle: a four-step process for quality improvement. In the first step (plan), a plan to effect improvement is developed. In the second step (do), the plan is carried out, preferably on a small scale. In the third step (check), the effects of the plan are observed. In the last step (act), the results are studied to determine what was learned and what can be predicted. The plan-do-check-act cycle is sometimes referred to as the Shewhart cycle because Walter A. Shewhart discussed the concept in his book Statistical Method From the Viewpoint of Quality Control and as the Deming cycle because W. Edwards Deming introduced the concept in Japan. The Japanese subsequently called it the Deming cycle." 7 Too often, organizations follow a Do-Act-Plan-Check approach. While an organization is starting out, the larger it gets, the more defined the process becomes.
Process capability: a statistical measure of the inherent process variability for a given characteristic. 8
Red bead experiment: an experiment developed by W. Edwards Deming to illustrate that it is impossible to put employees in rank order of performance for the coming year based on their performance during the past year because performance differences must be attributed to the system, not to employees. Four thousand red and white beads, 20% red, in a jar and six people are needed for the experiment. The participants' goal is to produce white beads, because the customer will not accept red beads. One person begins by stirring the beads and then, blindfolded, selects a sample of 50 beads. That person hands the jar to the next person, who repeats the process, and so on. When everyone has his or her sample, the number of red beads for each is counted. The limits of variation between employees that can be attributed to the system are calculated. Everyone will fall within the calculated limits of variation that could arise from the system. The calculations will show that there is no evidence one person will be a better performer than another in the future. The experiment shows that it would be a waste of management's time to try to find out why, say, John produced four red beads and Jane produced 15; instead, management should improve the system, making it possible for everyone to produce more white beads.
Special causes: causes of variation that arise because of special circumstances. They are not an inherent part of a process. Special causes are also referred to as assignable causes (see also "common causes"). 9
Structural variation: variation caused by regular, systematic changes in output, such as seasonal patterns and long-term trends. 10
Tampering: action taken to compensate for variation within the control limits of a stable system. Tampering increases rather than decreases variation, as evidenced in the funnel experiment. 11
Variation: a change in data, a characteristic, or a function that is caused by one of four factors: special causes, common causes, tampering, or structural variation. 12