Huawei's internal hardware development and design process
[Copy link]
[Reposted from the Internet, I hope you can gain something from reading it]
In 2007, I went to a small company for an interview with two years of work experience. After the written test, the other party was very impressed with me. But at that time, he said: "I need to hire someone who has worked in a large company, preferably someone who knows the hardware development process and specifications. Although you answered the questions well, we need someone with rich experience, preferably someone who has worked in Huawei."
At that time, I was wondering, "What are Huawei's standards and processes like?" Later, I went to Huawei, and I would like to share with you a few different points about Huawei's hardware development that I can think of.
NO.1 Documentation, review, design
When I first joined the company, three people were working on a circuit board. Although the circuit was more complicated, there was still some excess manpower. So, I was assigned to write the logic of converting PCI to UART.
I was a new employee at the time and was eager to show off. I used my weekends and it took me about a week to finish writing the code and start the simulation. I thought my mentor and supervisor would praise me, but he didn't. He asked, "Why didn't you call everyone to discuss? Then write a plan, review it? And then start writing the code?" I didn't understand at the time. I thought, I can do it by myself, why do I need to mobilize so many people?
After reflection, I found the following problems:
First, from the supervisor’s perspective, he doesn’t know the new employee’s personal abilities, so he will only feel at ease if you can explain clearly what needs to be done.
Second, from the company's perspective, there is a set of processes to ensure project delivery. This means that it is no longer too dependent on a person's personal ability, and the departure of any one person will not affect project delivery. This is also the most amazing thing about Huawei, which breaks down complex projects into very small pieces, so that it does not require particularly talented people to deliver projects. This is why Huawei's engineers earn only one tenth of Cisco's.
Third, from the perspective of effect, after all, a person's ideas are limited. The process of documenting ideas is the process of organizing thoughts; the process of discussion is the process of collecting ideas that you have not thought of. The formal review is the process of everyone reaching an agreement. Discussing in advance and letting relevant people participate in your design is much better than having others point out a fatal problem after you have finished designing.
Because Huawei has broken down a task, communication, documentation, review, and discussion have become very important. The disadvantages of this working model are also obvious: high communication costs and low work efficiency.
NO.2 Personnel composition in the hardware field
There are many roles within Huawei. Hardware people are responsible for the end-to-end product development phase. Being a single-board hardware engineer can cover the most areas, but it is also the type of job with the most diverse work content, the most contact with people, and the most disputes.
However, because there are people who are responsible for drawing PCB, EMC, power supply, and logic, which are the areas that hardware engineers should be responsible for, hardware engineers have lost all their skills and become "connecting wires".
In fact, it is not the case. It is precisely because each person is in a small field and there is no one to lead, so the role of a good hardware manager is very important and is a key role throughout all fields and all processes. As someone in Huawei's internal forum once said, hardware engineers are more like the "Cache" in the processor, a transit station for all links. Large companies divide people's work into such fine divisions to prevent a group of people from mastering too many core technologies of the company and going out to do it alone.
In fact, many people know that Huawei's IPD process comes from IBM. My personal understanding is that the IPD process has been modified in Huawei, combining the characteristics of the Chinese people and Huawei's corporate characteristics. If Huawei rigidly applies IBM's process, it will definitely not be so successful.
So let's summarize Huawei's hardware development process:
Requirements analysis → overall design → topic analysis → detailed design → detailed logic design → schematic diagram → PCB → inspection → logic bonding → board casting → production trial production → board return debugging → unit testing → professional experiment → system joint debugging → small batch trial production → hardware stabilization → maintenance.
The essence of the process is that after this link is done, we can move on to the next link. All the links are actually not much different from other companies, except that the assessment conditions for entering the next link are strictly controlled. What bothers hardware engineers the most is that "no node corresponds to the 'board'".
Huawei's system that supports the IPD process is PDM (also known as slow crawling)
The Chinese name of PDM is Product Data Management. PDM is a technology used to manage all product-related information (including part information, configuration, documents, CAD files, structure, permission information, etc.) and all product-related processes (including process definition and management). All Huawei device data, product components, tools, documents, schematics, PCBs, logic codes, etc. are stored in this system. However, the system is too complex and difficult to use. It is also easy to confuse with server archives and SVN archives.
Hardware engineers generally understand that they should choose the lowest-cost components and the smallest number of components on a board to facilitate centralized purchasing and processing. However, other companies may not be so meticulous and strict in the work of component normalization.
First, since Huawei uses a wide variety of devices, if the code of a device is reduced, the benefits will be from RMB 100,000 to several million, while other companies may not be able to achieve such high benefits. Therefore, if a code can be reduced, it is better to choose a device that may cost more. However, this also needs to be compared with the difference between the annual direct cost benefit of the device * the number of device shipments and the coding cost + processing cost. However, after the device is normalized, the price of the device can be renegotiated with the supplier, and this benefit is iterative. Therefore, sometimes even if the cost is superior, the conclusion of denormalizing the device will be inclined. For example, the resistors with a precision of 5% are gradually removed and normalized to 1%.
Second, component normalization requires special analysis. This is because some engineers did not fully analyze the circuit principles for normalization, which led to "problems introduced" by normalization. Therefore, my department had an Excel spreadsheet at the time, "Component Normalization Analysis.xls", which recorded and analyzed the reasons for each component, the original selection, the normalized selection, and the reasons for the change. First, every employee who does normalization should fully consider the analysis, second, all problems are recorded for easy review, and third, if there is a problem, it is easier to punish.
In addition to device normalization, a higher level of normalization is single board normalization. (Let me clarify the concept of single board. When I first arrived at Huawei, I also thought this term was strange. Because communication equipment is composed of a chassis, a backplane, and circuit boards for various functional modules. The circuits of various functional modules are called "single boards", and hardware engineers generally call them "single board hardware")
The benefits of single board normalization are, first of all, fewer types of circuits. There are three benefits of fewer types of circuits:
First, the production cost is reduced;
Second, hardware maintenance costs are reduced;
Third, the cost of software development and maintenance is reduced.
First, the prerequisite for single board normalization is processor normalization. In fact, some Huawei products are not good at this. X86, MIPS, ARM, and PPC are all used. Therefore, a hardware platform needs to be equipped with various software personnel, N sets of operating systems, VxWorks and Linux, and various BIOS matching.
Second, the standardization of single boards requires attention to product derivatives. If the functions implemented by the single board on the first version of the chassis can be used by subsequent products, they should be directly usable without further development. If this is not paid attention to, the single board of the first version will be found to be incompatible with each other when the second version is released. Conversely, the circuit board of the first version will be modified to adapt to the new version. Sometimes the problem is even worse, that is, it is completely incompatible and has to be redeveloped. Single board planning is very important.
Third, when the single board is normalized, although the circuit part is compatible, the structural parts are not compatible. For the configuration of the marketing personnel, there are still two configurations. It is also a failure.
If you find that different hardware platforms have similar architectures and functions, then the chassis can also be normalized. You only need to make different circuit function modules to achieve different functional requirements.
However, different hardware forms all have their own meanings. If they are forced to be unified, the market may not accept it. For example, using an operator's platform to unify a product for enterprise applications or home applications may not be successful.
4 Network architecture normalization
This statement is my own idea. As early as 2008, Huawei was discussing the "cloud-pipe-end strategy", but I didn't quite understand it at the time. But when our operator platform department merged with the "server" department, I seemed to understand something.
When the X86 processor is powerful enough, all operations, regardless of whether they are the most cost-effective, will be sent to the cloud for processing, and all intermediate storage and computing will become unimportant. Then the structure of the entire network is terminal + pipeline + cloud storage and cloud computing.
I think many hardware engineers have a misunderstanding, thinking that their core competitiveness lies in using a few software (cadence, Protel), drawing schematics, and drawing PCBs. This was the case in my early job. My greatest skill was to copy the demo board and the previous mature circuits. If I encountered a new circuit design, I would usually draw the circuit according to the reference circuit first, then debug it, try it out, and solve the problem when I encountered it.
My current belief is that the most valuable thing about a hardware engineer is that he or she understands hardware principles, circuit analysis, analog and digital electronics principles, and electromagnetic field theory, rather than being able to use drawing software.
So how does Huawei do circuit design? Why is there a term called special analysis? Why do we need to do special analysis when designing circuits?
Second, when encountering some new problems during the circuit design process, problems that the team has not encountered before, or content that is considered to be key or difficult, we will do a special analysis of this problem point: for example, some of the dual BIOS boot we have done, the camera's infrared LED driver, master-slave switching, and so on, we will analyze a problem point thoroughly, and then start drawing a schematic diagram.
Third, when developing hardware, the demo is only used as a reference. Everything is based on the datasheet. In addition to reading the chip data sheet, you must also carefully check the errata in the data sheet to check the difference between the datasheet and the demo. If the device has a checklist, you must also check the checklist. When developing AMD, the datasheet, demo, and checklist did not match. There was also a problem that was difficult to reproduce. Later, after checking the errata, it was found that the manufacturer had upgraded the chip and fixed the bug, but we were still purchasing the old version of the chip.
Fourth, since the project itself has a delivery time requirement, it is actually impossible to thoroughly investigate every problem point within a limited time. So here comes the question:
How do you do it? First of all, each project has an "Issue Tracking Sheet", and the hardware team has a lot of things to do, so this sheet must be used very well, otherwise it is normal to lose things. I have applied this sheet to my home decoration. The principle of this sheet is very simple, that is, to record the content of the problem, the person responsible, the completion status, and the completion time. But as long as you insist on using it, you will find that you will not lose track of the problem, you will be more organized in doing things, and you will have a sense of accomplishment. After using this sheet, after you find a problem, record it first. Even if you don't solve it now, you will also identify whether it needs to be solved and when to solve it. Secondly, prioritize the problems. Any project moves forward with risks, so identify high-risk problems, solve high-risk problems first, and continue with low-risk problems. This is also one of the reasons why "0 ohm" resistors are used more in Huawei circuit design. After identifying the risk, but not analyzing it clearly, or not analyzing it in time, you have to do compatible design. I have to sigh here that in your design process, if you are careless and don't analyze the problem clearly, it will definitely be exposed in the end.
Therefore, as a hardware engineer in the "Chrysanthemum Factory", "topic analysis" is the core work of hardware design, rather than drawing schematics. Through this method, it takes 1 to 2 months to do circuit analysis and 1 to 2 weeks to draw schematics, instead of drawing, debugging, redesigning, and redesigning. It is impossible to achieve more, faster, better, and cheaper at the same time, so hardware engineers have the responsibility to make good compromises and trade-offs.
NO.6 Special Topic: Device Selection Specifications
1. About “Device Selection Specifications”:
When I joined Huawei, the entire company was in the "standardization" movement. Everything was written in specifications, and everyone wrote specifications. Everything, including positions, performance, and technical levels, was based on specifications. (Large companies use KPIs to guide, which can easily turn into a "movement"). So at that time, many people wrote various device selection specifications according to the type of device. At that time, when reviewing schematics, the most common thing I heard was "The specifications are written like this." There are some problems with this:
1. The person who writes the specification may not be highly skilled, or may not write it in detail. If errors occur, it will be even more harmful.
2. Specifications sometimes inhibit the thinking of developers. If everything is done according to the specifications, it may not be suitable for the actual design scenario. For example, if I need a low-cost design, but the specifications emphasize high quality, they may not be applicable.
3. With the specification, some developers will not think about it. For example, the crystal oscillator is required to be above 50MHz, and pF capacitors are placed for power supply filtering, but it is not required for less than 50MHz. No one thinks about why, and naturally does not know why; for another example, the network port transformer protection, indoor and outdoor, according to the design requirements of various EMC standards, can be directly drawn; but few people think about why, and do not know the test results, and when they encounter difficulties in practice, they are at a loss. It is true that sometimes it improves work efficiency and product quality, but as tools are developed, people will degenerate, which is inevitable.
4. Some device selections are not suitable for writing specifications because the devices are developing too fast and may become obsolete by the time you finish writing the specifications. For example, after the X86 processor entered the communications field, the processor selection specifications became redundant.
Specifications do bring benefits. However, not all work is suitable for being constrained by specifications. Hardware engineers need to be able to think beyond the "reference circuit" and "specifications" and think about problems and designs from the principle.
Of course, specifications are still very useful means, which are the essence of a lot of theoretical analysis + experience accumulation + practical data. I think the specification I read the most at that time was the "Derating Specification for Component Selection", which is based on a large number of tests and actual cases, and summarizes the content that needs to be considered when selecting components.
For example: when specifying the use of aluminum electrolytic capacitors, it is necessary to consider that the steady-state operating voltage is 90% lower than the rated withstand voltage; for tantalum capacitors, the steady-state derating requirement is 50%; and for ceramic capacitors, the steady-state derating requirement is 85%; because some factors are taken into account here, such as the effective mode of some devices, the worst environment (high temperature, low temperature, maximum power consumption), the difference between steady-state power and transient power... and so on.
2. Factors to consider when selecting devices:
In Huawei's PDM system, devices have several levels of preference, such as "preferred", "non-preferred", "forbidden", "terminal-only", etc. Engineers can intuitively feel whether a device is preferred based on this preference level.
So what factors are considered when selecting the optimal grade of a device?
1. Availability: Especially for manufacturers like Huawei, there are a large number of products shipped. Carefully select devices whose life cycle is declining, and do not use discontinued devices. I designed a circuit in 2005. When designing, I copied someone else's circuit. As a result, when processing, I found that the device was not available at all. Since the device was discontinued, I could only buy refurbished devices in the electronics market. For key components, there are at least two brands of models that can replace each other, and some even need to consider solution-level replacement. This is very important. If it is an exclusive product, it requires reporting, decision-making, and risk assessment at all levels.
2. Reliability:
Heat dissipation: Power devices should preferably use packages with small RjA thermal resistance and larger Tj junction temperature; processor selection, if the performance is satisfactory, try to choose devices with lower power consumption. However, if it is a device monopolized by Intel, you can only endure it and add a heat sink and a fan.
ESD: The anti-static capability of the selected components must be at least 250V. For special components such as RF components, the anti-ESD capability must be at least 100V, and anti-static measures must be taken by the designer. (Note: Huawei has strict requirements and prohibits handling boards with bare hands. I didn’t understand it at first, but when I led the team, I found that my brothers spent a lot of time repairing boards. Our team is very strict about this. It seems to reduce efficiency, but it actually improves efficiency. At least you don’t have to always suspect that the device is damaged by static electricity.)
The selected components should take higher moisture sensitivity levels into consideration.
Safety: The materials used must meet the requirements of anti-static, flame retardant, anti-corrosion, anti-oxidation and safety regulations.
Failure rate: Avoid components with high failure rate, such as labeled DIP switches. Try not to choose bare die components, which are prone to cracking. Do not choose glass encapsulated components. Do not choose ceramic capacitors with large packages.
Failure mode: It is necessary to consider the failure mode of some devices, whether it is open circuit or broken circuit, and what consequences will be caused, which also needs to be evaluated. This is also an important reason for careful selection of tantalum capacitors.
3. Manufacturability: Do not use components with package size smaller than 0402.
Try to choose surface mount devices, which only require one reflow process to complete the soldering, without the need for wave soldering. If some plug-in devices are unavoidable, it is necessary to consider whether the through-hole reflow process can be used to complete the soldering. This can reduce the soldering process and cost.
4. Environmental protection: Since a large number of Huawei products are shipped to Europe, the environmental protection requirements are also relatively strict. Since the EU has put forward the lead-free requirement, almost all hardware engineers in the company were working on lead-free rectification.
5. Consider normalization: For example, if a certain product has already selected this device and is shipped in large quantities, sometimes the selection of this device is not very suitable, but it will be selected because it can not only renegotiate the cost by increasing the quantity, but also can be selected with confidence because it has been verified in large quantities. This is also the reason why we tend to choose mature devices and carefully choose the introduction and decline stages.
6. Industry management: For a certain category, such as power supply, clock, processor, memory, Flash, etc., there are dedicated people to plan and coordinate the use of the entire company, conduct market research, analysis, and write specifications in advance. They will also participate in the selection of new devices.
7. Device Department: There are colleagues in the device department who will analyze the failure causes of devices, perform reliability analysis, take X-rays of devices, evaluate device life, and so on.
8. Cost: If none of the above factors are fatal - the above factors are just clouds, keep a close eye on the eighth one.
Meeting Part 1: "Huawei's Meeting"
1. First of all, big companies have many meetings . Because the company is large, there are many departments, and the responsibilities of people are divided into fine details, so one thing requires the participation of many people. It is easy to have disputes. When I first arrived at Huawei, I was very uncomfortable. Everything had to be documented, reviewed, and met. So I was not used to so many meetings, and I would get bored during meetings. All the highest records of Snake were broken during that period.
2. There is always a person in charge of everything. Huawei gives the person in charge enough power to promote the development of things and coordinate resources. For example, marketing is strong enough to promote R&D to meet customer needs. Product managers and account managers are still very powerful and can directly communicate with the R&D director to promote R&D to do this and that.
3. All problems will eventually be recorded, tracked, and resolved. This is why even when the quality and performance of some equipment do not satisfy customers, customers are still willing to use Huawei equipment. This is why operators like to use Huawei equipment. When a problem arises, before it is determined which manufacturer is at fault, Huawei's brothers will rush in. Two people from China Unicom and six people from Huawei attended the meeting, and through testing and evidence, they proved that it was a problem with Juniper equipment. Then they gave a full report to tell the customer that it was not our problem, but the problem of XXX manufacturer.
4. If the forest is big, there will be all kinds of birds. So naturally, things like pushing, dragging, and blaming others will always happen. This requires a strong and clear performance evaluation system to guide employees to take the initiative to undertake tasks, rather than to draw clear boundaries. This kind of "clear division of responsibilities" is also inevitable. Otherwise, there will be no water to drink for three monks. Note: Huawei's practice of fully discussing everything is applicable in the field of telecom operators, but it is often not applicable in the consumer field or even the enterprise IT field because there is not enough profit margin to support this. So when I talked about some of Huawei's advantages, Huawei mobile phone users don't have to complain to me:-)
5. During meetings, people often fall into misunderstandings, either being too divergent or too conservative. In meetings at the product definition stage, people often remind us not to converge when we are divergent; in problem-solving meetings, people often remind us not to diverge and focus on the problem. The person who can remind everyone is often very important. Of course, sometimes it becomes a formality. Friends can read the next case "Huawei's internal discussion on how to improve Sun Yang's posture". In the meeting, people kept reminding us to focus, but everyone was still quite divergent.
Part II: Robert's Rules of Order
What are Robert's Rules of Order?
One hundred years ago, there was a good young man named Henry Martin Roberts, 25 years old, which is called "lengtouqing" in Chinese. He graduated from West Point Military Academy and was ordered to preside over a local church meeting during the Civil War. The result was - it was a mess. People argued happily, but there was no conclusion. In short, it was a mess. This meeting was worse than not having it. This young man was a bit stubborn. He said he would study it and make a rule, otherwise he would never hold a meeting again. He studied the discussion of meetings for thousands of years and came to a conclusion: humans are probably animals that love to argue, and are the most difficult to be convinced by reason. Once a disagreement arises, it is difficult to convince the other party through language communication in a short period of time. Otherwise, there will be no result after arguing for days and nights. And the more they argue, the more they feel that they are right and the other party is a fool. Therefore, there must be a mechanism for both parties to find common ground and reach a conclusion. He treated this research as a war and regarded people's argumentative nature as the enemy. In the end, this young man won.
The result of the victory was Robert's Rules of Order in 1876. He published it at his own expense and bought a thousand copies to give away. In 1915, the young Robert became a general and revised the rules. At first, people didn't take it seriously. How could a young man with no hair and no confidence take it? Alas, unexpectedly, it really worked. Once they implemented this rule, the quarrels stopped and the meeting was carried out smoothly. The ink bottles and benches were no longer flying around. As a result, Robert's Rules of Order became the most popular rules of order in the world.
There are three common issues in meetings.
First, going off topic: you mentioned Jet Li, I talked about Jackie Chan, I mentioned Zhu Bajie, you talked about Wen Jiabao and Li Peng. There was no end to the topic. And the old man especially loves to tell anecdotes. At the beginning, I told you a story, and then I talked about lunch.
Second, the one-man rule: This one-man rule is when the leader loves to talk. Whoever is the leader will talk endlessly. Once he speaks, he will do all the talking. The second one is that there are some people in the countryside who love to talk. There are also some who never speak. Our great party congresses and people's congresses are all like this.
3. Wild arguments: When discussing a problem, they say that you reported five yuan more last time, you are not a good kid, and they doubt other people's morals. They hold on to one word out of a hundred sentences. They even fight. The meeting cannot be held.
4. Interruption: Do not interrupt others when they are speaking.
One of Robert's Rules of Order is that the moderator should solve the above problems. However, in general enterprises, when the leader appears, the moderator will not remind the leader, "You are off topic," "You are the only one who has the final say," "You should not interrupt others' normal speeches." This is a typical example of how some foreign scientific theories and methods are often not suitable for China's soil and cannot be mechanically applied.
In fact, at Huawei, in most meetings, when "going off topic, talking alone, interrupting, or being uncivilized" occurs, the host will remind and bring the meeting back on track. However, this cannot be achieved in some meetings, for example: the leader is strong, the leader is the host, the host is a bootlicker, and some politically sensitive issues cannot disrupt harmony. I will not go into details here.
So how does Huawei solve these problems?
1. "Customer-centric", so no matter how powerful the leader is, he is not as powerful as the customer, and all customer needs must be promised and met. So everyone is trying to satisfy the customer, and there will be no major disagreements on principled issues.
2. Performance-oriented, everything is evaluated according to the results. So on some issues, if the leader proposes a plan, but there may be major hidden dangers, the subordinates have the responsibility to remind and oppose it. Otherwise, after causing serious consequences, the leader cannot escape and will punish the subordinates. They are all grasshoppers tied to the same rope. When a colleague puts forward an opinion different from the leader and it is valuable, this brother will be recognized based on the performance results. This is to educate employees, encourage objections, and encourage correction of the leader's mistakes.
3. Educate supervisors. Huawei advocates wolf culture. All supervisors who are promoted are usually full of wolf spirit, good at speaking, energetic, and talk a lot in meetings and when communicating with employees. This will easily lead to a one-man show or go off topic. Therefore, when training supervisors, we will educate the team leaders to listen, communicate, and grasp the rhythm and measure when communicating.
Part 3: Reduce Ineffective Meetings
I once supported CCB's network construction for a period of time. When I first went there, I held a meeting with their IT planning department. At that time, the meeting was a typical "one-man show". One of their leaders came over and yelled: "Why are your Huawei equipment so bad? Your Cisco equipment is also shit. Your Siemens service is too bad..." The people from CCB and the equipment manufacturers were stunned by his scolding. They just listened to his complaints. After scolding the equipment manufacturers, he started to scold his own employees. Then no one knew what this guy wanted to do. This guy couldn't tell what kind of equipment, performance and service he wanted. Then he left angrily.
One-man show, off-topic, uncivilized, these are not fatal, the most fatal is "ineffective meeting". When the leader left, everyone continued to discuss according to their own ideas and methods, and then spent 2 minutes discussing how to deal with this leader. So we need it when we hold a meeting, but how to hold it effectively has a routine.
So how do we do it?
First, for routine meetings, there are topics. For example, for weekly meetings, the topics for the weekly meetings should be arranged in advance, not just casually. Set the topics and the time for each topic to ensure that they do not go off topic.
Second, there must be minutes for each meeting. The host of each meeting and the person who takes the minutes must be clear. Meeting minutes are very important and require high skills, that is, you need to participate in the meeting discussion effectively and record the key points instead of keeping a running account.
Third, the meeting minutes should be divided into:
Conclusion (the conclusion of the meeting shall not be changed at will);
Remaining issues (must comply with SMART principles);
There must be someone responsible;
Required completion time, etc.
There is a template for minutes, reminding everyone that the minutes must comply with the SMART principles.
Fourth, follow up frequently and close the loop. All outstanding issues will be reviewed at the next meeting to see if they have been completed or delayed until an explanation is given. Of course, if there is a problem with the cashback task arrangement, the problem will be closed and suspended based on the evaluation.
Fifth, all decisions must be based on reason and evidence, not on impulse. Because if you improvise beforehand, you will improvise afterward. Then some people will just walk away. In this way, it will not be a decision where subordinates obey superiors and the minority obeys the majority. Of course, this will lead to efficiency problems, because some issues cannot be studied clearly in a short time and decisions cannot be made. This is where CCB comes in (CCB does not mean Construction Bank. CCB (Change Control Board) in CMMI (Capability Maturity Model Integration) means "Change Control Board". CCB can be a small group or multiple different groups, responsible for making decisions on which proposed requirement changes or new product features to implement. A typical change control board will also decide which errors to correct in which versions. CCB is the owner's equity representative of the system integration project, and is responsible for deciding which changes to accept. CCB is composed of multiple members involved in the project, usually including decision-makers from users and implementers. CCB is a decision-making body, not an operating body. Usually, CCB's work is to decide whether the project can be changed through review means, but it does not propose change plans. At least it will ensure that the decision is the collective wisdom.)
1. Comparison of Huawei and Xiaomi tests from the perspective of progress
According to the weekly release schedule of Xiaomi UI, there is a one-day internal test on Thursday. I can't install it according to Huawei's process.
The doubts are:
1. Does internal testing refer to self-testing by developers or testing by testers?
2. If it means self-testing by developers, then where do testers test?
3. If it is the testers who test, what about the developers’ self-testing? Where is the point of transition from development to testing?
Friends with a Huawei background will definitely ask: How is it possible for testers to complete the test in one day?
Some people may say that Xiaomi is just very efficient.
Then let's take a look at Huawei's testing process, and you will know whether the relevant tests can be completed in one day.
First of all, Huawei's software department, including the UI or website development team, also develops in small steps. After the product is stable, new requirements will be split into small versions for the shortest development and testing. It is also possible that Huawei's ability to break down requirements is weaker than Xiaomi's, but here we are simply talking about the testing process.
Testing is an essential part of the product development process. Among Huawei's R&D personnel, nearly one-third are testers.
Huawei's testing system started early in China and has gone through the following stages:
1) Bronze Age: Handcrafted Workshop Test
In 1996, the R&D and testing team was established to conduct R&D and testing in a workshop-style manner.
2) Iron Age: IPD and CMM Stage
In 1998, Huawei cooperated with IBM and began to introduce the IPD process
Introduced CMM concept around 1999
Generate IPD-CMMI process
In 2004, the PTM process was developed based on IPD, and automated testing was carried out on a large scale.
Around 2006-2007, PTM became more mature
Note: The meanings of the TR points in the above figure are as follows:
HLD: Highlight Design Document;
LLD: Detailed Design Document;
1. UT
The object of unit testing is the program unit or module defined in LLD, which is also the largest unit that can be tested in unit test case design. The test object may consist of one or more functions or classes, and test design is to design test cases for the test object.
The purpose of UT is to check the compliance of module code with LLD documents through function operation, and verify whether the input and output response of each function is consistent with its pre-defined in the detailed design document. Function is the most basic unit of product development implementation, and the next implementation unit is module. From the perspective of testing, it is hoped that after UT is completed, each function will be solid and reliable. The next step of IT testing will focus on whether the coordination between functions can achieve the distribution requirements, without worrying about the input and output response problems of the function itself.
Unit testing is more suitable for developers.
2.IT
Integration testing refers to the testing of assembling several units that have passed unit testing together. Integration testing should be based on HLD and mainly find errors or imperfections in interfaces and dependencies. The object of integration testing is a combination of several unit test objects, at least two.
The purpose of IT is to decompose modules according to module design, starting from verified functions and integrating upwards layer by layer to obtain an executable module.
IT can be done by developers or testers. It is not difficult to see that UT is a test for each unit, and IT is the interface between test units. UT/IT can be classified as "unit-level" testing.
3.ST
System testing defined by CMM: System testing is an overall test of the software system developed by the software project team. It runs the software system as a whole or implements a clearly defined subset of software behaviors. The main testing method used is black box testing, that is, regardless of the internal implementation logic of the program, to verify whether the input and output information meets the requirements specified in the specification. It can be seen that the test object of ST is the specification, or more precisely, the module requirement specification, so it is generally also called MST. The module SRS document gives the corresponding requirements for the input and output of the module. After MST, each module is solid and usable.
4.BBIT
BBIT is an inter-module interface test that verifies whether the interfaces between modules can work together. Sometimes it is mixed up with joint debugging, but the purposes are actually different. The purpose of BBIT is to decompose the system according to the system design, starting from the verified modules, and integrating upward layer by layer to obtain a runnable system. Joint debugging generally involves software, hardware, or coordination testing between different products. MST and BBIT can be classified as "module-level" testing, one verifies the module, and the other verifies the interface between modules.
The above UT/IT/MST/BBIT are generally completed by developers, the system can basically run, and testers can carry out SDV, SIT, and SVT.
5.SDV
Although SDV is a system test conducted by testers, it is somewhat gray box testing because SDV verifies whether the coordination of each subsystem meets the design requirements (DR). It still focuses on the internal implementation and verifies whether the integration of multiple modules meets the design requirements.
6. SIT
SIT also verifies whether the design requirements are met. Unlike SDV, SIT completely treats the system as a black box for testing and does not care about the specific internal implementation. In actual applications, although SDV and SIT are both system-level tests, they are often tested separately by testers from different project groups (subsystems). They only focus on their own subsystems, so it is better to classify SDV and SIT as "subsystem-level" tests.
7. SVT
SVT is an acceptance test, and its test object is the product package requirements OR. The product package requirements give the scope of the product and characterize the system from the perspective of the possible application environment of the product. The purpose of SVT is to confirm (or accept) that the product can meet the various application scenarios given by the product package requirements.
Even for web development projects, outsourcing projects, and terminal projects, Huawei's testing still goes through the following testing stages:
SIV:System Integration Verify
SDV: System design Verify
SIT: System Integration Test
SVT: System Verification Test (system simulation test)
After the iteration is completed, before the official release, all the stories implemented in the previous iterations will be tested again. The main body of the test is the tester, including functional and non-functional, and a test report must be given. This activity is called SIT or release testing.
If the Story test and the Iterative SDV test are both automated, this test mainly executes the automated use cases, performs supplementary tests if the previous tests are insufficient, and performs detailed performance tests. If the use case automation is not high, this test will select some for testing. A test report is required after the test is completed.
SIT test focus: After all iterative development is completed, the testers in the iterative development team will complete the regression test of the whole system to meet the quality standard of TR4A. The remaining issues must meet the DI (defect density) target of TR5.
4) Army Group Era: IPD-RD-I&V Stage
Agile was promoted around 2008, and R&D organizations evolved into PDUs.
Introducing the iterative development model to form the IPD-RD-I&V process
System Integration and Verification Process: IPD-RD-I&V (I&V: Integration and Verification)
Project Management Forum
After the Test Plan is written, it needs to be reviewed. The participants include the project manager, test manager and system engineer. The test team leader needs to modify the Test Plan based on the review opinions and upload it to VSS, which will be managed by the configuration administrator.
Project Managers Alliance
After the developers have summarized the SRS and set the baseline, the test team leader will start to organize the test members to write the test plan. The test plan requires the design of a plan including a brief introduction to the demand point, test ideas and detailed test methods according to each demand point on the SRS. After the test plan is written, it also needs to be reviewed. The reviewers include the project manager, developers, test manager, test team leader, test members and system engineers, and return the review results. The test team leader organizes the test members to modify the test plan, and only after the review is passed will it enter the next stage - writing test cases.
Test cases are written according to the "Test Plan". After the "Test Plan" stage, testers have a detailed understanding of the entire system requirements. Only then can the use cases be written to ensure that they are executable and cover the requirements. Test cases need to include test items, use case levels, preconditions, operation steps and expected results. The operation steps and expected results need to be written in detail and clearly. Test cases should cover test plans, and test plans cover test requirements, so as to ensure that customer requirements are not missed. Similarly, test cases also need to be reviewed by developers, testers, and system engineers. The test team leader also needs to organize testers to modify the test cases until the review is passed.
During the stage of writing test cases, the developers have basically completed the writing of the code and completed the unit test at the same time. After transferring to the test department, the system test will be carried out directly. The test department will pre-test the test version that has just been transferred. If the software does not achieve 10% of the CheckList, the test department will reject the version. Otherwise, the software will be transferred to the test department for system testing. According to the schedule of the "Test Plan", the test team leader will conduct multiple rounds of testing. After each round of testing, the test team leader needs to write a test report, which includes the execution of the test case, the distribution of defects, the cause of defects, the risks in the test, etc. At this time, the tester will modify and add test cases. After the development has modified the bug and transferred the new test version, the test department will start the second round of system testing. First, the problem list will be regressed, and then the test will continue to write the second round of test reports. This cycle will continue until the system test is completed. During the system test, the tester also needs to write acceptance manuals, acceptance cases and data test cases.
Modify the problem list until the specified defect density is met to pass the relevant TR point.
If the defect rate found during acceptance is within the range specified in the SOW, then the acceptance is successful. If the defect rate exceeds the specified range, quality traceback is required.
Lei Jun said:
So if we look at Huawei's hardware testing process, we will know where the costs come from.
First, the process of full test participation:
Second, multi-level testing and experimentation
For circuit design, unit test, whole machine test, small batch trial production, HALT test, environmental test, EMC test, thermal test, and HASS test will be carried out after entering the production stage. Special equipment will also carry out salt spray test and vulcanization test. The whole machine structure will also be tested: drop test, extrusion, twisting, etc.
HALT (Highly accelerated life test). HALT is a defect discovery process that accelerates the exposure of defects and weaknesses of test samples by setting progressively stricter environmental stresses, and then analyzes and improves the exposed defects and failures from various aspects such as design, process and materials, so as to achieve the purpose of improving reliability. The biggest feature is that the environmental stress is set higher than the design operating limit of the sample, so that the time to expose the failure is much shorter than the time required under normal reliability stress conditions.
Environmental testing is an activity conducted to ensure that the product maintains functional reliability during the specified lifespan in all environments of expected use, transportation or storage. It is to expose the product to natural or artificial environmental conditions to evaluate the performance of the product under the actual use, transportation and storage conditions, and analyze and study the impact of environmental factors and their mechanism of action.
HASS is applied to the production stage of the product to ensure that all improvement measures found in HALT can be implemented. HASS can also ensure that no new defects are introduced due to changes in production processes and components.
Hardware engineers are most afraid of HALT testing because it will exceed the limits of the device. But why do we do this? In fact, it is to find the weakest point of the entire device and then improve the weakest point. However, since it exceeds the allowable working range of the device, there are many abnormal situations and the reasons are complicated. However, according to the specifications, it must be analyzed clearly and optimization measures must be given. This is a very brain-burning opinion. Many classic problems are generated during the HALT test.
Since I am not a tester, please correct me if I am wrong. Now Xiaomi is no longer as glorious as it used to be. Thinking of Lei Jun's 5%, I wrote this.
|