The Brave New World Of Big Data Testing

As testers, we often have a love-hate relationship with data.  Processing data is our applications’ main reason for being and without data we cannot test.  Yet, data is often the root cause of testing issues; we don’t always have the data we need, which causes blocked test cases and defects get returned as “data issues”.

Data has grown exponentially over the last few years and continues to grow.  We began testing with megabytes and gigabytes and now terabytes and petabytes have joined the data landscape.  Data is now the elephant in the room, and where is it leading us?  Testers, welcome to the brave new world of Big Data!

What is Big Data?

Big Data has lots of definitions; it is a term often used to define both volume and process.  Sometimes, the term Big Data is used to refer to the approaches and tools used for processing large amounts of data.  Wikipedia defines it as “an all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process using on-hand data management tools or traditional data processing applications.”  Gartner defines big data as “high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation.”  Big data usually refers to at least five petabytes (5,000,000,000 megabytes).  Sometimes the term Big Data is used to refer to the approaches and tools used for processing large amounts of data.

However, Big Data is more than just size. It’s most significant aspects are the four “v’s”.  Big data obviously has huge volume, the sheer amount of data, however, it has velocity, the speed at which new data is generated and transported, variety which refers to the many types of data, and veracity, its accuracy and quality.

Testers, can you see some, make that many, test scenarios here?  Yes, big data means big testing. In addition to ensuring data quality, we need to make sure that our applications can effectively process this much data. However, before we can plan our big testing, we need to learn more about the brave new world of big data.

Big Data is usually unstructured which means that it is does not have a defined data model. It does not fit neatly into organized columns and rows.  Although much of the unstructured big data comes from the social media such as Facebook posts, tweets, it can also take audio and visual forms.  These include phone calls, instant messages voice mails, pictures, videos, pdf’s, geospatial data and slide shares.  So it seems our big testing SUT (system under test) is actually a giant jelly fish!

Challenges of Big Data Testing

Testing Big Data is like testing a Jelly Fish; because of the sheer amount and its unstructured nature, the test process is difficult to define.  Automation is required and although there are many tools, they are complex and require technical skills for troubleshooting.   Performance testing is also exceedingly complex giving the velocity at which the data is processed.

Testing the Jelly Fish

At the highest level, the big data test approach involves both functional and non-functional  components.  Functional testing includes validating both the quality of the data itself and the processing of it.  Test scenarios in data quality include completeness, correctness, lack of duplication, etc.  Data processing can be done in three ways; interactive, real-time and batch; however, they all involve movement of data.   Therefore, all big data testing strategies are based on the extract, transform and load (ETL) process.  It begins by validating data quality coming from the source databases, then validating the transformation or process through which the data is structured and then validating the load into the data warehouse.

ETL testing has three phases.  The first phase is the data staging.  Data staging is validated by comparing the data coming from the source systems to the data in the staged location.  The next phase is the MapReduce validation or validation of the transformation of the data.  [I think you’re going to have to explain what MapReduce is here.  It’s basically the programming model for unstructured data; probably the best-known implementation is in Hadoop.]This testing ensures that the business rules used to aggregate and segregate the data are working properly.  The final ETL phase is the output validation phase where the output files from the MapReduce and are ready to be moved to the data warehouse.  In this stage, the data integrity and the transformation is complete and correct.  ETL testing, especially of the speed required for big data, require automation and luckily there are tools for each phase of the ETL process, the most well-known are Mongo, Cassandra, Hadoop and Hive.

Do You Want To Be A Big Data Tester?

Testers, if you have a technical background, especially in Java, big data testing may be for you.  You already have strong analytical skills and you will need to become proficient in Hadoop and other Big Data tools.  Big Data is a fast-growing technology and testers with this skill set are in demand.  Why not take the challenge, be brave and embrace the brave new world of big data testing!

Advertisements

The Brave New World of Accessibility Testing

Testers, as you well know, technology, especially the web has opened up new worlds for everyone who uses it.  But have you ever thought about how the ability to access to technology impacts the lives of those with special needs?  Imagine being blind yet able to read, unable to hear or speak yet able to chat or being complete paralyzed but able to travel the world?  Technology has made all this possible for those with special needs, enriching their lives in ways they never imagined.

According to the US Census Bureau, 19 percent of the population in 2012 had a disability and half of these reported a severe disability, and therefore, accessibility testing will continue to grow in importance.  So testers, welcome to the brave new world of accessibility testing!

People with special needs use special technologies including screen readers, screen magnification software, speech recognition software and special keyboards for communication, work and personal fulfillment, yet not all website are user-friendly to special users.  Accessibility testing is defined as a subset of usability testing that is geared toward users of all abilities and disabilities.  The focus of this type of testing is to verify not only usability but accessibility.

So how do we test accessibility?  As with any usability testing, focus on the users.  This means not only users with various disabilities and severities thereof, but also those with limited computer literacy, infrastructure, access and equipment.  We may look for standards against which to measure and well as legal requirements that must be satisfied.  In the United States, Section 508 of the Rehabilitation Act requires that all of the federal government’s electronic and information technology be made accessible by everyone; however, this applies to federal agencies only.  However, the World Wide Web Consortium (W3C), the main international standards organization for the Internet, has created a guideline for making web content accessible to people with disabilities.

Web Content Accessibility Guidelines (WCAG) 2.0

The Web Content Accessibility Guidelines provides recommendations on making web content more accessible to people with disabilities.  It provides conditions for testing in the form of success criteria based on four principles of usability principles.  These are perceivable, operable, understandable and robust.

In order to be perceivable, web content must provide alternatives for non-text content, and time-based media.   Examples include providing options for braille translations and captions for audio or video only recordings.  Content should be able to presented in different formats and foreground should be separated from background for easier reading.

Operability requires that that all actions can be executed from a keyboard and that time limits for actions can be extended.  Flashing should be limited as it is known to cause seizures.  Finally navigation help should be provided in various contexts so that users know where they are in the application and are able to find content.

Understandable content means that it is easy to read i.e., limited use of jargon, abbreviations and it is written at lower levels of reading ability.  In addition, web pages should appear in predictable was and functionality should be provided to help users correct their mistakes.

Robustness means that the web content should be able to be interpreted by current and future technologies including assistive technologies.

WCOG 2.0 goes one step further by breaking down the success criteria into levels of conformance.  Level A is the minimum level of conformance; Level AA includes meets all Level A success criteria as well as the success criteria set at Level AA or provides an alternate version of the web content.  This is the level that is recommended for most websites.  Level AAA is the highest level of conformance and it is not possible for all web content to satisfy its success criteria.

Web Accessibility Testing

How do testers determine if the website under test meets the WCOG success criteria?  The good news is that there are automated tools available for this.  These tools evaluate the syntax of the website’s code, search for known patterns that cause accessibility issues and identify elements on web pages that could cause.  These tools may also find actual and potential accessibility issues.  Interpreting the test results requires knowledge and experience in accessibility issues.

However, as with all types of testing, especially usability, accessibility testing cannot be completely automated.  And it is important that all testers consider accessibility as we execute our functional tests.  For example, try turning off the mouse and track pad to make sure all functions are operable from the keyboard and try turning on Windows High Contrast Mode to see how the application works for low vision users.  And what happens when images are turned off?  Can you still understand the context of the content in the application?  Testers, always remember, our job evaluating the quality of the application and that means ALL users must be able to access and derive value from the applications we test.

Mobile Testing – How Much is Enough?

We all know that mobile apps can never be fully tested.  There are too many devices, too many OS versions, and too many different types of apps.  Testers continue to struggle with developing test plans and test cases the same way they did with traditional applications.  How do testers determine the correct scope of testing for an app to minimize quality and security risks for users and the organization?

First, look at organizational expectations.  Does your organization expect its software to work almost perfectly?  And does it fund and support development and testing to make that a realistic expectation?

If the answer to the first question is yes, but the second one is no, then the best you can do is set realistic expectations that may not be in line with what the organization wants.  If quality isn’t a high priority, then testing can be focused on high-priority areas only.

Second, is the app business-critical?  Does the organization depend on it to make money, service customers, or be more agile?  Or is it a marketing or public relations tool?  Does it provide valuable information to users, or is it simply nice to have?  If it is business-critical, then flaws could hurt the bottom line and quality become a higher priority.  And that’s true in all parts of the app, not just it’s operational aspects.  If users find poor quality in any part, they are unlikely to trust it to do business.

Third, consider the implications of a flaw to both use users and the organization.  A game or other entertainment app often has a high threshold of failures before most users think it’s not worth the effort.  Likewise a free app with limited extrinsic value.

But if the app performs an important function, or one that is counted on my users, a major flaw can have serious or even disastrous effects.  One example is the infamous iPhone time change bug in 2010, which failed to move from Daylight Savings Time on the appointed day.  Thousands of people were reportedly late for work or appointments as their alarm failed to go off on time.  It was an iOS flaw fixed by Apple a few weeks later.

Fourth, does the app use external services?  Many apps make use of other services within the enterprise, or commercial services for information such as weather or sports scores.  While testers don’t have to test the services, testers should read and understand what the service level agreements (SLAs) say about performance and capacity, and periodically test to make sure those are being fulfilled.

 

Now About Security

It goes without saying that security flaws are also quality flaws.  But many security flaws may have significant consequences to both user and organization.  If there is no data, or trivial data, to protect, security testing may not be important.  If it includes names, email addresses, or financial data, or any other identification information, that data needs to be protected at all cost.

In particular, testers have to know what data is being collected, and where data is stored.  On mobile devices, that can sometimes be a challenge, because it can be in internal storage or on a SIM, and it may not be readily apparent that data is being stored, and in any case often can’t easily be accessed.

In the era of hundreds of different devices and OS versions, as well as BYOD, it’s unrealistic to limit the kinds of devices that an app can be used on, even for internal users.  Testers have no control over the device or OS for testing or deployment purposes, and testers simply can’t test all combinations.

But teams likely have control over how and where the device stores data locally.  And they have control over how data is transferred to and from the device, and how it’s stored on the back end.  That’s what testers have to focus on.  Is any personal or identification data encrypted on the device, and is it encrypted during transmission?

Testing on mobile devices requires a combination of techniques, from traditional test cases to risk-based testing to device farm testing and if appropriate, crowdsourcing.  The test plan should be designed to use these techniques to test different aspects of the application.  Test cases, for example, can test traditional quality measures as well as security.  Device farms can be used for in-depth testing of popular devices.

Overall, test results should provide a clear picture of the quality of important aspects of the app given its purpose, and an overview of quality in general.

Testing Wearables:  The Human Experience

I became interested in testing wearables in a rather unusual way.  I ran the Boston Marathon.  So, you ask, what does the Boston Marathon have to do with wearables and the testing of them?  Well, every runner was wearing at least one “wearable”.  Wearables are electronics that can be worn on the body as an accessory or a part of one’s clothing.  One of the major features of wearable technology is its ability to connect to the Internet, enabling data to be exchanged between a network and the device.  Often they contain monitoring and tracking functionality.

Wearables have become a part of most runners’ gear; they wear sports watches with GPS functionality and often carry smart phones.  Yet every runner in the 2011 Boston Marathon also had a wearable attached to their clothing, a bib with their name and registration number.  Today, the bib also contains an RFID chip.  The chip records the runner’s exact race time by connecting to a series of mats with RFID readers at the starting line, along the course and at the finish.   The first time it was tried there was only one glitch, not all of the RFID chips registered with the readers.

Although this failure did not create a life-threatening situation, it created a great deal of consternation and disappointment among those runners to whose race did not get recorded.  For runners who had run a qualifying time and/or a personal record, their elation and joy at the finish line turned to grief and anguish when they found out that their times did not register.  And yes, I was one of those runners.

As a tester, I began to question not only what had and had not been tested, but also I became keenly aware of the impact that the failure of this wearable had on the user.  I realized that what all wearables have in common is that they have a purpose or function, coupled with human interaction that provides value by enabling the user to achieve (is this correct?) a goal.  Unless the runner ran the course and stepped on the mats, the chip in the runners bib would have no way of providing any value.

This analysis led me to realize that the human user must be an integral part of the testing.  Furthermore, the closer a device becomes (integrates?) to a human, the more important the human’s role in testing becomes.   When a networked device is physically attached to us and works with us and through us, the more important the results of the collaboration becomes to us physically and emotionally.  From this experience, I devised a framework for testing this collaboration which I call Human Experience testing.

The Brave New World of COTS Testing

Testing a COTS system?  Why would we need to test a COTS package?  Often, project managers and other stakeholders mistakenly believe that one of the benefits to purchasing COTS software is that there is little, if any testing needed.  This could not be further from the truth.

COTS, Commercial Off-The-Shelf software, are applications that are sold or licensed by vendors to organizations who wish to use them.  This includes common enterprise applications such as Salesforce.com, Workday, and PeopleSoft.  The code delivered to each purchasing organization is identical; however there is usually an administration module through which the application can be configured more closely match the needs of the buyer.  The configurations will usually be done by the vendor or by an integrator hired by the purchasing organization. Some COTS software vendors also make customizations, which involve changes to the base code, to accommodate purchasing organizations.  SaaS, Software as a Service, products are usually COTS software.

Testing COTS software requires a different focus from traditional testing approaches.  Although no COTS package will be delivered free of bugs, the focus of testing from the purchasing organization’s perspective is not on validating the base functionality.  Since the COTS software is not developed specifically to meet user-defined requirements, requirements-based testing is not straightforward.  In order to plan the testing effectively, test managers and testers need to focus on the areas where changes in the end to end workflow are made. The major areas of focus for COTS software testing include customizations and configurations, integrations, data and performance.

The focus of traditional functional testing when implementing a COTS package is on the customizations, if any and the configurations.  Customizations, since they involve changes to the actual code, carry the highest risk; however, configurations are vitally important as they are the basis of the workflows.  Testers need to understand what parts of the workflow involve configurations versus base code or customized code.  Although the integrators sometimes provide this information, often the test team must obtain it from vendor documentation.   Often business workflows will need to change in order to achieve the same results through COTS software and testers must consider this as they develop their test cases.

Integrations are a critical area of focus when testing a COTS package.  Often COTS software packages are large Customer Relationship Management or Enterprise Resource Planning systems and as such, they must be integrated with many legacy systems within the organization.  Often, the legacy systems have older architectures and different methods of sending and receiving data.  Adding to the complexity, new code is almost always needed to connect to the COTS package.  Understanding the types of architectures and testing through the APIs and other methods of data transmission is a new challenge for many testers.

Data testing is extremely important to the end-to-end testing of COTS software.  Testers must understand the data dictionary of the new application since data names and types may not match the existing software.   Often, as with configurations, the testers must work with the vendor or integrator to understand the data dictionary.  In addition, the tester must also understand the ETL or the extract, transform and load mechanisms.  This can be especially complicated if there is a data warehouse involved.  Since a data migration will likely been needed, the data transformations will need to be thoroughly tested.

ETL testing requires a completely different skill set from that of the manual, front-end tester.  Often, the organization purchasing the COTS package will need to contract with resources that have the appropriate skills.  SQL knowledge and a thorough understanding of how to simulate data interactions using SOAP or XML is required for data testing.   An understanding of SOA, Service Oriented Architecture, and the tools used to test web messages is also quite helpful.

Performance testing is another area requiring a different approach.  Many systems, especially web applications, require a focus on load testing or validating that the application can handle the required number of users simultaneously.  However, with large COTS applications that will be used internally within an organization, the focus is on the speed and number of transactions that can be processed as opposed to the number of users.  The test scenarios for this type of performance testing can be huge in number and complexity.  Furthermore, the more complex scenarios are also data intensive.  This testing not only requires testers with solid technical performance test skills, but also requires a detailed data coordination effort with the integrations

From beginning to end, testing the brave new world of COTS software requires a completely different approach focusing on configurations, integrations, data and performance.  This new approach offers new challenges and provides opportunities for testers to develop new strategies and skill sets.

To what extent should testers and QA engineers be involved in software design?

Traditionally testers and QA engineers have had minimal involvement with software design. Design has been the role of the software architect, or team lead, for many years. Depending on the team, input from testers at this stage of the software development lifecycle isn’t always valued.

But in some circumstances that is changing. In particular, testers have a real contribution to make when one of the product goals is “design to test”. Architects who recognize that contributions can come from a variety of sources are soliciting testing feedback when creating an overall design.

And testers have even more design contributions in Agile projects, especially when using Test-Driven Development (TDD). Testers typically have a more complete picture of user needs, based on their in-depth understand of user stories and interactions with the Product Owner.

Because design is something that grows with the application in Agile, testers can always look at what the developers are doing. If the team starts letting the design get complex, or difficult to test, it’s time to have a talk with the developers about making the design more straightforward. It may require a hardening sprint or two, but it will keep the debt down.

For testers, here are some of the things you might consider as you share your expertise with architects and developers.

Do:
• Provide feedback on design for testability. You don’t want to accumulate testing debt.
• Get deeply involved in TDD projects. This is your area of expertise.
• Provide feedback on design decisions during an Agile project.

Don’t:
• Attempt to give advice outside of your area of expertise.
• Reject feedback on your design ideas. Everyone has something to contribute.

Testing in a Brave New World: The Importance of Data Masking

As testers today, we face a brave new world. Our conundrum, providing effective testing with less time, is more difficult that it has ever been. Challenges from disruptive technologies such as cloud, mobile devices and big data have taken testing to a whole new level of complexity. At the same time, we are also challenged with the “need for speed” as agile methodologies evolve into continuous delivery and continuous deployment. We can engage in only so much risk-based testing, so often, we are tempted to use production data to speed up the test process. Ironically, those very same technologies make this practice increasingly more dangerous. So what gives?

If production data is also privacy-protected data, our use of it in testing may be illegal. At the very least, it opens up the data for compromise.

Testers must collaborate with security professionals to develop a test data privacy approach which is usually based on data masking. Data masking involves changing or obfuscating personal and non-public information. Data masking does not prevent access to the data; it only makes private data unrecognizable. Data masking can be accomplished by several methods depending upon the complexity required. These range from simply blanking out the data to replacing it with more generic data to using algorithms to scramble the data. The challenge of data masking is that the data not only has to be unrecognizable, but also still useful for testing.

There are two main types of data masking – static and dynamic. The usual approach is static data masking where the data is masked prior to loading into the test environment. In this approach, a new database is created (which is especially important when testing is outsourced). However, the database may not contain the same data or data in the same states as the actual database, issues which are very important in testing.

Dynamic data masking where production data is masked in real time as users request the data. The main advantage of this approach is that even users who are authorized to access the production database never see the private or non-public data. Furthermore, dynamic data masking can be user role specific; what data is masked depends upon the entitlements of the user who is requesting the data.

Automated software tools are required to mask data efficiently and effectively. When evaluating data masking tools, it is important to consider the following attributes. Most important, the tool should mask the data so that it cannot be reversed and is realistic enough for testing. Ideally, the tool should provide both static and dynamic data masking functionality and possibly, data redaction, a technique that is used for data masking in PDFs , spreadsheets and documents. Also, the tool should mask data for distributed platforms including cloud. Here is a brief look at a variety of the vendors in this arena. As with any tool evaluation, organizations must consider their own specific needs when choosing a vendor.

According to Gartner’s Magic Quadrant, IBM, Oracle and Infomatica are the market leaders in data masking for privacy purposes. All offer both static and dynamic data masking as well as data redaction. IBM offers integration with its Rational Suite. Oracle offers an API tool for data redaction and provides templates for Oracle eBusiness Suite and Oracle Fusion. Both IBM and Oracle products are priced relatively high as compared to other vendors.

Infomatica offers data redaction for many types of files and is a top player in dynamic data masking for big data. It offers Dynamic Data Masking for Hadoop, Cloudera, Hortonworks and MapR. Infomatica’s product is integrated with PowerCenter and its Application Information Lifecycle Management (ILM) which makes it a good choice of organizations who use those products.

Mentis offers a suite of products for static and dynamic data masking and data redaction as well as data access monitoring and data intrusion prevention at a reasonable cost. One of the most exciting features of these products is usability; not only are there are templates available for several vendor packages including Oracle eBusiness and Peoplesoft, but also the user interface is designed for use by the business as well as IT. Mentis was rated as a “challenger” by Gartner in 2013.

One of the least expensive products on the market, Net 2000 offers usability as its main feature. Net 2000 provides only static data masking for Oracle and SQL servers. It is rated as a “Niche” player by Gartner in 2013. This tool is a good choice for a small organization with a simple environment.

Data privacy is one of the most important issues facing test managers and testers today. Private and non-public data must not be compromised during testing; therefore, an understanding of data masking methodologies, approaches and tools is critical to effective testing and test management.