Experimenting with what works

Drawing of anthropomorphic A and B holding hands while a W watches on at a distance.

A lot has changed about whitney.org over the last year. This includes the entire platform underpinning the site, and a number of major usability and user interface improvements, from reworked navigation, to new mobile experiences for audio and video, to the incorporation of outside voices in our exhibition content. And with growing distance from the complexities of launching a new website, our data-related work has been picking up steam as we’ve been able to devote more time and mindshare to it, which in turn has begun to more deeply impact our design thinking and decision making processes. A major aspect of that impact is in an increasing ability to reevaluate our assumptions, and to better understand how visitors are actually interacting with us…rather than just how we might think they are.

In the beginning…

With whitney.org being rebuilt from the ground up, there was the opportunity (and requirement) to reconsider what data we ought to be collecting and analyzing through Google Analytics (GA) and similar tools. I joined the Whitney Digital Media team part way through this project, and so initially my primary goal was simply to get a handle on what was being tracked, and where we had major gaps.

Previously I had interned with the Museum of Modern Art on a data analytics and ingestion project and I hoped to build on that effort at the Whitney. Google Analytics can give you a lot just from incorporating the tag on your website, but to really understand our audience we needed to make sure we were also tracking important interactions, like use of our navigation, audio and video plays, and various call-to-action (CTA) clicks that wouldn’t necessarily be visible in the default pool of data. Without that tracking, we wouldn’t have a baseline to compare to when we were ready to start considering more major design and structural changes. Looking forward meant getting our house in order, today.

In addition to putting together the pieces for an interaction-baseline in Google Analytics, I was also interested in reviewing how accurate our GA-reported ecommerce numbers were compared to the organization’s internal accounting. This necessitated reaching out to colleagues in other departments. And while those conversations were brief, they have been incredibly important in framing our work. Seeing how our numbers compare to their numbers gives context to the fuzzy reporting of Google Analytics, and helps us to know what of level of confidence to ascribe to our figures…which is vital to separating the signal from the noise.

From passive data to A/B testing

The next stage in the Whitney’s digital data lifecycle has been to start investigating why our numbers are what they are. It’s one thing to know that we got 100 clicks on a button last week, it’s another to tease apart why it was 100 and not 50, or 200, or 10,000. To that end, we’ve begun to evaluate more and more of our core digital content through A/B testing. Altering certain elements of pages and considering those changes in relation to basic metrics like session duration, pageviews, and ecommerce transactions has been hugely helpful in terms of figuring out what aspects of our design and layout are effective at achieving their intended purpose. It has also been a gradual process, characterized by a growing willingness to experiment.

We began by making small changes to individual elements using Google Optimize: experimenting with the color of a few CTA elements, shifting the placement of a few links (and adding others), all in the hope of driving more conversions. As we became more familiar with the process of building and running tests, we started experimenting with larger aspects of the site, including a wholesale replacement of the footer, and a number of tests designed to start teasing out the effects of utilizing different kinds of content to promote exhibitions (or broadly, the institution) on our homepage. In the case of the latter, we’ve varyingly given users videos of artworks, images of artworks, experiential videos of exhibition installations, and images of people in those exhibitions.

The question of what content works is incredibly important to us as the museum’s Digital Media department. It’s our job to determine how to best reach new and returning audiences, and a major driver of our ability to do that is the media we choose to forefront. A/B testing is one more tool we can use to help make those decisions.

The conventional web wisdom tends to be that video will result in higher user engagement than still images. Video is also inherently more expensive than images, so the choice to go with one over the other doesn’t come without a few tradeoffs. In the effort to set a baseline for our both our data and any data-driven strategy, we wanted to test the conventional wisdom against our specific circumstances as a mission-driven arts institution. Given the added cost, we wanted to be sure that we understood the implications of each approach, and the metrics they affect.

And while it’s still too early to say what the definitive value difference may or may not be for us, the initial results have been interesting, and continue to warrant further investigation, and suggest new avenues to explore for homepage content.

Incorporating user feedback

Closely connected with our push into A/B testing, we’ve been working to get more qualitative feedback on various aspects of our platforms. This has included both quick in-person terminology surveys, and an incredibly rigorous two-part in-depth usability study of whitney.org run by an outside consulting firm. Both kinds of testing have been extremely useful, while also demonstrating why it can be worth it ($$$) to work with experts in the field. In-house user testing has made a lot of sense for us as small and focused endeavors, while the external usability study brought a level of expertise and specialized tooling we would never otherwise have had access to. There’s much more to say on both, but that deserves its own post.

It’s worth mentioning that all of our user testing efforts at some level remind us of what we aren’t doing as well as we’d like. And while it would be easy to discouraged by that, I think it’s important not to be. There is no end state in usability work. There is no point at which everyone will convert, everyone will understand our UI, and everyone will have a positive experience the first time, and every time after. Rather our goal is the constant evolution of our products, reflecting the changing needs and expectations of our audience.

T H E F U T U R E, and moving to product development in tandem with evaluation

So what comes next for our data efforts? Likely, it’s wrapping together all the helpful experiences we’ve had over the last year into a more consistent, repeatable process. In effect, moving to something akin to user–test driven development, where new features are planned alongside user-focused evaluatory measures. Whether that means A/B testing aspects of design or functionality, or small scale in-house user surveys, or even larger external evaluation, our actions should depend on the nature of the feature and be determined in advance, rather than only as a post-launch reactionary measure.

TL;DR

Just because we measured something doesn’t mean we measured what we thought we did.
User testing is useful in all kinds of forms.
User-test driven development is good development.

[Read this on Medium]