Why Self-Service Solutions don't always work for stakeholders

Self-service solutions collect digital dust more often than they should.

A few weeks ago, I wrote a piece related to product data literacy. I covered the mindset of data analysts vs. product managers when it comes to metrics. I don’t know if this will become a series, but today, I’m writing another piece on business analysts and self service.

Self-service is one of the most popular buzz words in any analytics organization. It’s a great concept in theory. It gives stakeholders the ability to pull their own reports and do their own analysis without relying on data analysts to prioritize their requests. As a result, analysts are freed up to work on other analyses that will help the company. It’s a win-win situation.

In practice, it’s not that simple.

I’ve been the liaison on numerous projects between stakeholders (product managers, marketing managers, etc) and business analysts who typically build the self-service solutions. Sometimes, a very technical minded business analyst is working directly with a stakeholder without a liaison between the two. The process usually goes something like this….

1. Meeting to get use case and requirements
2. Emails exchanged for clarification
3. Timeline extended because of complications
4. More emails exchanged
5. Meeting to present and go through solution

This process can vary, but you probably get the idea. On a few occasions, that final meeting has been deflating. The self-service solution that was built doesn’t get the stakeholder exactly what they need. Sometimes the solution technically works, but it takes more effort to get what the stakeholder needs than what should be expected of them.

This is extremely frustrating because a lot of time is spent planning these solutions and waiting for them to get built, and they end up not being used as much as they should be.

This happens for a few reasons. First, the communication during the planning phase is poor. I’ve been guilty of this early in my career, but I’ve learned to cross every ‘t’ and dot every ‘i’. That makes a big difference in the end. I mentioned it before, but sometimes you have a very technical business analyst working directly with a stakeholder who is very non-technical. This can lead to a lot of details getting lost in translation. Poor communication might be a post for another day, but it’s not the focus right now.

The second reason is because of the way teams store their data. Many analytics teams use data warehouses to store and query their data. In more recent years, tools like amazon redshift, google bigquery, and snowflake have also become more popular.

The benefits of data warehouses are that they take messy data from different source databases and pull it into one place after it’s cleaned and organized. If you want the details, read more about the ETL process. Data warehouses are also built to be very efficient. Queries run faster and time is saved getting to data. This process is immensely valuable for data analysts and it enables them to do their work.

However, this can create a false sense of accomplishment. When data analyst are removed from the picture and business analysts try to get teams all the way to reaching self service capabilities, they can fall short.

It’s not a fault of their own. They are building the best data warehouse for data to be used. Sometimes, what stakeholders need isn’t efficient from a data perspective. This is something they aren’t aware of because it’s not part of their role. There’s friction when this happens because business analysts need to balance the (sometimes inefficient) needs of the stakeholder with the needs of the data warehouse performance.

Here’s an example of how this all might play out in reality…

CompanyA is a SaaS company with a monthly subscription has separate databases for user info, billing, product usage, marketing campaign history, login history, among others. All of that data is combined into a data warehouse.

CompanyA is struggling with their quarterly customer number goals. To help with this, they ran a campaign (called RetCampaignMAR20) at the end of March 2020 reaching out to cancelled customers promoting a new product feature that has been built and launched. The analytics team gets 2 questions after the campaign. One is from the marketing manager who ran the campaign. The other is from the product manager responsible for the feature.

Marketing Manager

“Hi, I’m want to look at the number of customers who reactivated their accounts from the campaign. We’ll use this data to figure out ROI on the campaign and compare this to other possible campaigns in the future.”

Product Manager

“Hello analytics team!
Now that the new feature launched, we want to look at returning customers and see if it’s being used. We’ll use this data to see if the feature will help improve retention. If so, we will probably expand this feature in the future.”

After talking with both stakeholders, it’s pretty clear that very similar data will be used to answer both questions. In addition, both teams will be doing similar activities in the future. There’s enough of a use case to build this into a self-service tool.

These two question are pretty simple. Building a self-service tool for this isn’t.

If this was done as an ad-hoc analysis, the way to figure this out would be the following…

  1. Pull all CUSTOMER_IDs associated with the campaign name RetCampaignMAR20 from the marketing table
  2. Look at invoices for those CUSTOMER_IDs after March 2020 in the billing table to see if they reactivated their account
  3. Look at the product usage table for those CUSTOMER_IDs to see if they used the feature after reactivating their account

This is pretty straightforward and can be done with a few subqueries. When it’s done as a self-service solution that can be re-used in the future, here’s the requirement for the tool…

  1. Need the ability to pull CUSTOMER_IDs associated with a given campaign name from the marketing table
  2. Need the ability to see if CUSTOMER_IDs reactivated their account after receiving a campaign
  3. Need the ability to choose different product features from the product usage table and see if they are used by a certain CUSTOMER_ID.
  4. Need the ability to look at this at CUSTOMER_ID level and aggregate by campaign

In many cases, self-service tools pull in data directly from tables with a little bit of freedom to manipulate the data. One issue that could occur in this example is around timing.

Because we need to be flexible with dates as time goes on, there may not be a good way to confirm whether the product feature was used after the reactivation. A self service tool may allow a stakeholder to pull in invoices for a customer after a certain date and do the same for product usage, but it won’t logically tell them if one date occurred AFTER the other. The stakeholder may get an output that includes CUSTOMER_ID, INVOICE_DATE, and PRODUCT_USAGE_DATE, but they’ll have to do the date logic on their own. That’s not ideal.

In some cases, a business analyst can build different views of data that can handle this logic for the stakeholder, but those views may be inefficient in terms of database performance. Business Analysts will always try to avoid doing this, and rightfully so.

This is just one example that isn’t even that complex. Most of the time, questions and requirements will be a lot messier, with more details needing to be figured out.

If you work in company that has a goal of improving self-service analytics. This is something you need to consider. It’s very important to make sure you have (business and/or data) analysts who can communicate well with the business

If you have any additional thoughts or experience on how to deal with this, I’d love to hear from you. I’m most active on twitter @mattpupa, but will also respond to emails as well.