Drop your CSV file

Cohort Visualizer

A handy tool for browsing cohort datasets
Fork me on GitHub
Crunching...
Transforms
Period
Graph type
Included signs
Cohort Group type
Legend
Calculations ?

Drag/drop your CSV file (or paste it here)

Note: No data will go to the server. This is all client-side in your browser.

CSV format explanation

Format is comma-separated with quotes like Excel. A guide to columns:

  1. Cohort group type: Think of this as an enum type name.
  2. Cohort group value: Think of these as enum values within an enum type.
  3. Cohort day: Day to which the entry belongs, in the format MM/DD/YY (American short), MM/DD/YYYY (American long), or YYYY-MM-DD (ISO 8601). For October 25, 2012, the formats respectively are: 10/25/12, 10/25/2015, 2015-10-25.
  4. State1: Number of users exclusively in state 1.
  5. State2: Number of users exclusively in state 2.
  6. ... Additional state columns

First row is used to name the various user states you have. It should be in the format "Cohort group type,Cohort group value,Cohort day,State1 name,State2 name,...".

The dataset may have the same user appear in many rows that are of different group types. What's important is for each group type the user is only in one row with one group value; aka the tuple (group value, cohort day) should be a unique row for each cohort group type. Order of the rows does not matter.

Example CSV

Here's a simple example for a fake social network with "Sign-up Referrer", "Favorite feature", and "Total" cohort group types. This only counts a user towards one sign-up referrer row, one favorite feature row, and one total row. In the case of "Total" the only valid grouping value is the empty string, which this tool treats specially.

Cohort group type Cohort group value Cohort day Born Updated profile Sent first message
Sign-up referrer Search 10/25/12 5 5 5
Sign-up referrer Search 10/26/12 10 10 0
Sign-up referrer Email 10/25/12 5 5 5
Sign-up referrer Email 10/26/12 5 0 5
Favorite feature Chat 10/25/12 0 5 10
Favorite feature Chat 10/26/12 15 5 0
Favorite feature Reading news 10/25/12 10 5 0
Favorite feature Reading news 10/26/12 5 5 5
Total 10/25/12 10 10 10
Total 10/26/12 20 10 5

Negative values

The sign of a column value is treated specially. The cohort data may include two rows with the same (group value, cohort day) tuple, but one with all positive column values, and one with all negative column values. Example usage: When a user joins your service, have a positive value on their birthday. When the same user leaves the service, have the same negative value on their leave date. This lets you see the rate of sign-ups and drop- offs independently. If you view both positive and negative values cumulatively, you will plot your peak active users over time.

Why cohort analysis?

A cohort is a group of people who share a common characteristic or experience within a defined period (Wikipedia). It's used a lot in health to track how different groups of patients respond to disease and medication. It can be used in business to track progress in the funnel.

For software and websites, cohorts are useful because they let you measure the impact of your product changes over time. Simple example: Using cohorts you can see the conversion rate of new users from two months ago and compare it to new users of today. Ideally, this would let you judge if your software is getting better over time, users are getting happier, etc. Sometimes you'll see that things are getting worse.

Cohort analysis can apply to more than just users. For example, you could treat a set of articles on a blog as the source dataset; the levels of traffic or reshares could be mutually exclusive states; and you could treat common tags as a group type, or author as a group type. This tool would let you drill down and compare all of those groupings pretty easily.

One of the most interesting things about cohort analysis is the graphs change over time. If you take a snapshot of your cohort data today and then take another snapshot two months from now, you'll see that the older cohort bars have changed. This happens because users who signed up two months ago remain active and continue to make progress in your funnel over extended periods. It's useful to save your cohort datasets after you collect them, so you can compare to the past.

Other things to read to understand the motivation and method behind this:

Calculations

Here's a quick guide to the calculations presented on the right. A "bar" is every cohort for a particular day. A "bar segment" is one part of the bar for a day in single color, corresponding to a particular "cohort state" (like "Made two posts" above), which is usually some level of progression in the funnel.

∑↑ / ∑↕ "Percentage here and up"
Sum the bar your mouse is over vertically upward, including the bar segment your mouse is on top of. Then divide by the total sum for that bar vertically. Answers: "On this day, what percentage of users are beyond and including this cohort state in the funnel?" For the bottom bar segment this will be 100%.
∑↓ / ∑↕ "Percentage here and down"
Sum the bar your mouse is over vertically downward, including the bar segment your mouse is on top of. Then divide by the total sum for that bar vertically. Answers: "On this day, what percentage of users are before and including this cohort state in the funnel?" For the top bar segment this will be 100%.
X / ∑← "Percentage here of cumulative past sum"
Sum all bar segments from the cohort state you have your mouse over going back in time to the left, including the one your mouse is over. Divide the bar segment you have your mouse over by that sum. Answers: "What percentage of users does the highlighted bar segment represent as part of the whole past for that cohort state, including this day?"
X / ∑→ "Percentage here of cumulative future sum"
Sum all bar segments from the cohort state you have your mouse over going forward in time to the right, including the one your mouse is over. Divide the bar segment you have your mouse over by that sum. Answers: "What percentage of users does the highlighted bar segment represent as part of the whole future for that cohort state, including this day?"
X / ∑↔ "Percentage here of cumulative sum over time"
Sum all bar segments for the cohort state you have your mouse over for all days. Divide the bar segment you have your mouse over by that sum. Answers: "What percentage of users does the highlighted bar segment represent over all time for that cohort state?"
∑← / ∑↔ "Contribution of past to cumulative sum over time"
Sum all bar segments from the cohort state you have your mouse over going back in time to the left, including the one your mouse is over. Answers: "What percentage of users over all time got into the highlighted cohort state before and including the highlighted day?" For the last day this will be 100%.
∑→ / ∑↔ "Contribution of future to cumulative sum over time"
Sum all bar segments from the cohort state you have your mouse over going forward in time to the right, including the one your mouse is over. Answers: "What percentage of users over all time got into the highlighted cohort state after and including the highlighted day?" For the first day this will be 100%.
X / Max ↔ "Percentage of maximum single day ever"
Find the biggest bar segment for the cohort state you have your mouse over for all time. Divide the bar segment you have your mouse over by the the biggest amount. Answers: "How big is this day for users to get into the highlighted cohort state compared to all other days ever?" The biggest day will be 100%.
1 - X / Max ↔ "Delta from maximum single day ever"
Find the biggest bar segment for the cohort state you have your mouse over for all time. Divide the bar segment you have your mouse over by the the biggest amount. Subtract that from 100%. Answers: "How much bigger is the biggest day ever for this cohort state compared to this day?" The biggest day will be 0%.

About

Copyright 2012-2015 Brett Slatkin

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.


Built using D3, jQuery, Web Fonts, and Subtle Patterns