Process and Summarize Excel Data in Groups

Published on 20 Feb, 2019 by Milos Gregor


If is often the case that we have a larger dataset in Excel that needs to be processed statistically. It is also necessary to carry out this summation (average or sum) on groups that are defined by a categorical variable.

These summaries are very easy to accomplish for example through SQL commands if we have data in the database. But if we have them in the spreadsheet, there is a problem. Simple summaries can be done using the Pivot or Group feature. However, the problem occurs if you want to calculate a specific statistic value or you want to select Top N records from each group.

Here is a convenient use of the Reshape.XL add-in. Add-in (within the Grammar of Data implementation) has built-in support for data processing in groups, defined by one or more categorical variables. For data processing in groups, we use Group and Ungroup functions.

The character of these functions is similar to the Group By command in SQL. The difference is in workflow. Within one sequence of commands, you can optionally redefine grouping, deactivate it and then again reactivate. At the same time, implementation of grouping is far wider. The basic overview of the data processing in groups provides the following video tutorial.

As is clear from the video-tutorial, data processing in groups is very simple, fast and efficient using the Reshape.XL add-in. More about the functionality of the Group and Ungroup commands you can learn in the documentation.