DEV Community

Cover image for Analyzing data in PowerShell and Spreadsheet in SpreadCommander
Viatcheslav V. Vassiliev
Viatcheslav V. Vassiliev

Posted on

Analyzing data in PowerShell and Spreadsheet in SpreadCommander

Disclaimer: I am author of SpreadCommander

SpreadCommander is an office program with tools for data analysis.

Traditional output from scripts is console. SpreadCommander provides output to rich console, i.e. console that allows formatted output with images, charts and more. However often data analysis requires to work with tabular data. Most convenient and common way to work with tables is Spreadsheet. In addition to rich console, SpreadCommander's console includes Spreadsheet console and allows output data from script (data tables, lists, collections) directly into spreadsheet where these data can be analyzed.

Data for this article were downloaded from https://www.kaggle.com/heesoo37/olympic-history-data-a-thorough-analysis?select=athlete_events.csv, downloading requires registration.

Cmddlet Out-SpreadTable

This cmdlet takes DataTable, list or collection of objects as data source and generates new sheet and table in Spreadsheet console output.

$dt = Import-DelimitedText '~\Data\athlete_events.csv';

,$dt | Out-SpreadTable -SheetName:'Athlete Events' `
    -TableName:'Athlete Events' -TableStyle:Medium19 -Replace;

$dt | 
    group  -Property:'Sport' | 
    sort   -Property:'Count' -Descending |
    select -Property:@(
        @{ Label='Sport';        Expression = { $_.Name }},
        @{ Label='EventCount';   Expression = { $_.Count }},
        @{ Label='TeamCount';    Expression = { [Linq.Enumerable]::Count([Linq.Enumerable]::Distinct([Linq.Enumerable]::Select($_.Group, [Func[object, object]]{ $args[0].Team }))) }},
        @{ Label='AthleteCount'; Expression = { [Linq.Enumerable]::Count([Linq.Enumerable]::Distinct([Linq.Enumerable]::Select($_.Group, [Func[object, object]]{ $args[0].ID }))) }} ) |
    Out-SpreadTable -SheetName:'Sports' `
        -TableName:'Sports' -TableStyle:Medium19 -Replace `
        -Formatting: `
            "format table [Sports] column [EventCount] with IconSet=Signs3;
             format table [Sports] column [TeamCount] with Rule=DataBar, BackColor='LightGreen', BackColor2='LightGray', Gradient=Horizontal;
             format table [Sports] column [AthleteCount] with ColorScale='Red,LightGreen,White';
             format table [Sports] column [EventCount] with Rule=AboveAverage, BackColor=LightGreen";
Enter fullscreen mode Exit fullscreen mode

Table

Sending DataTable inside collection (,dt instead of just dt) allows cmdlet to process DataTable itslef, otherwise PowerShell sends list of DataRow converted to PSObject as data source. Result is the same but variant with collection is much faster.

Cmdlet New-SpreadPivot

Once data are sent to Spreadsheet they can be analyzed different ways. One of these ways is to generate Pivot table. SpreadCommander allows to generate Pivot table from PowerShell script.

New-SpreadPivot -DataTableName:'Athlete Events' `
    -PivotSheetName:'Athlete Events - Pivot' -PivotTableName:'Athlete Events - Pivot' `
    -RowFields:'Team' -ColumnFields:'Sport' -DataFields:'ID' -SummarizeValuesBy:Count `
    -Layout:Outline -MergeTitles -Style:Medium19 `
    -Formatting:'format with DataBar=Red, Gradient=true' -Replace;
Enter fullscreen mode Exit fullscreen mode

Pivot

Cmdlets New-SimpleSpreadChart, New-SpreadChart

Another way to analyze data is to add Chart to Spreadsheet. SpreadCommander has 2 cmdlets to add charts - one has simpler syntax, another one is more powerful.

New-SimpleSpreadChart -DataTableName:'Sports' `
    -ChartSheetName:'Sports - SimpleChart' -Replace -ChartType:LineMarker `
    -Arguments:'Sport' -Values:'EventCount','TeamCount','AthleteCount' `
    -AxisGroups:'Primary','Secondary','Primary' `
    -SeriesTypes:LineMarker -Style:ColorGradient `
    -Title:'Sport - Chart' -TitleFont:'Tahoma,12,Bold,Italic';

New-SpreadChart -DataTableName:'Sports' `
    -ChartSheetName:'Sports - Chart' -Replace `
    -ChartType:LineMarker -Style:ColorGradient -BackColor:Beige `
    -Title:'Sports' -TitleFont:'Segoe,12,Bold,Italic,Blue' `
    -Series:@(
        @{Name='Events'; Arguments='Sport'; Values='EventCount'; Type='LineMarker'}, `
        @{Name='Athletes'; Arguments='Sport'; Values='AthleteCount'; Type='LineMarker'}, `
        @{Name='Teams'; Arguments='Sport'; Values='TeamCount'; Type='LineMarker'; AxisGroup='Secondary'} );
Enter fullscreen mode Exit fullscreen mode

SimpleChart

Chart

Currently setting Title in New-SpreadChart does not work correctly, will be fixed in next version

New functions in Spreadsheet

SpreadCommander adds some new functions to Spreadsheet. These new functions add more power to Spreadsheet, particularly I find to be useful functions related to Regular Expressions. These functions are specific to SPreadCommander and do not work in Microsoft Excel.

Regular Expressions

  • REGEX.ISMATCH Indicates whether the regular expression finds a match in a specified input string.
  • REGEX.MATCH Searches the specified input string for the first occurrence of the regular expression.
  • REGEX.MATCHES Array function. Searches the specified input string for all (up to size of the array) occurrences of a regular expression.
  • REGEX.NAMEDMATCH Searches the specified input string for the first occurrence of the regular expression and returns value of the named group.
  • REGEX.NAMEDMATCHES Array function. Searches the specified input string for all (up to size of the array) occurrences of a regular expression and returns value of the named group.
  • REGEX.REPLACE Within a specified input string, replaces all strings that match a regular expression pattern with a specified replacement string.
  • REGEX.SPLIT Splits the specified input string at the positions defined by a regular expression pattern.

String

  • STRING.FORMAT Replaces one or more format items in a specified string with the string representation of a specified object.

Hash

  • HASH.MD5 Calculates MD5 hash of the string.
  • HASH.SHA1 Calculates SHA-1 hash of the string.
  • HASH.SHA256 Calculates SHA-256 hash of the string.
  • HASH.SHA384 Calculates SHA-384 hash of the string.
  • HASH.SHA512 Calculates SHA-512 hash of the string.

Path

  • PATH.CHANGEEXTENSION Changes the extension of a path string.
  • PATH.COMBINE Combines multiple strings into a path.
  • PATH.GETDIRECTORYNAME Returns the directory information for the specified path string.
  • PATH.GETEXTENSION Returns the extension of the specified path string.
  • PATH.GETFILENAME Returns the file name and extension of the specified path string.
  • PATH.GETFILENAMEWITHOUTEXTENSION Returns the file name of the specified path string without the extension.

GUID

  • NEWID Returns as new instance of GUID formatted using optional format specifier (N, D, B, P, X).

Formatting

Output to Spreadsheet and grid allow to specify conditional formatting.

format table [Athlete Events] column [Event] for "Contains([Event], 'Gymnastics')" with BackColor=Green, ForeColor=White;
format table [Sports] column [EventCount] with IconSet=Signs3;
Enter fullscreen mode Exit fullscreen mode

Following are supported:

Expression

  • Determines condition to select rows that should be formatted.
format table [Athlete Events] column [Event] for "Contains([Event], 'Gymnastics')" with BackColor=Green, ForeColor=White;
Enter fullscreen mode Exit fullscreen mode

DataBar

  • Applies format using a DataBar. Bar length changes proprotionally to a cell value.
format table [Sports] column [TeamCount] with Rule=DataBar, BackColor='LightGreen', BackColor2='LightGray', Gradient=Horizontal;
Enter fullscreen mode Exit fullscreen mode

IconSet

  • IconSet allows to classify values into ranges and display specific icon according to the range.
format table [Sports] column [EventCount] with IconSet=Signs3;
Enter fullscreen mode Exit fullscreen mode

ColorScale

  • Allows to disploay data distribution and variation using a gradation of 2 or 3 colors.
format table [Sports] column [AthleteCount] with ColorScale='Red,LightGreen,White'
Enter fullscreen mode Exit fullscreen mode

AboveAverage, BelowAverage, AboveOrEqualAverage, BelowOrEqualAverage, Unique, Duplicate

  • Applies format if value corresponds specified condition.
format table [Sports] column [EventCount] with Rule=AboveAverage, BackColor=LightGreen
Enter fullscreen mode Exit fullscreen mode

Above conditions work slow on large tables.

Top, Bottom

  • Applies format if value belongs to top or bottom N values. N can be a number or a percent.
format table [Sports] column [Weight] with Rule=Top, Rank='10%', BackColor=LightGreen
Enter fullscreen mode Exit fullscreen mode

Above conditions work slow on large tables.

DateOccuring

  • Applies a format if value refers to specific date or date interval relative to today. Day intervalus include Beyond, BeyondThisYear, Earlier, EarlierThisMonth, EarlierThisWeek, EarlierThisYear, Empty, LastWeek, LaterThisMonth, LaterThisWeek, LaterThisYear, MonthAfter1, MonthAfter2, MonthAgo1, MonthAgo2, MonthAgo3, MonthAgo4, MonthAgo5, MonthAgo6, NextWeek, PriorThisYear, SpecificDate, ThisMonth, ThisWeek, Today, Tomorrow, User, Yesterday.
format table [Table1] column [Value] with DateOccuring=Today, BackColor=Yellow
Enter fullscreen mode Exit fullscreen mode

Rule, Comparison

  • Applies a format if value meets specified comparison condition. Comparison types include Between, Equal, Expression, Greater, GreaterOrEqual, Less, LessOrEqual, NotBetween, NotEqual.
format table [Athlete Events] column [Season] with Condition=Equal, Value1='Summer', BackColor=Green, ApplyToRow=true
Enter fullscreen mode Exit fullscreen mode

Most format conditions> (except DataBar, IconSet and ColorScale) support properties that specify
appearance These properties include:

  • BackColor
  • BackColor2 (to use with Gradient)
  • BorderColor
  • Font
  • Gradient (requires BackColor2>, one of following values: Horizontal, Vertical, ForwardDiagonal, BackwardDiagonal)

Top comments (0)