Disclaimer: I am author of SpreadCommander
SpreadCommander is an office program with tools for data analysis.
Traditional output from scripts is console. SpreadCommander provides output to rich console, i.e. console that allows formatted output with images, charts and more. However often data analysis requires to work with tabular data. Most convenient and common way to work with tables is Spreadsheet. In addition to rich console, SpreadCommander's console includes Spreadsheet console and allows output data from script (data tables, lists, collections) directly into spreadsheet where these data can be analyzed.
Data for this article were downloaded from https://www.kaggle.com/heesoo37/olympic-history-data-a-thorough-analysis?select=athlete_events.csv, downloading requires registration.
Cmddlet Out-SpreadTable
This cmdlet takes DataTable, list or collection of objects as data source and generates new sheet and table in Spreadsheet console output.
$dt = Import-DelimitedText '~\Data\athlete_events.csv';
,$dt | Out-SpreadTable -SheetName:'Athlete Events' `
-TableName:'Athlete Events' -TableStyle:Medium19 -Replace;
$dt |
group -Property:'Sport' |
sort -Property:'Count' -Descending |
select -Property:@(
@{ Label='Sport'; Expression = { $_.Name }},
@{ Label='EventCount'; Expression = { $_.Count }},
@{ Label='TeamCount'; Expression = { [Linq.Enumerable]::Count([Linq.Enumerable]::Distinct([Linq.Enumerable]::Select($_.Group, [Func[object, object]]{ $args[0].Team }))) }},
@{ Label='AthleteCount'; Expression = { [Linq.Enumerable]::Count([Linq.Enumerable]::Distinct([Linq.Enumerable]::Select($_.Group, [Func[object, object]]{ $args[0].ID }))) }} ) |
Out-SpreadTable -SheetName:'Sports' `
-TableName:'Sports' -TableStyle:Medium19 -Replace `
-Formatting: `
"format table [Sports] column [EventCount] with IconSet=Signs3;
format table [Sports] column [TeamCount] with Rule=DataBar, BackColor='LightGreen', BackColor2='LightGray', Gradient=Horizontal;
format table [Sports] column [AthleteCount] with ColorScale='Red,LightGreen,White';
format table [Sports] column [EventCount] with Rule=AboveAverage, BackColor=LightGreen";
Sending DataTable inside collection (,dt instead of just dt) allows cmdlet to process DataTable itslef, otherwise PowerShell sends list of DataRow converted to PSObject as data source. Result is the same but variant with collection is much faster.
Cmdlet New-SpreadPivot
Once data are sent to Spreadsheet they can be analyzed different ways. One of these ways is to generate Pivot table. SpreadCommander allows to generate Pivot table from PowerShell script.
New-SpreadPivot -DataTableName:'Athlete Events' `
-PivotSheetName:'Athlete Events - Pivot' -PivotTableName:'Athlete Events - Pivot' `
-RowFields:'Team' -ColumnFields:'Sport' -DataFields:'ID' -SummarizeValuesBy:Count `
-Layout:Outline -MergeTitles -Style:Medium19 `
-Formatting:'format with DataBar=Red, Gradient=true' -Replace;
Cmdlets New-SimpleSpreadChart, New-SpreadChart
Another way to analyze data is to add Chart to Spreadsheet. SpreadCommander has 2 cmdlets to add charts - one has simpler syntax, another one is more powerful.
New-SimpleSpreadChart -DataTableName:'Sports' `
-ChartSheetName:'Sports - SimpleChart' -Replace -ChartType:LineMarker `
-Arguments:'Sport' -Values:'EventCount','TeamCount','AthleteCount' `
-AxisGroups:'Primary','Secondary','Primary' `
-SeriesTypes:LineMarker -Style:ColorGradient `
-Title:'Sport - Chart' -TitleFont:'Tahoma,12,Bold,Italic';
New-SpreadChart -DataTableName:'Sports' `
-ChartSheetName:'Sports - Chart' -Replace `
-ChartType:LineMarker -Style:ColorGradient -BackColor:Beige `
-Title:'Sports' -TitleFont:'Segoe,12,Bold,Italic,Blue' `
-Series:@(
@{Name='Events'; Arguments='Sport'; Values='EventCount'; Type='LineMarker'}, `
@{Name='Athletes'; Arguments='Sport'; Values='AthleteCount'; Type='LineMarker'}, `
@{Name='Teams'; Arguments='Sport'; Values='TeamCount'; Type='LineMarker'; AxisGroup='Secondary'} );
Currently setting Title in New-SpreadChart does not work correctly, will be fixed in next version
New functions in Spreadsheet
SpreadCommander adds some new functions to Spreadsheet. These new functions add more power to Spreadsheet, particularly I find to be useful functions related to Regular Expressions. These functions are specific to SPreadCommander and do not work in Microsoft Excel.
Regular Expressions
- REGEX.ISMATCH Indicates whether the regular expression finds a match in a specified input string.
- REGEX.MATCH Searches the specified input string for the first occurrence of the regular expression.
- REGEX.MATCHES Array function. Searches the specified input string for all (up to size of the array) occurrences of a regular expression.
- REGEX.NAMEDMATCH Searches the specified input string for the first occurrence of the regular expression and returns value of the named group.
- REGEX.NAMEDMATCHES Array function. Searches the specified input string for all (up to size of the array) occurrences of a regular expression and returns value of the named group.
- REGEX.REPLACE Within a specified input string, replaces all strings that match a regular expression pattern with a specified replacement string.
- REGEX.SPLIT Splits the specified input string at the positions defined by a regular expression pattern.
String
- STRING.FORMAT Replaces one or more format items in a specified string with the string representation of a specified object.
Hash
- HASH.MD5 Calculates MD5 hash of the string.
- HASH.SHA1 Calculates SHA-1 hash of the string.
- HASH.SHA256 Calculates SHA-256 hash of the string.
- HASH.SHA384 Calculates SHA-384 hash of the string.
- HASH.SHA512 Calculates SHA-512 hash of the string.
Path
- PATH.CHANGEEXTENSION Changes the extension of a path string.
- PATH.COMBINE Combines multiple strings into a path.
- PATH.GETDIRECTORYNAME Returns the directory information for the specified path string.
- PATH.GETEXTENSION Returns the extension of the specified path string.
- PATH.GETFILENAME Returns the file name and extension of the specified path string.
- PATH.GETFILENAMEWITHOUTEXTENSION Returns the file name of the specified path string without the extension.
GUID
- NEWID Returns as new instance of GUID formatted using optional format specifier (N, D, B, P, X).
Formatting
Output to Spreadsheet and grid allow to specify conditional formatting.
format table [Athlete Events] column [Event] for "Contains([Event], 'Gymnastics')" with BackColor=Green, ForeColor=White;
format table [Sports] column [EventCount] with IconSet=Signs3;
Following are supported:
Expression
- Determines condition to select rows that should be formatted.
format table [Athlete Events] column [Event] for "Contains([Event], 'Gymnastics')" with BackColor=Green, ForeColor=White;
DataBar
- Applies format using a DataBar. Bar length changes proprotionally to a cell value.
format table [Sports] column [TeamCount] with Rule=DataBar, BackColor='LightGreen', BackColor2='LightGray', Gradient=Horizontal;
IconSet
- IconSet allows to classify values into ranges and display specific icon according to the range.
format table [Sports] column [EventCount] with IconSet=Signs3;
ColorScale
- Allows to disploay data distribution and variation using a gradation of 2 or 3 colors.
format table [Sports] column [AthleteCount] with ColorScale='Red,LightGreen,White'
AboveAverage, BelowAverage, AboveOrEqualAverage, BelowOrEqualAverage, Unique, Duplicate
- Applies format if value corresponds specified condition.
format table [Sports] column [EventCount] with Rule=AboveAverage, BackColor=LightGreen
Above conditions work slow on large tables.
Top, Bottom
- Applies format if value belongs to top or bottom N values. N can be a number or a percent.
format table [Sports] column [Weight] with Rule=Top, Rank='10%', BackColor=LightGreen
Above conditions work slow on large tables.
DateOccuring
- Applies a format if value refers to specific date or date interval relative to today. Day intervalus include Beyond, BeyondThisYear, Earlier, EarlierThisMonth, EarlierThisWeek, EarlierThisYear, Empty, LastWeek, LaterThisMonth, LaterThisWeek, LaterThisYear, MonthAfter1, MonthAfter2, MonthAgo1, MonthAgo2, MonthAgo3, MonthAgo4, MonthAgo5, MonthAgo6, NextWeek, PriorThisYear, SpecificDate, ThisMonth, ThisWeek, Today, Tomorrow, User, Yesterday.
format table [Table1] column [Value] with DateOccuring=Today, BackColor=Yellow
Rule, Comparison
- Applies a format if value meets specified comparison condition. Comparison types include Between, Equal, Expression, Greater, GreaterOrEqual, Less, LessOrEqual, NotBetween, NotEqual.
format table [Athlete Events] column [Season] with Condition=Equal, Value1='Summer', BackColor=Green, ApplyToRow=true
Most format conditions> (except DataBar, IconSet and ColorScale) support properties that specify
appearance These properties include:
- BackColor
- BackColor2 (to use with Gradient)
- BorderColor
- Font
- Gradient (requires BackColor2>, one of following values: Horizontal, Vertical, ForwardDiagonal, BackwardDiagonal)
Top comments (0)