Disclaimer: I am author of SpreadCommander
SpreadCommander is an office program with tools for data analysis. This articles requires SpreadCommander 1.3 or higher.
SQL (Structured Query Language) script is an important tool to analyze relational data. It is more convenient to review data than programming script languages (PowerShell, Python, R). SpreadCommander adds client-side commands to SQL script, these commands specify how to display output retrieved from SQL Server and include naming tables, data formatting, computed columns, adding relations.
Convert CSV file to SQLite database
At first data shall be converted to SQL database format. In this sample SQLite database will be created.
Import-DelimitedText '~\Data\athlete_events.csv' -Columns:@(
@{ ColumnName = 'ID'; ColumnType = [int] },
@{ ColumnName = 'Name'; ColumnType = [string] },
@{ ColumnName = 'Sex'; ColumnType = [string] },
@{ ColumnName = 'Age'; ColumnType = [int]; NullValues = @('NA') },
@{ ColumnName = 'Heigth'; ColumnType = [float]; NullValues = @('NA') },
@{ ColumnName = 'Weigth'; ColumnType = [float]; NullValues = @('NA') },
@{ ColumnName = 'Team'; ColumnType = [string] },
@{ ColumnName = 'NOC'; ColumnType = [string] },
@{ ColumnName = 'Games'; ColumnType = [string] },
@{ ColumnName = 'Year'; ColumnType = [int]; NullValues = @('NA') },
@{ ColumnName = 'Season'; ColumnType = [string] },
@{ ColumnName = 'City'; ColumnType = [string] },
@{ ColumnName = 'Sport'; ColumnType = [string] },
@{ ColumnName = 'Event'; ColumnType = [string] },
@{ ColumnName = 'Medal'; ColumnType = [string]; NullValues = @('NA') }
) -AsDataReader |
Export-TableToDatabase 'sqlite:~\Data\athlete_events.db' `
-TableName:'Athlete_Events' -SkipAutoID -Replace `
-CreateTableScript: `
@'
create table [Athlete_Events]
(
ID integer not null,
Name nvarchar(200) null,
Sex char(1),
Age integer,
Heigth float,
Weigth float,
Team nvarchar(100),
NOC nvarchar(50),
Games nvarchar(50),
Year integer,
Season nvarchar(10),
City nvarchar(100),
Sport nvarchar(100),
Event nvarchar(100),
Medal nvarchar(10)
)
'@ `
-PostScript: `
@'
create index IX_ID on [Athlete_Events] (ID);
create index IX_Team on [Athlete_Events] (Team);
create index IX_NOC on [Athlete_Events] (NOC);
create index IX_Year on [Athlete_Events] (Year);
create index IX_Season on [Athlete_Events] (Season);
create index IX_Sport on [Athlete_Events] (Sport);
create index IX_Medal on [Athlete_Events] (Medal);
'@;
Invoke-SqlQuery 'sqlite:~\Data\athlete_events.db' `
'select * from Athlete_Events' -AsDataReader |
Out-Data -TableName:'Athlete Events' -Replace;
Connection
SpreadCommander supports SQLite, MS SQL Server, MySQL, ODBC and OLEDB data sources. Connection string can be specified at top of the SQL script.
--#connection 'sqlite:~\Data\athlete_events.db'
sqlite:
First part of connection string is Data Source. For example 'sqlite:~\Data\MyDb.sqlite' opens
connection to database located in file Data\MyDb.sqlite under
current project's directory.mssql:
First part of
connection string is Database>. Default server can be stored
in settings; if User ID is provided - it will be used, if no -
Windows authentication will be used. Connection to default server
allows to not specify it in connection string. For example
'mssql:Northwind' opens connection to database Northwind located
at default server (it has to be configured in settings) using
Windows authentication; 'mssql:Northwind;Server=RemoteServer\ServerName;User ID=user;Password=pwd'
connects to database Northwind at server specified in connection string.mysql:
First part of connection string is
Database. Default server can be stored in settings. Connection to
default server allows to not specify it in connection string. For example
'mysql:world' opens connection to database World located
at default server (it has to be configured in settings);
'mysql:World;Server=127.0.0.1;User ID=user;Password=pwd'
connects to database World at server specified in connection string.odbc:
Requires full connection string.oledb:
Requires full connection string.
Cmdlets Invoke-SqlQuery, Invoke-SqlScript, Out-Data
From PowerShell script output of SQL query or SQL script can be sent to Grid or Spreadsheet.
Invoke-SqlQuery 'sqlite:~\Data\athlete_events.db' `
'select * from Athlete_Events' -AsDataReader |
Out-Data -TableName:'Athlete Events' -Replace;
Invoke-SqlScript 'sqlite:~\Data\athlete_events.db' `
'~\Query.sql' |
Out-DataSet -Replace -NoCopyTables;
Formatting
Output to Spreadsheet and grid allow to specify conditional formatting. Formatting works in client part of SQL script too.
--#table Sports
--#computed column [AvgAthletesInTeam] float in [Sports] = 'ToFloat([AthleteCount]) / ToFloat([TeamCount])';
--#format table [Sports] column [EventCount] with IconSet=Signs3;
--#format table [Sports] column [TeamCount] with Rule=DataBar, BackColor='LightGreen', BackColor2='LightGray', Gradient=Horizontal;
--#format table [Sports] column [AthleteCount] with ColorScale='Red,LightGreen,White';
--#format table [Sports] column [EventCount] with Rule=AboveAverage, BackColor=LightGreen
--#relation Rel_Sport_Events [Sports] ([Sport]) - [Athlete_Events] ([Sport])
select Sport, count(*) as EventCount,
count(distinct Team) as TeamCount,
count(distinct ID) as AthleteCount
from Athlete_Events
group by Sport
order by count(*) desc;
Following are supported:
Expression
- Determines condition to select rows that should be formatted.
format table [Athlete Events] column [Event] for "Contains([Event], 'Gymnastics')" with BackColor=Green, ForeColor=White;
DataBar
- Applies format using a DataBar. Bar length changes proprotionally to a cell value.
format table [Sports] column [TeamCount] with Rule=DataBar, BackColor='LightGreen', BackColor2='LightGray', Gradient=Horizontal;
IconSet
- IconSet allows to classify values into ranges and display specific icon according to the range.
format table [Sports] column [EventCount] with IconSet=Signs3;
ColorScale
- Allows to disploay data distribution and variation using a gradation of 2 or 3 colors.
format table [Sports] column [AthleteCount] with ColorScale='Red,LightGreen,White'
AboveAverage, BelowAverage, AboveOrEqualAverage, BelowOrEqualAverage, Unique, Duplicate
- Applies format if value corresponds specified condition.
format table [Sports] column [EventCount] with Rule=AboveAverage, BackColor=LightGreen
Above conditions work slow on large tables.
Top, Bottom
- Applies format if value belongs to top or bottom N values. N can be a number or a percent.
format table [Sports] column [Weight] with Rule=Top, Rank='10%', BackColor=LightGreen
Above conditions work slow on large tables.
DateOccuring
- Applies a format if value refers to specific date or date interval relative to today. Day intervalus include Beyond, BeyondThisYear, Earlier, EarlierThisMonth, EarlierThisWeek, EarlierThisYear, Empty, LastWeek, LaterThisMonth, LaterThisWeek, LaterThisYear, MonthAfter1, MonthAfter2, MonthAgo1, MonthAgo2, MonthAgo3, MonthAgo4, MonthAgo5, MonthAgo6, NextWeek, PriorThisYear, SpecificDate, ThisMonth, ThisWeek, Today, Tomorrow, User, Yesterday.
format table [Table1] column [Value] with DateOccuring=Today, BackColor=Yellow
Rule, Comparison
- Applies a format if value meets specified comparison condition. Comparison types include Between, Equal, Expression, Greater, GreaterOrEqual, Less, LessOrEqual, NotBetween, NotEqual.
format table [Athlete Events] column [Season] with Condition=Equal, Value1='Summer', BackColor=Green, ApplyToRow=true
Most format conditions> (except DataBar, IconSet and ColorScale) support properties that specify
appearance These properties include:
- BackColor
- BackColor2 (to use with Gradient)
- BorderColor
- Font
- Gradient (requires BackColor2>, one of following values: Horizontal, Vertical, ForwardDiagonal, BackwardDiagonal)
Computed columns
If SQL dialect does not support functions or to allow to modify function, computed column can be generated at client.
--#computed column [AvgAthletesInTeam] float in [Sports] = 'ToFloat([AthleteCount]) / ToFloat([TeamCount])';
To modify expression of computed column - right-click on computed column's header and choose "Expression Editor" in popup menu. To review and edit all computed columns in table - select "Data\Computed Columns" in ribbon. Expression Editor allows to build expressions for computed columns in UI.
Top comments (0)