DEV Community

Awasume Marylin
Awasume Marylin

Posted on

MY DATA ANALYTICS JORNAL

WEEK 2 OF MY DATA ANALYTICS JOURNEY
This week i learnt a all about PREPARE PHASE OF DATA ANALYTICS. 
DATA EXPLORATION. After asking the right questions whuch we saw in the previous days, PREPARING DATA CORRECTLY is the next step.
N.B Data works best when it is organized.
 Every piece of information is data and they are generated as a result of our activities in the world. We have SURVEY DATA which is generated using forms. INTERVIEWS can also generate data.
 
 💠 Forms, Questionaires and Surveys are commonly used 
 ways to collect and generate data.
 💠Data that is generated online doesnt always happen directly
 They use COOKIES which help them remember your preferences
 
 ⭕ COOKIES are small files stored on computer that contains information about users.
 There are 3 different data sources which include:
➡️ First party data which is data collected by an individual or group using their own resources. This is typically the prefered method of collecting data because you know exactly where it came from.
➡️ Second part data which is collected by a group directly from its audience. 
➡️ Third part data which is data collected from outside sources who did not collect it directly. This source type might not be very reliable.
⭕ A SAMPLE is part of a population that is representative of the population.
 I learnt that selecting the right data is a key important in data analysis.
 💠 When solving your business problems be sure to select data that can 
 solve your problem question.
 💠 How much data to collect: If you are collecting your own data make 
 reasonable decisions about sample size.
 💠 Time frame: If you are collecting your own data, decide how long you 
 will like to collect it especially if you are tracking trends over a long
 period of time.
 
 DATA COLLECTION CONSIDERATION
 ‼️ Select the right type of data
 ‼️ Determine the time frame
 ‼️ Decide how data will be collected (collection o new data). 
 Then decide how much data to be collected.
 OR 
 ‼️ Choose data source (use existing data).
 Deside what data to use.
Learned about the different data formats which include
 ➡️DESCRETE DATA which is data that is counted and has a 
 limted number of values.
 ➡️ CONTINUOUS DATA is data that can have any numeric value
 ➡️ NORMINAL DATA is a type of qualitative data that is categorized with 
 out a set order
 ➡️ ORDINAL DATA is a type of qualitative data with out a set order or 
 scale
 ➡️ INTERNAL DATA which is data that lives within a company's
 own systems 
 ➡️ EXTERNAL DATA which is data that doesnt live within a company's
 own system
 ➡️ STRUCTURED DATA is data that is organized in a certain format such 
 as row and columns
 ➡️ UNSTRUCTURED DATA is data that is not organized in a certain format 
 audio,video and images
 
Data model is a model that is used for organizing data elements and how they relate to one another. Structured data works nicely within a data model.
Data elements are pieces of information such as peoples names, account numbers and addresses.
PHYSICAL DATA MODELING depicts how a database operates. A physical data model defines all entities and attributes used e.g it includes tables, names, columns name, and data types for the database.
 2 common methods of approaching data modeling techniques include
 ‼️ ENTITY RELATIONSHIP DIAGRAM (ERD) which are a visual way to 
 understand the relationship between entities in the data model.
 ‼️ UNIFIED MODELING LANGUAGE (UML) diagrams are very detailed 
 diagrams that describe the structure of a system by showing the system's
 entities, attributes, operations and their relationships.
 Data type is a specific kind of data attribute that tells what kind of value the data is. It tells you what type of data you are working with. Data types can be dfferent depending on the query language you are using.
 ➡️ BOOLEAN DATA TYPE is a type of data with only 2 possible values
 which include; AND, OR, NOT, TRUE, FALSE
 ⭕ AND OPERATOR is a symbol that lets you stack both your conditions
 ⭕ OR OPERATOR lets you move forward if either one of your 2
 conditions are met
 ⭕ NOT OPERATOR lets you filter by subtracting specific conditions from
 the results.
 WIDE DATA: Is a data set in which every data subject has a single row with multiple columns to hold the values of the various attributes to the subject
 LONG DATA: Is data in which each row represents one observation per subect , so each subect will be represented by multiple rows.
DATA TRANFORMATION whic is the process of changing the data's format, structure, or values. It includes;
 
 💠 Adding, copying and replicating data
 💠 Deleting fields or records
 💠 Standardizing the names of variables
 💠 Renaming, moving or combining colums in a database
 💠 Joining one set of data with another 
 💠 Saving file in a different format
 There are goals for data transformation. They are
 
 ➡️ Data orgaization
 ➡️ Data migration
 ➡️ Data enhancement
 ➡️ Data comparison
 DATA BIAS is a type of error that systematically skews results in a certain direction. There are 4 types of bias
 -SAMPLE BIAS is when a sample isnt representative of the population as a 
 whole. 
 - OBSERVER BIAS (experiment bias/ research bias). The tendency for 
 different people to observe things differently.
 - CONFIRMATION BIAS is the tendency to search for or interprete 
 information in a way that confirms pre-existing beliefs
 - INTERPRETATION BIAS the tendency to always interprete 
 ambiguous situations in a positive or negative way.
 In order for you to indentify good data sources, you need to follow these practices;
 ⭕ Your data must be RELIABLE (good source)
 ⭕ Your data must be ORIGINAL
 ⭕ Your data must be COMPREHENSIVE
 ⭕ Your data must be CURRENT (upto date)
 ⭕ Your data must be CITED
 If you dont follow this practices you will have bad data.
 Just as humans have ethics, data also has ethics. DATA ETHICS is a well founded standard of right and wrong that dictates how data is collected, shared and used.
 
 There are various aspects of data ethics which include;
 - Ownership: Individuals own the raw data. They provide and they have primary control over its usage, how its processed and how its shared.
 - Transcation transparency: All data processing activities and algorithm should be completely explainable and understood by the individual who provides their data 
 - Consent: An individual's right to know explicit details about how and why their data will be used before agreeing to provide it.
 - Currency: individuals should be aware of financial transactions resulting from the use of their personal data and the scale o these transactions.
 - Privacy: Preserving a data subject's information and activities any time a data transaction occurs.
 DATA ANONYMIZATION is the process of protecting people's privaye or sensitieve data by eliminating PERSONAL IDENTIFIABLE INFORMATION (PII) which is information that can be used by itself or with other data to track down a person's identity.
OPEN DATA and its features.
 OPEN DATA refers to free access, usage and sharing of data. Features of 
 include;
 -Open data must be available as a whole. 
 -It must be provided under terms that allow reuse and distribution 
 including the ability to use it with other data sets.
 DATA INTEROPERABILITY is the ability of data systems and services to openly connect and share data.
DATABASE is a collection of data stored in a computer system. Data base has 4 features. These features include;
 
 ⭕ PRIMARY KEY is a unique identifier of each record within a table.
 ⭕ FOREIGN KEY is a field within a table that is a primary key in 
 another table.
 ⭕ RELATIONAL DATABASE is a database that contains a series of tables
 that can be connected to form relationship. Basically they allow 
 analyst to organize and link data based on what the data has in 
 common.
 ⭕ NORMALIZATION is a process of organizing data in relational
 database. For example creating tables and establishing 
 relationships between these tables. It is applied to eliminate
 data redundancy, increase intergrity and reduce complexity
 in a database.
I also learnt about META DATA. META DATA is data about data, it is used in database management to help data analyst interpret the contents of data within the database. 
 There are 3 types of meta data.
 ➡️ DESCRIPTIVE META DATA describes a piece of data and can be used to 
 identify it at a later point in time.
 ➡️ STRUCTURAL META DATA it indicates how a piece of data is organized
 and whether it is part of one or more than one data collection
 ➡️ ADMINSTRATIVE META DATA it indicates the technical sources of a
 digital asset.
META DATA provides information about other data and helps data analyst interprete the contents of the dat within a database it tells the Who, What, When, Where, Which, Why and How of the data. You can find meta data in PHOTOS, EMAILS, SPREADSHEETS AND ELECTRONIC CREATED DOCUMENTS.
 Elements of meta data include;
 
💠 File or document type
💠 Date, time and creator
💠Title and description
💠 Geolocation
💠 Tags and categories
💠 Who last modified it and when
💠 Who can access or update it.
 The benefits of meta data is that it makes data RELIABLE AND CONSISTENT
META DATA REPOSITORIES helps data analyst ensure their data is reliable and consistent. They describe where the meta data came from and store that data in accessible form with a a common structure.
BIG QUERY which is a data warehouse on the google cloud platform used to query and filter large datasets.
 
 Most importantly i learnt how to use a big query and run data from datasets using the conditions
⭕ SELECT
 ⭕ FROM
 ⭕ WHERE
 These are STRUCTURED QUERY LANGUAGE that helps you to easily get a specific data (e.g customer ID, or number of customers who bought sugar balls from the bakery) from a large range of data, hence making your analysis process easy. They also help you to retrieve and manipulate data from a database.
ORGANIZING and SAVING DATA.
 
There are best practices organisations can follow when organizing data. They are seen below
 ⭕ NAMING CONVENTIONS which are consistent guidelines that describe 
 the content, date, or version of a fle in its name. Means using logical and 
 descriptive names for your files to make them easier to find and use
 ⭕ FOLDERING which is organizing files into folders which helps to easily
 find your files. 
 ⭕ ARCHIVING CONVENTION can be used to move old projects to a separate 
 location to create, archive and cut down or clutter
 ⭕ ALIGN YOUR NAMING and STORAGE practices with your team
 ⭕ Develope META DATA practices like creating a file that outlines project
 naming conventions for easy reference.
 Storing data in different places (database or spreadsheet) can make the data 
 to contradict itself and also takes up alot of space. 
RELATIONAL DATABASE can you avoid data duplication and store data more effeciently. 
 File-naming conventions help you organize access, process and analyze data because they act as a quick refernce points to identify what's in the file.
 One important practice is to decide on file naming conventions as a team or compnay early in a project. It's also critical to ensure that file names are meaningful, consistent, and easy to read. File names should include;
 
 ➡️ Project's name
 ➡️File creation data 
 ➡️ Revision version
 ➡️ consistent style and order
 DATA SECURITY is protecting data from unauthorized access or corruption by adopting safety measures. 
 - Excel and Google sheets both have features that help you protect spreadsheets from being edited from the entire worksheet right down to single cells in a table.
 - They both have access control features like password protection and user permissions.
 
 We can both keep our data safe and still have access to use them at anytime we want to use them for analysis. Security measures that can help companies do the 2 are;
 
 ‼️ ENCRYPTION. Uses a unique algorithm to alter data and make it unusable ny users and applications that dont know the algorithm. This algorithm is saved as 'KEY' which can be used to reserve the encryption so if you have the key you can still use data in its original form.
 
 ‼️ TOKENIZATION. Replaces the data elements you want to protect with randomly generated data refered to as TOKEN.The original data is stored in a separate location and mapped to the tokens. 
 
VERSION CONTROL enales all collaboratores within a file to track changes over time. You can understand who made what changes to a file, when they were made and why.

Billboard image

The fastest way to detect downtimes

Join Vercel, CrowdStrike, and thousands of other teams that trust Checkly to streamline monitoring.

Get started now

Top comments (0)

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs

👋 Kindness is contagious

Dive into an ocean of knowledge with this thought-provoking post, revered deeply within the supportive DEV Community. Developers of all levels are welcome to join and enhance our collective intelligence.

Saying a simple "thank you" can brighten someone's day. Share your gratitude in the comments below!

On DEV, sharing ideas eases our path and fortifies our community connections. Found this helpful? Sending a quick thanks to the author can be profoundly valued.

Okay