DEV Community

Cover image for Data Breach Analysis

Posted on • Originally published at Data Breach Analysis


A women's fashion retailer SHEIN, also spelled SheIn, is a US-based online store that had apparently suffered a data breach somewhere in June 2018, but the company only discovered the breach in late August 2018. SHEIN stated that the intruders managed to gain access to customers' email addresses and encrypted passwords.

What data is at risk?

When the data breach was discovered, SHEIN stated that the hackers managed to gain access to email addresses and encrypted passwords that were stored in the system, but the leaked data does not contain any signs of encryption - it is likely that the passwords were decrypted before publishing the data.

Email addresses

In this data breach, there is a very wide array of email providers being used. Lets take a look:

# Email Domain Quantity
1 13,679,190
2 4,019,832
3 2,192,258
4 729,539
5 HOTMAIL.FR 528,368
6 526,108
7 403,258
8 401,925
9 318,435
10 313,180
11 297,622
12 226,573
13 195,841
14 182,469
15 156,084
16 131,248
17 110,296
18 106,191
19 104,798
20 104,342
21 102,292
22 97,320
23 95,561
24 94,348
25 85,934
26 84,970
27 83,819
28 81,731
29 73,624
30 72,114
31 68,649
32 68,064
33 63,208
34 63,159
35 57,809
36 52,470
37 51,544
38 46,147
39 45,914
40 44,002
41 gmail.con 43,782
42 38,499
43 38,245
44 37,974
45 37,381
46 35,944
47 33,423
48 32,642
49 31,715
50 31,586
51 31,243
52 30,401
53 30,082
54 28,461
55 28,223
56 26,302
57 26,123
58 26,116
59 25,320
60 25,123
61 24,858
62 24,577
63 24,125
64 24,023
65 23,852
66 23,120
67 20,452
68 19,979
69 19,002
70 hotmail.con 18,627
71 18,206
72 18,109
73 18,080
74 17,910
75 17,908
76 17,629
77 16,991
78 16,093
79 16,037
80 14,134
81 13,796
82 12,907
83 12,612
84 12,477
85 12,295
86 12,278
87 11,201
88 11,021
89 9,594
90 9,006
91 8,947
92 8,857
93 8,549
94 8,492
95 8,491
96 8,306
97 7,984
98 7,926
99 7,674
100 7,624

The length of the chosen email addresses in this data breach also varies widely - if we take a range from the smallest number to the largest we can see that:

  • The smallest amount - 7 emails were more than or equal to 100 characters in length;
  • There's 11 emails which were less than or equal to 5 characters in length;
  • 13 emails which contained more than or equal to 90 characters in length;
  • 25 emails which contained more than or equal to 80 characters in length;
  • 117 emails which contained more than or equal to 70 characters in length;
  • 178 emails which contained more than or equal to 60 characters in length;
  • 385 emails which contained more than or equal to 50 characters in length;
  • 10,183 emails which contained more than or equal to 40 characters in length;
  • 16,755 emails which contained less than or equal to 10 characters in length;
  • 843,073 emails which contained more than or equal to 30 characters in length;
  • 9,848,312 emails which contained less than or equal to 20 characters in length;
  • 22,322,666 emails which contained more than or equal to 20 characters in length.

Looking at the top-level domains (TLDs), we can also create a list of countries that SheIn users were using the service from:

# Email Domain Quantity Purpose / Country
1 .com 17,699,022 Commercial / United States
2 .edu 1,813 Education
3 .net 85,934 Network Infrastructure
4 .de 403,258 Germany
5 .fr 754,941 France
6 .au 24,125 Australia
7 .it 110,296 Italy
8 .ru 526,108 Russia
9 .uk 313,180 United Kingdom
10 .es 83,819 Spain
11 .pl 45,119 Poland
12 .con 43,782 None, probably misspelled
13 .br 12,295 Brazil
14 .ca 8,492 Canada
15 .nl 17,908 The Netherlands
16 .mx 11,021 Mexico
17 .co 8,491 Colombia
18 .no 5,712 Norway
19 .be 2,130 Belgium
20 .in 35,944 India
21 .se 7,674 Sweden
22 .at 6,910 Austria
23 .ch 4,639 Switzerland
24 .dk 2,675 Denmark
25 .nz 2,321 New Zealand
26 .pt 2,243 Portugal
27 .ar 2,229 Argentina
28 .tw 63,159 Taiwan
29 .ae 1,532 United Arab Emyrates
30 .cz 12,907 Czech Republic
31 .cn 1,393 China
32 .bg 7,984 Bulgaria
33 .gr 4,178 Greece
34 .cim 3,815 None, probably misspelled
35 .ua 828 Ukraine
36 .hu 3,141 Hungary
37 .eu 2,393 European Union
38 .cm 1,945 None, probably misspelled
39 .sk 1,813 Slovakia
40 .sa 30,082 Saudi Arabia
41 .ie 1,496 Ireland
42 .ro 1,330 Romania
43 .fm 1,221 Federated States of Micronesia
44 .id 1,206 Indonesia
45 .cl 1,200 Chile
46 .om 1,188 Oman
47 .lv 6,980 Latvia
48 .comm 1,177 None, probably misspelled
49 .me 1,029 Montenegro
50 .qa 1,003 Qatar
51 .clm 853 None, probably misspelled
52 .fi 840 Finland
53 .ee 773 Estonia
54 .ph 2,847 The Philippines
55 .by 736 Belarus
56 .cpm 714 None, probably misspelled
57 .cat 703 Catalonia
58 .hr 699 Croatia
59 .XOM 621 None, probably misspelled
60 .fe 598 Footballia
61 .vn 2,206 Vietnam
62 .cok 586 None, probably misspelled
63 .il 2,202 Israel
64 .te 562 None, probably misspelled
65 .jp 1,928 Japan
66 .come 1,858 None, probably misspelled
67 .vom 1,615 None, probably misspelled
68 .hk 16,093 Hong Kong
69 .col 1,517 None, probably misspelled
70 .sg 1,464 Singapore

Here's the letters email addresses begin with. If the analysis is being run on a database with duplicates, the results show that there are 29,026,175 email addresses that begin with letters. The most popular letter is R followed by the letter A, which is followed by the letter S. Email addresses beginning with letters contain 99.05978747356848% of the entire user base:

# The letter an email address begins with Quantity
1 A 3,206,739
2 B 1,187,451
3 C 1,770,137
4 D 1,195,226
5 E 1,053,108
6 F 670,340
7 G 842,864
8 H 872,318
9 I 567,572
10 J 1,497,023
11 K 1,524,405
12 L 1,795,120
13 M 3,133,130
14 N 1,267,323
15 O 300,603
16 P 997,513
17 Q 56,536
18 R 1,308,177
19 S 3,101,369
20 T 1,007,586
21 U 107,682
22 V 635,428
23 W 293,056
24 X 96,957
25 Y 306,122
26 Z 232,390

Now that letters have been covered, we could also take a look at the numbers. It should be noted that email addresses beginning with numbers are much less prevalent than those beginning with letters. Combined, there are just 213,390 email addresses that begin with numbers - that's less than 1% of the entire user base. Email addresses beginning with numbers contain 0.7282519329186425% of the total entries in the SheIn data breach.

The number an email address begins with Quantity
0 17,052
1 63,972
2 39,719
3 17,427
4 11,964
5 9,447
6 8,266
7 15,081
8 14,165
9 16,337

0.2119605935128775% of the email addresses in the SheIn data breach did not start with any numbers or letters - that's exactly 62,108 accounts if we check the records against the database with duplicate entries or slightly more than 58,457 accounts if we check the records against the database without duplicate entries - the exact record count then would be 58,457.41329595996.


There is a very interesting password distribution in the SheIn data breach - there are hundreds of different passwords that have been used by multiple different people. Of course, there are the ordinary combinations, but there are also thousands of passwords like "sheinside" potentially meaning that the users who chose such a password probably thought of it on-the-spot or "shein18" and "Shein2018", potentially meaning that the users created their accounts in 2018. There were also 293,688 users that used multiple empty spaces as their passwords. Here's the list:

# Password Quantity
1 290,394
2 123456 89,122
3 123456789 41,637
4 1234567890 22,968
5 12345678 20,673
6 Shein123 13,773
7 shopping 11,664
8 1234567 11,634
9 password 11,298
10 123123 11,155
11 aa123456 11,072
12 sheinside 10,063
13 shein 7,978
14 1234 7,297
15 12345 7,153
16 11223344 6,767
17 shein1 6,679
18 112233 5,874
19 0987654321 5,415
20 111111 5,281
21 1122334455 5,071
22 123321 4,781
23 Aa123123 4,742
24 qwerty 4,737
25 Shein2018 4,715
26 sheinshein 4,403
27 qwert 3,949
28 qwertyuiop 3,904
29 123123123 3,902
30 Aa112233 3,881
31 Aa11223344 3,785
32 1234512345 3,737
33 shein2017 3,682
34 onedirection 3,542
35 password1 3,473
36 iloveyou 3,295
37 3,294
38 qwer1234 3,156
39 12344321 3,086
40 azerty 2,979
41 12345678910 2,934
42 chocolate 2,920
43 motdepasse 2,885
44 abc123 2,784
45 sunshine 2,754
46 princess 2,745
47 asDF1234 2,662
48 asdfghjkl 2,586
49 000000 2,567
50 shein@123 2,554
51 2,547
52 loulou 2,524
53 SheIn2016 2,522
54 Mm123456 2,515
55 1234554321 2,456
56 as123456 2,401
57 987654321 2,399
58 qwerty123 2,389
59 shein1234 2,381
60 justinbieber 2,363
61 112233445566 2,354
62 abcd1234 2,330
63 shopping1 2,329
64 chouchou 2,313
65 doudou 2,289
66 654321 2,276
67 passwort 2,267
68 hallo123 2,254
69 chocolat 2,246
70 121212 2,204
71 forever21 2,176
72 hellokitty 2,165
73 Aa12341234 2,126
74 ichliebedich 2,110
75 clothes 2,092
76 ss123456 2,024
77 fashion 1,934
78 incorrect 1,888
79 shopping123 1,881
80 Aa123456789 1,877
81 hello123 1,849
82 12345678900 1,842
83 soleil 1,778
84 12341234 1,766
85 charlotte 1,756
86 compras 1,735
87 michelle 1,715
88 11111111 1,707
89 butterfly 1,704
90 Rr123456 1,701
91 azertyuiop 1,661
92 shein18 1,651
93 sheinpassword 1,633
94 Password123 1,621
95 charlie 1,620
96 Aa1234567 1,618
97 zxcvbnm 1,600
98 20092012 1,592
99 123456aA 1,590
100 welcome1 1,586

It should also be noted that the system contained 3,294 one-character passwords meaning that it is probably safe to assume that SheIn did not implement many security rules to enforce password strength.

Judging by the passwords that the users chose, we can safely assume that the service has been in operation at least since 2015 and since then grown steadily - "shein2015" password has been chosen by 699 users, "shein2016" password has been chosen by 2,522 users, "shein2017" password has been chosen by 3,682 users and the "shein2018" password has been chosen by 4,715 users.

This allows us to make an assumption that the choices of year-based passwords grew by 1,823 users in 2016, by 1,160 users in 2017 and by 1,033 users in 2018. Average growth per year - 1338.666666666667 users who chose new year-based passwords, so we can assume that the service would have had approximately 2,372 new users who would have chosen new year-based passwords in 2019 and approximately 3,711 new users who would have chosen new year-based passwords in 2020.

More interesting password choices include one-character passwords like "&", "S", "43", and "(", the word "sonnenschein" has been used 1,356 times, "papillon" has been used 1,131 times, "1q2w3e4r5t" has been used 1,065 times and "ritinhasantos4" has been used 1,021 times.

We can also see that there are multiple passwords that have been used the same number of times - there are 73 of them:

# Password Quantity Password Repeat Times
1 estrella 1,001 2
2 00000 1,060 2
3 happy123 1,062 2
4 ; 1,095 2
5 Iloveshein 1,131 2
6 Aa1122334455 1,356 2
7 10203040 616 3
8 123456788 619 2
9 jesus123 625 2
10 999999 627 4
11 samantha1 631 2
12 123123AA 634 3
13 chicken 635 2
14 2 639 2
15 Computer 642 3
16 Aa100100 648 3
17 alessandro 649 3
18 Daisy123 652 4
19 lolipop 655 2
20 family 656 2
21 purple123 657 2
22 love2shop 666 3
23 ashley 667 2
24 monkey123 673 2
25 ( 676 2
26 justine 679 3
27 11223344556677 684 2
28 angela 692 2
29 123456789Aa 697 2
30 fuckyou 698 2
31 michelle1 699 2
32 224466 702 2
33 1234abcd 705 2
34 7654321 712 2
35 Mm11223344 715 2
36 123098 717 4
37 aa12345 718 3
38 131313 720 3
39 alessia 721 2
40 elizabeth1 724 2
41 beatrice 725 2
42 cooper 730 2
43 a1234567 731 2
44 buddy123 733 3
45 amandine 738 4
46 motherlode 739 2
47 090909 740 3
48 fatima 746 2
49 banana 751 2
50 hannah123 754 2
51 lovelove 757 2
52 barbie 759 2
53 88888888 773 2
54 asd123 779 3
55 asdfgh 783 2
56 112233445566778899 796 3
57 12121212 800 2
58 pepper 811 2
59 00000000 823 2
60 009988 824 3
61 aB123456 842 2
62 123456a 847 2
63 87654321 853 2
64 cocacola 860 2
65 coucou 874 2
66 123654 884 4
67 1 885 2
68 lalala 897 2
69 d 925 2
70 123455 952 2
71 Asd12345 964 2
72 marina 981 2
73 patricia 998 2

Best guess would be that these passwords were created by users who had more than one account in the system and thus, the times passwords repeated would match the count of multiple accounts the user had.

Apart from this, there are also a lot of passwords that begin with alphabetical letters and numbers. Here is the list of passwords that begin with letters:

# The letter the password begins with Quantity
1 A 1,992,010
2 B 1,298,884
3 C 1,455,374
4 D 964,988
5 E 710,719
6 F 803,599
7 G 789,006
8 H 887,892
9 I 660,077
10 J 978,300
11 K 906,390
12 L 1,342,680
13 M 2,109,455
14 N 904,379
15 O 458,681
16 P 1,213,355
17 Q 349,112
18 R 940,811
19 S 2,286,583
20 T 955,001
21 U 289,858
22 V 536,327
23 W 492,077
24 X 255,748
25 Y 381,001
26 Z 391,859

Here is the list of passwords that begin with numbers:

The number the password begins with Quantity
0 656,343
1 1,423,879
2 613,947
3 299,329
4 236,096
5 231,729
6 232,289
7 235,043
8 247,879
9 341,314

In the data dump there are 408,406 passwords that are less than or equal to 5 characters in length, 20,919,888 passwords that are less than or equal to 10 characters in length, 29,187,461 passwords that are less than or equal to 20 characters in length, 65,519 passwords that are more than or equal to 20 characters in length, 40,642 passwords that are more than or equal to 30 characters in length. There are even passwords that are more than or equal to 40 characters in length - the total count of such passwords is 48. It is very likely that the passwords that are more than or equal to 20 characters in length were generated by password managers.


To summarize, the SheIn data breach, although relatively small compared to the bigger ones, did bring a lot of damage to the company and to its customers. The good thing is that SheIn notified all of their customers that their data is at risk - they also collaborated with cybersecurity investigators who monitored the network and tried to ensure that future data breaches can be prevented.

Retry later

Top comments (0)

Retry later
Retry later