<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: R Sanjabi</title>
    <description>The latest articles on DEV Community by R Sanjabi (@rsanjabi).</description>
    <link>https://dev.to/rsanjabi</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F238924%2F1e8f9fb9-85ee-4217-85a5-e351dcfbdfca.JPG</url>
      <title>DEV Community: R Sanjabi</title>
      <link>https://dev.to/rsanjabi</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/rsanjabi"/>
    <language>en</language>
    <item>
      <title>Supporting work...</title>
      <dc:creator>R Sanjabi</dc:creator>
      <pubDate>Wed, 18 Dec 2019 01:13:58 +0000</pubDate>
      <link>https://dev.to/rsanjabi/supporting-work-d51</link>
      <guid>https://dev.to/rsanjabi/supporting-work-d51</guid>
      <description>&lt;p&gt;&lt;i&gt;My periodic accountability report for a self-study approach to learning data science. December 17, 2017&lt;/i&gt;&lt;/p&gt;

&lt;p&gt;I've written about assessment recently, but other activities are helping me in my goal to become a data scientist besides studying and project work. I talk about some of those things here. Since my last check-in two weeks ago, I've done the following:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Attended two meetups.&lt;/strong&gt;  For R Ladies, with lightning talks, and deeplearning.ai's "Pie and AI" on how to break into Data Science/Machine Learning Engineering. Evening events aren't my favorite, but I attended these two nights in a row as I start networking in earnest.  "Pie and AI" was particularly interesting because it was targeting people very much like me. I've taken deeplearning.ai's Deep Learning specialization and blogged about how much I appreciate workera's offerings for data/ai folks to test their skills. But they are also attempting to offer some sort of four-month, part-time, mentorship for machine learning engineers in the new year. They were short on details, and it sounds like it's still in the formative stage. I did put my name on the list for more information and am curious to see what they offer, but also I'm not holding my breath due to my goal of looking for full-time work starting in the new year. I will say that I'm very grateful for all the opportunities out there and do feel especially fortunate to live in the Bay Area (I mean except for the outrageous cost of living - so perhaps its a wash luck-wise).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Two webinars.&lt;/strong&gt; ODSC's &lt;a href="https://odsc.com/webinar-calendar/#previous"&gt;Evolutionary AI is the New Deep Learning&lt;/a&gt; by Babak Hodjat. This topic is one I've found interesting since grad school (genetic algorithms, anyone?), so I wanted to learn a little more about it. I also attended a webinar offered by &lt;a href="https://www.pathforward.org/"&gt;Path Forward&lt;/a&gt;, which is a non-profit that offers returnships with participating companies. Returnships are a bit like mentorships or internships for people returning to the workforce after an extended break due to caregiving. It's mostly moms returning to work, but their interpretation of who can participate and what caregiving is is broad. The webinar covered general advice for restarting your career and included a panel discussion with two women who had benefited from the returnship. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Studying.&lt;/strong&gt; I've reviewed some machine learning techniques like logistic regression, k-nearest neighbors, support vector machines, and the kernel trick. I've also done some more algorithmic coding (recursion) and watched some videos on techniques for technical interviews. I own &lt;em&gt;Cracking the Coding Interview&lt;/em&gt;  by Gayle Laakmann McDowell, and the code samples are all in Java, which I can muddle through but haven't touched in years. I prefer skipping around the book to read the strategic parts while using Hackerrank or LeetCode in python is good enough for me right now.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Project Work.&lt;/strong&gt;  I took a break from studying to do some personal project work. I'm not ready to blog about it here since I want to keep this journal entry short, and if I start talking about it, I won't stop. I will say I'm proud of my skills in scraping data, and happy that I've come far enough to know how to architect a recommendation engine from data gathering to deployment (at least at the high-level; fingers crossed on the execution). This is certainly not something I could conceive of even six months ago, let alone a year ago. I've been reading and digesting and asking questions for a while, trying to catch up with the state of tech, but I also credit the Full Stack Deep Learning boot camp with filling in the missing spots in my knowledge.&lt;/p&gt;

&lt;p&gt;Between the interview prep and the Path Forward seminar, I have some more thoughts on what to prepare for when interviewing. I'll be spending some time thinking about what these points should be.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I should be prepared to do a quick walkthrough of my resume and have a coherent and compelling path despite my lapse of time. I've already been tweaking my linked in profile to play up that trajectory.&lt;/li&gt;
&lt;li&gt;I should be doing mock interviews and have stories around two or three projects that make the case as a compelling hire.&lt;/li&gt;
&lt;li&gt;I should be prepared for a phone screen to make sure I 'm excited about a company as well as understand what the process is.&lt;/li&gt;
&lt;li&gt;Finally, I should remember to talk openly about failure. Part of me is struggling with fears of not belonging, so this one requires a certain amount of courage and equanimity. Thinking of interviewing as a reiterative process is helpful, and I'm trusting that repeated exposure to it will be a good thing.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>devjournal</category>
      <category>selftaught</category>
    </item>
    <item>
      <title>03.12.19 - Turn, return, turn</title>
      <dc:creator>R Sanjabi</dc:creator>
      <pubDate>Wed, 04 Dec 2019 00:03:40 +0000</pubDate>
      <link>https://dev.to/rsanjabi/03-12-19-turn-return-turn-2j8k</link>
      <guid>https://dev.to/rsanjabi/03-12-19-turn-return-turn-2j8k</guid>
      <description>&lt;p&gt;&lt;i&gt;My weekly accountability report for my self-study approach to learning data science.&lt;/i&gt;&lt;/p&gt;

&lt;p&gt;This past week has been about downtime, sweet, sweet, downtime. But also returning to old things.&lt;/p&gt;

&lt;p&gt;Even though I don't attend a school or am even following a scheduled curriculum at the moment, when Wednesday of last week rolled around, the day before Thanksgiving, I was felt like I was back in high school. My mind would not settle. Focus laughed in my face while concentration flipped the bird at me. I gave up on trying to be productive and did the holidays with family. After that, the post-holiday chill stretched into the post-holiday weekend lethargy and not a lot happened until Monday kicked me in the seat. I do believe in the power of R&amp;amp;R (I'm too old to believe otherwise) BUT I still wish I had more to report. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--F2KcQPyt--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/vy229fxmlud6u2jyip6b.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--F2KcQPyt--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/vy229fxmlud6u2jyip6b.gif" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;
Me, chilling over the holiday watching my best-laid study plans from afar



&lt;p&gt;In some ways, I've let go of trying to focus too much on what I should know to get hired and instead have given in to the fact that I know quite a bit about some things and nothing about quite a few things. So I'll just keep returning to study on whatever seems most pressing at any given second. Right now that means, reviewing my results and the suggested study materials from Workera. It's not exactly what I would have chosen to focus on but giving it a modicum of effort will hopefully pay off if I score better on retries. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--HiBx36kl--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/srhd5eq0yrbm3ugdamr7.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--HiBx36kl--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/srhd5eq0yrbm3ugdamr7.jpg" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;
Image credit:&lt;a href="https://www.flickr.com/photos/ducttapeavenger/457047490"&gt;Brendan&lt;/a&gt;



&lt;p&gt;I do think feedback from interviewing will direct me towards weak spots and areas to focus on. Or perhaps I don't get any interviews and I may conclude that I should spend more time on portfolio work to get said interviews and worry about passing them at some later point in time.&lt;/p&gt;

&lt;p&gt;So more specifically what does my studying look like? Mostly returning to concepts I've already been exposed to. I reviewed on speed blast the videos from Full Stack Deep Learning and I went through my Coursera Deep Learning notes. Things I covered included: CNN layer mathematics, ResNet and InceptionNet architectures, loss functions, infrastructure for development, training and deployment (serverless functions vs instances vs on-premises hardware). I spent time reviewing this table. We covered it in the 365 Data Science MOOC I did on Udemy, but 9 months later and some freelance work later, it holds more significance than it did earlier. &lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--3AHVxmpz--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/zowi48or35tt7kfa7443.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--3AHVxmpz--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/zowi48or35tt7kfa7443.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;
Image Source:&lt;a href="https://en.wikipedia.org/wiki/Evaluation_of_binary_classifiers"&gt;Wikipedia: Evaluating Binary Classifiers&lt;/a&gt;



&lt;p&gt;I ending up taking the Full Stack Deep Learning assessment test today but haven't heard back yet how I did. &lt;/p&gt;

&lt;p&gt;I'm also returning to B-trees (trying Leetcode this time after doing Hackerrank earlier in the year) because I haven't looked at them in decades. (Fun fact: B-trees are older than me but not by much.) I find it fascinating that I'm reviewing both deep learning (which wasn't even a thing when I took AI in graduate school - I learned about neural nets and expert systems and genetic algorithms) and introductory computer science algorithms.&lt;/p&gt;

&lt;p&gt;Moral of the story, no one &lt;del&gt;ever really dies&lt;/del&gt; No. &lt;del&gt;no one ever really gets it on the first try&lt;/del&gt; Still no. &lt;del&gt;If at first you don't put your computer science degree to use, try try again?&lt;/del&gt;  Eh, maybe take a break now and then and watch The Mandalorian.&lt;/p&gt;
Header image is from my course notes from CSCI 323 - Advanced Artificial Intelligence, Spring 1996



</description>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>selftaught</category>
      <category>devjournal</category>
    </item>
    <item>
      <title>26.11.19 - Testing, testing</title>
      <dc:creator>R Sanjabi</dc:creator>
      <pubDate>Wed, 27 Nov 2019 00:45:00 +0000</pubDate>
      <link>https://dev.to/rsanjabi/26-11-19-testing-testing-2lgk</link>
      <guid>https://dev.to/rsanjabi/26-11-19-testing-testing-2lgk</guid>
      <description>&lt;h4&gt; Test, quizzes, and other measures of knowledge.&lt;/h4&gt; &lt;i&gt;My weekly accountability report for my self-study approach to learning data science.&lt;/i&gt;

&lt;p&gt;I'm still feeling positive about the &lt;a href="https://fullstackdeeplearning.com/"&gt;Full Stack Deep Learning&lt;/a&gt; boot camp. I check the slack channel with alumni postings almost every day, which I'm not usually a Slack person (or more accurately haven't felt a need to be a Slack person previously) so that probably says something about my level of connection with the group. &lt;/p&gt;

&lt;p&gt;And I've been studying for the FSDL Alumni test which I hope to take by the end of the week. This feels like the same sort of studying I would do to prepare for an interview so it's an interesting use of my time, shifting from learning things, or attempting to produce content for portfolio reasons, to a "can I convince someone I know stuff." There are lots of different modes to work through in trying to become a data scientist, and this one is a bit new to me.&lt;/p&gt;

&lt;p&gt;My approach is to cover and review my notes from the Coursera Deep Learning Specialization as well as the lecture slides from the boot camp. One thing I've noticed is that my memory is not what it was the first go-around in graduate school. So that's fun. I'm glad I took mostly good notes; I did go back and redo some of the convolutional ones where clearly I was tired and my basic math is not checking out. I'm reviewing things like activation functions, initialization of weights, CNN and LSTM architecture, and structuring/evaluating projects (which I realize I really, really enjoy).&lt;/p&gt;

&lt;p&gt;This time, I'm transcribing the most salient points into questions into flashcards which I can review in 5 minutes bits of time when I don't have the space to do deeper coding or learning projects. It's interesting that while I draw knowledge from many sources (blog posts, MOOC videos, books (online or otherwise)) I do need to have them in one central spot in order to review/study them. There's a lot I don't like about the Anki cards, but at least they are all together. &lt;/p&gt;

&lt;p&gt;I also tried out &lt;a href="https://workera.ai/candidates/"&gt;Workera&lt;/a&gt;, a deeplearning.ai company, that administers online tests in the ai/data science space.  For self-directed learners I highly recommend it. For jobseekers it's free. Take a handful of tests, get graded, then you are given a course of study to improve and an analysis on what role might be good for you along with a list of job postings and direct referrals that you qualify for. &lt;/p&gt;

&lt;p&gt;The test areas are machine learning, deep learning (opt), data science, mathematics, object-oriented programming, algorithmic coding, software engineering (opt), and communication ability (opt).  The communication ability is a one-way video call, the algorithm coding is a python&lt;br&gt;
coding environment, but the rest are multiple-choice questions. Once you complete them all they will suggest a primary and secondary role in one of the following: Data Analyst, Data Scientist, Machine Learning Engineer, Machine Learning Researcher, Software Engineer - Machine Learning Engineer, and Software Engineer. &lt;/p&gt;

&lt;p&gt;It recommended data scientist as the closest role based on my scores, followed by a data analyst. My top scores were in machine learning, data science, and Software Engineering, while my deep learning, mathematics, and algorithmic coding scores were all low. But it also included study guides (yeah! tell me what you think I should study and maybe, I will and maybe I won't but at least I have some input!). &lt;/p&gt;

&lt;p&gt;I took the tests cold and didn't do so hot, which I was expecting as I've done very little interview prep so far. The algorithm coding included basic things like tree traversal and honestly, it's been over twenty years, so I'm not surprised at my results. But I am excited by some actual concrete feedback and where I'm at and what I've been doing for the last year. I can take the test 2 more times, then try again in 90 days. This works well with what I'm trying to focus on between now and the end of the year, with the plan to applying for work in Q1 of next.&lt;/p&gt;

&lt;p&gt;There are a few job postings listed (a dozen for San Francisco) and if you meet or exceed the company's scores you are eligible for a direct referral. As an entry-level person, anything that allows me a leg up over cold applying on LinkedIn or a company website seems like a great idea. Here's an example of a radar chart for a position listed as a Data Scientist (that Workera classified as machine learning engineer role), showing what the company's performance requirements are in green vs umm, some blue dots that most definitely aren't my scores. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Ol12kwyc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/dvcrfpgr9dcenbfjti3e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Ol12kwyc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/dvcrfpgr9dcenbfjti3e.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is nice even if it gives you an idea of what the recruiter&lt;br&gt;
thinks is the relative importance of a set of skills. That being said there still could be a mismatch going on. &lt;/p&gt;

&lt;p&gt;My general impressions are really positive even if it's still a bit rough around the edges. I'm not convinced it's a perfect example of what people in those specific roles should know, so as a self-taught individual I still take complete and total responsibility for what I'm learning. And as someone who has opted out of traditional schooling on numerous occasions and side-eyes any standardized test, I'm probably not going to be thrilled if this somehow becomes an industry standard. &lt;/p&gt;

&lt;p&gt;Also, I'm new to things, but there were a few things that didn't jive with my experience of the industry. For example, I was a bit surprised at the data science questions that were asked (it seemed focused on more basic probability focused than I was expecting). There were no questions about SQL (that I recall?). I completely bombed the mathematics portion which was mostly linear algebra, a dash of calculus and some functional analysis where I choked on notation. I have been reviewing these within the context of deep learning and machine learning but maybe I need to be able to answer them as stand-alone interview questions. Also, it seemed engineering heavy with a lot less statistics than I was expecting. I'm guessing that this reflects the background of the deeplearning.ai folks who seem to be coming at it from an engineering and research angle. It certainly doesn't match my experience of professional data science twitter for example, or maybe I just follow a lot of statisticians. &lt;/p&gt;

&lt;p&gt;But despite those, I am thrilled to have this asset in my self-directed learning toolbox. If you've used this site, I'd love to hear your experience.&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>selftaught</category>
      <category>devjournal</category>
    </item>
    <item>
      <title>18.11.19 - boot camp</title>
      <dc:creator>R Sanjabi</dc:creator>
      <pubDate>Wed, 20 Nov 2019 05:08:04 +0000</pubDate>
      <link>https://dev.to/rsanjabi/18-11-19-boot-camp-1j99</link>
      <guid>https://dev.to/rsanjabi/18-11-19-boot-camp-1j99</guid>
      <description>&lt;p&gt;&lt;i&gt;My weekly accountability report for my self-study approach to learning data science.&lt;/i&gt;&lt;/p&gt;

&lt;p&gt;Shhhh. I skipped last week. Don't tell on me, ok?&lt;/p&gt;

&lt;p&gt;Where am I at?&lt;/p&gt;


&lt;ul&gt;
&lt;li&gt;The biggest thing is participating in &lt;a href="https://fullstackdeeplearning.com/"&gt;Full Stack Deep Learning&lt;/a&gt;. This was a 2-day boot camp in Berekely. It was intense in a good way and an uncomfortable way. It some ways it was straight forward. Show up, listen, learn and network. But it was a stretch for me (I'm noticing this is a pattern) but I ultimately decided that it was a good one. I met a bunch of nice people and I'm looking forward to putting all the things I learned in to practice. I think it would help me a lot to share what I've learned, so I might try as I'm working through the labs and studying to pick out some highlights from many of the lectures and blog about it.
&lt;p&gt;&lt;br&gt;
It was great because it covered things that I might get in bits and pieces if I'm paying attention to blog posts and tweets. But the advantage of curriculum-based instruction is relying on someone else's expertise of what is relevant to learn. Self-taught you have to make sure you trust your teachers. But the instructors (Pieter Abbeel, Josh Tobin, and Sergey Karayev) were fabulous and the bread and depth of knowledge were excellent.&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
Besides the lectures and labs, one of the features of the boot camp is the ability to take an exam and if you score high enough to access the recruiter network. You have a month and can take the test multiple times. That being said, I haven't looked at the questions yet and I'm not sure if I'll pass. Ultimately I think it's a great idea and there is no penalty for failing. It's not like the time I spend studying for it or taking it will be wasted time.&lt;br&gt;
&lt;/p&gt;


&lt;/li&gt;
&lt;li&gt; My other studies weren't terribly focused last week. Freelancing was good. I'm getting to flex my SQL skills and am playing around with some NLP libraries. In general, both the boot camp and the work I'm doing are making me very aware of the way in which engineering with data and the probabilistic nature of it is quite different from traditional software engineering. And that's requiring an adjustment.
&lt;/li&gt;
&lt;li&gt; Finally, does anyone have any suggestions on how to write blog posts quicker. Or should I just keep carving out more time than I think I should for this?
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>devjournal</category>
      <category>datascience</category>
      <category>machinelearning</category>
      <category>selftaught</category>
    </item>
    <item>
      <title>2.11.19 - Sprinting</title>
      <dc:creator>R Sanjabi</dc:creator>
      <pubDate>Thu, 07 Nov 2019 05:05:25 +0000</pubDate>
      <link>https://dev.to/rsanjabi/2-11-19-sprinting-3ced</link>
      <guid>https://dev.to/rsanjabi/2-11-19-sprinting-3ced</guid>
      <description>&lt;p&gt;&lt;i&gt;My weekly accountability report for my self-study approach to learning data science.&lt;/i&gt;&lt;/p&gt;


&lt;ul&gt;Things that I worked on this week.
&lt;li&gt; &lt;b&gt; Studying concepts and interview questions.&lt;/b&gt;One of my goals was to study flashcards every day and that happened three times. An improvement over the previous week but still short of my goal. I'm doing around 25 questions a day which takes about 15 minutes and the Anki app is cool because once I start mastering the question it will swap those out for new questions. I currently have around 500 questions including Chris Albon's &lt;a href="https://machinelearningflashcards.com/"&gt;Machine Learning Flashcards&lt;/a&gt; but I'm also slowly adding new concepts as I encounter them. I wish I had started this when I first started learning but in some ways, I wasn't ready, since I didn't even have a good idea how to wrap my head around all the things I needed to learn. My decks are divided between Python &amp;amp; Coding, Machine Learning, Deep Learning, and the hand-drawn Machine Learning cards.
&lt;/li&gt;
&lt;li&gt; &lt;b&gt; Freelancing&lt;/b&gt; Work is going well. Learning stuff. Getting to work with another cloud service provider. Also Postgres! It's reassuring that my rusty database skills are coming back. I'm also excited to do some named entity recognition at some point soon.
&lt;/li&gt;
&lt;li&gt; &lt;b&gt;SPRINT!&lt;/b&gt; The big news this week, is that I participated in the &lt;a href="https://github.com/WiMLDS/bayarea-2019-scikit-sprint"&gt;WiMLDS ScikitLearn Open Source Sprint&lt;/a&gt; this weekend. It was, frankly, a stretch for me, but I learned a lot. Mostly around the open-source process, testing processes, and using git and GitHub to open a PR. But also I became more familiar with the types of challenges that open source contributors face. &lt;p&gt;The issue I worked on was getting docstrings in a numpydoc format, first by running a script that flagged inconsistencies and then attempting to fix the problems. But there's only so much I want to try to do on my first attempt. I think I changed three minor things like removing a blank line and rearranging a section. Yet somehow that took the entire day and I had questions at every step. Fortunately, a twitter friend was there and we helped each other with moral support. The day went by insanely fast because I was concentrating so much. &lt;br&gt;
&lt;/p&gt;


&lt;/li&gt;
&lt;li&gt;I didn't do any &lt;b&gt;fast.ai&lt;/b&gt; and I might put that on the back burner at least while I'm doing freelance work. The eternal question is what do I need to know to get my first job in data science AND how do you balance your time, since you can't learn it all. But depending on who you ask, you'll get different answers. While I really enjoy deep learning I think there are more job opportunities looking for a wide variety of machine learning skills.
&lt;/li&gt;
&lt;li&gt;Which brings me to my &lt;b&gt;reading list&lt;/b&gt;. I keep hearing that understanding the production process is critical for data scientists. Vicki Boykis wrote about it in this "oh-so-helpful-to-the-self-study-data-science-learner" article &lt;a href="http://veekaybee.github.io/2019/02/13/data-science-is-different/"&gt;Data Science is Different Now:&lt;/a&gt;

&lt;blockquote&gt;
Along with data cleaning, what’s become more clear as the hype cycle continues its way to productivity is that data tooling and being able to put models into production has become even more important than being able to build ML algorithms from scratch on a single machine, particularly with the explosion of the availability of cloud resources.
&lt;/blockquote&gt;

&lt;p&gt;This article has so many good parts that I keep returning to it every few months or so on my journey. It also seems to confirm &lt;a href="https://www.sharpestminds.com/faq"&gt;Sharpest Minds'&lt;/a&gt; portfolio approach, where mentors help ensure that mentees have a project that demonstrates an understanding of the complete pipeline from data acquisition through deployment. From their FAQ:&lt;/p&gt;

&lt;blockquote&gt;
Normally you'll focus on industry best practices in deploying ML models to production, devops, writing clean code, and doing proper data engineering and data cleaning.&lt;/blockquote&gt; 

&lt;p&gt;Knowing this, I keep looking for opportunities to get better at git, cloud and docker tools, development environments, etc. I've also signed up for &lt;a href="https://fullstackdeeplearning.com/"&gt;Full Stack Deep Learning Bootcamp&lt;/a&gt; which takes place in a few weeks. The fear I always have is that I'll be over my head, but watching a few videos of previous sessions, I'm hopeful that while it will be challenging it won't be overwhelming.&lt;/p&gt;

&lt;p&gt;I have a couple of other links that I've come across recently that I'm putting on my reading list as well. Chip Huyen had a really great thread on blog articles that discuss platforms and deployment:&lt;/p&gt;

&lt;blockquote class="twitter-tweet"&gt;
&lt;p&gt;To learn how to design machine learning systems, I find it really helpful to read case studies to see how great teams deal with different deployment requirements and constraints. Here are some of my favorite case studies.&lt;/p&gt;— Chip Huyen (@chipro) &lt;a href="https://twitter.com/chipro/status/1188650188392390656?ref_src=twsrc%5Etfw"&gt;October 28, 2019&lt;/a&gt;
&lt;/blockquote&gt; 

&lt;p&gt;This O'Reilly book caught my eye. &lt;a href="http://shop.oreilly.com/product/0636920215912.do"&gt; Building Machine Learning Powered Applications&lt;/a&gt; Going from Idea to Product By Emmanuel Ameisen. Finally, Ben Webber is working on a book and is sharing on a chapter by chapter basis as blog posts called &lt;a href="https://towardsdatascience.com/data-science-in-production-13764b11d68e"&gt;Data Science in Production&lt;/a&gt; which I can't wait to start reading.&lt;/p&gt;




&lt;center&gt;&lt;h5&gt;&lt;i&gt;Cover Photo via &lt;a href="https://www.goodfreephotos.com/"&gt;Good Free Photos&lt;/a&gt;&lt;/i&gt;&lt;/h5&gt;&lt;/center&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>devjournal</category>
      <category>datascience</category>
      <category>machinelearning</category>
      <category>selftaught</category>
    </item>
    <item>
      <title>28.10.19 - In which many wheels were spun</title>
      <dc:creator>R Sanjabi</dc:creator>
      <pubDate>Tue, 29 Oct 2019 02:26:15 +0000</pubDate>
      <link>https://dev.to/rsanjabi/28-10-19-in-which-many-wheels-were-spun-33mb</link>
      <guid>https://dev.to/rsanjabi/28-10-19-in-which-many-wheels-were-spun-33mb</guid>
      <description>&lt;h1&gt;And little forward momentum occurred...&lt;/h1&gt;
&lt;br&gt;
&lt;i&gt;My weekly accountability report for my self-study approach to learning data science.&lt;/i&gt;



&lt;ul&gt;Things that I worked on this week.
&lt;li&gt; I reviewed some high-level documentation of Scikit-learn, in order to have some level of preparedness for the WiMLDS sklearn open-source sprint this coming weekend. I've never been involved in open source or a sprint, so I'm excited about that. But I definitely need to do a bit more studying/review in that arena.
&lt;/li&gt;
&lt;li&gt; I did some more fastai. I returned to week three because I wanted to understand better what I was doing. I worked on protein multilabel classification and got more CUDA errors, but was able to figure out a scaling system that seemed to work and I avoided more crashes, by swapping out to resnet34, using smaller images sizes and incrementally increasing them while shrinking the batch sizes. My resulting f-score beat the top public scores, proof that I have no clue what I'm doing. I was at the point where I was reading the forums to understand people's various approaches when the opportunity came to do some freelance work. So I haven't been back to fastai, and I'm not sure exactly when that will happen as the next few weeks are filling up between sklearn, work, and family constraints.
&lt;/li&gt;
&lt;li&gt; Something I wish I understood - why does my VM on GCP sometimes quit? Usually not long after a first launch it? Very mysterious...
&lt;/li&gt;
&lt;li&gt; I did manage to study polynomial regression and pull together several hundred flashcards in machine learning, coding, and statistics. I like being able to review a little at a time in a very methodical way. I don't feel like it's necessary for a job, but I do think it will be helpful for interviews.
&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--FUqgMW5f--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/29fes64h5ptljkfy1sao.png" alt="Alt Text"&gt;
&lt;/li&gt;
&lt;li&gt;There are many things about python that tripped me up when I first started learning it. Like list comprehensions and how iterators work in for loops but once I write them of my own volition I feel like I attain the next level of pythonicness. This weekend I had the opportunity to say "you know what would be good here, a generator" and then I wrote it and I'm honestly wondering why I was scratching my head for so long. Honestly, it's probably because I still have to remind myself that functions are objects.
&lt;/li&gt;
&lt;li&gt;I feel like I should have gotten more stuff done this week, but damn if work &amp;amp; study - life balance isn't hard. Life has a way of demanding my attention, which I don't mind. It's just hard to say no to. I think I will try to make sure I set my alarm every morning and see if I can't get some consistency around that.
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>devjournal</category>
      <category>selftaught</category>
    </item>
    <item>
      <title>21.10.19 - Proteins and Laptop Repair</title>
      <dc:creator>R Sanjabi</dc:creator>
      <pubDate>Mon, 21 Oct 2019 23:55:21 +0000</pubDate>
      <link>https://dev.to/rsanjabi/10-21-19-proteins-and-laptop-repair-27i4</link>
      <guid>https://dev.to/rsanjabi/10-21-19-proteins-and-laptop-repair-27i4</guid>
      <description>&lt;p&gt;Here's my weekly accountability report for my self-study approach to learning data science.&lt;/p&gt;

&lt;h2&gt;Fast.ai Week 3&lt;/h2&gt;

&lt;p&gt;I made progress on the week 3 instruction, which covered datablocks, multi-class labeling, image regression, and image segmentation.&lt;/p&gt;

&lt;p&gt;The foundational class of fast.ai is called a datablock (which is used to create a databunch). The block is essentially stringing together a series of preprocessing steps. Some of the elements include specifying where the data is located, how to import it and organize it in such a way as to be able to label it, how to split into train/validation, how to do transformations and augmentations, and create an object that we can then use for model building. I learned a few things about why last week's Aurebesh project was behaving oddly - namely it was mirroring the letters horizontally to augment the data. This makes sense for classifying cats but not so much for letters. I quickly reran my model from last week and was able to lower the error rate a bit.&lt;/p&gt;

&lt;p&gt;After that, I dug into the multi-class labeling problem by searching Kaggle for datasets to explore. I settled on &lt;a href="https://www.kaggle.com/c/human-protein-atlas-image-classification/overview"&gt;Human Protein Classification&lt;/a&gt; which was a competition from nine months ago (while skipping over the interesting but not-immediately-obvious-how-to-go-about-modeling-it datasets on &lt;a href="https://www.kaggle.com/danofer/dbpedia-classes"&gt;Hierarchical Taxonomy of Wikipedia Articles&lt;/a&gt; and &lt;a href="https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/overview"&gt;The Toxic Comment Classification Challenge&lt;/a&gt;). I was able to get the data into a datablock in part because of the helpful pre-processing kernels for the competition. (Note to self: datasets for competitions will not only be cleaner but have far more forum posts/kernels to learn from). For example, there are 4 files for each of the 4 channels (RGBY) but the green was the most relevant for proteins so, in the interest in getting something up and running, I ignored the other three channels. With that pointer and a few lines of code, I was able to get a model running. On the first attempt, its f-score was 0.63, high enough to place it in the top 40 (out of 2000+ entries) and that was both reassuring and not. Jeremy's comment was if you are in the top 10% you're doing good because the folks there know what they are doing. This was reassuring until I realized &lt;i&gt;I&lt;/i&gt; don't feel like I know what I'm doing. I tried to rerun it using more detailed versions of the images and in the process ran out of memory, getting CUDA errors, and broke the kernel. And then I ran out of time to investigate further.&lt;/p&gt;

&lt;p&gt;Starting this week, I now have to decide whether to spend more time to really grok what's happening or to move forward. I would love to linger and really &lt;i&gt;get&lt;/i&gt; what's going on. But there's also something to be said to keep moving. I do wonder for folks who chose to do self-study how they make those decisions?&lt;/p&gt;

&lt;h2&gt;Other Studies&lt;/h2&gt;

&lt;p&gt;I'm working my way through a Udemy Machine Learning course. It's review for me, but this time I'm taking notes and benefit from having a better perspective of how all the pieces fit together and more solid coding skills. This week was common preprocessing steps, as well as simple and multiple linear regression, using StatsModels and scikit-learn. I made loads of flashcards.&lt;/p&gt;

&lt;p&gt;And today I had the opportunity to chat with an industry veteran to just pick his brain. Grateful for all the senior folks who take 30 minutes out of their day to share what they know. &lt;/p&gt;

&lt;h2&gt;Real-Life&lt;/h2&gt;

&lt;p&gt;A crimp in my learning flow this week occurred when my laptop went from occasionally acting erratically to constantly acting erratically. This required some sleuthing on my part. IT repairs are so not my thing. Gimme data to work with. But like my experience with memory issues, there's a whole ecosystem at play in order to do data science, and as much as I dislike hardware and systems-level issues, it helps to be resilient and knowledgeable enough to work through the problem. In this case a shoutout to all those mom and pop repair shops who put "how to fix your X" YouTube videos out. I was able to not only repair my MacBook Pro this week but also my sewing machine that went kaput mid-Halloween costume assembly. The real lesson (besides having the right tools) is that there are a lot of layers of technical components involved in doing tech. Knowing who to turn to for help while not panicking isn't limited to machine learning models. &lt;/p&gt;

</description>
      <category>datascience</category>
      <category>devjournal</category>
      <category>machinelearning</category>
      <category>selftaught</category>
    </item>
    <item>
      <title>15.10.19 - This Week Brought to You by the Letter Aurek</title>
      <dc:creator>R Sanjabi</dc:creator>
      <pubDate>Thu, 17 Oct 2019 01:22:47 +0000</pubDate>
      <link>https://dev.to/rsanjabi/15-10-19-this-week-brought-to-you-by-the-letter-aurek-53in</link>
      <guid>https://dev.to/rsanjabi/15-10-19-this-week-brought-to-you-by-the-letter-aurek-53in</guid>
      <description>&lt;p&gt;Here's my weekly accountability report for my self-study approach to learning data science.&lt;/p&gt;

&lt;h3&gt;Week 2 Fast.ai&lt;/h3&gt;

&lt;p&gt;I finished week 2 by building a model that predicts which character of the Aurebesh (Star Wars alphabet) a handwritten letter is. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--1VBsPY9T--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/uga3zsl1hv39yk71g0tz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--1VBsPY9T--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/uga3zsl1hv39yk71g0tz.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;
Printed Aurebesh and Corresponding Latin Characters



&lt;p&gt;The model achieved 97% accuracy with a small dataset (a subset of the Omniglot dataset - &lt;a href="http://www.sciencemag.org/content/350/6266/1332.short"&gt;Lake, B. M., Salakhutdinov, R., and Tenenbaum, J. B. (2015). Human-level concept learning through probabilistic program induction.&lt;/a&gt; &lt;em&gt;Science&lt;/em&gt;, 350(6266), 1332-1338.) on a ResNet 34 architecture. But when I deployed it the real-world accuracy was much lower. You can see it in &lt;a href="https://aurebesh.onrender.com/"&gt;action on Render.&lt;/a&gt; And my GitHub repo has both my model and the fast.ai docker container for the frontend.&lt;/p&gt;

&lt;p&gt;Some thoughts (I should just call them &lt;em&gt;#$@#ing issues&lt;/em&gt;, but &lt;em&gt;thoughts&lt;/em&gt; gives the impression that I have some professional distance). &lt;br&gt;
&lt;/p&gt;
&lt;ol&gt;

&lt;li&gt;The sample image of the alphabet is broken. I double-checked index.html, it worked locally and is present in my &lt;a href="https://github.com/rsanjabi/aurebesh"&gt; GitHub repo&lt;/a&gt;. So maybe it's my &lt;a href="https://github.com/rsanjabi/fastai-v3"&gt; docker container&lt;/a&gt; or my Render service? Both frameworks are new to me and I don't have a deep grasp of &lt;em&gt;what&lt;/em&gt; is doing &lt;em&gt;what&lt;/em&gt;. And there is so much to learn that frontend and DevOps ... well, I try a bit and then I need to move on. But hey my model works... kinda...sorta...)
&lt;/li&gt;

&lt;li&gt;

It turns out my model's real-world accuracy is much, much lower. How much lower? I don't know, I didn't bother calculating. I expected it wouldn't be great, so it's not really a surprise. But hey! I made an attempt. I suspect having more than 20 black and white images of each character would help, given that real-world test cases are using color pictures taken in real-world situations.
&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Wucizk3V--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/hs3tl1eumyvobwlii6mm.png" alt="Alt Text"&gt;Say hello to my friend the letter Esk (aka 'E')



&lt;p&gt;I would like to gather more samples of handwritten letters and figure I know enough fans to help me crowdsource say in the realm of scores of samples per letter (vs. 20). Some questions I have. How important is it to get clean letters vs noisy ones? I suspect the more it represents what you would find in the wild the better.   Alternately I could augment what I've got. And I would like to train it on more dirty printed characters as well. &lt;br&gt;
&lt;/p&gt;


&lt;/li&gt;

&lt;li&gt; A nice goal would be to get it to translate complete lines of text, which requires character/line detection. That might help with my accuracy since the training set has characters that take up the bulk of the image percentage-wise and the stuff I've been throwing at it has been less so. That seems like a reasonable constraint to have on the training/test sets and possible next steps for gathering data.
&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--DX1itd9z--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/09lfltio3lujfd61fxng.jpg" alt="Alt Text"&gt;"Warning. Rendering Fat" At least that's what I think it says. Sure would be nice to have a model that would do the translation for me. Source: &lt;a href="https://www.mercurynews.com/2019/05/30/inside-disneylands-star-wars-galaxys-edge-these-50-photos-show-you-what-its-like/"&gt;Mercury News&lt;/a&gt;

 




&lt;/li&gt;

&lt;li&gt;

Finally, Render is pooping out after an image or two and I think it's related to running out of memory, but again, I haven't had a chance to investigate and understand why.
&lt;/li&gt;

&lt;p&gt;I would love to get it to the point where it detects complete written lines of Aurebesh and could decipher it - making it easy to snap pics from the movies, shows, Galaxy's Edge, books, etc. without having to actually know Aurebesh. But all of this is in service to me learning the skills I need to get a job. So we'll see how it fits in with the rest of my studying. For now, Fast.ai encourages a just keep swimming with the uncertainty, so that's my plan.&lt;/p&gt;

&lt;p&gt;And if anyone knows exactly why Render is choking or my image isn't loading or how to design a dataset better, I welcome feedback.&lt;/p&gt;


&lt;/ol&gt;

</description>
      <category>devjournal</category>
      <category>datascience</category>
      <category>machinelearning</category>
      <category>selftaught</category>
    </item>
    <item>
      <title>Self Taught Data Science Journal</title>
      <dc:creator>R Sanjabi</dc:creator>
      <pubDate>Wed, 16 Oct 2019 00:53:19 +0000</pubDate>
      <link>https://dev.to/rsanjabi/self-taught-data-science-journal-3m86</link>
      <guid>https://dev.to/rsanjabi/self-taught-data-science-journal-3m86</guid>
      <description>&lt;p&gt;Learning is hard. &lt;/p&gt;

&lt;p&gt;Finding a community of people to share your trials with can make it less hard. &lt;/p&gt;

&lt;p&gt;So I'm blogging about my experiences. &lt;/p&gt;

&lt;p&gt;I decided to switch careers to data science in January 2019, using a self-study approach to learn the material. I've been doing MOOCs, reading blog posts and books, attending meetups and conferences, and following folks on social media. The funny thing is that I'm finally confident enough to talk about how much I don't know instead of complaining about it to my friends and family. It's only taken 8 months to get here. Guess what, world? &lt;strong&gt;I don't know squat and that's ok!&lt;/strong&gt; Whew, got that off my chest.&lt;/p&gt;

&lt;p&gt;My background is in computer science but it's fair to say it's been &lt;em&gt;years&lt;/em&gt; since I last coded. I left in no small part because I didn't feel I belonged, and it's with some trepidation that I'm returning to tech. I don't know how much the culture has changed for women, but I do know people are at least talking about it. &lt;/p&gt;

&lt;p&gt;I chose data science over some other software engineering paths because I like data. I like asking questions. I like the idea of taking an experiential approach and learning something about the nature of the world. And with data scientists coming from a variety of backgrounds besides computer science, I suppose I hope it will be less problematic for women. It's also a relatively new field with huge demand, I'm hopeful that it will make it easier for someone my age starting out (again).&lt;/p&gt;

&lt;p&gt;What do I plan for this series? I'm hoping to do a very informal weekly writeup of things I'm discovering and learning, but also, where I'm struggling and good use some assistance. It feels nice to share that process. And it might help anyone else who's at the same stage as I am. And anyone who is at the stage right before me can, hopefully, offer guidance and support.&lt;/p&gt;

&lt;p&gt;Thanks for reading!&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>devjournal</category>
      <category>selftaught</category>
    </item>
  </channel>
</rss>
