Showing posts with label online learning. Show all posts
Showing posts with label online learning. Show all posts

Hashing: How and Why to Check a File's Hash Value

Consider the following situation. You have been working for days on a PowerPoint presentation for work or school, and have been keeping the file on a shared computer, a network drive or even a personal flash drive. You put the final touches on your presentation the night before it’s due, save the file and get ready for a good night's sleep. The next day, you confidently begin your presentation. But imagine your surprise when you and your audience see the following image on your third slide:


You’ve been pranked. If you're lucky, everyone got a good laugh out of it. If not, there may be more serious consequences, depending on the situation. This sort of everyday  scenario raises an obvious question. Short of opening the file and manually perusing each slide in the presentation, how could you be sure that it had not been modified by any of the pranksters you may share your computer or network with? More seriously, how can we verify the integrity of a file that may or may not have been modified by a malicious individual seeking to infect out computer or network with a dangerous piece of malware?

In this article, we’ll consider these questions and discuss the pros and cons of one simple means by which we can verify a file’s integrity to ensure that it has not been tampered with, namely, by verifying its hash value. We’ll conclude with a quick tutorial on how to verify a file’s hash value on Mac, Linux and Windows systems, and provide some links to a few lectures on cryptographic hash functions culled from the series of courses listed in our collection of free online computer science courses. Our primary sources along the way will be Everyday Cryptography by Keith M. Martin, and Applied Cryptography by Bruce Schneier.

Malware comes in many different guises. As the Electronic Frontier Foundation writes in their Surveillance Self-Defense Project, malware is frequently spread by "trick[ing] the computer user into running a software program that does something the user wouldn't have wanted." Let's say you decide to download a file from a website you know and trust, and from which you have safely downloaded files in the past. How do you know, for example, that the file you have downloaded onto your computer is in fact the one intended by the trusted website? How do you know it was not altered in transit? How do you know it was not swapped for another file by a malicious attacker? And how can you determine this without running the file first? 

One simple way to verify a file's integrity is by confirming its hash value. In Everyday Cryptography, Martin writes: “Hash functions can be used to provide checks against accidental changes to data and, in certain cases, deliberate manipulation of data . . . As such they are sometimes referred to as modification detection codes or manipulation detection codes” (emphasis in original, Martin, p. 188). In our opening example, a suitable hash function would have allowed you to detect that your presentation had been modified in some way without ever opening it.

So, what is a hash function? The primary practical property of a hash function is that it compresses arbitrarily long inputs into a fixed length output (Martin, p. 189, Schneier, section 2.4). Furthermore, slight differences in the input data result in large differences in the output data. “A single bit change in the pre-image [i.e. the file you’re hashing] changes, on the average, half of the bits in the hash value,” (Schneier, section 2.4). Two of the most commonly used cryptographic hash functions are known as MD5 and SHA1. Schnier quotes NIST’s description of the SHA hash function as found in the Federal Register:
The SHA is called secure because it is designed to be computationally infeasible to recover a message corresponding to a given message digest, or to find two different messages which produce the same message digest. Any change to a message in transit will, with a very high probability, result in a different message digest. (Schneier, section 18.7.)
Here’s a simple example. I have created a plain text file named hello.txt on my Desktop. The file contains a single line that reads: “Hello there.” Applying the well-known sha1 hash function to the file produces the following hash value:
4177876fcf6806ef65c4c1a1abf464087bfbf337.

If I edit the file and remove the period from the end of the line so that it reads “Hello there”, the hash function now returns an entirely different value: 33ab5639bfd8e7b95eb1d8d0b87781d4ffea4d5d.

If I then return the file to its original state by adding the period back in to the end of the sentence, the hash value of the newly edited file will be the same as the original hash. And we would have seen much the same result (though it would have taken a good bit longer to compute!) if my original file had been a copy of the complete works of Shakespeare from which I then removed a period.  

Let’s consider a more practical example. The Electronic Frontier Foundation provides a number of recommendations on how to reduce your risk of malware infection in its Surveillance Self-Defense Project. At the top of their list, we read: “Currently, running a minority operating system [their examples are Linux and  MacOS -ed.] significantly diminishes the risk of infection because fewer malware applications have been targeted at these platforms. (The overwhelming majority of existing malware targets only a single particular operating system.)” This is more security through obscurity than anything else, but it’s still fun to try out new things, so after a bit of reading you decide to download a copy of the latest version of Ubuntu from an online repository.

How can you check to make sure that the file you’ve downloaded is the official one intended by Ubuntu’s developers and has not been manipulated or corrupted in transit? One way is to confirm that the file’s hash value is equivalent to the one provided by the developers. So you go to the page that lists the download’s hash value and make a note of it. Next, you run the hash function on the file you downloaded. If the resulting value is equivalent to the expected one, you have successfully verified the file’s hash.

However, it is critical to note here that verifying a file’s hash value by itself can only establish a relatively weak form of data integrity, in comparison with more robust mechanisms such as digital signature schemes which can provide a stronger form of integrity verification and even authentication. (Martin, pp. 186-189.) This is because a hash value such as we are discussing here cannot tell us anything about the origin of a digital file. For example, assume that unbeknownst to you, the site you’ve downloaded your file from has itself been compromised, and the attacker has: 1) replaced the download file with a piece of malware, and 2) also replaced the corresponding hash value that you use to check the file’s integrity with the hash value of the malware.

If you then verify the hash value of your downloaded file, you have done nothing more than verify the integrity of the malware! And you’re none the wiser because the site itself was compromised! At the same time, however, if you found out through another source that the site and file were compromised, you could then identify the malicious file and distinguish it from the legitimate source file. In a digital signature scheme, as mentioned above, the developer could digitally sign the legitimate hash value with a trusted key. In this way, the question of trust is then displaced to the question of signature authentication.

A second concern regarding this method of determining data integrity is the security of the hash functions themselves. There are known practical and theoretical vulnerabilities in two hash functions that are among the most common in use for these exact purposes on the web today: MD5 and SHA1. A discussion of these vulnerabilities is beyond the scope of the present article, but more information can be easily found online.

Still, as Bruce Schnier states, “we cannot use [one-way hash functions] to determine with certainty that the two strings are equal, but we can use them to get a reasonable assurance of accuracy.” (Schneier, section 2.4). In other words, hash functions can help us establish a basic level of data integrity. In our opening example, simply making a note of the hash and then checking it the next day would have sufficed to establish that the file had been tampered with. But, of course, if the file had been secured or encrypted to begin with, it never would have even been an issue in the first place.

Finally, how does one actually compute the hash value of a file? It is actually rather simple, but the specifics depend on your choice of operating system. MacOS and Linux systems come bundled with basic functionality to check any file’s hash value, while Microsoft Windows systems require you to download a piece of software to accomplish the task. Two of the most common functions used to verify file hashes are known as MD5 and SHA1. We’ll consider each in turn.

MacOS
1) Open up a command line Terminal.
2) Type “openssl md5 </path/to/file>” into the terminal and press enter.
2A) As an alternative to #2, you can also type “openssl md5 ” into the terminal, then drag and drop the target file into the Terminal window, and press enter.
3) The terminal will then return the MD5 hash value of the given file.

To compute the hash value of the file using a different hash function, type the name of that function into the terminal command in place of “md5”. For example, to compute the sha1 hash of a file, you would type: “openssl sha1 ” followed by the file path. To see a list of all the message digest commands available on your machine, type “openssl —help” into the command line terminal.

Linux (Debian-based)

1) Open up a command line Terminal.
2) Type: “md5sum </path/to/file>”. Then press enter.
3) The terminal will return the MD5 hash value of the given file.

To compute the hash value of the file using a different hash function, type the appropriate command into the terminal in front of the path to the target file. For example, “sha1sum </path/to/file>” will compute the file’s sha1 hash value. To see what other hash functions are available on your system, type “man dgst” into the terminal. 

Windows
Windows systems apparently do not come bundled with a built-in utility to check hash values. However, there are a number of different pieces of software you can download to accomplish the task. Microsoft Support lists the File Checksum Integrity Verifier, but warns that this is not supported by Microsoft and is only of use on Windows 2000, Windows XP and Windows Server 2003. This discussion at superuser provides a number of different extant options.

Video Lectures on Hash Functions
As always, comments, questions, suggestions and angry tirades are welcome below.

Python: What to do after you've finished Code Academy?

One of the most common beginner questions I see in Python programming forums is from people asking what to do once they’ve completed Code Academy’s Python training course. The possibilities are virtually limitless and may seem overwhelming at first, especially if you have little prior programming experience. It’s a bit like finding yourself in a foreign country where you know enough of the local language to just get by, but not enough to really find your way around. I certainly don’t claim to have the map, but in this post I’ll try to point out a few landmarks that might help some folks get their bearings. 

The Code Academy course covers the basics of Python syntax and data structures, and provides a quick introduction to more advanced topics like the use of list comprehensions, bitwise operators, classes, file input/output etc. Along the way, the student also completes a handful of small projects to demonstrate how this newly acquired knowledge can be put to use, for example, a pig latin translator, a Battleship game simulator and so on.

But the big question is: what next?! There is no straightforward answer to this question, as it depends on a number of highly individual variables such as your level of prior programming knowledge and experience, your interests, your goals and motivations for learning programming in general and Python in particular, not to mention the amount of time you are able to devote to study and practice, to name just a few. For the sake of simplicity, what follows is targeted to a beginner who has recently completed an introductory Python crash course such as Cody Academy’s Python Track, or Learn Python the Hard Way, has little or no prior programming experience, and can devote a modest amount of time to study and practice on a regular basis. 

As a natural language instructor, I can almost always tell when a student has not done any homework or practice over a long weekend: they are already starting to get rusty! Probably the single most important thing to do after an introductory course like Code Academy’s is to reinforce the lessons learned, and to do it on a consistent basis. This will help to shore up all that newly acquired knowledge and provide a sturdier basis to extend it and expand on it. This could be anything from reading a textbook to watching a series of lectures, or following along with another tutorial, exploring other areas of the Python universe, tinkering with your own little programming projects, or some combination of these, or even all of the above. 

You’ll likely also find that these activities are themselves mutually reinforcing: while working on your own projects, you’ll realize when you’ve hit a wall and need to consult some documentation or a textbook, or seek out a new library or framework to help achieve your goal; reading through a textbook you’ll be exposed to new ideas that you can experiment with in the interpreter or in your own little projects; working through a tutorial, you might find a piece of code that interests you and which you start to tweak on your own to see how it works and to experiment with extending it or expanding on it in some way, shape or form.  

If Code Academy was your first exposure to programming in general, it might be a good idea to consider working through a general introductory textbook (or even an introductory course!) on computer science. This will provide you with a basis in the fundamentals of the discipline as a whole, things that are more or less the same across all programming languages. 

So far as introductory textbooks go, many people, myself included, highly recommend Think Python: How to Think Like a Computer Scientist, which is freely available online.   This book is required reading for a number of well known introductory computer science courses such as MIT’s Introduction to Computer Science and Programming, and was written for the express purpose of introducing students to the field. It is highly readable, provides a review of basic syntax and covers intermediate as well as more advanced topics, along with a series of chapters on object-oriented programming and design. 

Along similar lines, if you have the time to devote to it, I highly recommend MIT’s Introduction to Computer Science class. All the lectures, recitation sections and course materials are freely available online in their entirety, and the course uses Python as its pedagogical language of choice. For more information on this course, see our previous post Teach Yourself Python in Less Than Four Months, which provides a learning plan that uses the MIT course as its guide.  

Okay, but what if you are not the type who likes to curl up with a good textbook, and don’t have the time to slog through a college level introduction to computer science course, but want to delve more deeply into Python itself? What then? In this case, you might consider working through another general introductory tutorial on Python programming, this will help consolidate the knowledge you’ve already gained and also likely expose you to more beginner and intermediate level aspects of the language and the programming process. There are tons of such resources available online. Here are a few that I've found quite helpful:
"Bah," some may say, "I'm bored of mechanically typing out tutorial code! I want to experiment, but I'm not sure where to begin!" Not to worry, there's tons of stuff out there, you just have to know where to look. For those who want to jump right in to real problem solving, your first stop should be the Programming Mega Project List. This is a list of around 100 practical programming projects that can be solved in any programming language. The difficulty level of the various projects ranges from beginner to advanced and the list broken down into basic categories such as math, algorithms, networking, text, web, and so on. Find one that interests you, tackle it and repeat.

Other people may find it rather uninteresting to solve problems for the sake of problem solving, and would rather explore Python itself. The standard library is your friend! One of the great things about programming is that it can make your life a whole lot easier. If you stick with programming, one thing you will learn rather quickly is that programmers are lazy and proud of it. If I had a dime for every time I’ve come across a talk or article on programming which proclaimed that programmers are lazy, I’d probably have like $10 by now.  I guarantee there is some absurd, repetitive task that you have to complete on a regular basis that can be automated with a relatively simple Python script. For these everyday routines, there is also very likely a standard library module that can aid you in your endeavor. Relevant xkcd:


Maybe you work in an office and have a tedious spreadsheet task you have to complete on a regular basis. Automate it with the csv module. Perhaps you’re in a band and hate writing up set lists, write a random set list generator with the random module. Maybe you like sports or finance, and are constantly looking up scores or quotes. Write a command line app to grab them from your favorite online source using urllib without having to open a browser.  If you’re a news junky, you could consider writing your own RSS headline aggregator with urllib and one of the XML modules. The possibilities are literally limitless. 

Last but not least, as a beginner Python programmer, you will most definitely want to begin checking out the many great frameworks that have been built around the language.  "A software framework is a universal, reusable software platform to develop software applications, products and solutions," says Wikipedia. At the most basic level, a software framework is a library or set of libraries that provide generic functionality for routine tasks to aid in the development of applications and programming projects. In the Python universe there are tons of frameworks to explore, such as web frameworks for the development of web applications, GUI frameworks for development of graphical user interfaces for desktop applications, and so on. Some of my favorites:
Well, that concludes our tour of some noteworthy landmarks in the Python programming space. As always, feel free to provide your own favorite resources or suggestions in the comments.

Unit Testing and Test-Driven Development in Python

There are both advantages and disadvantages to being self-taught in any given discipline. In certain cases, the advantages and disadvantages can overlap or even coincide. For example, when you are self-taught, you are not confined by institutional structures and courses of study. On the one hand, this allows for a distinct measure of freedom to pursue one’s own interests in the field, which would not necessarily be afforded to a person following a traditional disciplinary curriculum. On the other hand, this also means that it can be quite easy to develop gaps in one’s basic knowledge of the discipline, for the simple reason that these areas of study did not fall within your area of interest.

I discovered one such gap in my study of programming in general, and Python in particular, a number of months ago when I came across a quote online that went something like this: “Code that is not tested is broken by definition.”  Testing? “You mean running the code to see if it works?” I thought to myself. Within the next hour I had my first exposure to the method of test-driven development and the Python unittest module.

This was literally the exact opposite of how I had approached my own programming projects up until then, which might be termed “error-driven development”: write some code; run it; see if it works; if it doesn’t work, tinker at random until it does; write some more code and repeat. I quickly realized that, according to the above quote, all my code was broken, by definition. 

The test-driven development model is the reverse of this: write a test, run it and watch it fail; write some code to make the test pass; refactor; write another test and repeat. It was an enlightening experience to attempt writing even a simple program under a test-driven model, as it was immediately obvious that I had only the vaguest notions about things that I thought I knew fairly well.

Since then, I’ve re-written a number of programs I’d created for myself under a completely test-driven developmental model, and have integrated testing into my everyday coding practice. I’ve also collected a bunch of resources that I've found helpful along the way, which you can find below. Also, as you may know, of late there has been something of a controversy brewing on the merit and value of test driven software development. Some links on this are supplied at the end. As always, further recommendations are welcome in the comments!

Overview of Test-Driven Development (Video Lectures)

Unit Testing in Python (Video Lectures)

Python Unittest Module Docs

Python Unittest Intro Tutorials

Test Driven Development in Python

Unit Testing Today

Online Learning: An Intensive Bachelor's Level Computer Science Program Curriculum, Part II (Updated - Dec 2020)

Last month, we published a piece providing a basic template for a bachelor’s level computer science curriculum composed entirely from college or university courses that are freely available online. To date, this has been the most popular post on the blog, and we received a ton of great feedback, both positive and negative, in the comments and from around the web.

The original post was based on a learning plan that I had worked out for myself after I jumped into the study of programming and computer science just over a year ago on something of a whim. As I’ve mentioned before, I do not have any formal background in computer science beyond the handful of courses from this list that I have worked through myself. However, I do have years of experience in teaching and in curriculum design for natural and foreign language acquisition at the college level, and consulted the computer science curricula from a number of universities around the country when putting the plan together.

The idea was not to provide a substitute for an actual college or university education (that would typically also require a large amount of alcohol at the very least, which, unfortunately, is not freely available online), but rather to aggregate resources that have been made freely available online from disparate institutions and organize them into the sort of logical structure one would likely find in a general bachelor’s level computer science program.

On the basis of the feedback from that post, we’ve put together a new list of course offerings that covers a lot more ground. In the process, I’ve also loosened up a number of implicit strictures on resources for inclusion in the present listing. For example, some of these courses require registration at a particular website and/or may not yet be available in full (ex. Coursera), a couple others are actually compiled from other resources freely available online (ex. Saylor). But all of them are still free.

Whereas the first post was intended to provide a general overview of the field along with a generic curriculum and necessary resources suitable for an absolute beginner (containing 27 courses altogether), the present listing is much more extensive and intensive in scope representing 72 courses from 30 different institutions. While we have added a number of new introductory level courses, there is a lot more that may be of interest to intermediate level folks and perhaps even some who are highly advanced and are considering a refresher course or two.

The course listing is broken down into three major divisions: Introductory Courses, Core Courses and Intermediate/Advanced Courses.  Individual courses are then listed by category within each division. 

Last but not least, thanks to everyone who provided feedback and offered suggestions on how to improve the original listing. Special thanks to Pablo Torre who provided a ton of links in the comments to the first post, many of which are included here. 


Introductory Courses 

Intro to Computer Science:
Mathematics:
Programming:
Theory of Computation:
Data Structures and Algorithms:

Core Courses 

Theory:
Algorithms and Data Structures:
Mathematics:
Operating Systems:
Computer Programming:
Software Engineering:
Computer Architecture:
Data Management:
Networking and Data Communications:
Cryptography and Security:
Artificial Intelligence:

Intermediate and Advanced Courses

Algorithms and Data Structures:
Systems:
Programming:
Software Engineering:
Mobile App Development:
Web Development:
Databases and Data Management:
Security:
Cryptography:
Artificial Intelligence and Machine Learning:
Natural Language Processing:
Digital Media:
Networking and Communications:
Statistics and Probability:
Leave any suggestions for improvements or additions in the comments!

Online Learning: A Bachelor's Level Computer Science Program Curriculum (Updated - Dec 2020)

Introduction
[Update: See also the follow-up post to this piece, An Intensive Bachelor's Level Computer Science Curriculum Program.]

A few months back we took an in-depth look at MIT’s free online Introduction to Computer Science course, and laid out a self-study time table to complete the class within four months, along with a companion post providing learning benchmarks to chart your progress. In the present article, I'll step back and take a much more broad look at com-sci course offerings available for free on the internet, in order to answer a deceptively straightforward question: is it possible to complete the equivalent of a college bachelor’s degree in computer science through college and university courses that are freely available online? And if so, how does one do so?

The former question is more difficult to answer than it may at first appear. There are, of course, tons of resources relating to computer science and engineering, computer programming, software engineering, etc. that can easily be found online with a few simple searches. However, despite this fact, it is very unlikely that you would find a free, basic computer science curriculum offered in one complete package from any given academic source. The reason for this is fairly obvious. Why pay $50,000 a year to go to Harvard, for example, if you could take all the exact same courses online for free?

Yet, this does not mean that all the necessary elements for such a curriculum are not freely accessible. Indeed, today there are undoubtedly more such resources available at the click of a button than any person could get through even in an entire lifetime of study.  The problem is that organizing a series of random lecture courses you find on the internet into a coherent curriculum is actually rather difficult, especially when those courses are offered by different institutions for different reasons and for considerably different programs of study, and so on. Indeed, colleges themselves require massive advisory bureaucracies to help students navigate their way through complicated degree requirements, even though those programs already form a coherent curriculum and course of study. But, still, it’s not impossible to do it yourself, with a little bit of help perhaps.

The present article will therefore attempt to sketch out a generic bachelor’s level curriculum in computer science on the basis of program requirements distilled from a number of different computer science departments at top universities from around the country.  I will then provide links to a set of specific college and university courses that are freely available online which, if taken together, would satisfy the requirements of our generic computer science curriculum.

A Hypothetical Curriculum
So, what are the requirements of our hypothetical computer science program?  Despite overarching similarities, there are actually many differences between courses of study offered at different colleges and universities, especially in computer science.  Some programs are more geared toward electrical engineering and robotics, others toward software development and programming, or toward computer architecture and hardware design, or mathematics and cryptography, or networking and applications, and on and on.  Our curriculum will attempt to integrate courses that would be common to all such programs, while also providing a selection of electives that could function as an introduction to those various concentrations. 

There are essentially four major parts to any bachelor’s level course of study, in any given field: pre-requisites, core requirements, concentration requirements and electives. 

Pre-requisites are what you need to know before you even begin. For many courses of study, there are no pre-requisites, and no specialized prior knowledge is required or presumed on the part of the student, since the introductory core requirements themselves provide students with the requisite knowledge and skills. 

Core requirements are courses that anyone in a given field is required to take, no matter what their specialization or specific areas of interest within the field may be.  These sorts of classes provide a general base-level knowledge of the field that can then be built upon in the study of more advanced and specialized topics.

Concentration requirements are classes that are required as part of a given concentration, focus or specialization within an overall curriculum.  For example, all students who major in computer science at a given university may be required to take two general introductory courses in the field, but students who decide to concentrate on cryptography may be required to take more math classes, while students interested in electrical engineering may take required courses on robotics, while others interested in software development may be required to study programming methodologies and so on.

Finally, electives are courses within the overall curriculum that individuals may decide to take at will, in accordance with their own particular interests.  Some people may prefer to take electives which reenforce sub-fields related to their concentration, while others may elect to sign on for courses that may only be tangentially related to their concentration.

Our hypothetical curriculum will simplify this model. We will assume no prerequisites are necessary other than an interest in learning the material and a basic high school education.  Our curriculum will also not offer any concentration tracks in the traditional sense, as that would require specialized resources that are not within the scope of our current domain.  Instead, our planned curriculum shall provide for introductory courses, general core requirements, and a choice of electives that may also serve as a basis for further concentration studies.

Basic Requirements
A quick survey of curricular requirements for programs in computer science at a number of the country’s top colleges and universities reveals a wide spectrum of possibilities for our proposed curriculum, from a ten course minor in computer science to a twenty-five course intensive major in the field along with an interdisciplinary concentration. (See, for example, MIT, Carnegie Mellon, Berkeley, Stanford and Columbia, or the comp-sci page for a college or university near you.) 

Our proposed curriculum will attempt to stake out a space between those two poles, and aim for a program that consists of about 15 courses: 3 introductory classes, 7 core classes and 5 electives. The required topics and themes of a generic computer science degree program are fairly easy to distill from the comparison: introduction to the field, data structures, algorithms, programming languages, operating systems, networking, data communications, systems engineering, software development, and so on.  Our program will consist of university or college level courses from around the world that cover our basic requirements and are freely available in full online.

Note: I have, unfortunately, not watched every single video from all of the courses below.  However, I have completed three of them in full, viewed a handful lectures from a number of the other courses, and spot checked the videos from the rest for quality. 


Introductory Courses 

Intro to Computer Science, pick two of three:
Basic mathematics, pick one of two:

Core Courses 

Data Structures and Algorithms, pick one of two:
Operating Systems:
Programming Languages and Methodologies:
Computer Architecture:
Networking:
Data Communications:
Cryptography and Security:

Electives

Web Development:
Data Structures:
Systems:
Programming Languages:
Security:
Cryptography:
App Development:
Artificial Intelligence:
Graphics:
Math:
Leave any suggestions for improvements or additions in the comments!

UPDATE: There has been a ton of great feedback on this post, with suggestions for additions, critiques of the overall form, identification of "glaring holes" and more.  Thanks everyone!  However, rather than address them one by one in the comments, or include them all into an update of some sort, I think I may just begin work on a new version of the piece which provides a more intensive track of study and tries to incorporate as many of those suggestions as possible, assuming that examples of such courses are available for free in full online from a college or university.  So be sure to check back in future!

UPDATE II:  See also the companion post to this piece, An Intensive Bachelor's Level Computer Science Curriculum Program.

Online Learning: Free Lecture Courses on Data Communications, Networking, Cryptography and Computer Security

I've been meaning to bring these resources together into a post for some time now.  There are a ridiculous number of free university level courses on communications, networking, cryptography and computer security available online.  Here are some of the better courses, lectures and video tutorials that I've come across over the last six months, all of which are appropriate for people who are looking for in depth introductions to these fields, or more experienced folks who would like a refresher on the fundamentals.

Lecture Series


Steve Gordon's Lecture Courses
Steve Gordon is an Associate Professor at Sirindhorn International Institute of Technology (SIIT), Thammasat University, Thailand.  On his Youtube page, you can find four complete lecture series on Security and Cryptography, IT Security, Data Communications and Networks, and Internet Technologies and Applications

•  Introduction to Cryptography
Christof Paar, a Professor at Ruhr University, Bochum Germany, provides an introduction to modern cryptography in this series of 24 lectures. 

•  Cryptography and Network Security
Prof. D. Mukhopadhyay, from the Department of Computer Science and Engineering at the Indian Institute of Technology provides a broad introduction to Cryptography and Network security in this series of 41 lectures.  Production quality could be better, but the video lectures are substantive in nature.

•  Computer System Engineering
This undergraduate course, taught by Prof. Robert Morris and Prof. Samuel Madden from MIT, covers the basics of networking and computer security.  The first few lectures are not available.  But the units on networking and cryptography are available in full beginning with lecture 9.  

•  Fundamentals of Computer Networking 
This series contains over 30 lectures by Professor Parviz Kermani Department of Electrical & Computer Engineering at Manhattan College, and provides an in depth introduction to the basics of computer networking.


Miscellaneous Video

•  Whitfield Diffie on the History of Public Key Cryptography
•  Google Tech Talks on Cryptography (Assorted lectures and seminars from the Google Tech Talk series relating to cryptography and computer security)
•  Intro to Network Scanning (Basic introduction to network scanning tools)
•  Intro to Pentesting (10 short tutorials)

50 Python Resources for Beginner and Intermediate Programmers

This is the third post in our recent series for beginning Python programmers.  In the first post, I detailed a self-study time table for beginner Python programmers.  The second post then laid out learning benchmarks for the project on the basis of MIT's Introduction to Computer Science course.  Today's installment provides a categorized list of Python resources for beginner to intermediate programmers.  Add any others you've found helpful in the comments and I'll update the list.  Enjoy!

Textbooks
Think Python: How to Think Like a Computer Scientist
The Art and Craft of Programming: Python Edition
A Byte of Python
Code Like a Pythonista: Idiomatic Python
Python Programming WikiBook
Python Style Guide
The Hitchhiker's Guide to Python
Building Skills in Python: A Programmer's Introduction to Python

Tutorial Textbooks
Learn Python the Hard Way
Dive Into Python
Hacking Secret Ciphers with Python
Invent Your Own Computer Games with Python
Making Games with Python and Pygame
A Beginner's Python Tutorial: Civilization IV

Intro Web Tutorials
Learn Python in Ten Minutes
Code Academy: Python Track
Python-Course: Intro to Python
Google Developers: Python Introduction
pGuides: Python
New Coder Python Tutorials
Tutorials Point: Python

Video
Python Video Index
43 Short, Targeted Intro Python Video Tutorials 
A Hands-on Introduction to Python for Beginning Programmers 
Python for Programmers: A Project-Based Tutorial
Google Developers' Python Class
Learn Python Through Public Data Hacking
Growing Python with Spreadsheets
Python for Hackers: Networkers Primer

Targeted Web Tutorials
How to Use the Reddit API in Python
Intro to Python Web Scraping
Python Network Programming
Sockets in Python: Into the World of Python Network Programming
Sockets Programming in Python
Python gnupg (GPG) Example

GUI Programming
An Introduction to Tkinter
Getting Started with wxPy
Creating an Application in Kivy 

Web Programming
Hacked Existence Full Django Website Tutorial Series
How to Tango with Django

Targeted Textbooks (Advanced)
Natural Language Processing with Python 
Data Structures and Algorithms with Object-Oriented Design Patterns in Python 

Reference
Python Standard Library
Python Package Index
Effbot Guide to the Python Standard Library
Python Module of the Week
Python Cheat Sheet (quick reference guide)
Ivan Idris' Almost a Hundred Python Resources

Projects and Sample Code
Karan's Python Mega Project List
Active State: Popular Python Recipes

Benchmarks: Teach Yourself Python in Less than Four Months, Part II

In the first post of this series, I developed a self-study time table for beginner Python programmers, using MIT's free online Introduction to Computer Science course as my general guide.  The present article will look more closely at the MIT course to set up learning benchmarks on the basis of the course's problem sets and quizzes.  The next post in the series will provide links to related but alternative text and video resources available for free on the web.

Before jumping in though, let's take a step back for a moment.  MIT is consistently rated one of the top universities in the world for computer science and information systems.  It's courses are challenging, to say the least. But, more importantly for the current context, we should also keep in mind that its courses are geared toward students who are completing degrees in engineering, physics, biology, chemistry and the like. And this orientation is reflected in its Introduction to Computer Science class in the course's focus on scientific computing, and in the choice of topics emphasized in its problem sets and quizzes.

Working through the course on their own, many people may find this aspect of the course intimidating or uninteresting, or simply irrelevant to their own individual learning goals.  The next post in this series will therefore provide alternative resources to supplement the course materials that may prove of interest to people whose primary focus is not on scientific computing. 

In the previous post, I worked out a time table for completion of the course which began by assuming a person would devote 10 hours of work to this project every week, and  finish the course in 15 weeks.  (You can also consult that post for alternative time lines.)  That ten hour weekly work load was broken down in the following manner:

      • Watch the lectures (2 @ 50 mins): ~2 hours
      • Textbook and background reading: ~2 hours
      • Recitation/discussion video tutorial: ~1 hour
      • Homework problems and exercises: ~2 hours
      • Free study tutorials or reading: 1-2 hours
      • Free study independent projects: 1-2 hours

With that in mind, let's set up some learning benchmarks using the course's 3 quizzes as our primary guide, with the interstitial space filled in by its 12 problem sets.  To begin, let's note when the quizzes are scheduled and what topics they cover.  Then we'll have a general sense of how much work-time is necessary to grasp those topics.

Quizzes

Quiz #1 follows the 9th lecture in the course.  You can find its topic list here.  It covers:
  • • Basic computer science terms and definitions: syntax, semantics, straight line vs. branching vs. looping programs etc.
  • • Basic aspects of Python: values, types, expressions, statements, control-flow, functions, scope.
  • • Basic algorithmic techniques: guess and check, linear, bisection, approximation, Newton's method.
  • • Binary representation of numbers
  • • Debugging protocols
  • • Orders of growth  
Quiz #2 follows the 19th lecture in the course.  You can find its topic list here.  It covers:
  • • Big O notation and orders of growth
  • • Sort and search methods and algorithms
  • • Python: values, types, (im)mutability, control flow, functions/methods, recursion, objects/classes, simulations
  • • Basics of statistics: standard deviation, confidence, linear regression
  • • Data abstraction, debugging
Quiz #3 is the course final, and follows the 26th lecture in the course.  You can find its topic list here.  It covers those topics found in the first and second quizzes, and it adds the following:
  • • Call stacks, exceptions, polymorphism
  • • Algorithms: divide and conquer, basing, orders of growth
  • • Simulations
  • • Basic statistics and computational models
  • • Optimization strategies  

Problem Sets

The course also has 12 problem sets.  Here we'll simply note when each is due, and what it covers:
  • 0) Due lecture 2: install python, set up IDLE, write a basic program to get user info, print out that info
  • 1) Due lecture 4: simple debt calculator, bisection search
  • 2) Due lecture 6: successive approximation and a word game, i.e. Newton's Method and Hangman
  • 3) Due lecture 7: debugging, implementing two versions of game introduced in lecture
  • 4) Due lecture 10: implementing a version of the Caesar Cipher
  • 5) Due lecture 12: implementing an RSS feed filter
  • 6) Due lecture 14: simulating a Roomba, using classes
  • 7) Due lecture 16: simulating spread of disease and virus population
  • 8) Due lecture 18: optimization, topic cont'd from previous assignment
  • 9) Due lecture 20: schedule optimization
  • 10) Due lecture 22: clustering to analyze census data
  • 11) Due lecture 24: optimization, finding most direct route between two points
Note: Detailed information on each of the problem sets can be found on the page for the lecture when that problem set is due.  So, the information for problem set 0, which is assigned in lecture 1, is actually on the page for lecture 2

Benchmark Summary

Let's now cross-reference the quiz schedule with the problem set schedule, and estimate the number of hours necessary to complete those assignments on the basis of our time table above.  We see that:
  • quiz #1 coincides with problem set #4 and lecture 10: ~ 40-50 hours of work
  • quiz #2 coincides with problem set #9 and lecture 20: ~ 90-100 hours of work
  • quiz #3 coincides with problem set #11 and lecture 26: ~ 150 hours of work
In the next post in this series, I will detail alternative text and video resources that can be used to supplement the materials offered in the MIT course.  

Online Learning: Teach Yourself Python in Less Than 4 Months, Part I

The purpose of this article is to lay out a general time management template for anyone who wants to jump in to programming and computer science with little or no experience in the field.  A future article will flesh out the details, providing links to learning resources and other materials freely available online.  [Edit: See the second article in the series, which covers learning benchmarks for beginner Python programmers.]

For starters, I should say up front that I do not have any formal background in Computer Science. I'm a language teacher by trade and training, and never really considered myself a "computer person."  But some time back, after expressing some interest in programming to a programmer friend, he challenged me to try and pick up a programming language. The gist of his argument was fairly straightforward: if you can understand English, with a bit of effort you can understand a programming language, it's just syntax and semantics.  That made it sound pretty simple, and my interest was piqued, so I set to work. 

After doing a bit of background research, I decided that I would focus on the Python programming language, using MIT's Introduction to Computer Science and Programming course – all the materials for which are available for free online – as my general guide.  I finished that course within three months, supplementing it with tutorials and readings that were more in line with my own particular interests. The skills and knowledge that I acquired in that time have proven to be indispensable in my daily life, for both work and play, so much so that I wonder how it is that I was able to get by for so long without them! 

As stated above, I do not have any formal background in computer science.  However, I have over ten years of experience in planning, developing and teaching natural language learning curricula, from task-based lessons to overarching course goals, in two languages.  This article will lay out a general time-plan for self-guided study of the Python programming language for absolute beginners, using the MIT Introduction to Computer Science class as its overarching framework and scaffold.   

To begin our assessment, let's take a closer look at the MIT course. The class has 26 lectures, each about 50 minutes long, for a total of 1300 minutes, or 21 total hours of time, less than a single day.  In theory, you could easily blow through the whole course's lecture series over a long weekend, if you did it like it was your job, or a marathon of your favorite television series on Netflix.

Obviously, that does not mean you can learn all the material covered in those lectures in a three day period.  The process of learning requires things to sink in, as it were, and that just takes time.  Furthermore, it just wouldn't make any sense to simply blow through all the lectures in this way, because we still have to account for the recitation/discussion sections associated with the course, as well as for the independent study necessary to complete homework assignments, which would be normal for any university course. 

In a serious course of study at any college or university, and even for graduate level work,  disciplined students should expect to devote around ten hours a week to study for each course they take.  Assuming a full time work week of 40 hours, this would make taking four college or university classes the labor equivalent of a full time job. 

To begin working out our time table, let's therefore assume that a person should devote 10 hours a week to this project.  A college semester is about 15 weeks long, so that comes out to 150 hours of total work to successfully complete a course that like offered by MIT.  Assuming you did nothing else except this, as if doing the work for this single course were a full time job at 40 hours a week, you could complete it within a month. This is doable, but very intensive. To finish in 3 months, you'd have to devote 12-13 hours to it a week.  To finish in six months, you would have to spend 6-7 hours on it a week.  To finish it in a year's time, you could spend just 3-4 hours of work on it a week. 

For the sake of simplicity, let's assume that we have 10 hours a week to devote to this project, taking our benchmarks and cues from the syllabus for the MIT course.  (We'll work out alternative time lines at the end of the post.)  What do we do with all this time?  The answer is deceptively simple:  watch the lectures, read, do tutorials and exercises, and begin work on your own individual programming projects.  Let's flesh this out a bit.   

With 26 lectures at 50 minutes each, that comes out to 100 minutes of lectures a week, the equivalent of the time you might spend watching a bad movie you wish you hadn't watched to begin with.  In a university course, each week you are also going to spend around another hour in your discussion/recitation section, reviewing materials covered in the corresponding lectures.  That leaves us with around 7 hours and 20 minutes of time for independent study.  How should one spend that time?  Reading, research and practice. 

Let's assume that in a given week, the professor covers more or less the same materials that can be found in the course textbook, in more to less the same amount of time that it would take you to read those sections of the text(s).  So now we have a ballpark figure of 1.5 hours to devote to reading, leaving us with just under 6 hours of time left for the week. 

Doing the reading is not an end in itself, there are also homework assignments that need to be completed. In the MIT course, the homework and problem sets reinforce the lessons covered in the lectures. However, as you complete such exercises, you will find that there are things in the textbook or from the lecture that you did not understand, or you will come across a problem that requires looking into something that has not yet been covered in the lectures or readings at all, and you will therefore have to inquire into these things a bit more closely. So homework will also necessitate more reading, research and tutorials.

Let's assume that doing the homework requires about as much time as you would normally spend in class including discussion section, around 2.5 hours.  We're now left with 3.5 hours of free study time to do with as we please.  This can be spent doing more background reading, tutorials, exercises, or working on one's own little programming projects. 

So here's our plan for 10 hours of work a week, to complete the course in about 15 weeks:
   • Watch the lectures (2 @ 50 mins): ~2 hours
   • Textbook and background reading: ~2 hours
   • Recitation/discussion video tutorial: ~1 hour
   • Homework problems and exercises: ~2 hours
   • Free study tutorials or reading: 1-2 hours
   • Free study independent projects: 1-2 hours

Let's break this down even further.  For each 50 minute lecture, one should do:
  • 1 hour of reading
  • 30 minutes of recitation/tutorial videos
  • 1 hour of problems or exercises
  • 1 hour of targeted external tutorials
  • 1 hour on your own little project(s)

Assuming you were to devote 90 minutes a day, 3-4 days a week to this project, within 4 months, you will have watched all the lectures from the course, read a couple books, done tens or hundreds of problems, completed a number of tutorials, done a lot of online (re)searching, and created a bunch of your own little programs, putting in 150 hours of work.

Doing 90 minutes a day, 2 days a week, it would take 50 weeks, just under a year, to complete the course.  Doing 60 minutes a day, 3 days a week is the same, of course.    

Doing 90 minutes a day, 3 days a week would take 33 weeks, about 8 months.

Doing 90 minutes a day, 5 days a week would take 20 weeks, or 5 months.

Doing 90 minutes a day, every day, would take 3.5 months. 

Doing 2 hours a day, 3 days a week would take about 6 months. 

Doing 1 hour a day, every day, would take just over 5 months. 

In the next article in this series, I'll detail specific textbooks, video and text-based tutorials, and other assorted learning materials to help put some muscle on the skeleton framework presented in this post.  

See the second article in the series, which covers learning benchmarks for beginner Python programmers.

Coursera to Open Learning Centers Around the World in Partnership with State Department

From Businessweek:
Coursera Inc. will offer free online courses in more than 30 locations around the world, mostly in third-world countries, bringing instruction to students who lack computer access. 

Under an agreement with the State Department, courses will be available at some U.S. embassies, the Mountain View, California-based company said today. All but one of the sites are outside the U.S., including Baghdad; Port au Prince, Haiti; and Hanoi, Vietnam.
Students can take the courses, have reliable Internet access and learn from local course facilitators, Coursera said. Along with the State Department, the University of Trinidad and Tobago and Overcoming Faith Academy, an orphanage in Kenya, are among the groups hosting the space. Of the more than 5 million students who have signed up for the free courses, about 1.2 million are from emerging markets, said Yin Lu, who leads the company’s growth and international outreach efforts.

Online Learning: Three Free Introduction to Computer Science Courses

These days, with a bit of perseverance and discipline, it is entirely possible to receive a world class education in computer science for free online from the comfort of your own home.  Many of the top computer science departments at US universities make their course lectures and materials freely available on the net, providing motivated individuals with a range of choices that is almost unbelievable in its scope.  In this post, we'll take a look a three Introduction to Computer Science courses that have been made freely available online from Harvard, MIT and Stanford.  The Harvard course provides an introduction to C, PHP and JavaScript.  Stanford focuses on Java. And MIT utilizes the Python programming language. 

Harvard's Intensive Introduction to Computer Science
Course site and description:
This free online computer science course is an introduction to the intellectual enterprises of computer science. Topics include algorithms (their design, implementation, and analysis); software development (abstraction, encapsulation, data structures, debugging, and testing); architecture of computers (low-level data representation and instruction processing); computer systems (programming languages, compilers, operating systems, and databases); and computers in the real world (networks, websites, security, forensics, and cryptography). The course teaches students how to think more carefully and how to solve problems more effectively. Problem sets involve extensive programming in C as well as PHP and JavaScript.
Stanford's Introduction to Computer Science and Programming Methodology
Course site and description:
This course is the largest of the introductory programming courses and is one of the largest courses at Stanford. Topics focus on the introduction to the engineering of computer applications emphasizing modern software engineering principles: object-oriented design, decomposition, encapsulation, abstraction, and testing. 
Programming Methodology teaches the widely-used Java programming language along with good software engineering principles. Emphasis is on good programming style and the built-in facilities of the Java language. The course is explicitly designed to appeal to humanists and social scientists as well as hard-core techies. In fact, most Programming Methodology graduates end up majoring outside of the School of Engineering. 
MIT's Introduction to Computer Science and Programming
Course site and description:
This subject is aimed at students with little or no programming experience. It aims to provide students with an understanding of the role computation can play in solving problems. It also aims to help students, regardless of their major, to feel justifiably confident of their ability to write small programs that allow them to accomplish useful goals. The class will use the Python programming language.  Many of the problem sets focus on specific topics, such as virus population dynamics, word games, optimizing routes, or simulating the movement of a Roomba.