Community Data Science Course (Spring 2016)/Day 6 Coding Challenges

From CommunityData
< Community Data Science Course (Spring 2016)
Revision as of 01:50, 7 May 2016 by Guyrt (talk | contribs) (added hint)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

There is only 1 question for this week. I expect it to take 2 to 6 hours.

Find out how many people edit more than one unique page in the category "Category:Cities_in_Washington_(state)"?

How many people edit only one page?

Please treat IP addresses as separate users.

Hints

Many of you will want to save the output of some call to wikipedia to a file using open("file.tsv", "w"). You can read the file back into python using the code below. We will cover this in more detail on Wednesday.

file_handle = open("my_output.tsv", "r")  # the "r" means you are opening the file to read from it, not to write to it. Be careful about the difference!
for line in file_handle:
    line_clean = line.strip()  # remove the newline char at end of line.
    line_parts = line_clean.split('\t')  # Make a list by splitting the string on tab chars.
    print(len(line_parts))  # print the length of each line.