Editing CommunityData:Dataset And Tools Release 2018
From CommunityData
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.
The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 1: | Line 1: | ||
In summer 2018 [[People | In summer 2018 [[People/Nate]] is leading efforts to improve the code our research group uses to generate datasets from raw mediawiki dumps. The end goal is to release both the code and datasets generated on wikia and wikipedia wikis and to publish a data descriptor. This page documents these efforts. | ||
== Overview == | == Overview == | ||
Line 7: | Line 7: | ||
# Wiki level edits: for each wiki, a table where each row corresponds to an edit. | # Wiki level edits: for each wiki, a table where each row corresponds to an edit. | ||
# Wiki level edit weeks: edit data aggregated by each week. | # Wiki level edit weeks: edit data aggregated by each week. | ||
# | # User level edits: for each user, a table where each row corresponds to an edit. | ||
# User level edit weeks: user level edits aggregated by week. | # User level edit weeks: user level edits aggregated by week. | ||
Line 20: | Line 20: | ||
# Document the data sets with codebooks and example code. | # Document the data sets with codebooks and example code. | ||
# Document the use and development of wikiq and build_edit_weeks to support future maintainers. | # Document the use and development of wikiq and build_edit_weeks to support future maintainers. | ||