{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Assignment: Unix\n",
    "\n",
    "### Wouter Zorgdrager and Georgios Gousios"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As we have seen during the lecture, the Unix command line is extremely\n",
    "versatile. This comes at a cost: you need to know which tool is suitable for the\n",
    "job or to build it. For this assignment, you will have to use a Unix command line \n",
    "(i.e. in the course's VM, a Mac or Ubuntu for Windows) \n",
    "to accomplish the indicated tasks. Not all tools you\n",
    "will need are readily available in the default installation of Xubuntu that we \n",
    "are using in the course. You need to search online (or implement!) for \n",
    "the appropriate tools and command line options.\n",
    "\n",
    "_Note_: To develop and run those commands in Jupyter, you need to install the\n",
    "[Bash kernel](https://github.com/takluyver/bash_kernel). Alternatively, you\n",
    "can create a file where you paste all programs/pipelines, print it and upload\n",
    "it to the assignment directory.\n",
    "\n",
    "There **120** points to collect. To get a 10, you need **110**."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Shell programming"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q1 (10 points)** Write a pipeline that converts (recursively!) a directory\n",
    "structure full of `.wav` files to `.mp3`s."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cell_id": "q1",
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q2 (10 points)** Write a program that given a directory structure of text files, it\n",
    "prints the 10 most common words in those files (across all files).\n",
    "\n",
    "The output should look like this (example is for 3 most common items):\n",
    "\n",
    "``\n",
    "5 most_common_word\n",
    "2 less_common_word\n",
    "1 least_common_word\n",
    "``"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cell_id": "q2",
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q3 (10 points)** Given this\n",
    "[dataset](https://drive.google.com/file/d/1HYkiUFb_NKw4BWXSOvjgtw0jnjF4eA2s/view?usp=sharing)\n",
    "(which you will also be using later on), write a program that obtains the star\n",
    "count, watcher count and number of forks for first 10 repositories from the GitHub API.\n",
    "The output must look like the following:\n",
    "\n",
    "```\n",
    "url,stars,watchers,forks\n",
    "https://api.github.com/repos/8d8d/Think.Admin,0,0,0\n",
    "https://api.github.com/repos/971638267/RetrofitAndRxjavaforRecyclerview,12,12,3\n",
    "https://api.github.com/repos/9alsacelost/greenDAO,0,0,0\n",
    "https://api.github.com/repos/a1265137718/ZoomHeader,0,0,0\n",
    "https://api.github.com/repos/12345678/NonExistingRepo,,,\n",
    "```\n",
    "\n",
    "**Note** Make sure you use this path for the csv file: `repositories.csv` otherwise auto-grading will fail.\n",
    "\n",
    "**Warning** The GitHub API requires authentication, otherwise it is limited to 60 requests per hour. Make sure you setup [an OAuth key](https://developer.github.com/v3/#oauth2-token-sent-in-a-header) and use this key for doing the requests. After the assignment deadline, you can remove it. _If you fail to submit a solution including a key, the grading will fail_"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cell_id": "q3",
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q4 (10 points)** Write a program that downloads all JPEG pictures in a web\n",
    "page. JPEG pictures are identified by the extensions `*.jpg` and `*.jpeg`.\n",
    "Your program must accept the URL to process as an argument. For example:\n",
    "\n",
    "```bash\n",
    "$ jpegdl https://www.nu.nl\n",
    "[...]\n",
    "$ ls\n",
    "zi7xo0pa6h6x_std320.jpg\n",
    "hsnxkcyat176_std320.jpg\n",
    "[...]\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cell_id": "q4"
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q5 (10 points)** All Unix systems have a\n",
    "[dictionary file](https://en.wikipedia.org/wiki/Words_(Unix)) residing under\n",
    "`/usr/share/dict/words` or `/usr/dict/words`. Use it to implement a (rudimentary)\n",
    "spellchecker. Your spellchecker should read a file named `foo.txt` and print\n",
    "a list all the words in the document to be checked that are not in the dictionary.\n",
    "An example usage session can be seen below.\n",
    "\n",
    "```bash\n",
    "$ cat foo.txt\n",
    "I am a nicelly formatted sentence,\n",
    "but I contain errors.\n",
    "\n",
    "$ cat foo.txt | spellchecker\n",
    "nicelly\n",
    "formated\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cell_id": "q5",
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q6 (10 points)** Given [this repository](https://github.com/SERG-Delft/jpacman-framework)\n",
    "at commit [05681455d905586f940e0e00e](https://github.com/SERG-Delft/jpacman-framework/commit/05681455d905586f940e0e00ed05f34266332ae9), find the sizes for all versions of all test files (assume that all test files are under `src/test/java`). The output must look like the following:\n",
    "\n",
    "```\n",
    "blob_id blob_path size_in_bytes\n",
    "```\n",
    "\n",
    "for example:\n",
    "\n",
    "```\n",
    "da14c3975e src/test/java/nl/tudelft/jpacman/npc/ghost/NavigationTest.java 165\n",
    "1fbe0d836d src/test/java/nl/tudelft/jpacman/sprite/SpriteTest.java 108\n",
    "3bde982975 src/test/java/nl/tudelft/jpacman/sprite/SpriteTest.java 108\n",
    "6286792e43 src/test/java/nl/tudelft/jpacman/sprite/SpriteTest.java 110\n",
    "```\n",
    "\n",
    "A blob in Git represents a file version."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cell_id": "q6",
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data processing"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For the assignments in this section, we use the following [dataset](https://drive.google.com/file/d/1HYkiUFb_NKw4BWXSOvjgtw0jnjF4eA2s/view?usp=sharing) containing repositories.\n",
    "\n",
    "Write pipelines to calculate answers to the following questions:\n",
    "\n",
    "_Note:_ Use `\"repositories.csv\"` as filename, otherwise automatic grading will fail."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q7 (10 points)** Count the number of repositories written in `Java`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cell_id": "q7",
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q8 (10 points)** How many repositories were forked _and_ written in PHP?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cell_id": "q8",
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q9 (10 points)** Which `owner_id` owns most repositories?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cell_id": "q9",
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q10 (10 points)**  Which repositories are created between `01-01-2017` and `24-03-2017` (both inclusive)? Print the names."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cell_id": "q10",
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q11 (10 points)**: Print the `10` most used programming languages sorted on popularity."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cell_id": "q11",
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q12 (10 points)**: Print the `username` of the owners whose repositories are deleted.\n",
    "\n",
    "_Hint_: Use the url field."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cell_id": "q12",
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Bash",
   "language": "bash",
   "name": "bash"
  },
  "language_info": {
   "codemirror_mode": "shell",
   "file_extension": ".sh",
   "mimetype": "text/x-sh",
   "name": "bash"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}