Data wrangling with dplyr

# Data wrangling with dplyr
----
## **Session 6**
### 
###

]

---

# Wrangling

Reshaping or transforming data 
 into a format which is easier to work with
 (…for later visualisation, modelling, 
 or computing of statistics…)

---

# A note on "tidy" data

Tidyverse functions work best with tidy data:

1. Each variable forms a column.
2. Each observation forms a row.

(Broadly, this means long rather than wide tables)

---

# The tool: dplyr package
.blue[(dee-ply-r)]

is a language for data manipulation 
 Most wrangling puzzles can be solved with 
 knowledge of just 5 dplyr verbs .blue[(5 functions)]. 
 These verbs will be the subject of this session.

---

# Project 2:

## Exploring Mental Health (MH) Inpatient Capacity
--

The following is some analysis of Mental Health inpatient capacity in England.

As part of this, we will be looking at the changes in the number (and occupancy) of MH beds available.

### Background

Maintaining clinical effectiveness and safety when a
ward is fully occupied is a serious challenge for staff.

Inappropriate out of area placements have an added cost and also mean
patients are separated from their social support networks.

---

# The Data:

KH03 returns (bed numbers and occupancy) by organisation, published by NHS England.

Scraped from the NHSE statistics website:

https://www.england.nhs.uk/statistics/statistical-work-areas/bed-availability-and-occupancy/bed-data-overnight/

---

# Start a new script

File/New script or shortcut keys <kbd>Ctrl + Shift + N </kbd>

---

# Less friendly csvs

---

# Less friendly csvs

---

# Less friendly csvs

Note you will have to move the cursor to another area for the number to have an effect.

---

# Less friendly csvs

---

# Less friendly csvs

---

# Look at the data

This is real data so there are real issues (which we'll work with)

<table>
 <thead>
 <tr>
 <th style="text-align:left;"> date </th>
 <th style="text-align:left;"> org_code </th>
 <th style="text-align:left;"> org_name </th>
 <th style="text-align:right;"> beds_av </th>
 <th style="text-align:right;"> occ_av </th>
 </tr>
 </thead>
<tbody>
 <tr>
 <td style="text-align:left;"> 2013-09-01 </td>
 <td style="text-align:left;"> R1A </td>
 <td style="text-align:left;"> Worcestershire Health And Care </td>
 <td style="text-align:right;"> 129 </td>
 <td style="text-align:right;"> 117 </td>
 </tr>
 <tr>
 <td style="text-align:left;"> 2013-09-01 </td>
 <td style="text-align:left;"> R1C </td>
 <td style="text-align:left;"> Solent </td>
 <td style="text-align:right;"> 105 </td>
 <td style="text-align:right;"> 82 </td>
 </tr>
 <tr>
 <td style="text-align:left;"> 2013-09-01 </td>
 <td style="text-align:left;"> R1E </td>
 <td style="text-align:left;"> Staffordshire And Stoke On Trent Partnership </td>
 <td style="text-align:right;"> NA </td>
 <td style="text-align:right;"> NA </td>
 </tr>
 <tr>
 <td style="text-align:left;"> 2013-09-01 </td>
 <td style="text-align:left;"> R1F </td>
 <td style="text-align:left;"> Isle Of Wight </td>
 <td style="text-align:right;"> 54 </td>
 <td style="text-align:right;"> 42 </td>
 </tr>
 <tr>
 <td style="text-align:left;"> 2013-09-01 </td>
 <td style="text-align:left;"> R1H </td>
 <td style="text-align:left;"> Barts Health </td>
 <td style="text-align:right;"> NA </td>
 <td style="text-align:right;"> NA </td>
 </tr>
 <tr>
 <td style="text-align:left;"> 2013-09-01 </td>
 <td style="text-align:left;"> R1J </td>
 <td style="text-align:left;"> Gloucestershire Care Services </td>
 <td style="text-align:right;"> NA </td>
 <td style="text-align:right;"> NA </td>
 </tr>
</tbody>
</table>

---

# Dplyr

5 key verbs will help us gain a deeper understanding of our data sets.

Note summarise() can also be spelt summarize()

```r
dplyr::arrange()
dplyr::filter()
dplyr::mutate()
dplyr::group_by()
dplyr::summarise()
```

---

# Building with steps