<?xml version="1.0" encoding="UTF-8"?>
<rss  xmlns:atom="http://www.w3.org/2005/Atom" 
      xmlns:media="http://search.yahoo.com/mrss/" 
      xmlns:content="http://purl.org/rss/1.0/modules/content/" 
      xmlns:dc="http://purl.org/dc/elements/1.1/" 
      version="2.0">
<channel>
<title>Mathatistics</title>
<link>https://mathatistics.com/blog.html</link>
<atom:link href="https://mathatistics.com/blog.xml" rel="self" type="application/rss+xml"/>
<description>My personal blog where I share my experiences and my learning journey.</description>
<generator>quarto-1.2.335</generator>
<lastBuildDate>Mon, 17 Mar 2025 23:00:00 GMT</lastBuildDate>
<item>
  <title>Merging multiple datasets</title>
  <dc:creator>Raju Rimal</dc:creator>
  <link>https://mathatistics.com/blog/posts/2025-04-28-join-with-dplyr/index.html</link>
  <description><![CDATA[ 




<style>
.knitsql-table {
  overflow-x: scroll;
}
</style>
<p>This post explores the fundamentals of joining tables in R using the&nbsp;<code>dplyr</code>&nbsp;package, with a focus on both core concepts and practical edge cases. You’ll learn how keys link datasets, how different join types handle unmatched rows, and strategies for tackling real-world challenges like missing values, multi-key joins, and pre-processing with intermediate variables. To bridge the gap for database users, every join example includes its SQL equivalent. Finally, a reproducible dummy dataset is provided so you can follow along and experiment.</p>
<section id="example-dataset" class="level2">
<h2 class="anchored" data-anchor-id="example-dataset">Example dataset</h2>
<div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-2-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-1" aria-controls="tabset-2-1" aria-selected="true">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-2-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-2" aria-controls="tabset-2-2" aria-selected="false">SQL</a></li></ul>
<div class="tab-content">
<div id="tabset-2-1" class="tab-pane active" aria-labelledby="tabset-2-1-tab">
<div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-1-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-1" aria-controls="tabset-1-1" aria-selected="true">Employee table</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-2" aria-controls="tabset-1-2" aria-selected="false">Department Table</a></li></ul>
<div class="tab-content">
<div id="tabset-1-1" class="tab-pane active" aria-labelledby="tabset-1-1-tab">
<div class="cell">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="co" style="color: #5E5E5E;"># Employees: </span></span>
<span id="cb1-2"><span class="co" style="color: #5E5E5E;">#&gt; emp_id 3 has NA dept_id; </span></span>
<span id="cb1-3"><span class="co" style="color: #5E5E5E;">#&gt; emp_id 5's dept_id (4) is missing</span></span>
<span id="cb1-4">employees <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">tibble</span>(</span>
<span id="cb1-5">  <span class="at" style="color: #657422;">emp_id =</span> <span class="fu" style="color: #4758AB;">c</span>(<span class="dv" style="color: #AD0000;">1</span>, <span class="dv" style="color: #AD0000;">2</span>, <span class="dv" style="color: #AD0000;">3</span>, <span class="dv" style="color: #AD0000;">4</span>, <span class="dv" style="color: #AD0000;">5</span>, <span class="dv" style="color: #AD0000;">6</span>),</span>
<span id="cb1-6">  <span class="at" style="color: #657422;">first_name =</span> <span class="fu" style="color: #4758AB;">c</span>(</span>
<span id="cb1-7">    <span class="st" style="color: #20794D;">"Alice Susanna"</span>, <span class="st" style="color: #20794D;">"Bob"</span>, <span class="st" style="color: #20794D;">"Mary Anne"</span>, </span>
<span id="cb1-8">    <span class="st" style="color: #20794D;">"Diana"</span>, <span class="st" style="color: #20794D;">"Eve"</span>, <span class="st" style="color: #20794D;">"Frank"</span></span>
<span id="cb1-9">  ),</span>
<span id="cb1-10">  <span class="at" style="color: #657422;">last_name =</span> <span class="fu" style="color: #4758AB;">c</span>(</span>
<span id="cb1-11">    <span class="st" style="color: #20794D;">"Smith"</span>, <span class="st" style="color: #20794D;">"Johnson"</span>, <span class="st" style="color: #20794D;">"Brown"</span>, </span>
<span id="cb1-12">    <span class="st" style="color: #20794D;">"Lee"</span>, <span class="st" style="color: #20794D;">"Davis"</span>, <span class="st" style="color: #20794D;">"DeVito Jr."</span></span>
<span id="cb1-13">  ),</span>
<span id="cb1-14">  <span class="at" style="color: #657422;">dept_id =</span> <span class="fu" style="color: #4758AB;">c</span>(<span class="dv" style="color: #AD0000;">1</span>, <span class="dv" style="color: #AD0000;">2</span>, <span class="cn" style="color: #8f5902;">NA</span>, <span class="dv" style="color: #AD0000;">3</span>, <span class="dv" style="color: #AD0000;">4</span>, <span class="dv" style="color: #AD0000;">2</span>),</span>
<span id="cb1-15">  <span class="at" style="color: #657422;">salary =</span> <span class="fu" style="color: #4758AB;">c</span>(</span>
<span id="cb1-16">    <span class="dv" style="color: #AD0000;">60000</span>, <span class="dv" style="color: #AD0000;">75000</span>, <span class="dv" style="color: #AD0000;">80000</span>, </span>
<span id="cb1-17">    <span class="dv" style="color: #AD0000;">90000</span>, <span class="dv" style="color: #AD0000;">65000</span>, <span class="dv" style="color: #AD0000;">70000</span></span>
<span id="cb1-18">  )</span>
<span id="cb1-19">)</span>
<span id="cb1-20"></span>
<span id="cb1-21"><span class="fu" style="color: #4758AB;">print</span>(employees)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 6 × 5
  emp_id first_name    last_name  dept_id salary
   &lt;dbl&gt; &lt;chr&gt;         &lt;chr&gt;        &lt;dbl&gt;  &lt;dbl&gt;
1      1 Alice Susanna Smith            1  60000
2      2 Bob           Johnson          2  75000
3      3 Mary Anne     Brown           NA  80000
4      4 Diana         Lee              3  90000
5      5 Eve           Davis            4  65000
6      6 Frank         DeVito Jr.       2  70000</code></pre>
</div>
</div>
</div>
<div id="tabset-1-2" class="tab-pane" aria-labelledby="tabset-1-2-tab">
<div class="cell">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="co" style="color: #5E5E5E;">#&gt; Departments: </span></span>
<span id="cb3-2"><span class="co" style="color: #5E5E5E;">#&gt; - dept_id 5 has no employees</span></span>
<span id="cb3-3"><span class="co" style="color: #5E5E5E;">#&gt; - dept_id 2 appears once</span></span>
<span id="cb3-4">departments <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">tibble</span>(</span>
<span id="cb3-5">  <span class="at" style="color: #657422;">dept_id =</span> <span class="fu" style="color: #4758AB;">c</span>(<span class="dv" style="color: #AD0000;">1</span>, <span class="dv" style="color: #AD0000;">2</span>, <span class="dv" style="color: #AD0000;">3</span>, <span class="dv" style="color: #AD0000;">5</span>),</span>
<span id="cb3-6">  <span class="at" style="color: #657422;">dept_name =</span> <span class="fu" style="color: #4758AB;">c</span>(</span>
<span id="cb3-7">    <span class="st" style="color: #20794D;">"HR"</span>, <span class="st" style="color: #20794D;">"Engineering"</span>, <span class="st" style="color: #20794D;">"Marketing"</span>, <span class="st" style="color: #20794D;">"Finance"</span></span>
<span id="cb3-8">  ),</span>
<span id="cb3-9">  <span class="at" style="color: #657422;">manager_first =</span> <span class="fu" style="color: #4758AB;">c</span>(</span>
<span id="cb3-10">    <span class="st" style="color: #20794D;">"alice"</span>, <span class="st" style="color: #20794D;">"robert"</span>, <span class="st" style="color: #20794D;">"diana"</span>, <span class="st" style="color: #20794D;">"carol"</span></span>
<span id="cb3-11">  ),</span>
<span id="cb3-12">  <span class="at" style="color: #657422;">manager_last =</span> <span class="fu" style="color: #4758AB;">c</span>(</span>
<span id="cb3-13">    <span class="st" style="color: #20794D;">"smith"</span>, <span class="st" style="color: #20794D;">"johnson"</span>, <span class="st" style="color: #20794D;">"lee"</span>, <span class="st" style="color: #20794D;">"taylor"</span></span>
<span id="cb3-14">  ),</span>
<span id="cb3-15">  <span class="at" style="color: #657422;">budget =</span> <span class="fu" style="color: #4758AB;">c</span>(</span>
<span id="cb3-16">    <span class="dv" style="color: #AD0000;">500000</span>, <span class="dv" style="color: #AD0000;">1000000</span>, <span class="dv" style="color: #AD0000;">750000</span>, <span class="dv" style="color: #AD0000;">600000</span></span>
<span id="cb3-17">  )</span>
<span id="cb3-18">)</span>
<span id="cb3-19"></span>
<span id="cb3-20"><span class="fu" style="color: #4758AB;">print</span>(departments)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 4 × 5
  dept_id dept_name   manager_first manager_last  budget
    &lt;dbl&gt; &lt;chr&gt;       &lt;chr&gt;         &lt;chr&gt;          &lt;dbl&gt;
1       1 HR          alice         smith         500000
2       2 Engineering robert        johnson      1000000
3       3 Marketing   diana         lee           750000
4       5 Finance     carol         taylor        600000</code></pre>
</div>
</div>
</div>
</div>
</div>
</div>
<div id="tabset-2-2" class="tab-pane" aria-labelledby="tabset-2-2-tab">
<div class="cell">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1"><span class="fu" style="color: #4758AB;">library</span>(RSQLite)</span>
<span id="cb5-2"><span class="fu" style="color: #4758AB;">library</span>(DBI)</span>
<span id="cb5-3">con <span class="ot" style="color: #003B4F;">&lt;-</span> DBI<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">dbConnect</span>(<span class="fu" style="color: #4758AB;">SQLite</span>(), <span class="st" style="color: #20794D;">"database.db"</span>)</span>
<span id="cb5-4"><span class="cf" style="color: #003B4F;">if</span> (<span class="sc" style="color: #5E5E5E;">!</span>RSQLite<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">dbExistsTable</span>(con, <span class="st" style="color: #20794D;">"departments"</span>)) {</span>
<span id="cb5-5">  <span class="fu" style="color: #4758AB;">dbWriteTable</span>(con, <span class="at" style="color: #657422;">name =</span> <span class="st" style="color: #20794D;">"departments"</span>, <span class="at" style="color: #657422;">value =</span> departments)</span>
<span id="cb5-6">}</span>
<span id="cb5-7"><span class="cf" style="color: #003B4F;">if</span> (<span class="sc" style="color: #5E5E5E;">!</span>RSQLite<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">dbExistsTable</span>(con, <span class="st" style="color: #20794D;">"employees"</span>)) {</span>
<span id="cb5-8">  <span class="fu" style="color: #4758AB;">dbWriteTable</span>(con, <span class="at" style="color: #657422;">name =</span> <span class="st" style="color: #20794D;">"employees"</span>, <span class="at" style="color: #657422;">value =</span> employees)</span>
<span id="cb5-9">}</span></code></pre></div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode sql code-with-copy"><code class="sourceCode sql"><span id="cb6-1">PRAGMA table_info(employees);</span></code></pre></div>
<div class="knitsql-table">
<table class="table table-sm table-striped">
<caption>5 records</caption>
<thead>
<tr class="header">
<th style="text-align: left;">cid</th>
<th style="text-align: left;">name</th>
<th style="text-align: left;">type</th>
<th style="text-align: right;">notnull</th>
<th style="text-align: left;">dflt_value</th>
<th style="text-align: right;">pk</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: left;">0</td>
<td style="text-align: left;">emp_id</td>
<td style="text-align: left;">REAL</td>
<td style="text-align: right;">0</td>
<td style="text-align: left;">NA</td>
<td style="text-align: right;">0</td>
</tr>
<tr class="even">
<td style="text-align: left;">1</td>
<td style="text-align: left;">first_name</td>
<td style="text-align: left;">TEXT</td>
<td style="text-align: right;">0</td>
<td style="text-align: left;">NA</td>
<td style="text-align: right;">0</td>
</tr>
<tr class="odd">
<td style="text-align: left;">2</td>
<td style="text-align: left;">last_name</td>
<td style="text-align: left;">TEXT</td>
<td style="text-align: right;">0</td>
<td style="text-align: left;">NA</td>
<td style="text-align: right;">0</td>
</tr>
<tr class="even">
<td style="text-align: left;">3</td>
<td style="text-align: left;">dept_id</td>
<td style="text-align: left;">REAL</td>
<td style="text-align: right;">0</td>
<td style="text-align: left;">NA</td>
<td style="text-align: right;">0</td>
</tr>
<tr class="odd">
<td style="text-align: left;">4</td>
<td style="text-align: left;">salary</td>
<td style="text-align: left;">REAL</td>
<td style="text-align: right;">0</td>
<td style="text-align: left;">NA</td>
<td style="text-align: right;">0</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode sql code-with-copy"><code class="sourceCode sql"><span id="cb7-1">PRAGMA table_info(departments);</span></code></pre></div>
<div class="knitsql-table">
<table class="table table-sm table-striped">
<caption>5 records</caption>
<thead>
<tr class="header">
<th style="text-align: left;">cid</th>
<th style="text-align: left;">name</th>
<th style="text-align: left;">type</th>
<th style="text-align: right;">notnull</th>
<th style="text-align: left;">dflt_value</th>
<th style="text-align: right;">pk</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: left;">0</td>
<td style="text-align: left;">dept_id</td>
<td style="text-align: left;">REAL</td>
<td style="text-align: right;">0</td>
<td style="text-align: left;">NA</td>
<td style="text-align: right;">0</td>
</tr>
<tr class="even">
<td style="text-align: left;">1</td>
<td style="text-align: left;">dept_name</td>
<td style="text-align: left;">TEXT</td>
<td style="text-align: right;">0</td>
<td style="text-align: left;">NA</td>
<td style="text-align: right;">0</td>
</tr>
<tr class="odd">
<td style="text-align: left;">2</td>
<td style="text-align: left;">manager_first</td>
<td style="text-align: left;">TEXT</td>
<td style="text-align: right;">0</td>
<td style="text-align: left;">NA</td>
<td style="text-align: right;">0</td>
</tr>
<tr class="even">
<td style="text-align: left;">3</td>
<td style="text-align: left;">manager_last</td>
<td style="text-align: left;">TEXT</td>
<td style="text-align: right;">0</td>
<td style="text-align: left;">NA</td>
<td style="text-align: right;">0</td>
</tr>
<tr class="odd">
<td style="text-align: left;">4</td>
<td style="text-align: left;">budget</td>
<td style="text-align: left;">REAL</td>
<td style="text-align: right;">0</td>
<td style="text-align: left;">NA</td>
<td style="text-align: right;">0</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</section>
<section id="mutating-joins" class="level2">
<h2 class="anchored" data-anchor-id="mutating-joins">Mutating Joins</h2>
<p>Mutating joins combine columns from two tables based on matching keys, preserving rows depending on the join type.</p>
<section id="inner-join-inner_join" class="level3">
<h3 class="anchored" data-anchor-id="inner-join-inner_join">Inner Join | <code>inner_join()</code></h3>
<div class="columns">
<div class="column" style="width:60%;">
<p>We can use <code>inner_join()</code> function from <code>dplyr</code> to inner join the data. Inner join retains <em>only rows with matching keys in both tables</em>. The output table contains all columns, excluding the unmatched rows from both tables.</p>
</div><div class="column" style="width:40%;">
<p><img src="https://mathatistics.com/blog/posts/2025-04-28-join-with-dplyr/images/inner-join.svg" class="img-fluid" style="width:90.0%"></p>
</div>
</div>
<div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-3-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-3-1" aria-controls="tabset-3-1" aria-selected="true">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-3-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-3-2" aria-controls="tabset-3-2" aria-selected="false">SQL</a></li></ul>
<div class="tab-content">
<div id="tabset-3-1" class="tab-pane active" aria-labelledby="tabset-3-1-tab">
<div class="cell">
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1">employees <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb8-2">  <span class="fu" style="color: #4758AB;">inner_join</span>(departments, <span class="at" style="color: #657422;">by =</span> <span class="st" style="color: #20794D;">"dept_id"</span>) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb8-3">  <span class="fu" style="color: #4758AB;">select</span>(emp_id, first_name, last_name, salary, dept_name)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 4 × 5
  emp_id first_name    last_name  salary dept_name  
   &lt;dbl&gt; &lt;chr&gt;         &lt;chr&gt;       &lt;dbl&gt; &lt;chr&gt;      
1      1 Alice Susanna Smith       60000 HR         
2      2 Bob           Johnson     75000 Engineering
3      4 Diana         Lee         90000 Marketing  
4      6 Frank         DeVito Jr.  70000 Engineering</code></pre>
</div>
</div>
</div>
<div id="tabset-3-2" class="tab-pane" aria-labelledby="tabset-3-2-tab">
<div class="cell">
<div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode sql code-with-copy"><code class="sourceCode sql"><span id="cb10-1"><span class="kw" style="color: #003B4F;">SELECT</span></span>
<span id="cb10-2"> emp.emp_id,</span>
<span id="cb10-3"> emp.first_name,</span>
<span id="cb10-4"> emp.last_name,</span>
<span id="cb10-5"> emp.salary,</span>
<span id="cb10-6"> dept.dept_name </span>
<span id="cb10-7"> <span class="kw" style="color: #003B4F;">FROM</span> employees <span class="kw" style="color: #003B4F;">AS</span> emp</span>
<span id="cb10-8">  <span class="kw" style="color: #003B4F;">INNER</span> <span class="kw" style="color: #003B4F;">JOIN</span> departments <span class="kw" style="color: #003B4F;">as</span> dept</span>
<span id="cb10-9">    <span class="kw" style="color: #003B4F;">ON</span> emp.dept_id <span class="op" style="color: #5E5E5E;">=</span> dept.dept_id;</span></code></pre></div>
<div class="knitsql-table">
<table class="table table-sm table-striped">
<caption>4 records</caption>
<thead>
<tr class="header">
<th style="text-align: right;">emp_id</th>
<th style="text-align: left;">first_name</th>
<th style="text-align: left;">last_name</th>
<th style="text-align: right;">salary</th>
<th style="text-align: left;">dept_name</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: right;">1</td>
<td style="text-align: left;">Alice Susanna</td>
<td style="text-align: left;">Smith</td>
<td style="text-align: right;">60000</td>
<td style="text-align: left;">HR</td>
</tr>
<tr class="even">
<td style="text-align: right;">2</td>
<td style="text-align: left;">Bob</td>
<td style="text-align: left;">Johnson</td>
<td style="text-align: right;">75000</td>
<td style="text-align: left;">Engineering</td>
</tr>
<tr class="odd">
<td style="text-align: right;">4</td>
<td style="text-align: left;">Diana</td>
<td style="text-align: left;">Lee</td>
<td style="text-align: right;">90000</td>
<td style="text-align: left;">Marketing</td>
</tr>
<tr class="even">
<td style="text-align: right;">6</td>
<td style="text-align: left;">Frank</td>
<td style="text-align: left;">DeVito Jr.</td>
<td style="text-align: right;">70000</td>
<td style="text-align: left;">Engineering</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
<p>Here,</p>
<ul>
<li>Rows with&nbsp;<code>dept_id = NA</code>&nbsp;(Charlie) and&nbsp;<code>dept_id = 4</code>&nbsp;(Eve) were dropped (no match in&nbsp;<code>departments</code>).</li>
<li><code>dept_id = 5</code>&nbsp;(Finance) was dropped (no match in&nbsp;<code>employees</code>).</li>
<li><code>dept_id = 2</code>&nbsp;appears twice (Bob and Frank), causing&nbsp;<strong>row expansion</strong>.</li>
</ul>
</section>
<section id="left-join-left_join" class="level3">
<h3 class="anchored" data-anchor-id="left-join-left_join">Left Join | <code>left_join()</code></h3>
<div class="columns">
<div class="column" style="width:60%;">
<p>Left Join Keeps&nbsp;<em>all rows from the left table</em>, adding matched columns from the right table. The unmatched rows from the left table get&nbsp;<code>NA</code>&nbsp;for right-table columns. The left table is preserved entirely and the right table is <em>attached</em> where possible.</p>
</div><div class="column" style="width:40%;">
<p><img src="https://mathatistics.com/blog/posts/2025-04-28-join-with-dplyr/images/left-join.svg" class="img-fluid" style="width:90.0%"></p>
</div>
</div>
<div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-4-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-4-1" aria-controls="tabset-4-1" aria-selected="true">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-4-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-4-2" aria-controls="tabset-4-2" aria-selected="false">SQL</a></li></ul>
<div class="tab-content">
<div id="tabset-4-1" class="tab-pane active" aria-labelledby="tabset-4-1-tab">
<div class="cell">
<div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1">employees <span class="sc" style="color: #5E5E5E;">|&gt;</span> </span>
<span id="cb11-2">  <span class="fu" style="color: #4758AB;">left_join</span>(departments, <span class="at" style="color: #657422;">by =</span> <span class="st" style="color: #20794D;">"dept_id"</span>) <span class="sc" style="color: #5E5E5E;">|&gt;</span> </span>
<span id="cb11-3">  <span class="fu" style="color: #4758AB;">select</span>(<span class="fu" style="color: #4758AB;">names</span>(employees), dept_name)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 6 × 6
  emp_id first_name    last_name  dept_id salary dept_name  
   &lt;dbl&gt; &lt;chr&gt;         &lt;chr&gt;        &lt;dbl&gt;  &lt;dbl&gt; &lt;chr&gt;      
1      1 Alice Susanna Smith            1  60000 HR         
2      2 Bob           Johnson          2  75000 Engineering
3      3 Mary Anne     Brown           NA  80000 &lt;NA&gt;       
4      4 Diana         Lee              3  90000 Marketing  
5      5 Eve           Davis            4  65000 &lt;NA&gt;       
6      6 Frank         DeVito Jr.       2  70000 Engineering</code></pre>
</div>
</div>
</div>
<div id="tabset-4-2" class="tab-pane" aria-labelledby="tabset-4-2-tab">
<div class="cell">
<div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode sql code-with-copy"><code class="sourceCode sql"><span id="cb13-1"><span class="kw" style="color: #003B4F;">SELECT</span> </span>
<span id="cb13-2">  emp.<span class="op" style="color: #5E5E5E;">*</span>, dept.dept_name</span>
<span id="cb13-3">  <span class="kw" style="color: #003B4F;">FROM</span> employees <span class="kw" style="color: #003B4F;">AS</span> emp</span>
<span id="cb13-4">  <span class="kw" style="color: #003B4F;">LEFT</span> <span class="kw" style="color: #003B4F;">JOIN</span> departments  <span class="kw" style="color: #003B4F;">AS</span> dept</span>
<span id="cb13-5">    <span class="kw" style="color: #003B4F;">ON</span> emp.dept_id <span class="op" style="color: #5E5E5E;">=</span> dept.dept_id;</span></code></pre></div>
<div class="knitsql-table">
<table class="table table-sm table-striped">
<caption>6 records</caption>
<thead>
<tr class="header">
<th style="text-align: left;">emp_id</th>
<th style="text-align: left;">first_name</th>
<th style="text-align: left;">last_name</th>
<th style="text-align: right;">dept_id</th>
<th style="text-align: right;">salary</th>
<th style="text-align: left;">dept_name</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: left;">1</td>
<td style="text-align: left;">Alice Susanna</td>
<td style="text-align: left;">Smith</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">60000</td>
<td style="text-align: left;">HR</td>
</tr>
<tr class="even">
<td style="text-align: left;">2</td>
<td style="text-align: left;">Bob</td>
<td style="text-align: left;">Johnson</td>
<td style="text-align: right;">2</td>
<td style="text-align: right;">75000</td>
<td style="text-align: left;">Engineering</td>
</tr>
<tr class="odd">
<td style="text-align: left;">3</td>
<td style="text-align: left;">Mary Anne</td>
<td style="text-align: left;">Brown</td>
<td style="text-align: right;">NA</td>
<td style="text-align: right;">80000</td>
<td style="text-align: left;">NA</td>
</tr>
<tr class="even">
<td style="text-align: left;">4</td>
<td style="text-align: left;">Diana</td>
<td style="text-align: left;">Lee</td>
<td style="text-align: right;">3</td>
<td style="text-align: right;">90000</td>
<td style="text-align: left;">Marketing</td>
</tr>
<tr class="odd">
<td style="text-align: left;">5</td>
<td style="text-align: left;">Eve</td>
<td style="text-align: left;">Davis</td>
<td style="text-align: right;">4</td>
<td style="text-align: right;">65000</td>
<td style="text-align: left;">NA</td>
</tr>
<tr class="even">
<td style="text-align: left;">6</td>
<td style="text-align: left;">Frank</td>
<td style="text-align: left;">DeVito Jr.</td>
<td style="text-align: right;">2</td>
<td style="text-align: right;">70000</td>
<td style="text-align: left;">Engineering</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
<p>here,</p>
<ul>
<li>All employees are retained, even those with&nbsp;<code>NA</code>&nbsp;or unmatched&nbsp;<code>dept_id</code>.</li>
<li><code>dept_id = 4</code>&nbsp;(Eve) and&nbsp;<code>NA</code>&nbsp;(Charlie) have&nbsp;<code>NA</code>&nbsp;for department columns.</li>
</ul>
</section>
<section id="right-join-right_join" class="level3">
<h3 class="anchored" data-anchor-id="right-join-right_join">Right Join | <code>right_join()</code></h3>
<div class="columns">
<div class="column" style="width:60%;">
<p>Similar to Left Join, Right join Keeps&nbsp;<em>all rows from the right table</em>, adding matched columns from the left table. The unmatched rows from the right table get&nbsp;<code>NA</code>&nbsp;for left-table columns. The right table is preserved entirely and the left table is <em>attached</em> where possible.</p>
</div><div class="column" style="width:40%;">
<p><img src="https://mathatistics.com/blog/posts/2025-04-28-join-with-dplyr/images/right-join.svg" class="img-fluid" style="width:90.0%"></p>
</div>
</div>
<div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-5-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-5-1" aria-controls="tabset-5-1" aria-selected="true">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-5-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-5-2" aria-controls="tabset-5-2" aria-selected="false">SQL</a></li></ul>
<div class="tab-content">
<div id="tabset-5-1" class="tab-pane active" aria-labelledby="tabset-5-1-tab">
<div class="cell">
<div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1">employees <span class="sc" style="color: #5E5E5E;">|&gt;</span> </span>
<span id="cb14-2">  <span class="fu" style="color: #4758AB;">right_join</span>(departments, <span class="at" style="color: #657422;">by =</span> <span class="st" style="color: #20794D;">"dept_id"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 5 × 9
  emp_id first_name    last_name  dept_id salary dept_name   manager_first
   &lt;dbl&gt; &lt;chr&gt;         &lt;chr&gt;        &lt;dbl&gt;  &lt;dbl&gt; &lt;chr&gt;       &lt;chr&gt;        
1      1 Alice Susanna Smith            1  60000 HR          alice        
2      2 Bob           Johnson          2  75000 Engineering robert       
3      4 Diana         Lee              3  90000 Marketing   diana        
4      6 Frank         DeVito Jr.       2  70000 Engineering robert       
5     NA &lt;NA&gt;          &lt;NA&gt;             5     NA Finance     carol        
# ℹ 2 more variables: manager_last &lt;chr&gt;, budget &lt;dbl&gt;</code></pre>
</div>
</div>
</div>
<div id="tabset-5-2" class="tab-pane" aria-labelledby="tabset-5-2-tab">
<div class="cell">
<div class="sourceCode cell-code" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode sql code-with-copy"><code class="sourceCode sql"><span id="cb16-1"><span class="kw" style="color: #003B4F;">SELECT</span> <span class="op" style="color: #5E5E5E;">*</span></span>
<span id="cb16-2">  <span class="kw" style="color: #003B4F;">FROM</span> employees <span class="kw" style="color: #003B4F;">AS</span> emp</span>
<span id="cb16-3">  <span class="kw" style="color: #003B4F;">RIGHT</span> <span class="kw" style="color: #003B4F;">JOIN</span> departments  <span class="kw" style="color: #003B4F;">AS</span> dept</span>
<span id="cb16-4">    <span class="kw" style="color: #003B4F;">ON</span> emp.dept_id <span class="op" style="color: #5E5E5E;">=</span> dept.dept_id;</span></code></pre></div>
<div class="knitsql-table">
<table class="table table-sm table-striped">
<caption>5 records</caption>
<colgroup>
<col style="width: 6%">
<col style="width: 13%">
<col style="width: 10%">
<col style="width: 7%">
<col style="width: 6%">
<col style="width: 7%">
<col style="width: 11%">
<col style="width: 13%">
<col style="width: 12%">
<col style="width: 7%">
</colgroup>
<thead>
<tr class="header">
<th style="text-align: right;">emp_id</th>
<th style="text-align: left;">first_name</th>
<th style="text-align: left;">last_name</th>
<th style="text-align: right;">dept_id</th>
<th style="text-align: right;">salary</th>
<th style="text-align: right;">dept_id</th>
<th style="text-align: left;">dept_name</th>
<th style="text-align: left;">manager_first</th>
<th style="text-align: left;">manager_last</th>
<th style="text-align: right;">budget</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: right;">1</td>
<td style="text-align: left;">Alice Susanna</td>
<td style="text-align: left;">Smith</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">60000</td>
<td style="text-align: right;">1</td>
<td style="text-align: left;">HR</td>
<td style="text-align: left;">alice</td>
<td style="text-align: left;">smith</td>
<td style="text-align: right;">500000</td>
</tr>
<tr class="even">
<td style="text-align: right;">2</td>
<td style="text-align: left;">Bob</td>
<td style="text-align: left;">Johnson</td>
<td style="text-align: right;">2</td>
<td style="text-align: right;">75000</td>
<td style="text-align: right;">2</td>
<td style="text-align: left;">Engineering</td>
<td style="text-align: left;">robert</td>
<td style="text-align: left;">johnson</td>
<td style="text-align: right;">1000000</td>
</tr>
<tr class="odd">
<td style="text-align: right;">4</td>
<td style="text-align: left;">Diana</td>
<td style="text-align: left;">Lee</td>
<td style="text-align: right;">3</td>
<td style="text-align: right;">90000</td>
<td style="text-align: right;">3</td>
<td style="text-align: left;">Marketing</td>
<td style="text-align: left;">diana</td>
<td style="text-align: left;">lee</td>
<td style="text-align: right;">750000</td>
</tr>
<tr class="even">
<td style="text-align: right;">6</td>
<td style="text-align: left;">Frank</td>
<td style="text-align: left;">DeVito Jr.</td>
<td style="text-align: right;">2</td>
<td style="text-align: right;">70000</td>
<td style="text-align: right;">2</td>
<td style="text-align: left;">Engineering</td>
<td style="text-align: left;">robert</td>
<td style="text-align: left;">johnson</td>
<td style="text-align: right;">1000000</td>
</tr>
<tr class="odd">
<td style="text-align: right;">NA</td>
<td style="text-align: left;">NA</td>
<td style="text-align: left;">NA</td>
<td style="text-align: right;">NA</td>
<td style="text-align: right;">NA</td>
<td style="text-align: right;">5</td>
<td style="text-align: left;">Finance</td>
<td style="text-align: left;">carol</td>
<td style="text-align: left;">taylor</td>
<td style="text-align: right;">600000</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
<p>here,</p>
<ul>
<li>All departments are retained, even those with&nbsp;unmatched&nbsp;<code>dept_id</code>. The corresponding <code>emp_id</code> was set to NA if not matched with items in the departments.</li>
<li><code>dept_id = 5</code> (Finance department) where carol is the manager is retained even if she des not have any matched records in employees table.</li>
</ul>
</section>
<section id="full-join-full_join" class="level3">
<h3 class="anchored" data-anchor-id="full-join-full_join">Full Join | <code>full_join()</code></h3>
<div class="columns">
<div class="column" style="width:60%;">
<p>Full join keeps&nbsp;<em>all rows from both tables</em>, filling&nbsp;<code>NA</code>&nbsp;where no match exists. Output is the union of both tables, with missing values for unmatched rows. In other words, it combines all data preserving everything.</p>
</div><div class="column" style="width:40%;">
<p><img src="https://mathatistics.com/blog/posts/2025-04-28-join-with-dplyr/images/outer-join.svg" class="img-fluid" style="width:90.0%"></p>
</div>
</div>
<div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-6-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-6-1" aria-controls="tabset-6-1" aria-selected="true">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-6-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-6-2" aria-controls="tabset-6-2" aria-selected="false">SQL</a></li></ul>
<div class="tab-content">
<div id="tabset-6-1" class="tab-pane active" aria-labelledby="tabset-6-1-tab">
<div class="cell">
<div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb17-1">employees <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb17-2">    <span class="fu" style="color: #4758AB;">full_join</span>(departments, <span class="at" style="color: #657422;">by =</span> <span class="st" style="color: #20794D;">"dept_id"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 7 × 9
  emp_id first_name    last_name  dept_id salary dept_name   manager_first
   &lt;dbl&gt; &lt;chr&gt;         &lt;chr&gt;        &lt;dbl&gt;  &lt;dbl&gt; &lt;chr&gt;       &lt;chr&gt;        
1      1 Alice Susanna Smith            1  60000 HR          alice        
2      2 Bob           Johnson          2  75000 Engineering robert       
3      3 Mary Anne     Brown           NA  80000 &lt;NA&gt;        &lt;NA&gt;         
4      4 Diana         Lee              3  90000 Marketing   diana        
5      5 Eve           Davis            4  65000 &lt;NA&gt;        &lt;NA&gt;         
6      6 Frank         DeVito Jr.       2  70000 Engineering robert       
7     NA &lt;NA&gt;          &lt;NA&gt;             5     NA Finance     carol        
# ℹ 2 more variables: manager_last &lt;chr&gt;, budget &lt;dbl&gt;</code></pre>
</div>
</div>
</div>
<div id="tabset-6-2" class="tab-pane" aria-labelledby="tabset-6-2-tab">
<div class="cell">
<div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode sql code-with-copy"><code class="sourceCode sql"><span id="cb19-1"><span class="kw" style="color: #003B4F;">SELECT</span> <span class="op" style="color: #5E5E5E;">*</span> </span>
<span id="cb19-2"><span class="kw" style="color: #003B4F;">FROM</span> employees </span>
<span id="cb19-3"><span class="kw" style="color: #003B4F;">FULL</span> <span class="kw" style="color: #003B4F;">OUTER</span> <span class="kw" style="color: #003B4F;">JOIN</span> departments </span>
<span id="cb19-4">    <span class="kw" style="color: #003B4F;">ON</span> employees.dept_id <span class="op" style="color: #5E5E5E;">=</span> departments.dept_id;</span></code></pre></div>
<div class="knitsql-table">
<table class="table table-sm table-striped">
<caption>7 records</caption>
<colgroup>
<col style="width: 6%">
<col style="width: 13%">
<col style="width: 10%">
<col style="width: 7%">
<col style="width: 6%">
<col style="width: 7%">
<col style="width: 11%">
<col style="width: 13%">
<col style="width: 12%">
<col style="width: 7%">
</colgroup>
<thead>
<tr class="header">
<th style="text-align: right;">emp_id</th>
<th style="text-align: left;">first_name</th>
<th style="text-align: left;">last_name</th>
<th style="text-align: right;">dept_id</th>
<th style="text-align: right;">salary</th>
<th style="text-align: right;">dept_id</th>
<th style="text-align: left;">dept_name</th>
<th style="text-align: left;">manager_first</th>
<th style="text-align: left;">manager_last</th>
<th style="text-align: right;">budget</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: right;">1</td>
<td style="text-align: left;">Alice Susanna</td>
<td style="text-align: left;">Smith</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">60000</td>
<td style="text-align: right;">1</td>
<td style="text-align: left;">HR</td>
<td style="text-align: left;">alice</td>
<td style="text-align: left;">smith</td>
<td style="text-align: right;">500000</td>
</tr>
<tr class="even">
<td style="text-align: right;">2</td>
<td style="text-align: left;">Bob</td>
<td style="text-align: left;">Johnson</td>
<td style="text-align: right;">2</td>
<td style="text-align: right;">75000</td>
<td style="text-align: right;">2</td>
<td style="text-align: left;">Engineering</td>
<td style="text-align: left;">robert</td>
<td style="text-align: left;">johnson</td>
<td style="text-align: right;">1000000</td>
</tr>
<tr class="odd">
<td style="text-align: right;">3</td>
<td style="text-align: left;">Mary Anne</td>
<td style="text-align: left;">Brown</td>
<td style="text-align: right;">NA</td>
<td style="text-align: right;">80000</td>
<td style="text-align: right;">NA</td>
<td style="text-align: left;">NA</td>
<td style="text-align: left;">NA</td>
<td style="text-align: left;">NA</td>
<td style="text-align: right;">NA</td>
</tr>
<tr class="even">
<td style="text-align: right;">4</td>
<td style="text-align: left;">Diana</td>
<td style="text-align: left;">Lee</td>
<td style="text-align: right;">3</td>
<td style="text-align: right;">90000</td>
<td style="text-align: right;">3</td>
<td style="text-align: left;">Marketing</td>
<td style="text-align: left;">diana</td>
<td style="text-align: left;">lee</td>
<td style="text-align: right;">750000</td>
</tr>
<tr class="odd">
<td style="text-align: right;">5</td>
<td style="text-align: left;">Eve</td>
<td style="text-align: left;">Davis</td>
<td style="text-align: right;">4</td>
<td style="text-align: right;">65000</td>
<td style="text-align: right;">NA</td>
<td style="text-align: left;">NA</td>
<td style="text-align: left;">NA</td>
<td style="text-align: left;">NA</td>
<td style="text-align: right;">NA</td>
</tr>
<tr class="even">
<td style="text-align: right;">6</td>
<td style="text-align: left;">Frank</td>
<td style="text-align: left;">DeVito Jr.</td>
<td style="text-align: right;">2</td>
<td style="text-align: right;">70000</td>
<td style="text-align: right;">2</td>
<td style="text-align: left;">Engineering</td>
<td style="text-align: left;">robert</td>
<td style="text-align: left;">johnson</td>
<td style="text-align: right;">1000000</td>
</tr>
<tr class="odd">
<td style="text-align: right;">NA</td>
<td style="text-align: left;">NA</td>
<td style="text-align: left;">NA</td>
<td style="text-align: right;">NA</td>
<td style="text-align: right;">NA</td>
<td style="text-align: right;">5</td>
<td style="text-align: left;">Finance</td>
<td style="text-align: left;">carol</td>
<td style="text-align: left;">taylor</td>
<td style="text-align: right;">600000</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
<p>here,</p>
<ul>
<li>Includes all employees&nbsp;<strong>and</strong>&nbsp;all departments, even unmatched ones.</li>
<li><code>dept_id = 5</code>&nbsp;(Finance) appears with&nbsp;<code>NA</code>&nbsp;for employee columns.</li>
</ul>
</section>
</section>
<section id="filtering-joins" class="level2">
<h2 class="anchored" data-anchor-id="filtering-joins">Filtering Joins</h2>
<p>Filtering joins subset rows from one table based on another, without merging columns.</p>
<section id="semi-join-semi_join" class="level3">
<h3 class="anchored" data-anchor-id="semi-join-semi_join">Semi-Join | <code>semi_join()</code></h3>
<div class="columns">
<div class="column">
<p>Semi-Join Returns rows from the left table&nbsp;<em>that have a match in the right table</em>. It returns a subset of the left table but no columns from the right table are added. For example: filtering the employees who belongs to one of the department in the <code>departments</code> table based on their <code>dept_id</code> information.</p>
</div><div class="column">
<p><img src="https://mathatistics.com/blog/posts/2025-04-28-join-with-dplyr/images/semi-join.svg" class="img-fluid" style="width:90.0%"></p>
</div>
</div>
<div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-7-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-7-1" aria-controls="tabset-7-1" aria-selected="true">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-7-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-7-2" aria-controls="tabset-7-2" aria-selected="false">SQL</a></li></ul>
<div class="tab-content">
<div id="tabset-7-1" class="tab-pane active" aria-labelledby="tabset-7-1-tab">
<div class="cell">
<div class="sourceCode cell-code" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb20-1">employees <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb20-2">  <span class="fu" style="color: #4758AB;">semi_join</span>(departments, <span class="at" style="color: #657422;">by =</span> <span class="st" style="color: #20794D;">"dept_id"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 4 × 5
  emp_id first_name    last_name  dept_id salary
   &lt;dbl&gt; &lt;chr&gt;         &lt;chr&gt;        &lt;dbl&gt;  &lt;dbl&gt;
1      1 Alice Susanna Smith            1  60000
2      2 Bob           Johnson          2  75000
3      4 Diana         Lee              3  90000
4      6 Frank         DeVito Jr.       2  70000</code></pre>
</div>
</div>
</div>
<div id="tabset-7-2" class="tab-pane" aria-labelledby="tabset-7-2-tab">
<div class="cell">
<div class="sourceCode cell-code" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode sql code-with-copy"><code class="sourceCode sql"><span id="cb22-1"><span class="kw" style="color: #003B4F;">SELECT</span> employees.<span class="op" style="color: #5E5E5E;">*</span> </span>
<span id="cb22-2"><span class="kw" style="color: #003B4F;">FROM</span> employees </span>
<span id="cb22-3"><span class="kw" style="color: #003B4F;">WHERE</span> <span class="kw" style="color: #003B4F;">EXISTS</span> (</span>
<span id="cb22-4">  <span class="kw" style="color: #003B4F;">SELECT</span> <span class="dv" style="color: #AD0000;">1</span> <span class="kw" style="color: #003B4F;">FROM</span> departments </span>
<span id="cb22-5">  <span class="kw" style="color: #003B4F;">WHERE</span> employees.dept_id <span class="op" style="color: #5E5E5E;">=</span> departments.dept_id</span>
<span id="cb22-6">);</span></code></pre></div>
<div class="knitsql-table">
<table class="table table-sm table-striped">
<caption>4 records</caption>
<thead>
<tr class="header">
<th style="text-align: right;">emp_id</th>
<th style="text-align: left;">first_name</th>
<th style="text-align: left;">last_name</th>
<th style="text-align: right;">dept_id</th>
<th style="text-align: right;">salary</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: right;">1</td>
<td style="text-align: left;">Alice Susanna</td>
<td style="text-align: left;">Smith</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">60000</td>
</tr>
<tr class="even">
<td style="text-align: right;">2</td>
<td style="text-align: left;">Bob</td>
<td style="text-align: left;">Johnson</td>
<td style="text-align: right;">2</td>
<td style="text-align: right;">75000</td>
</tr>
<tr class="odd">
<td style="text-align: right;">4</td>
<td style="text-align: left;">Diana</td>
<td style="text-align: left;">Lee</td>
<td style="text-align: right;">3</td>
<td style="text-align: right;">90000</td>
</tr>
<tr class="even">
<td style="text-align: right;">6</td>
<td style="text-align: left;">Frank</td>
<td style="text-align: left;">DeVito Jr.</td>
<td style="text-align: right;">2</td>
<td style="text-align: right;">70000</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
<p>here,</p>
<ul>
<li>Keeps only employees with&nbsp;<code>dept_id</code>&nbsp;present in&nbsp;<code>departments</code>.</li>
<li>Drops&nbsp;<code>emp_id=3</code>&nbsp;(<code>NA</code>) and&nbsp;<code>emp_id=5</code>&nbsp;(unmatched&nbsp;<code>dept_id=4</code>).</li>
</ul>
</section>
<section id="anti-join-anti_join" class="level3">
<h3 class="anchored" data-anchor-id="anti-join-anti_join">Anti-Join | <code>anti_join()</code></h3>
<div class="columns">
<div class="column" style="width:60%;">
<p>Anti-Join returns rows from the left table&nbsp;<em>with no match in the right table</em>. For example: filtering the employees who do not belong to any of the department in the <code>departments</code> table.</p>
</div><div class="column" style="width:40%;">
<p><img src="https://mathatistics.com/blog/posts/2025-04-28-join-with-dplyr/images/anti-join.svg" class="img-fluid" style="width:90.0%"></p>
</div>
</div>
<div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-8-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-8-1" aria-controls="tabset-8-1" aria-selected="true">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-8-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-8-2" aria-controls="tabset-8-2" aria-selected="false">SQL</a></li></ul>
<div class="tab-content">
<div id="tabset-8-1" class="tab-pane active" aria-labelledby="tabset-8-1-tab">
<div class="cell">
<div class="sourceCode cell-code" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb23-1">employees <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb23-2">  <span class="fu" style="color: #4758AB;">anti_join</span>(departments, <span class="at" style="color: #657422;">by =</span> <span class="st" style="color: #20794D;">"dept_id"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 2 × 5
  emp_id first_name last_name dept_id salary
   &lt;dbl&gt; &lt;chr&gt;      &lt;chr&gt;       &lt;dbl&gt;  &lt;dbl&gt;
1      3 Mary Anne  Brown          NA  80000
2      5 Eve        Davis           4  65000</code></pre>
</div>
</div>
</div>
<div id="tabset-8-2" class="tab-pane" aria-labelledby="tabset-8-2-tab">
<div class="cell">
<div class="sourceCode cell-code" id="cb25" style="background: #f1f3f5;"><pre class="sourceCode sql code-with-copy"><code class="sourceCode sql"><span id="cb25-1"><span class="kw" style="color: #003B4F;">SELECT</span> employees.<span class="op" style="color: #5E5E5E;">*</span> </span>
<span id="cb25-2"><span class="kw" style="color: #003B4F;">FROM</span> employees </span>
<span id="cb25-3"><span class="kw" style="color: #003B4F;">WHERE</span> <span class="kw" style="color: #003B4F;">NOT</span> <span class="kw" style="color: #003B4F;">EXISTS</span> (</span>
<span id="cb25-4">  <span class="kw" style="color: #003B4F;">SELECT</span> <span class="dv" style="color: #AD0000;">1</span> <span class="kw" style="color: #003B4F;">FROM</span> departments </span>
<span id="cb25-5">  <span class="kw" style="color: #003B4F;">WHERE</span> employees.dept_id <span class="op" style="color: #5E5E5E;">=</span> departments.dept_id</span>
<span id="cb25-6">);</span></code></pre></div>
<div class="knitsql-table">
<table class="table table-sm table-striped">
<caption>2 records</caption>
<thead>
<tr class="header">
<th style="text-align: right;">emp_id</th>
<th style="text-align: left;">first_name</th>
<th style="text-align: left;">last_name</th>
<th style="text-align: right;">dept_id</th>
<th style="text-align: right;">salary</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: right;">3</td>
<td style="text-align: left;">Mary Anne</td>
<td style="text-align: left;">Brown</td>
<td style="text-align: right;">NA</td>
<td style="text-align: right;">80000</td>
</tr>
<tr class="even">
<td style="text-align: right;">5</td>
<td style="text-align: left;">Eve</td>
<td style="text-align: left;">Davis</td>
<td style="text-align: right;">4</td>
<td style="text-align: right;">65000</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
<p>here, only employees with&nbsp;<code>dept_id</code>&nbsp;not in&nbsp;<code>departments</code>&nbsp;(or&nbsp;<code>NA</code>) are kept.</p>
</section>
</section>
<section id="advanced-cases-preprocessing-and-multiple-keys" class="level2">
<h2 class="anchored" data-anchor-id="advanced-cases-preprocessing-and-multiple-keys">Advanced Cases: Preprocessing and Multiple Keys</h2>
<p>Suppose we need find the employees with their departments who are also manager in their department. Here we have two issues to deal with:</p>
<ol type="a">
<li><p>We need multiple keys (fields/ variables) to join. For example: <code>dept_id</code> in both table and <code>first_name</code>/ <code>last_name</code> from employees table and <code>manager_first</code>/ <code>manager_last</code> from departments table.</p>
<blockquote class="blockquote">
<p>To solve this we can use multiple keys to join the tables. We can use semi-join or inner-join on employees table using department tables.</p>
</blockquote></li>
<li><p>The <code>first_name</code> and <code>last_name</code> from employees table has uppercase characters and middle name in <code>first_name</code> field while <code>manager_first</code> and <code>manager_last</code> from departments tables are all lowercase. We need to preprocess them before using them as a joining key.</p>
<blockquote class="blockquote">
<p>To solve this, we lowercase the first word from <code>first_name</code> and <code>manager_first</code> and the last word from <code>last_name</code> and <code>manager_last</code> from tables employees and departments respectively.</p>
</blockquote></li>
</ol>
<div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-9-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-9-1" aria-controls="tabset-9-1" aria-selected="true">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-9-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-9-2" aria-controls="tabset-9-2" aria-selected="false">SQL</a></li></ul>
<div class="tab-content">
<div id="tabset-9-1" class="tab-pane active" aria-labelledby="tabset-9-1-tab">
<div class="cell">
<div class="sourceCode cell-code" id="cb26" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb26-1">employees <span class="sc" style="color: #5E5E5E;">|&gt;</span> </span>
<span id="cb26-2">  <span class="fu" style="color: #4758AB;">mutate</span>(</span>
<span id="cb26-3">    <span class="at" style="color: #657422;">fname =</span> stringr<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">word</span>(first_name, <span class="dv" style="color: #AD0000;">1</span>) <span class="sc" style="color: #5E5E5E;">|&gt;</span> <span class="fu" style="color: #4758AB;">tolower</span>(),</span>
<span id="cb26-4">    <span class="at" style="color: #657422;">lname =</span> stringr<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">word</span>(last_name, <span class="sc" style="color: #5E5E5E;">-</span><span class="dv" style="color: #AD0000;">1</span>) <span class="sc" style="color: #5E5E5E;">|&gt;</span> <span class="fu" style="color: #4758AB;">tolower</span>()</span>
<span id="cb26-5">  ) <span class="sc" style="color: #5E5E5E;">|&gt;</span> <span class="fu" style="color: #4758AB;">inner_join</span>(</span>
<span id="cb26-6">    departments <span class="sc" style="color: #5E5E5E;">|&gt;</span> <span class="fu" style="color: #4758AB;">mutate</span>(</span>
<span id="cb26-7">      <span class="at" style="color: #657422;">fname =</span> stringr<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">word</span>(manager_first, <span class="dv" style="color: #AD0000;">1</span>) <span class="sc" style="color: #5E5E5E;">|&gt;</span> <span class="fu" style="color: #4758AB;">tolower</span>(),</span>
<span id="cb26-8">      <span class="at" style="color: #657422;">lname =</span> stringr<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">word</span>(manager_last, <span class="sc" style="color: #5E5E5E;">-</span><span class="dv" style="color: #AD0000;">1</span>) <span class="sc" style="color: #5E5E5E;">|&gt;</span> <span class="fu" style="color: #4758AB;">tolower</span>()</span>
<span id="cb26-9">    ) <span class="sc" style="color: #5E5E5E;">|&gt;</span> <span class="fu" style="color: #4758AB;">select</span>(dept_name, fname, lname, dept_id),</span>
<span id="cb26-10">    <span class="at" style="color: #657422;">by =</span> <span class="fu" style="color: #4758AB;">join_by</span>(<span class="st" style="color: #20794D;">'fname'</span>, <span class="st" style="color: #20794D;">'lname'</span>, <span class="st" style="color: #20794D;">'dept_id'</span>)</span>
<span id="cb26-11">  ) <span class="sc" style="color: #5E5E5E;">|&gt;</span> <span class="fu" style="color: #4758AB;">select</span>(<span class="sc" style="color: #5E5E5E;">-</span>dept_id, <span class="sc" style="color: #5E5E5E;">-</span>fname, <span class="sc" style="color: #5E5E5E;">-</span>lname)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 2 × 5
  emp_id first_name    last_name salary dept_name
   &lt;dbl&gt; &lt;chr&gt;         &lt;chr&gt;      &lt;dbl&gt; &lt;chr&gt;    
1      1 Alice Susanna Smith      60000 HR       
2      4 Diana         Lee        90000 Marketing</code></pre>
</div>
</div>
</div>
<div id="tabset-9-2" class="tab-pane" aria-labelledby="tabset-9-2-tab">
<div class="cell">
<div class="sourceCode cell-code" id="cb28" style="background: #f1f3f5;"><pre class="sourceCode sql code-with-copy"><code class="sourceCode sql"><span id="cb28-1"><span class="kw" style="color: #003B4F;">SELECT</span> </span>
<span id="cb28-2">  emp.emp_id, emp.first_name, emp.last_name, </span>
<span id="cb28-3">  emp.salary, dept.dept_name</span>
<span id="cb28-4"><span class="kw" style="color: #003B4F;">FROM</span> employees emp</span>
<span id="cb28-5"><span class="kw" style="color: #003B4F;">JOIN</span> departments dept <span class="kw" style="color: #003B4F;">ON</span> emp.dept_id <span class="op" style="color: #5E5E5E;">=</span> dept.dept_id</span>
<span id="cb28-6"><span class="kw" style="color: #003B4F;">WHERE</span> </span>
<span id="cb28-7">    <span class="co" style="color: #5E5E5E;">-- First name: match first word</span></span>
<span id="cb28-8">    <span class="fu" style="color: #4758AB;">LOWER</span>(<span class="fu" style="color: #4758AB;">SUBSTR</span>(</span>
<span id="cb28-9">      emp.first_name, <span class="dv" style="color: #AD0000;">1</span>, </span>
<span id="cb28-10">      <span class="fu" style="color: #4758AB;">INSTR</span>(emp.first_name <span class="op" style="color: #5E5E5E;">||</span> <span class="st" style="color: #20794D;">' '</span>, <span class="st" style="color: #20794D;">' '</span>) <span class="op" style="color: #5E5E5E;">-</span> <span class="dv" style="color: #AD0000;">1</span></span>
<span id="cb28-11">    )) <span class="op" style="color: #5E5E5E;">=</span> </span>
<span id="cb28-12">    <span class="fu" style="color: #4758AB;">LOWER</span>(<span class="fu" style="color: #4758AB;">SUBSTR</span>(</span>
<span id="cb28-13">      dept.manager_first, <span class="dv" style="color: #AD0000;">1</span>, </span>
<span id="cb28-14">      <span class="fu" style="color: #4758AB;">INSTR</span>(dept.manager_first <span class="op" style="color: #5E5E5E;">||</span> <span class="st" style="color: #20794D;">' '</span>, <span class="st" style="color: #20794D;">' '</span>) <span class="op" style="color: #5E5E5E;">-</span> <span class="dv" style="color: #AD0000;">1</span></span>
<span id="cb28-15">    ))</span>
<span id="cb28-16">    </span>
<span id="cb28-17">    <span class="co" style="color: #5E5E5E;">-- Last name: match last word</span></span>
<span id="cb28-18">    <span class="kw" style="color: #003B4F;">AND</span> </span>
<span id="cb28-19">    <span class="fu" style="color: #4758AB;">LOWER</span>(<span class="fu" style="color: #4758AB;">SUBSTR</span>(</span>
<span id="cb28-20">      emp.last_name, </span>
<span id="cb28-21">      <span class="fu" style="color: #4758AB;">INSTR</span>(<span class="st" style="color: #20794D;">' '</span> <span class="op" style="color: #5E5E5E;">||</span> emp.last_name, <span class="st" style="color: #20794D;">' '</span>) <span class="op" style="color: #5E5E5E;">+</span> <span class="dv" style="color: #AD0000;">1</span></span>
<span id="cb28-22">    )) <span class="op" style="color: #5E5E5E;">=</span> </span>
<span id="cb28-23">    <span class="fu" style="color: #4758AB;">LOWER</span>(<span class="fu" style="color: #4758AB;">SUBSTR</span>(</span>
<span id="cb28-24">      dept.manager_last, </span>
<span id="cb28-25">      <span class="fu" style="color: #4758AB;">INSTR</span>(<span class="st" style="color: #20794D;">' '</span> <span class="op" style="color: #5E5E5E;">||</span> dept.manager_last, <span class="st" style="color: #20794D;">' '</span>) <span class="op" style="color: #5E5E5E;">+</span> <span class="dv" style="color: #AD0000;">1</span></span>
<span id="cb28-26">    ));</span></code></pre></div>
<div class="knitsql-table">
<table class="table table-sm table-striped">
<caption>2 records</caption>
<thead>
<tr class="header">
<th style="text-align: right;">emp_id</th>
<th style="text-align: left;">first_name</th>
<th style="text-align: left;">last_name</th>
<th style="text-align: right;">salary</th>
<th style="text-align: left;">dept_name</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: right;">1</td>
<td style="text-align: left;">Alice Susanna</td>
<td style="text-align: left;">Smith</td>
<td style="text-align: right;">60000</td>
<td style="text-align: left;">HR</td>
</tr>
<tr class="even">
<td style="text-align: right;">4</td>
<td style="text-align: left;">Diana</td>
<td style="text-align: left;">Lee</td>
<td style="text-align: right;">90000</td>
<td style="text-align: left;">Marketing</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
<p>Since the&nbsp;<code>employees</code>&nbsp;table uses&nbsp;<code>first_name</code>/<code>last_name</code>&nbsp;while&nbsp;<code>departments</code>&nbsp;uses&nbsp;<code>manager_first</code>/<code>manager_last</code>, I standardized the keys by:</p>
<ol type="1">
<li>Extracting the&nbsp;<strong>first word</strong>&nbsp;from&nbsp;<code>first_name</code>&nbsp;and the&nbsp;<strong>last word</strong>&nbsp;from&nbsp;<code>last_name</code>&nbsp;(to handle middle names/suffixes).</li>
<li>Converting them to lowercase for case-insensitive matching.</li>
</ol>
<p>Also, check out a post, I have used all these concepts in a real dataset.</p>
<p>Joins are essential for combining data in R. With&nbsp;<code>dplyr</code>, you can handle mismatched keys, missing values, and multi-column joins—just clean your keys first and verify results with&nbsp;<code>anti_join()</code>. Try these techniques on your own data, or explore more examples&nbsp;here.</p>


</section>

 ]]></description>
  <guid>https://mathatistics.com/blog/posts/2025-04-28-join-with-dplyr/index.html</guid>
  <pubDate>Mon, 17 Mar 2025 23:00:00 GMT</pubDate>
  <media:content url="https://mathatistics.com/blog/posts/2025-04-28-join-with-dplyr/images/inner-join.svg" medium="image" type="image/svg+xml"/>
</item>
<item>
  <title>Melanoma Incidence in Nordic Countries</title>
  <dc:creator>Raju Rimal</dc:creator>
  <link>https://mathatistics.com/blog/posts/2023-06-20-nordic-melanoma/index.html</link>
  <description><![CDATA[ 



<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>Cutaneous melanoma (CM) is the most aggressive and lethal form of skin cancer. Melanoma can be cured if caught and treated early but if left untreated, it may spread to other parts and can be fatal. In the recent years, melanoma has increased dramatically in fair skinned population worldwide including Nordic countries like Norway, Denmark, and Sweden. Norway is ranked fifth in incidence and third in mortality worldwide. This increase can be an effect of increased awareness in general public and health care provider.</p>
<p>This article explores <strong><em>melanoma incidence</em></strong> and <strong><em>mortality</em></strong> in nordic countries by <strong><em>sex</em></strong> and their <strong><em>trend</em></strong> over 40-years period from 1980–2020. Further, I try to step through the analysis process from data collection to create plots and tables.</p>
</section>
<section id="data-preparation" class="level2">
<h2 class="anchored" data-anchor-id="data-preparation">Data Preparation</h2>
<p>Data on melanoma were obtained from NORDCAN <span class="citation" data-cites="nordcan-2023">Engholm et al. (2010)</span>, Association of the Nordic Cancer Registries, IARC. Using NORDCAN 2.0 API<sup>1</sup> crude and age-adjusted rates were downloaded as JSON and converted to tabular data for further analysis. R<span class="citation" data-cites="r_core_team_r_2020">(R Core Team 2020)</span> software was used for data gathering, cleanup, analysis, and plotting.</p>
<p>The API endpoint has four placeholder <code>type</code>, <code>sex</code>, <code>country</code>, and <code>cancer</code> following design was used to create individual endpoint. These individual url are used to download the JSON file as list in R.</p>
<div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-1-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-1" aria-controls="tabset-1-1" aria-selected="true">Data API</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-2" aria-controls="tabset-1-2" aria-selected="false">Sample JSON</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-3-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-3" aria-controls="tabset-1-3" aria-selected="false">Data download</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-4-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-4" aria-controls="tabset-1-4" aria-selected="false">Data Preparation</a></li></ul>
<div class="tab-content">
<div id="tabset-1-1" class="data-api tab-pane active" aria-labelledby="tabset-1-1-tab">
<div class="cell">
<details>
<summary>Code for preparing download URL</summary>
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1">design_map <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">list</span>(</span>
<span id="cb1-2">  <span class="at" style="color: #657422;">sex =</span> <span class="fu" style="color: #4758AB;">c</span>(<span class="at" style="color: #657422;">Male =</span> <span class="dv" style="color: #AD0000;">1</span>, <span class="at" style="color: #657422;">Female =</span> <span class="dv" style="color: #AD0000;">2</span>),</span>
<span id="cb1-3">  <span class="at" style="color: #657422;">type =</span> <span class="fu" style="color: #4758AB;">c</span>(<span class="at" style="color: #657422;">Incidence =</span> <span class="dv" style="color: #AD0000;">0</span>, <span class="at" style="color: #657422;">Mortality =</span> <span class="dv" style="color: #AD0000;">1</span>),</span>
<span id="cb1-4">  <span class="at" style="color: #657422;">cancer =</span> <span class="fu" style="color: #4758AB;">c</span>(<span class="at" style="color: #657422;">Melanoma =</span> <span class="dv" style="color: #AD0000;">290</span>),</span>
<span id="cb1-5">  <span class="at" style="color: #657422;">country =</span> <span class="fu" style="color: #4758AB;">c</span>(</span>
<span id="cb1-6">    <span class="at" style="color: #657422;">Denmark =</span> <span class="dv" style="color: #AD0000;">208</span>, <span class="at" style="color: #657422;">Finland =</span> <span class="dv" style="color: #AD0000;">246</span>, <span class="at" style="color: #657422;">Iceland =</span> <span class="dv" style="color: #AD0000;">352</span>,</span>
<span id="cb1-7">    <span class="at" style="color: #657422;">Norway =</span> <span class="dv" style="color: #AD0000;">578</span>, <span class="at" style="color: #657422;">Sweden =</span> <span class="dv" style="color: #AD0000;">752</span></span>
<span id="cb1-8">  )</span>
<span id="cb1-9">)</span>
<span id="cb1-10">label_values <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="cf" style="color: #003B4F;">function</span>(data, label_map, var) {</span>
<span id="cb1-11">  label_vec <span class="ot" style="color: #003B4F;">&lt;-</span> label_map[[var]]</span>
<span id="cb1-12">  label <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="st" style="color: #20794D;">`</span><span class="at" style="color: #657422;">names&lt;-</span><span class="st" style="color: #20794D;">`</span>(<span class="fu" style="color: #4758AB;">names</span>(label_vec), label_vec)</span>
<span id="cb1-13">  data[[var]] <span class="ot" style="color: #003B4F;">&lt;-</span> data[, label[<span class="fu" style="color: #4758AB;">as.character</span>(<span class="fu" style="color: #4758AB;">get</span>(var))]]</span>
<span id="cb1-14">  <span class="fu" style="color: #4758AB;">return</span>(data)</span>
<span id="cb1-15">}</span>
<span id="cb1-16"></span>
<span id="cb1-17">design <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">do.call</span>(crossing, design_map) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb1-18">  <span class="fu" style="color: #4758AB;">mutate</span>(</span>
<span id="cb1-19">    <span class="at" style="color: #657422;">url =</span> glue<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">glue</span>( </span>
<span id="cb1-20">      <span class="st" style="color: #20794D;">"https://gco.iarc.fr/gateway_prod/api/nordcan/v2/92/data/population/{type}/{sex}/({country})/({cancer})/?ages_group=5_17&amp;year_start=1980&amp;year_end=2020&amp;year_grouped=0"</span></span>
<span id="cb1-21">    )</span>
<span id="cb1-22">  )</span>
<span id="cb1-23"></span>
<span id="cb1-24">design <span class="ot" style="color: #003B4F;">&lt;-</span> design <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb1-25">  <span class="fu" style="color: #4758AB;">label_values</span>(design_map, <span class="st" style="color: #20794D;">"sex"</span>) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb1-26">  <span class="fu" style="color: #4758AB;">label_values</span>(design_map, <span class="st" style="color: #20794D;">"type"</span>) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb1-27">  <span class="fu" style="color: #4758AB;">label_values</span>(design_map, <span class="st" style="color: #20794D;">"cancer"</span>) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb1-28">  <span class="fu" style="color: #4758AB;">label_values</span>(design_map, <span class="st" style="color: #20794D;">"country"</span>)</span></code></pre></div>
</details>
</div>
<dl>
<dt>API URL:</dt>
<dd>
<code>https://gco.iarc.fr/gateway_prod/api/nordcan/v2/92/data/population/{type}/{sex}/({country})/({cancer})/?ages_group=5_17&amp;year_start=1980&amp;year_end=2020&amp;year_grouped=0</code>
</dd>
</dl>
<p>Replacing the placeholders <code>type</code> (Incidence: 0, Mortality: 1), <code>sex</code> (Male: 1, Female: 2), <code>country</code> (Denmark: 208, Finland: 246, Iceland: 352, Norway: 578, and Sweden: 752), and <code>cancer</code> (Melanoma: 290) prepare the data API.</p>
</div>
<div id="tabset-1-2" class="tab-pane" aria-labelledby="tabset-1-2-tab">
<div class="cell">
<details>
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb2" data-startfrom="113" data-source-offset="-0" style="background: #f1f3f5;"><pre class="sourceCode js code-with-copy"><code class="sourceCode javascript" style="counter-reset: source-line 112;"><span id="cb2-113">json_data <span class="op" style="color: #5E5E5E;">=</span> <span class="cf" style="color: #003B4F;">await</span> <span class="fu" style="color: #4758AB;">FileAttachment</span>(<span class="st" style="color: #20794D;">"Data/nordcan.json"</span>)<span class="op" style="color: #5E5E5E;">.</span><span class="fu" style="color: #4758AB;">json</span>()</span>
<span id="cb2-114">json_data[<span class="dv" style="color: #AD0000;">0</span>]</span></code></pre></div>
</details>
<div class="cell-output cell-output-display">
<div>
<div id="ojs-cell-1-1" data-nodetype="declaration">

</div>
</div>
</div>
<div class="cell-output cell-output-display">
<div>
<div id="ojs-cell-1-2" data-nodetype="expression">

</div>
</div>
</div>
</div>
</div>
<div id="tabset-1-3" class="tab-pane" aria-labelledby="tabset-1-3-tab">
<div class="cell">
<details>
<summary>Code for data download</summary>
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="cf" style="color: #003B4F;">if</span> (<span class="sc" style="color: #5E5E5E;">!</span><span class="fu" style="color: #4758AB;">file.exists</span>(<span class="st" style="color: #20794D;">"Data/nordcan-json.Rds"</span>)) {</span>
<span id="cb3-2">  design[, data <span class="sc" style="color: #5E5E5E;">:</span><span class="er" style="color: #AD0000;">=</span> <span class="fu" style="color: #4758AB;">map</span>(url, <span class="sc" style="color: #5E5E5E;">~</span><span class="fu" style="color: #4758AB;">read_json</span>(.x) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> purrr<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">pluck</span>(<span class="st" style="color: #20794D;">"dataset"</span>))]</span>
<span id="cb3-3">  <span class="fu" style="color: #4758AB;">saveRDS</span>(design, <span class="at" style="color: #657422;">file =</span> <span class="st" style="color: #20794D;">"Data/nordcan-json.Rds"</span>)</span>
<span id="cb3-4">} <span class="cf" style="color: #003B4F;">else</span> {</span>
<span id="cb3-5">  design <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">readRDS</span>(<span class="st" style="color: #20794D;">"Data/nordcan-json.Rds"</span>)</span>
<span id="cb3-6">}</span>
<span id="cb3-7">design <span class="ot" style="color: #003B4F;">&lt;-</span> design <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb3-8">  tidytable<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">select</span>(sex, type, country, data)</span></code></pre></div>
</details>
</div>
</div>
<div id="tabset-1-4" class="tab-pane" aria-labelledby="tabset-1-4-tab">
<div class="cell">
<details>
<summary>Rate data frame</summary>
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">rate_df <span class="ot" style="color: #003B4F;">&lt;-</span> design[, <span class="fu" style="color: #4758AB;">map_df</span>(data, <span class="cf" style="color: #003B4F;">function</span>(dta) {</span>
<span id="cb4-2">  out <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">data.table</span>(</span>
<span id="cb4-3">    <span class="at" style="color: #657422;">year =</span> <span class="fu" style="color: #4758AB;">map_int</span>(dta, <span class="sc" style="color: #5E5E5E;">~</span><span class="fu" style="color: #4758AB;">get</span>(<span class="st" style="color: #20794D;">"year"</span>, .x)),</span>
<span id="cb4-4">    <span class="at" style="color: #657422;">asr_w =</span> <span class="fu" style="color: #4758AB;">map_dbl</span>(dta, <span class="sc" style="color: #5E5E5E;">~</span><span class="fu" style="color: #4758AB;">get</span>(<span class="st" style="color: #20794D;">"asr"</span>, .x)),</span>
<span id="cb4-5">    <span class="at" style="color: #657422;">asr_e =</span> <span class="fu" style="color: #4758AB;">map_dbl</span>(dta, <span class="sc" style="color: #5E5E5E;">~</span><span class="fu" style="color: #4758AB;">get</span>(<span class="st" style="color: #20794D;">"asr_e"</span>, .x)),</span>
<span id="cb4-6">    <span class="at" style="color: #657422;">asr_n =</span> <span class="fu" style="color: #4758AB;">map_dbl</span>(dta, <span class="sc" style="color: #5E5E5E;">~</span><span class="fu" style="color: #4758AB;">get</span>(<span class="st" style="color: #20794D;">"asr_n"</span>, .x)),</span>
<span id="cb4-7">    <span class="at" style="color: #657422;">crude_rate =</span> <span class="fu" style="color: #4758AB;">map_dbl</span>(dta, <span class="sc" style="color: #5E5E5E;">~</span><span class="fu" style="color: #4758AB;">get</span>(<span class="st" style="color: #20794D;">"crude_rate"</span>, .x)),</span>
<span id="cb4-8">    <span class="at" style="color: #657422;">count =</span> <span class="fu" style="color: #4758AB;">map_dbl</span>(dta, <span class="sc" style="color: #5E5E5E;">~</span><span class="fu" style="color: #4758AB;">get</span>(<span class="st" style="color: #20794D;">"total"</span>, .x)),</span>
<span id="cb4-9">    <span class="at" style="color: #657422;">population =</span> <span class="fu" style="color: #4758AB;">map_dbl</span>(dta, <span class="sc" style="color: #5E5E5E;">~</span><span class="fu" style="color: #4758AB;">get</span>(<span class="st" style="color: #20794D;">"total_pop"</span>, .x)),</span>
<span id="cb4-10">    <span class="at" style="color: #657422;">cum_risk =</span> <span class="fu" style="color: #4758AB;">map</span>(dta, <span class="sc" style="color: #5E5E5E;">~</span><span class="fu" style="color: #4758AB;">get</span>(<span class="st" style="color: #20794D;">"cum_risk"</span>, .x)) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb4-11">      <span class="fu" style="color: #4758AB;">unlist</span>()</span>
<span id="cb4-12">  )</span>
<span id="cb4-13">  <span class="cf" style="color: #003B4F;">if</span> (<span class="st" style="color: #20794D;">"cum_risk"</span> <span class="sc" style="color: #5E5E5E;">%in%</span> <span class="fu" style="color: #4758AB;">names</span>(out)) {</span>
<span id="cb4-14">    out <span class="ot" style="color: #003B4F;">&lt;-</span> out <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb4-15">      <span class="fu" style="color: #4758AB;">mutate</span>(<span class="at" style="color: #657422;">cum_risk =</span> <span class="fu" style="color: #4758AB;">as.numeric</span>(cum_risk))</span>
<span id="cb4-16">  }</span>
<span id="cb4-17">  <span class="fu" style="color: #4758AB;">return</span>(out)</span>
<span id="cb4-18">}), by <span class="ot" style="color: #003B4F;">=</span> .(sex, type, country)]</span></code></pre></div>
</details>
</div>
<div class="cell">
<div class="cell-output cell-output-stdout">
<pre><code>Classes 'tidytable', 'data.table' and 'data.frame': 820 obs. of  10 variables:
 $ sex       : chr  "Male" "Male" "Male" "Male" ...
 $ type      : chr  "Incidence" "Incidence" "Incidence" "Incidence" ...
 $ country   : chr  "Denmark" "Denmark" "Denmark" "Denmark" ...
 $ year      : int  1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 ...
 $ asr_w     : num  11.9 10.9 12.2 13.4 13.8 ...
 $ asr_e     : num  12.7 12.1 13.1 14.5 15.3 ...
 $ asr_n     : num  13.5 13.1 14.1 15.2 16 ...
 $ crude_rate: num  12.4 12 13.1 14.3 14.8 ...
 $ count     : num  196 191 210 230 239 234 269 283 291 306 ...
 $ population: num  1585436 1593815 1601416 1609960 1618038 ...
 - attr(*, ".internal.selfref")=&lt;externalptr&gt; </code></pre>
</div>
</div>
<div class="cell">
<details>
<summary>Rate by age</summary>
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">rate_by_age <span class="ot" style="color: #003B4F;">&lt;-</span> design[, <span class="fu" style="color: #4758AB;">map_df</span>(data, <span class="cf" style="color: #003B4F;">function</span>(dta) {</span>
<span id="cb6-2">  year <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">map_int</span>(dta, <span class="sc" style="color: #5E5E5E;">~</span><span class="fu" style="color: #4758AB;">get</span>(<span class="st" style="color: #20794D;">"year"</span>, .x))</span>
<span id="cb6-3">  count_df <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">map</span>(dta, <span class="sc" style="color: #5E5E5E;">~</span><span class="fu" style="color: #4758AB;">get</span>(<span class="st" style="color: #20794D;">"ages"</span>, .x)) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb6-4">    <span class="fu" style="color: #4758AB;">map_dfr</span>(as_tidytable) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb6-5">    <span class="fu" style="color: #4758AB;">cbind</span>(<span class="at" style="color: #657422;">year =</span> year) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb6-6">    <span class="fu" style="color: #4758AB;">pivot_longer</span>(</span>
<span id="cb6-7">      <span class="at" style="color: #657422;">cols =</span> <span class="sc" style="color: #5E5E5E;">-</span><span class="st" style="color: #20794D;">"year"</span>,</span>
<span id="cb6-8">      <span class="at" style="color: #657422;">names_to =</span> <span class="st" style="color: #20794D;">"age_group"</span>,</span>
<span id="cb6-9">      <span class="at" style="color: #657422;">values_to =</span> <span class="st" style="color: #20794D;">"count"</span></span>
<span id="cb6-10">    )</span>
<span id="cb6-11">  pop_df <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">map</span>(dta, <span class="sc" style="color: #5E5E5E;">~</span><span class="fu" style="color: #4758AB;">get</span>(<span class="st" style="color: #20794D;">"populations"</span>, .x)) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb6-12">    <span class="fu" style="color: #4758AB;">map_dfr</span>(as_tidytable) <span class="sc" style="color: #5E5E5E;">%&gt;%</span></span>
<span id="cb6-13">    <span class="fu" style="color: #4758AB;">cbind</span>(<span class="at" style="color: #657422;">year =</span> year) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb6-14">    <span class="fu" style="color: #4758AB;">pivot_longer</span>(</span>
<span id="cb6-15">      <span class="at" style="color: #657422;">cols =</span> <span class="sc" style="color: #5E5E5E;">-</span><span class="st" style="color: #20794D;">"year"</span>,</span>
<span id="cb6-16">      <span class="at" style="color: #657422;">names_to =</span> <span class="st" style="color: #20794D;">"age_group"</span>,</span>
<span id="cb6-17">      <span class="at" style="color: #657422;">values_to =</span> <span class="st" style="color: #20794D;">"population"</span></span>
<span id="cb6-18">    )</span>
<span id="cb6-19">  asr_df <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">map</span>(dta, <span class="sc" style="color: #5E5E5E;">~</span><span class="fu" style="color: #4758AB;">get</span>(<span class="st" style="color: #20794D;">"age_specific_rate"</span>, .x)) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb6-20">    <span class="fu" style="color: #4758AB;">map_dfr</span>(as_tidytable) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb6-21">    <span class="fu" style="color: #4758AB;">as_tidytable</span>() <span class="sc" style="color: #5E5E5E;">%&gt;%</span></span>
<span id="cb6-22">    <span class="fu" style="color: #4758AB;">cbind</span>(<span class="at" style="color: #657422;">year =</span> <span class="fu" style="color: #4758AB;">as.numeric</span>(year)) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb6-23">    <span class="fu" style="color: #4758AB;">pivot_longer</span>(</span>
<span id="cb6-24">      <span class="at" style="color: #657422;">cols =</span> <span class="sc" style="color: #5E5E5E;">-</span><span class="st" style="color: #20794D;">"year"</span>,</span>
<span id="cb6-25">      <span class="at" style="color: #657422;">names_to =</span> <span class="st" style="color: #20794D;">"age_group"</span>,</span>
<span id="cb6-26">      <span class="at" style="color: #657422;">values_to =</span> <span class="st" style="color: #20794D;">"asr"</span></span>
<span id="cb6-27">    )</span>
<span id="cb6-28">  out <span class="ot" style="color: #003B4F;">&lt;-</span> purrr<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">reduce</span>(</span>
<span id="cb6-29">    <span class="fu" style="color: #4758AB;">list</span>(count_df, pop_df, asr_df),</span>
<span id="cb6-30">    inner_join,</span>
<span id="cb6-31">    <span class="at" style="color: #657422;">by =</span> <span class="fu" style="color: #4758AB;">c</span>(<span class="st" style="color: #20794D;">"year"</span>, <span class="st" style="color: #20794D;">"age_group"</span>)</span>
<span id="cb6-32">  )</span>
<span id="cb6-33">  age_lbl <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">paste</span>(</span>
<span id="cb6-34">    <span class="fu" style="color: #4758AB;">seq</span>(<span class="dv" style="color: #AD0000;">0</span>, <span class="dv" style="color: #AD0000;">85</span>, <span class="dv" style="color: #AD0000;">5</span>),</span>
<span id="cb6-35">    <span class="fu" style="color: #4758AB;">seq</span>(<span class="dv" style="color: #AD0000;">0</span>, <span class="dv" style="color: #AD0000;">85</span>, <span class="dv" style="color: #AD0000;">5</span>) <span class="sc" style="color: #5E5E5E;">+</span> <span class="dv" style="color: #AD0000;">4</span>,</span>
<span id="cb6-36">    <span class="at" style="color: #657422;">sep =</span> <span class="st" style="color: #20794D;">"-"</span></span>
<span id="cb6-37">  )</span>
<span id="cb6-38">  age_lbl[<span class="fu" style="color: #4758AB;">length</span>(age_lbl)] <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="st" style="color: #20794D;">"85+"</span></span>
<span id="cb6-39">  <span class="fu" style="color: #4758AB;">names</span>(age_lbl) <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="dv" style="color: #AD0000;">1</span><span class="sc" style="color: #5E5E5E;">:</span><span class="dv" style="color: #AD0000;">18</span></span>
<span id="cb6-40"></span>
<span id="cb6-41">  out <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb6-42">    <span class="fu" style="color: #4758AB;">mutate</span>(<span class="at" style="color: #657422;">age_group =</span> age_lbl[age_group]) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb6-43">    <span class="fu" style="color: #4758AB;">mutate</span>(<span class="at" style="color: #657422;">asr =</span> <span class="fu" style="color: #4758AB;">as.numeric</span>(asr)) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb6-44">    <span class="fu" style="color: #4758AB;">mutate</span>(<span class="fu" style="color: #4758AB;">across</span>(<span class="fu" style="color: #4758AB;">c</span>(year, count, population), as.integer))</span>
<span id="cb6-45"></span>
<span id="cb6-46">}), by <span class="ot" style="color: #003B4F;">=</span> .(sex, type, country)]</span></code></pre></div>
</details>
</div>
<div class="cell">
<div class="cell-output cell-output-stdout">
<pre><code>Classes 'tidytable', 'data.table' and 'data.frame': 14760 obs. of  8 variables:
 $ sex       : chr  "Male" "Male" "Male" "Male" ...
 $ type      : chr  "Incidence" "Incidence" "Incidence" "Incidence" ...
 $ country   : chr  "Denmark" "Denmark" "Denmark" "Denmark" ...
 $ year      : int  1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 ...
 $ age_group : chr  "0-4" "0-4" "0-4" "0-4" ...
 $ count     : int  0 0 0 0 0 0 0 0 0 0 ...
 $ population: int  164317 156964 150170 145414 139557 135827 134375 136240 138423 142597 ...
 $ asr       : num  0 0 0 0 0 0 0 0 0 0 ...
 - attr(*, ".internal.selfref")=&lt;externalptr&gt; </code></pre>
</div>
</div>
</div>
</div>
</div>
</section>
<section id="analysis" class="level2">
<h2 class="anchored" data-anchor-id="analysis">Analysis</h2>
<p>For the following analysis, crude rates were used in visualization and modelling. Stratified by <code>sex</code>, <code>country</code> and <code>type</code> following plots presents the the <code>crude_rate</code> over the <code>year</code> of diagnosis. Additionally, using <code>count</code> and <code>population</code>, a poisson regression model (Equation&nbsp;1) was fitted.</p>
<p><span id="eq-poisson-model-1"><img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Balign%7D%0A%5Clog%5Cleft(%5Cfrac%7B%5Clambda%7D%7BY%7D%5Cright)%20&amp;=%20%5Cbeta_0%20+%20%5Cbeta_1%20x%20+%20%5Cvarepsilon%20%5C%5C%0A%5Ctext%7Bequivalently,%20%7D%20%5Clog%5Cleft(%5Clambda%5Cright)%20&amp;=%20%5Cbeta_0%20+%20%5Cbeta_1%20x%20+%20%5Clog(Y)%20+%20%5Cvarepsilon%0A%5Cend%7Balign%7D%0A%5Ctag%7B1%7D"></span></p>
<p>where, <img src="https://latex.codecogs.com/png.latex?%5Clambda"> is the number of events (<code>count</code>), <img src="https://latex.codecogs.com/png.latex?Y"> is the number of exposed (<code>population</code>) and <img src="https://latex.codecogs.com/png.latex?x"> is the <code>year</code> of diagnosis.</p>
<p>Additionally, using segmented regression the change points in the trend was identified and the annual percentage change (APC) and average annual percentage change (AAPC) in crude rate were calculated. For each strata, following R-code for poisson regression model and segmented regression model were used.</p>
<div class="cell">
<details>
<summary>Data within each strata</summary>
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1">nested_df <span class="ot" style="color: #003B4F;">&lt;-</span> rate_df <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb8-2">  <span class="fu" style="color: #4758AB;">nest</span>(<span class="at" style="color: #657422;">data =</span> <span class="sc" style="color: #5E5E5E;">-</span><span class="fu" style="color: #4758AB;">c</span>(sex, type, country))</span>
<span id="cb8-3"><span class="fu" style="color: #4758AB;">head</span>(nested_df)</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tidytable: 6 × 4
  sex   type      country data                
  &lt;chr&gt; &lt;chr&gt;     &lt;chr&gt;   &lt;list&gt;              
1 Male  Incidence Denmark &lt;tidytable [41 × 7]&gt;
2 Male  Incidence Finland &lt;tidytable [41 × 7]&gt;
3 Male  Incidence Iceland &lt;tidytable [41 × 7]&gt;
4 Male  Incidence Norway  &lt;tidytable [41 × 7]&gt;
5 Male  Incidence Sweden  &lt;tidytable [41 × 7]&gt;
6 Male  Mortality Denmark &lt;tidytable [41 × 7]&gt;</code></pre>
</div>
</div>
<section id="modelling" class="level3">
<h3 class="anchored" data-anchor-id="modelling">Modelling</h3>
<div class="callout-note callout callout-style-default no-icon callout-captioned">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-caption-container flex-fill">
Poisson regression model
</div>
</div>
<div class="callout-body-container callout-body">
<div class="sourceCode" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1">model <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">glm</span>(</span>
<span id="cb10-2">  count <span class="sc" style="color: #5E5E5E;">~</span> year <span class="sc" style="color: #5E5E5E;">+</span> <span class="fu" style="color: #4758AB;">offset</span>(<span class="fu" style="color: #4758AB;">log</span>(population)),</span>
<span id="cb10-3">  <span class="at" style="color: #657422;">data =</span> rate_df,</span>
<span id="cb10-4">  <span class="at" style="color: #657422;">family =</span> <span class="fu" style="color: #4758AB;">poisson</span>(<span class="at" style="color: #657422;">link =</span> <span class="st" style="color: #20794D;">"log"</span>)</span>
<span id="cb10-5">)</span></code></pre></div>
</div>
</div>
<div class="callout-note callout callout-style-default no-icon callout-captioned">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-caption-container flex-fill">
Segmented regression model
</div>
</div>
<div class="callout-body-container callout-body">
<div class="sourceCode" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1">sgmt_model <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">segmented</span>(model, <span class="at" style="color: #657422;">npsi =</span> <span class="dv" style="color: #AD0000;">2</span>)</span></code></pre></div>
</div>
</div>
<div class="cell">
<details>
<summary>Poisson and segmented fit</summary>
<div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1">fitted_df <span class="ot" style="color: #003B4F;">&lt;-</span> nested_df <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb12-2">  <span class="fu" style="color: #4758AB;">mutate</span>(<span class="at" style="color: #657422;">fit =</span> <span class="fu" style="color: #4758AB;">map</span>(data, <span class="cf" style="color: #003B4F;">function</span>(.data) {</span>
<span id="cb12-3">    <span class="fu" style="color: #4758AB;">glm</span>(</span>
<span id="cb12-4">      count <span class="sc" style="color: #5E5E5E;">~</span> year <span class="sc" style="color: #5E5E5E;">+</span> <span class="fu" style="color: #4758AB;">offset</span>(<span class="fu" style="color: #4758AB;">log</span>(population)),</span>
<span id="cb12-5">      <span class="at" style="color: #657422;">family =</span> <span class="fu" style="color: #4758AB;">poisson</span>(<span class="at" style="color: #657422;">link =</span> <span class="st" style="color: #20794D;">"log"</span>),</span>
<span id="cb12-6">      <span class="at" style="color: #657422;">data =</span> .data</span>
<span id="cb12-7">    )</span>
<span id="cb12-8">  })) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb12-9">  <span class="fu" style="color: #4758AB;">mutate</span>(<span class="at" style="color: #657422;">sgmt_fit =</span> <span class="fu" style="color: #4758AB;">map2</span>(data, fit, <span class="cf" style="color: #003B4F;">function</span>(.data, .fit) {</span>
<span id="cb12-10">    out <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">segmented</span>(.fit, <span class="at" style="color: #657422;">seg.Z =</span> <span class="sc" style="color: #5E5E5E;">~</span>year, <span class="at" style="color: #657422;">data =</span> .data, <span class="at" style="color: #657422;">npsi =</span> <span class="dv" style="color: #AD0000;">2</span>)</span>
<span id="cb12-11">    <span class="cf" style="color: #003B4F;">if</span> (<span class="sc" style="color: #5E5E5E;">!</span>(<span class="st" style="color: #20794D;">"segmented"</span> <span class="sc" style="color: #5E5E5E;">%in%</span> <span class="fu" style="color: #4758AB;">class</span>(out))) {</span>
<span id="cb12-12">      out <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">segmented</span>(.fit, <span class="at" style="color: #657422;">seg.Z =</span> <span class="sc" style="color: #5E5E5E;">~</span>year, <span class="at" style="color: #657422;">data =</span> .data, <span class="at" style="color: #657422;">npsi =</span> <span class="dv" style="color: #AD0000;">1</span>)</span>
<span id="cb12-13">    }</span>
<span id="cb12-14">    <span class="fu" style="color: #4758AB;">return</span>(out)</span>
<span id="cb12-15">  }))</span>
<span id="cb12-16"></span>
<span id="cb12-17"><span class="fu" style="color: #4758AB;">head</span>(fitted_df)</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tidytable: 6 × 6
  sex   type      country data                 fit    sgmt_fit  
  &lt;chr&gt; &lt;chr&gt;     &lt;chr&gt;   &lt;list&gt;               &lt;list&gt; &lt;list&gt;    
1 Male  Incidence Denmark &lt;tidytable [41 × 7]&gt; &lt;glm&gt;  &lt;segmentd&gt;
2 Male  Incidence Finland &lt;tidytable [41 × 7]&gt; &lt;glm&gt;  &lt;segmentd&gt;
3 Male  Incidence Iceland &lt;tidytable [41 × 7]&gt; &lt;glm&gt;  &lt;segmentd&gt;
4 Male  Incidence Norway  &lt;tidytable [41 × 7]&gt; &lt;glm&gt;  &lt;segmentd&gt;
5 Male  Incidence Sweden  &lt;tidytable [41 × 7]&gt; &lt;glm&gt;  &lt;segmentd&gt;
6 Male  Mortality Denmark &lt;tidytable [41 × 7]&gt; &lt;glm&gt;  &lt;segmentd&gt;</code></pre>
</div>
</div>
<p>Following plots highlighting Norway and Finland for comparison show a higher melanoma incidence and mortality rate in Norway compared to Finland. A plateau was observed in melanoma incidence in Norway in both male and female.</p>
<div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-3-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-3-1" aria-controls="tabset-3-1" aria-selected="true">Crude rate</a></li><li class="nav-item"><a class="nav-link" id="tabset-3-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-3-2" aria-controls="tabset-3-2" aria-selected="false">Poisson fit</a></li><li class="nav-item"><a class="nav-link" id="tabset-3-3-tab" data-bs-toggle="tab" data-bs-target="#tabset-3-3" aria-controls="tabset-3-3" aria-selected="false">Segmented fit</a></li><li class="nav-item"><a class="nav-link" id="tabset-3-4-tab" data-bs-toggle="tab" data-bs-target="#tabset-3-4" aria-controls="tabset-3-4" aria-selected="false">All countries</a></li></ul>
<div class="tab-content">
<div id="tabset-3-1" class="tab-pane active" aria-labelledby="tabset-3-1-tab">
<div class="cell" data-fig.asp="0.9">
<details>
<summary>Crude rate plot</summary>
<div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1">cols <span class="ot" style="color: #003B4F;">&lt;-</span> RColorBrewer<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">brewer.pal</span>(fitted_df[, <span class="fu" style="color: #4758AB;">n_distinct</span>(country)], <span class="st" style="color: #20794D;">"Set1"</span>) </span>
<span id="cb14-2"><span class="fu" style="color: #4758AB;">names</span>(cols) <span class="ot" style="color: #003B4F;">&lt;-</span> fitted_df[, <span class="fu" style="color: #4758AB;">unique</span>(country)]</span>
<span id="cb14-3"></span>
<span id="cb14-4">rate_df <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb14-5">  <span class="fu" style="color: #4758AB;">filter</span>(country <span class="sc" style="color: #5E5E5E;">!=</span> <span class="st" style="color: #20794D;">"Iceland"</span>) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb14-6">  <span class="fu" style="color: #4758AB;">ggplot</span>(<span class="fu" style="color: #4758AB;">aes</span>(year, crude_rate, <span class="at" style="color: #657422;">group =</span> country)) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb14-7">  <span class="fu" style="color: #4758AB;">facet_grid</span>(</span>
<span id="cb14-8">    <span class="at" style="color: #657422;">cols =</span> <span class="fu" style="color: #4758AB;">vars</span>(sex),</span>
<span id="cb14-9">    <span class="at" style="color: #657422;">rows =</span> <span class="fu" style="color: #4758AB;">vars</span>(type),</span>
<span id="cb14-10">    <span class="at" style="color: #657422;">scales =</span> <span class="st" style="color: #20794D;">"free_y"</span></span>
<span id="cb14-11">  ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb14-12">  <span class="fu" style="color: #4758AB;">geom_line</span>(<span class="at" style="color: #657422;">color =</span> <span class="st" style="color: #20794D;">"lightgrey"</span>) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb14-13">  <span class="fu" style="color: #4758AB;">geom_point</span>(</span>
<span id="cb14-14">    <span class="at" style="color: #657422;">fill =</span> <span class="st" style="color: #20794D;">"whitesmoke"</span>, </span>
<span id="cb14-15">    <span class="at" style="color: #657422;">shape =</span> <span class="dv" style="color: #AD0000;">21</span>, </span>
<span id="cb14-16">    <span class="at" style="color: #657422;">color =</span> <span class="st" style="color: #20794D;">"lightgrey"</span>,</span>
<span id="cb14-17">    <span class="at" style="color: #657422;">stroke =</span> <span class="dv" style="color: #AD0000;">1</span>,</span>
<span id="cb14-18">    <span class="at" style="color: #657422;">size =</span> <span class="dv" style="color: #AD0000;">1</span></span>
<span id="cb14-19">  ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb14-20">  <span class="fu" style="color: #4758AB;">geom_line</span>(</span>
<span id="cb14-21">    <span class="at" style="color: #657422;">data =</span> <span class="sc" style="color: #5E5E5E;">~</span><span class="fu" style="color: #4758AB;">subset</span>(.x, country <span class="sc" style="color: #5E5E5E;">%in%</span> <span class="fu" style="color: #4758AB;">c</span>(<span class="st" style="color: #20794D;">"Finland"</span>, <span class="st" style="color: #20794D;">"Norway"</span>)),</span>
<span id="cb14-22">    <span class="fu" style="color: #4758AB;">aes</span>(<span class="at" style="color: #657422;">color =</span> country)</span>
<span id="cb14-23">  ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb14-24">  ggthemes<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">theme_few</span>(<span class="at" style="color: #657422;">base_size =</span> <span class="dv" style="color: #AD0000;">14</span>) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb14-25">  <span class="fu" style="color: #4758AB;">theme</span>(</span>
<span id="cb14-26">    <span class="at" style="color: #657422;">legend.position =</span> <span class="st" style="color: #20794D;">"bottom"</span>,</span>
<span id="cb14-27">    <span class="at" style="color: #657422;">legend.justification =</span> <span class="st" style="color: #20794D;">"left"</span>,</span>
<span id="cb14-28">    <span class="at" style="color: #657422;">panel.grid =</span> <span class="fu" style="color: #4758AB;">element_line</span>(<span class="at" style="color: #657422;">color =</span> <span class="st" style="color: #20794D;">"#f0f0f0"</span>)</span>
<span id="cb14-29">  ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb14-30">  <span class="fu" style="color: #4758AB;">scale_color_manual</span>(<span class="at" style="color: #657422;">breaks =</span> <span class="fu" style="color: #4758AB;">names</span>(cols), <span class="at" style="color: #657422;">values =</span> cols) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb14-31">  <span class="fu" style="color: #4758AB;">labs</span>(</span>
<span id="cb14-32">    <span class="at" style="color: #657422;">x =</span> <span class="st" style="color: #20794D;">"Year of diagnosis"</span>,</span>
<span id="cb14-33">    <span class="at" style="color: #657422;">y =</span> <span class="st" style="color: #20794D;">"Crude rate per 100,000 person-years"</span>,</span>
<span id="cb14-34">    <span class="at" style="color: #657422;">color =</span> <span class="st" style="color: #20794D;">"Country"</span></span>
<span id="cb14-35">  )</span></code></pre></div>
</details>
<div class="cell-output-display">
<p><img src="https://mathatistics.com/blog/posts/2023-06-20-nordic-melanoma/index_files/figure-html/unnamed-chunk-12-1.png" class="img-fluid" width="672"></p>
</div>
</div>
</div>
<div id="tabset-3-2" class="tab-pane" aria-labelledby="tabset-3-2-tab">
<div class="cell">
<details>
<summary>Fitted values from the model</summary>
<div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1">fitted_df <span class="ot" style="color: #003B4F;">&lt;-</span> fitted_df <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb15-2">  <span class="fu" style="color: #4758AB;">mutate</span>(</span>
<span id="cb15-3">    <span class="at" style="color: #657422;">fit_df =</span> <span class="fu" style="color: #4758AB;">map</span>(</span>
<span id="cb15-4">      fit, </span>
<span id="cb15-5">      <span class="cf" style="color: #003B4F;">function</span>(.fit) {</span>
<span id="cb15-6">        new_data <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">crossing</span>(<span class="at" style="color: #657422;">year =</span> <span class="dv" style="color: #AD0000;">1980</span><span class="sc" style="color: #5E5E5E;">:</span><span class="dv" style="color: #AD0000;">2020</span>, <span class="at" style="color: #657422;">population =</span> <span class="fl" style="color: #AD0000;">1e5</span>)</span>
<span id="cb15-7">        <span class="fu" style="color: #4758AB;">tidytable</span>(</span>
<span id="cb15-8">          <span class="at" style="color: #657422;">year =</span> new_data[, year],</span>
<span id="cb15-9">          <span class="at" style="color: #657422;">.fitted =</span> <span class="fu" style="color: #4758AB;">predict</span>(.fit, <span class="at" style="color: #657422;">newdata =</span> new_data, <span class="at" style="color: #657422;">type =</span> <span class="st" style="color: #20794D;">"response"</span>)</span>
<span id="cb15-10">        )</span>
<span id="cb15-11">      }),</span>
<span id="cb15-12">    <span class="at" style="color: #657422;">sgmt_fit_df =</span> <span class="fu" style="color: #4758AB;">map</span>(</span>
<span id="cb15-13">      sgmt_fit, <span class="cf" style="color: #003B4F;">function</span>(.fit) {</span>
<span id="cb15-14">        new_data <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">crossing</span>(<span class="at" style="color: #657422;">year =</span> <span class="dv" style="color: #AD0000;">1980</span><span class="sc" style="color: #5E5E5E;">:</span><span class="dv" style="color: #AD0000;">2020</span>, <span class="at" style="color: #657422;">population =</span> <span class="fl" style="color: #AD0000;">1e5</span>)</span>
<span id="cb15-15">        pred_df <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">predict</span>(</span>
<span id="cb15-16">          .fit, <span class="at" style="color: #657422;">newdata =</span> new_data,</span>
<span id="cb15-17">          <span class="at" style="color: #657422;">type =</span> <span class="st" style="color: #20794D;">"link"</span>, <span class="at" style="color: #657422;">interval =</span> <span class="st" style="color: #20794D;">"confidence"</span></span>
<span id="cb15-18">        ) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">exp</span>() <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">apply</span>(<span class="dv" style="color: #AD0000;">1</span><span class="sc" style="color: #5E5E5E;">:</span><span class="dv" style="color: #AD0000;">2</span>, prod, <span class="fl" style="color: #AD0000;">1e5</span>) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb15-19">        <span class="fu" style="color: #4758AB;">as_tidytable</span>() <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb15-20">        <span class="fu" style="color: #4758AB;">rename_with</span>(<span class="sc" style="color: #5E5E5E;">~</span><span class="fu" style="color: #4758AB;">c</span>(<span class="st" style="color: #20794D;">".fitted"</span>, <span class="st" style="color: #20794D;">".lower"</span>, <span class="st" style="color: #20794D;">".upper"</span>))</span>
<span id="cb15-21">        <span class="fu" style="color: #4758AB;">bind_cols</span>(<span class="at" style="color: #657422;">year =</span> new_data[, year], pred_df)</span>
<span id="cb15-22">      })</span>
<span id="cb15-23">  )</span></code></pre></div>
</details>
</div>
<div class="cell" data-fig.asp="0.9">
<details>
<summary>Crude rate with poisson fit</summary>
<div class="sourceCode cell-code" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb16-1">fit_df <span class="ot" style="color: #003B4F;">&lt;-</span> fitted_df <span class="sc" style="color: #5E5E5E;">%&gt;%</span></span>
<span id="cb16-2">  <span class="fu" style="color: #4758AB;">unnest</span>(fit_df) <span class="sc" style="color: #5E5E5E;">%&gt;%</span></span>
<span id="cb16-3">  <span class="fu" style="color: #4758AB;">filter</span>(country <span class="sc" style="color: #5E5E5E;">!=</span> <span class="st" style="color: #20794D;">"Iceland"</span>)</span>
<span id="cb16-4"></span>
<span id="cb16-5">plot_poisson <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="cf" style="color: #003B4F;">function</span>(rate_df, fit_df, <span class="at" style="color: #657422;">countries =</span> <span class="cn" style="color: #8f5902;">NULL</span>) {</span>
<span id="cb16-6">  <span class="cf" style="color: #003B4F;">if</span> (<span class="fu" style="color: #4758AB;">is.null</span>(countries)) {</span>
<span id="cb16-7">    countries <span class="ot" style="color: #003B4F;">&lt;-</span> rate_df[, country]</span>
<span id="cb16-8">  }</span>
<span id="cb16-9">  rate_df <span class="sc" style="color: #5E5E5E;">%&gt;%</span></span>
<span id="cb16-10">    <span class="fu" style="color: #4758AB;">filter</span>(country <span class="sc" style="color: #5E5E5E;">!=</span> <span class="st" style="color: #20794D;">"Iceland"</span>) <span class="sc" style="color: #5E5E5E;">%&gt;%</span></span>
<span id="cb16-11">    <span class="fu" style="color: #4758AB;">ggplot</span>(<span class="fu" style="color: #4758AB;">aes</span>(year, crude_rate, <span class="at" style="color: #657422;">group =</span> country)) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb16-12">    <span class="fu" style="color: #4758AB;">facet_grid</span>(</span>
<span id="cb16-13">      <span class="at" style="color: #657422;">cols =</span> <span class="fu" style="color: #4758AB;">vars</span>(sex),</span>
<span id="cb16-14">      <span class="at" style="color: #657422;">rows =</span> <span class="fu" style="color: #4758AB;">vars</span>(type),</span>
<span id="cb16-15">      <span class="at" style="color: #657422;">scales =</span> <span class="st" style="color: #20794D;">"free_y"</span></span>
<span id="cb16-16">    ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb16-17">    <span class="fu" style="color: #4758AB;">geom_line</span>(<span class="at" style="color: #657422;">color =</span> <span class="st" style="color: #20794D;">"lightgrey"</span>) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb16-18">    <span class="fu" style="color: #4758AB;">geom_point</span>(</span>
<span id="cb16-19">      <span class="at" style="color: #657422;">fill =</span> <span class="st" style="color: #20794D;">"whitesmoke"</span>,</span>
<span id="cb16-20">      <span class="at" style="color: #657422;">shape =</span> <span class="dv" style="color: #AD0000;">21</span>,</span>
<span id="cb16-21">      <span class="at" style="color: #657422;">color =</span> <span class="st" style="color: #20794D;">"lightgrey"</span>,</span>
<span id="cb16-22">      <span class="at" style="color: #657422;">stroke =</span> <span class="dv" style="color: #AD0000;">1</span>,</span>
<span id="cb16-23">      <span class="at" style="color: #657422;">size =</span> <span class="dv" style="color: #AD0000;">1</span></span>
<span id="cb16-24">    ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb16-25">    <span class="fu" style="color: #4758AB;">geom_line</span>(</span>
<span id="cb16-26">      <span class="at" style="color: #657422;">data =</span> <span class="sc" style="color: #5E5E5E;">~</span> <span class="fu" style="color: #4758AB;">subset</span>(fit_df, country <span class="sc" style="color: #5E5E5E;">%in%</span> countries),</span>
<span id="cb16-27">      <span class="fu" style="color: #4758AB;">aes</span>(<span class="at" style="color: #657422;">color =</span> country, <span class="at" style="color: #657422;">y =</span> .fitted),</span>
<span id="cb16-28">    ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb16-29">    ggthemes<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">theme_few</span>(<span class="at" style="color: #657422;">base_size =</span> <span class="dv" style="color: #AD0000;">14</span>) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb16-30">    <span class="fu" style="color: #4758AB;">theme</span>(</span>
<span id="cb16-31">      <span class="at" style="color: #657422;">legend.position =</span> <span class="st" style="color: #20794D;">"bottom"</span>,</span>
<span id="cb16-32">      <span class="at" style="color: #657422;">legend.justification =</span> <span class="st" style="color: #20794D;">"left"</span>,</span>
<span id="cb16-33">      <span class="at" style="color: #657422;">panel.grid =</span> <span class="fu" style="color: #4758AB;">element_line</span>(<span class="at" style="color: #657422;">color =</span> <span class="st" style="color: #20794D;">"#f0f0f0"</span>)</span>
<span id="cb16-34">    ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb16-35">    <span class="fu" style="color: #4758AB;">scale_color_manual</span>(<span class="at" style="color: #657422;">breaks =</span> <span class="fu" style="color: #4758AB;">names</span>(cols), <span class="at" style="color: #657422;">values =</span> cols) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb16-36">    <span class="fu" style="color: #4758AB;">labs</span>(</span>
<span id="cb16-37">      <span class="at" style="color: #657422;">x =</span> <span class="st" style="color: #20794D;">"Year of diagnosis"</span>,</span>
<span id="cb16-38">      <span class="at" style="color: #657422;">y =</span> <span class="st" style="color: #20794D;">"Crude rate per 100,000 person-years"</span>,</span>
<span id="cb16-39">      <span class="at" style="color: #657422;">color =</span> <span class="st" style="color: #20794D;">"Country"</span></span>
<span id="cb16-40">    )</span>
<span id="cb16-41">}</span>
<span id="cb16-42"><span class="fu" style="color: #4758AB;">plot_poisson</span>(rate_df, fit_df, <span class="fu" style="color: #4758AB;">c</span>(<span class="st" style="color: #20794D;">"Finland"</span>, <span class="st" style="color: #20794D;">"Norway"</span>))</span></code></pre></div>
</details>
<div class="cell-output-display">
<p><img src="https://mathatistics.com/blog/posts/2023-06-20-nordic-melanoma/index_files/figure-html/unnamed-chunk-14-1.png" class="img-fluid" width="672"></p>
</div>
</div>
</div>
<div id="tabset-3-3" class="tab-pane" aria-labelledby="tabset-3-3-tab">
<div class="cell" data-fig.asp="0.9">
<details>
<summary>Crude rate with poisson fit</summary>
<div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb17-1">plot_segmented <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="cf" style="color: #003B4F;">function</span>(rate_df, fit_df, <span class="at" style="color: #657422;">countries =</span> <span class="cn" style="color: #8f5902;">NULL</span>, <span class="at" style="color: #657422;">show_poisson =</span> T) {</span>
<span id="cb17-2">  sgmt_fit_df <span class="ot" style="color: #003B4F;">&lt;-</span> fitted_df <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb17-3">    <span class="fu" style="color: #4758AB;">filter</span>(country <span class="sc" style="color: #5E5E5E;">!=</span> <span class="st" style="color: #20794D;">"Iceland"</span>) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb17-4">    <span class="fu" style="color: #4758AB;">unnest</span>(sgmt_fit_df)</span>
<span id="cb17-5"></span>
<span id="cb17-6">  <span class="cf" style="color: #003B4F;">if</span> (<span class="fu" style="color: #4758AB;">is.null</span>(countries)) {</span>
<span id="cb17-7">    countries <span class="ot" style="color: #003B4F;">&lt;-</span> rate_df[, <span class="fu" style="color: #4758AB;">unique</span>(country)]</span>
<span id="cb17-8">  } <span class="cf" style="color: #003B4F;">else</span> {</span>
<span id="cb17-9">    sgmt_fit_df <span class="ot" style="color: #003B4F;">&lt;-</span> sgmt_fit_df <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">filter</span>(country <span class="sc" style="color: #5E5E5E;">%in%</span> countries)</span>
<span id="cb17-10">  }</span>
<span id="cb17-11"></span>
<span id="cb17-12">  plt <span class="ot" style="color: #003B4F;">&lt;-</span> rate_df <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb17-13">    <span class="fu" style="color: #4758AB;">filter</span>(country <span class="sc" style="color: #5E5E5E;">!=</span> <span class="st" style="color: #20794D;">"Iceland"</span>) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb17-14">    <span class="fu" style="color: #4758AB;">ggplot</span>(<span class="fu" style="color: #4758AB;">aes</span>(year, crude_rate, <span class="at" style="color: #657422;">group =</span> country)) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb17-15">    <span class="fu" style="color: #4758AB;">facet_grid</span>(</span>
<span id="cb17-16">      <span class="at" style="color: #657422;">cols =</span> <span class="fu" style="color: #4758AB;">vars</span>(sex),</span>
<span id="cb17-17">      <span class="at" style="color: #657422;">rows =</span> <span class="fu" style="color: #4758AB;">vars</span>(type),</span>
<span id="cb17-18">      <span class="at" style="color: #657422;">scales =</span> <span class="st" style="color: #20794D;">"free_y"</span></span>
<span id="cb17-19">    ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb17-20">    <span class="fu" style="color: #4758AB;">geom_line</span>(<span class="at" style="color: #657422;">color =</span> <span class="st" style="color: #20794D;">"lightgrey"</span>) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb17-21">    <span class="fu" style="color: #4758AB;">geom_point</span>(</span>
<span id="cb17-22">      <span class="at" style="color: #657422;">fill =</span> <span class="st" style="color: #20794D;">"whitesmoke"</span>, </span>
<span id="cb17-23">      <span class="at" style="color: #657422;">shape =</span> <span class="dv" style="color: #AD0000;">21</span>, </span>
<span id="cb17-24">      <span class="at" style="color: #657422;">color =</span> <span class="st" style="color: #20794D;">"lightgrey"</span>,</span>
<span id="cb17-25">      <span class="at" style="color: #657422;">stroke =</span> <span class="dv" style="color: #AD0000;">1</span>,</span>
<span id="cb17-26">      <span class="at" style="color: #657422;">size =</span> <span class="dv" style="color: #AD0000;">1</span></span>
<span id="cb17-27">    ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb17-28">    <span class="fu" style="color: #4758AB;">geom_line</span>(</span>
<span id="cb17-29">      <span class="at" style="color: #657422;">data =</span> <span class="sc" style="color: #5E5E5E;">~</span><span class="fu" style="color: #4758AB;">subset</span>(sgmt_fit_df, country <span class="sc" style="color: #5E5E5E;">%in%</span> countries),</span>
<span id="cb17-30">      <span class="fu" style="color: #4758AB;">aes</span>(<span class="at" style="color: #657422;">color =</span> country, <span class="at" style="color: #657422;">y =</span> .fitted),</span>
<span id="cb17-31">      <span class="at" style="color: #657422;">linetype =</span> <span class="cf" style="color: #003B4F;">if</span> (show_poisson) <span class="st" style="color: #20794D;">"dashed"</span> <span class="cf" style="color: #003B4F;">else</span> <span class="st" style="color: #20794D;">"solid"</span></span>
<span id="cb17-32">    ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb17-33">    ggthemes<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">theme_few</span>(<span class="at" style="color: #657422;">base_size =</span> <span class="dv" style="color: #AD0000;">14</span>) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb17-34">    <span class="fu" style="color: #4758AB;">theme</span>(</span>
<span id="cb17-35">      <span class="at" style="color: #657422;">legend.position =</span> <span class="st" style="color: #20794D;">"bottom"</span>,</span>
<span id="cb17-36">      <span class="at" style="color: #657422;">legend.justification =</span> <span class="st" style="color: #20794D;">"left"</span>,</span>
<span id="cb17-37">      <span class="at" style="color: #657422;">panel.grid =</span> <span class="fu" style="color: #4758AB;">element_line</span>(<span class="at" style="color: #657422;">color =</span> <span class="st" style="color: #20794D;">"#f0f0f0"</span>)</span>
<span id="cb17-38">    ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb17-39">    <span class="fu" style="color: #4758AB;">scale_color_manual</span>(<span class="at" style="color: #657422;">breaks =</span> <span class="fu" style="color: #4758AB;">names</span>(cols), <span class="at" style="color: #657422;">values =</span> cols) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb17-40">    <span class="fu" style="color: #4758AB;">labs</span>(</span>
<span id="cb17-41">      <span class="at" style="color: #657422;">x =</span> <span class="st" style="color: #20794D;">"Year of diagnosis"</span>,</span>
<span id="cb17-42">      <span class="at" style="color: #657422;">y =</span> <span class="st" style="color: #20794D;">"Crude rate per 100,000 person-years"</span>,</span>
<span id="cb17-43">      <span class="at" style="color: #657422;">color =</span> <span class="st" style="color: #20794D;">"Country"</span></span>
<span id="cb17-44">    )</span>
<span id="cb17-45"></span>
<span id="cb17-46">    <span class="cf" style="color: #003B4F;">if</span> (show_poisson) {</span>
<span id="cb17-47">      plt <span class="ot" style="color: #003B4F;">&lt;-</span> plt <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb17-48">        <span class="fu" style="color: #4758AB;">geom_line</span>(</span>
<span id="cb17-49">          <span class="at" style="color: #657422;">data =</span> <span class="sc" style="color: #5E5E5E;">~</span><span class="fu" style="color: #4758AB;">subset</span>(fit_df, country <span class="sc" style="color: #5E5E5E;">%in%</span> countries),</span>
<span id="cb17-50">          <span class="fu" style="color: #4758AB;">aes</span>(<span class="at" style="color: #657422;">color =</span> country, <span class="at" style="color: #657422;">y =</span> .fitted),</span>
<span id="cb17-51">          <span class="at" style="color: #657422;">alpha =</span> <span class="fl" style="color: #AD0000;">0.5</span></span>
<span id="cb17-52">        )</span>
<span id="cb17-53">    }</span>
<span id="cb17-54">    <span class="fu" style="color: #4758AB;">return</span>(plt)</span>
<span id="cb17-55">}</span>
<span id="cb17-56"><span class="fu" style="color: #4758AB;">plot_segmented</span>(rate_df, fit_df, <span class="fu" style="color: #4758AB;">c</span>(<span class="st" style="color: #20794D;">"Norway"</span>, <span class="st" style="color: #20794D;">"Finland"</span>))</span></code></pre></div>
</details>
<div class="cell-output-display">
<p><img src="https://mathatistics.com/blog/posts/2023-06-20-nordic-melanoma/index_files/figure-html/unnamed-chunk-15-1.png" class="img-fluid" width="672"></p>
</div>
</div>
</div>
<div id="tabset-3-4" class="tab-pane" aria-labelledby="tabset-3-4-tab">
<div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-2-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-1" aria-controls="tabset-2-1" aria-selected="true">Poisson fit</a></li><li class="nav-item"><a class="nav-link" id="tabset-2-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-2" aria-controls="tabset-2-2" aria-selected="false">Segmented fit</a></li></ul>
<div class="tab-content">
<div id="tabset-2-1" class="tab-pane active" aria-labelledby="tabset-2-1-tab">
<div class="cell" data-fig.asp="0.9">
<details>
<summary>Crude rate with poisson fit for all countries</summary>
<div class="sourceCode cell-code" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb18-1"><span class="fu" style="color: #4758AB;">plot_poisson</span>(rate_df, fit_df)</span></code></pre></div>
</details>
<div class="cell-output-display">
<p><img src="https://mathatistics.com/blog/posts/2023-06-20-nordic-melanoma/index_files/figure-html/unnamed-chunk-16-1.png" class="img-fluid" width="672"></p>
</div>
</div>
</div>
<div id="tabset-2-2" class="tab-pane" aria-labelledby="tabset-2-2-tab">
<div class="cell" data-fig.asp="0.9">
<details>
<summary>Crude rate with poisson fit for all countries</summary>
<div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb19-1"><span class="fu" style="color: #4758AB;">plot_segmented</span>(rate_df, fit_df, <span class="at" style="color: #657422;">show_poisson =</span> <span class="cn" style="color: #8f5902;">FALSE</span>)</span></code></pre></div>
</details>
<div class="cell-output-display">
<p><img src="https://mathatistics.com/blog/posts/2023-06-20-nordic-melanoma/index_files/figure-html/unnamed-chunk-17-1.png" class="img-fluid" width="672"></p>
</div>
</div>
<p><img src="https://mathatistics.com/blog/posts/2023-06-20-nordic-melanoma/Plots/Sgmt-all.svg" class="img-fluid"></p>
</div>
</div>
</div>
</div>
</div>
</div>
<p>Here, Norway leads with the highest rates of melanoma incidence and mortality, while Finland shines with the lowest rates across both sexes. Recently, Denmark has surged ahead of Norway and Sweden in terms of melanoma incidence.</p>
<p>Interestingly, Norway had a plateau in melanoma cases for a while, but most Nordic countries saw a rise in melanoma cases after 2005. The silver lining here is that all countries have experienced a drop in melanoma mortality in recent years, thanks to better detection, treatments, and awareness.</p>
</section>
<section id="annual-percentge-change-apc" class="level3">
<h3 class="anchored" data-anchor-id="annual-percentge-change-apc">Annual percentge change (APC)</h3>
<div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-4-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-4-1" aria-controls="tabset-4-1" aria-selected="true">Plot</a></li><li class="nav-item"><a class="nav-link" id="tabset-4-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-4-2" aria-controls="tabset-4-2" aria-selected="false">Table</a></li></ul>
<div class="tab-content">
<div id="tabset-4-1" class="tab-pane active" aria-labelledby="tabset-4-1-tab">
<div class="cell" data-fig.asp="0.5">
<details>
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb20-1">tidy_fit <span class="ot" style="color: #003B4F;">&lt;-</span> fitted_df <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb20-2">  <span class="fu" style="color: #4758AB;">transmute</span>(<span class="at" style="color: #657422;">tidy_fit =</span> <span class="fu" style="color: #4758AB;">map</span>(fit, <span class="cf" style="color: #003B4F;">function</span>(.fit) {</span>
<span id="cb20-3">    broom<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">tidy</span>(.fit, <span class="at" style="color: #657422;">exponentiate =</span> <span class="cn" style="color: #8f5902;">TRUE</span>, <span class="at" style="color: #657422;">conf.int =</span> <span class="cn" style="color: #8f5902;">TRUE</span>) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb20-4">      <span class="fu" style="color: #4758AB;">filter</span>(term <span class="sc" style="color: #5E5E5E;">==</span> <span class="st" style="color: #20794D;">"year"</span>, country <span class="sc" style="color: #5E5E5E;">!=</span> <span class="st" style="color: #20794D;">"Iceland"</span>)</span>
<span id="cb20-5">  }), <span class="at" style="color: #657422;">.by =</span> <span class="fu" style="color: #4758AB;">c</span>(sex, type, country)) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb20-6">  <span class="fu" style="color: #4758AB;">unnest</span>()</span>
<span id="cb20-7"></span>
<span id="cb20-8">tidy_fit <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb20-9">  <span class="fu" style="color: #4758AB;">mutate</span>(<span class="at" style="color: #657422;">country =</span> <span class="fu" style="color: #4758AB;">factor</span>(country, <span class="fu" style="color: #4758AB;">c</span>(</span>
<span id="cb20-10">    <span class="st" style="color: #20794D;">"Finland"</span>, <span class="st" style="color: #20794D;">"Denmark"</span>,</span>
<span id="cb20-11">    <span class="st" style="color: #20794D;">"Sweden"</span>, <span class="st" style="color: #20794D;">"Norway"</span></span>
<span id="cb20-12">  ))) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb20-13">  <span class="fu" style="color: #4758AB;">ggplot</span>(<span class="fu" style="color: #4758AB;">aes</span>(<span class="at" style="color: #657422;">x =</span> estimate, <span class="at" style="color: #657422;">y =</span> country, <span class="at" style="color: #657422;">color =</span> sex)) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb20-14">  <span class="fu" style="color: #4758AB;">facet_grid</span>(<span class="at" style="color: #657422;">cols =</span> <span class="fu" style="color: #4758AB;">vars</span>(type)) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb20-15">  <span class="fu" style="color: #4758AB;">geom_pointrange</span>(</span>
<span id="cb20-16">    <span class="fu" style="color: #4758AB;">aes</span>(<span class="at" style="color: #657422;">xmin =</span> conf.low, <span class="at" style="color: #657422;">xmax =</span> conf.high),</span>
<span id="cb20-17">    <span class="at" style="color: #657422;">shape =</span> <span class="dv" style="color: #AD0000;">21</span>,</span>
<span id="cb20-18">    <span class="at" style="color: #657422;">fill =</span> <span class="st" style="color: #20794D;">"whitesmoke"</span>,</span>
<span id="cb20-19">    <span class="at" style="color: #657422;">position =</span> <span class="fu" style="color: #4758AB;">position_dodge</span>(<span class="at" style="color: #657422;">width =</span> <span class="fl" style="color: #AD0000;">0.5</span>)</span>
<span id="cb20-20">  ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb20-21">  <span class="fu" style="color: #4758AB;">geom_vline</span>(<span class="at" style="color: #657422;">xintercept =</span> <span class="dv" style="color: #AD0000;">1</span>, <span class="at" style="color: #657422;">linetype =</span> <span class="dv" style="color: #AD0000;">2</span>, <span class="at" style="color: #657422;">color =</span> <span class="st" style="color: #20794D;">"grey"</span>) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb20-22">  <span class="fu" style="color: #4758AB;">scale_color_brewer</span>(<span class="at" style="color: #657422;">palette =</span> <span class="st" style="color: #20794D;">"Set1"</span>) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb20-23">  <span class="fu" style="color: #4758AB;">expand_limits</span>(<span class="at" style="color: #657422;">x =</span> <span class="dv" style="color: #AD0000;">1</span>) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb20-24">  ggthemes<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">theme_few</span>(<span class="at" style="color: #657422;">base_size =</span> <span class="dv" style="color: #AD0000;">16</span>) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb20-25">  <span class="fu" style="color: #4758AB;">theme</span>(</span>
<span id="cb20-26">    <span class="at" style="color: #657422;">panel.grid =</span> <span class="fu" style="color: #4758AB;">element_line</span>(<span class="at" style="color: #657422;">color =</span> <span class="st" style="color: #20794D;">"#f0f0f0"</span>),</span>
<span id="cb20-27">    <span class="at" style="color: #657422;">legend.position =</span> <span class="st" style="color: #20794D;">"bottom"</span>,</span>
<span id="cb20-28">    <span class="at" style="color: #657422;">legend.justification =</span> <span class="st" style="color: #20794D;">"left"</span></span>
<span id="cb20-29">  ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb20-30">  <span class="fu" style="color: #4758AB;">labs</span>(</span>
<span id="cb20-31">    <span class="at" style="color: #657422;">x =</span> <span class="st" style="color: #20794D;">"Percentage change in count"</span>,</span>
<span id="cb20-32">    <span class="at" style="color: #657422;">y =</span> <span class="cn" style="color: #8f5902;">NULL</span></span>
<span id="cb20-33">  )</span></code></pre></div>
</details>
<div class="cell-output-display">
<p><img src="https://mathatistics.com/blog/posts/2023-06-20-nordic-melanoma/index_files/figure-html/unnamed-chunk-18-1.png" class="img-fluid" width="672"></p>
</div>
</div>
</div>
<div id="tabset-4-2" class="tab-pane" aria-labelledby="tabset-4-2-tab">
<div class="cell">
<details>
<summary>Calculating annual percentage change</summary>
<div class="sourceCode cell-code" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb21-1">aapc_df <span class="ot" style="color: #003B4F;">&lt;-</span> fitted_df <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb21-2">  <span class="fu" style="color: #4758AB;">transmute</span>(</span>
<span id="cb21-3">    <span class="at" style="color: #657422;">aapc =</span> <span class="fu" style="color: #4758AB;">map</span>(sgmt_fit, <span class="cf" style="color: #003B4F;">function</span>(.fit) {</span>
<span id="cb21-4">      <span class="fu" style="color: #4758AB;">aapc</span>(.fit) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb21-5">        <span class="fu" style="color: #4758AB;">t</span>() <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb21-6">        <span class="fu" style="color: #4758AB;">as_tidytable</span>() <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb21-7">        <span class="fu" style="color: #4758AB;">rename_with</span>(</span>
<span id="cb21-8">          <span class="sc" style="color: #5E5E5E;">~</span><span class="fu" style="color: #4758AB;">c</span>(<span class="st" style="color: #20794D;">"estimate"</span>, <span class="st" style="color: #20794D;">"std_err"</span>, <span class="st" style="color: #20794D;">"lower"</span>, <span class="st" style="color: #20794D;">"upper"</span>)</span>
<span id="cb21-9">        ) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb21-10">        <span class="fu" style="color: #4758AB;">mutate</span>(<span class="at" style="color: #657422;">psi =</span> <span class="st" style="color: #20794D;">"1980--2020"</span>) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb21-11">        <span class="fu" style="color: #4758AB;">mutate</span>(</span>
<span id="cb21-12">          <span class="at" style="color: #657422;">label =</span> glue<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">glue</span>(</span>
<span id="cb21-13">            <span class="st" style="color: #20794D;">"{estimate} ({lower}, {upper})"</span>,</span>
<span id="cb21-14">            <span class="at" style="color: #657422;">.transformer =</span> \(d, e) <span class="fu" style="color: #4758AB;">round</span>(<span class="fu" style="color: #4758AB;">get</span>(d, e), <span class="dv" style="color: #AD0000;">2</span>)</span>
<span id="cb21-15">          )</span>
<span id="cb21-16">        )</span>
<span id="cb21-17">    }),</span>
<span id="cb21-18">    <span class="at" style="color: #657422;">apc =</span> <span class="fu" style="color: #4758AB;">map</span>(sgmt_fit, <span class="cf" style="color: #003B4F;">function</span>(fit) {</span>
<span id="cb21-19">      <span class="cf" style="color: #003B4F;">if</span> (<span class="sc" style="color: #5E5E5E;">!</span><span class="st" style="color: #20794D;">"segmented"</span> <span class="sc" style="color: #5E5E5E;">%in%</span> <span class="fu" style="color: #4758AB;">class</span>(fit)) <span class="fu" style="color: #4758AB;">return</span>(<span class="cn" style="color: #8f5902;">NULL</span>)</span>
<span id="cb21-20">      out <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">slope</span>(fit, <span class="at" style="color: #657422;">APC =</span> <span class="cn" style="color: #8f5902;">TRUE</span>)[[<span class="dv" style="color: #AD0000;">1</span>]] <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb21-21">        <span class="fu" style="color: #4758AB;">as_tidytable</span>(<span class="at" style="color: #657422;">.keep_rownames =</span> <span class="st" style="color: #20794D;">"segment"</span>)</span>
<span id="cb21-22">      psi_start <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">unname</span>(<span class="fu" style="color: #4758AB;">c</span>(<span class="dv" style="color: #AD0000;">1980</span>, fit<span class="sc" style="color: #5E5E5E;">$</span>psi[, <span class="st" style="color: #20794D;">"Est."</span>], <span class="dv" style="color: #AD0000;">2020</span>))</span>
<span id="cb21-23">      psi_end <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">lead</span>(psi_start, <span class="dv" style="color: #AD0000;">1</span>)</span>
<span id="cb21-24">      psi_range <span class="ot" style="color: #003B4F;">&lt;-</span> glue<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">glue</span>(</span>
<span id="cb21-25">        <span class="st" style="color: #20794D;">"{psi_start}--{psi_end}"</span>,</span>
<span id="cb21-26">        <span class="at" style="color: #657422;">.transformer =</span> \(d, e) <span class="fu" style="color: #4758AB;">round</span>(<span class="fu" style="color: #4758AB;">get</span>(d, e))</span>
<span id="cb21-27">      )</span>
<span id="cb21-28">      psi_range <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">setdiff</span>(psi_range, <span class="fu" style="color: #4758AB;">last</span>(psi_range))</span>
<span id="cb21-29">      out <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb21-30">        <span class="fu" style="color: #4758AB;">mutate</span>(<span class="at" style="color: #657422;">psi =</span> psi_range) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb21-31">        <span class="fu" style="color: #4758AB;">rename_with</span>(</span>
<span id="cb21-32">          <span class="at" style="color: #657422;">.fn =</span> <span class="sc" style="color: #5E5E5E;">~</span><span class="fu" style="color: #4758AB;">c</span>(<span class="st" style="color: #20794D;">"estimate"</span>, <span class="st" style="color: #20794D;">"lower"</span>, <span class="st" style="color: #20794D;">"upper"</span>), </span>
<span id="cb21-33">          <span class="at" style="color: #657422;">.cols =</span> <span class="dv" style="color: #AD0000;">2</span><span class="sc" style="color: #5E5E5E;">:</span><span class="dv" style="color: #AD0000;">4</span></span>
<span id="cb21-34">        ) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb21-35">        <span class="fu" style="color: #4758AB;">mutate</span>(<span class="at" style="color: #657422;">label =</span> glue<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">glue</span>(</span>
<span id="cb21-36">          <span class="st" style="color: #20794D;">"{estimate} ({lower}, {upper})"</span>,</span>
<span id="cb21-37">          <span class="at" style="color: #657422;">.transformer =</span> \(d, e) <span class="fu" style="color: #4758AB;">round</span>(<span class="fu" style="color: #4758AB;">get</span>(d, e), <span class="dv" style="color: #AD0000;">2</span>)</span>
<span id="cb21-38">        )) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb21-39">        <span class="fu" style="color: #4758AB;">mutate</span>(<span class="at" style="color: #657422;">label_period =</span> glue<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">glue</span>(<span class="st" style="color: #20794D;">"{label}&lt;br&gt;{psi}"</span>)) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb21-40">        <span class="fu" style="color: #4758AB;">mutate</span>(<span class="at" style="color: #657422;">segment =</span> <span class="fu" style="color: #4758AB;">gsub</span>(<span class="st" style="color: #20794D;">"slope"</span>, <span class="st" style="color: #20794D;">""</span>, segment))</span>
<span id="cb21-41">    }),</span>
<span id="cb21-42">    <span class="at" style="color: #657422;">.by =</span> <span class="fu" style="color: #4758AB;">c</span>(sex, type, country)</span>
<span id="cb21-43">  )</span>
<span id="cb21-44"></span>
<span id="cb21-45">apc <span class="ot" style="color: #003B4F;">&lt;-</span> aapc_df <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">unnest</span>(<span class="st" style="color: #20794D;">"apc"</span>)</span>
<span id="cb21-46">aapc <span class="ot" style="color: #003B4F;">&lt;-</span> aapc_df <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">unnest</span>(<span class="st" style="color: #20794D;">"aapc"</span>)</span></code></pre></div>
</details>
</div>
<div class="cell">
<details>
<summary>Table data for APC</summary>
<div class="sourceCode cell-code" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb22-1">apc <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb22-2">  <span class="fu" style="color: #4758AB;">filter</span>(country <span class="sc" style="color: #5E5E5E;">%in%</span> <span class="fu" style="color: #4758AB;">c</span>(<span class="st" style="color: #20794D;">"Norway"</span>, <span class="st" style="color: #20794D;">"Denmark"</span>)) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb22-3">  <span class="fu" style="color: #4758AB;">pivot_wider</span>(</span>
<span id="cb22-4">    <span class="at" style="color: #657422;">id_cols =</span> <span class="fu" style="color: #4758AB;">c</span>(country, segment),</span>
<span id="cb22-5">    <span class="at" style="color: #657422;">names_from =</span> <span class="fu" style="color: #4758AB;">c</span>(type, sex),</span>
<span id="cb22-6">    <span class="at" style="color: #657422;">values_from =</span> <span class="st" style="color: #20794D;">"label_period"</span></span>
<span id="cb22-7">  ) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb22-8">  gt<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">gt</span>(</span>
<span id="cb22-9">    <span class="at" style="color: #657422;">id =</span> <span class="st" style="color: #20794D;">"apc-table"</span>,</span>
<span id="cb22-10">    <span class="at" style="color: #657422;">rowname_col =</span> <span class="st" style="color: #20794D;">"segment"</span>,</span>
<span id="cb22-11">    <span class="at" style="color: #657422;">groupname_col =</span> <span class="st" style="color: #20794D;">"country"</span></span>
<span id="cb22-12">  ) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb22-13">  gt<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">tab_spanner_delim</span>(<span class="st" style="color: #20794D;">"_"</span>) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb22-14">  gt<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">fmt_markdown</span>(<span class="fu" style="color: #4758AB;">everything</span>()) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb22-15">  gt<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">sub_missing</span>(<span class="fu" style="color: #4758AB;">everything</span>(), <span class="at" style="color: #657422;">missing_text =</span> <span class="st" style="color: #20794D;">"-"</span>) <span class="sc" style="color: #5E5E5E;">%&gt;%</span></span>
<span id="cb22-16">  gt<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">tab_stubhead</span>(<span class="st" style="color: #20794D;">"Segment"</span>) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb22-17">  gt<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">tab_options</span>(</span>
<span id="cb22-18">    <span class="at" style="color: #657422;">table.width =</span> <span class="st" style="color: #20794D;">"100%"</span>,</span>
<span id="cb22-19">    <span class="at" style="color: #657422;">column_labels.font.weight =</span> <span class="st" style="color: #20794D;">"bold"</span>,</span>
<span id="cb22-20">    <span class="at" style="color: #657422;">row_group.font.weight =</span> <span class="st" style="color: #20794D;">"bold"</span></span>
<span id="cb22-21">  )</span></code></pre></div>
</details>
<div class="cell-output-display">

<div id="apc-table" style="padding-left:0px;padding-right:0px;padding-top:10px;padding-bottom:10px;overflow-x:auto;overflow-y:auto;width:auto;height:auto;">
<style>#apc-table table {
  font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji';
  -webkit-font-smoothing: antialiased;
  -moz-osx-font-smoothing: grayscale;
}

#apc-table thead, #apc-table tbody, #apc-table tfoot, #apc-table tr, #apc-table td, #apc-table th {
  border-style: none;
}

#apc-table p {
  margin: 0;
  padding: 0;
}

#apc-table .gt_table {
  display: table;
  border-collapse: collapse;
  line-height: normal;
  margin-left: auto;
  margin-right: auto;
  color: #333333;
  font-size: 16px;
  font-weight: normal;
  font-style: normal;
  background-color: #FFFFFF;
  width: 100%;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #A8A8A8;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #A8A8A8;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
}

#apc-table .gt_caption {
  padding-top: 4px;
  padding-bottom: 4px;
}

#apc-table .gt_title {
  color: #333333;
  font-size: 125%;
  font-weight: initial;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-color: #FFFFFF;
  border-bottom-width: 0;
}

#apc-table .gt_subtitle {
  color: #333333;
  font-size: 85%;
  font-weight: initial;
  padding-top: 3px;
  padding-bottom: 5px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-color: #FFFFFF;
  border-top-width: 0;
}

#apc-table .gt_heading {
  background-color: #FFFFFF;
  text-align: center;
  border-bottom-color: #FFFFFF;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#apc-table .gt_bottom_border {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#apc-table .gt_col_headings {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#apc-table .gt_col_heading {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: bold;
  text-transform: inherit;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  overflow-x: hidden;
}

#apc-table .gt_column_spanner_outer {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: bold;
  text-transform: inherit;
  padding-top: 0;
  padding-bottom: 0;
  padding-left: 4px;
  padding-right: 4px;
}

#apc-table .gt_column_spanner_outer:first-child {
  padding-left: 0;
}

#apc-table .gt_column_spanner_outer:last-child {
  padding-right: 0;
}

#apc-table .gt_column_spanner {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 5px;
  overflow-x: hidden;
  display: inline-block;
  width: 100%;
}

#apc-table .gt_spanner_row {
  border-bottom-style: hidden;
}

#apc-table .gt_group_heading {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: bold;
  text-transform: inherit;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  text-align: left;
}

#apc-table .gt_empty_group_heading {
  padding: 0.5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: bold;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: middle;
}

#apc-table .gt_from_md > :first-child {
  margin-top: 0;
}

#apc-table .gt_from_md > :last-child {
  margin-bottom: 0;
}

#apc-table .gt_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  margin: 10px;
  border-top-style: solid;
  border-top-width: 1px;
  border-top-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  overflow-x: hidden;
}

#apc-table .gt_stub {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
}

#apc-table .gt_stub_row_group {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
  vertical-align: top;
}

#apc-table .gt_row_group_first td {
  border-top-width: 2px;
}

#apc-table .gt_row_group_first th {
  border-top-width: 2px;
}

#apc-table .gt_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#apc-table .gt_first_summary_row {
  border-top-style: solid;
  border-top-color: #D3D3D3;
}

#apc-table .gt_first_summary_row.thick {
  border-top-width: 2px;
}

#apc-table .gt_last_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#apc-table .gt_grand_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#apc-table .gt_first_grand_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: double;
  border-top-width: 6px;
  border-top-color: #D3D3D3;
}

#apc-table .gt_last_grand_summary_row_top {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: double;
  border-bottom-width: 6px;
  border-bottom-color: #D3D3D3;
}

#apc-table .gt_striped {
  background-color: rgba(128, 128, 128, 0.05);
}

#apc-table .gt_table_body {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#apc-table .gt_footnotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#apc-table .gt_footnote {
  margin: 0px;
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#apc-table .gt_sourcenotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#apc-table .gt_sourcenote {
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#apc-table .gt_left {
  text-align: left;
}

#apc-table .gt_center {
  text-align: center;
}

#apc-table .gt_right {
  text-align: right;
  font-variant-numeric: tabular-nums;
}

#apc-table .gt_font_normal {
  font-weight: normal;
}

#apc-table .gt_font_bold {
  font-weight: bold;
}

#apc-table .gt_font_italic {
  font-style: italic;
}

#apc-table .gt_super {
  font-size: 65%;
}

#apc-table .gt_footnote_marks {
  font-size: 75%;
  vertical-align: 0.4em;
  position: initial;
}

#apc-table .gt_asterisk {
  font-size: 100%;
  vertical-align: 0;
}

#apc-table .gt_indent_1 {
  text-indent: 5px;
}

#apc-table .gt_indent_2 {
  text-indent: 10px;
}

#apc-table .gt_indent_3 {
  text-indent: 15px;
}

#apc-table .gt_indent_4 {
  text-indent: 20px;
}

#apc-table .gt_indent_5 {
  text-indent: 25px;
}

#apc-table .katex-display {
  display: inline-flex !important;
  margin-bottom: 0.75em !important;
}

#apc-table div.Reactable > div.rt-table > div.rt-thead > div.rt-tr.rt-tr-group-header > div.rt-th-group:after {
  height: 0px !important;
}
</style>
<table class="gt_table" data-quarto-disable-processing="false" data-quarto-bootstrap="false">
  <thead>
    <tr class="gt_col_headings gt_spanner_row">
      <th class="gt_col_heading gt_columns_bottom_border gt_left" rowspan="2" colspan="1" scope="col" id="a::stub">Segment</th>
      <th class="gt_center gt_columns_top_border gt_column_spanner_outer" rowspan="1" colspan="2" scope="colgroup" id="spanner-Incidence_Male">
        <div class="gt_column_spanner">Incidence</div>
      </th>
      <th class="gt_center gt_columns_top_border gt_column_spanner_outer" rowspan="1" colspan="2" scope="colgroup" id="spanner-Mortality_Male">
        <div class="gt_column_spanner">Mortality</div>
      </th>
    </tr>
    <tr class="gt_col_headings">
      <th class="gt_col_heading gt_columns_bottom_border gt_center" rowspan="1" colspan="1" scope="col" id="Incidence_Male">Male</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_center" rowspan="1" colspan="1" scope="col" id="Incidence_Female">Female</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_center" rowspan="1" colspan="1" scope="col" id="Mortality_Male">Male</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_center" rowspan="1" colspan="1" scope="col" id="Mortality_Female">Female</th>
    </tr>
  </thead>
  <tbody class="gt_table_body">
    <tr class="gt_group_heading_row">
      <th colspan="5" class="gt_group_heading" scope="colgroup" id="Denmark">Denmark</th>
    </tr>
    <tr class="gt_row_group_first"><th id="stub_1_1" scope="row" class="gt_row gt_right gt_stub"><span data-qmd-base64="MQ=="><span class="gt_from_md">1</span></span></th>
<td headers="Denmark stub_1_1 Incidence_Male" class="gt_row gt_center"><span data-qmd-base64="My41NCAoMy4yMywgMy44NSk8YnI+MTk4MOKAkzIwMDQ="><span class="gt_from_md">3.54 (3.23, 3.85)<br>1980–2004</span></span></td>
<td headers="Denmark stub_1_1 Incidence_Female" class="gt_row gt_center"><span data-qmd-base64="Mi43MyAoMi40LCAzLjA2KTxicj4xOTgw4oCTMjAwMg=="><span class="gt_from_md">2.73 (2.4, 3.06)<br>1980–2002</span></span></td>
<td headers="Denmark stub_1_1 Mortality_Male" class="gt_row gt_center"><span data-qmd-base64="MC42MyAoMC4xLCAxLjE3KTxicj4xOTgw4oCTMjAwNA=="><span class="gt_from_md">0.63 (0.1, 1.17)<br>1980–2004</span></span></td>
<td headers="Denmark stub_1_1 Mortality_Female" class="gt_row gt_center"><span data-qmd-base64="MC41MSAoMC4xMiwgMC45MSk8YnI+MTk4MOKAkzIwMTE="><span class="gt_from_md">0.51 (0.12, 0.91)<br>1980–2011</span></span></td></tr>
    <tr><th id="stub_1_2" scope="row" class="gt_row gt_right gt_stub"><span data-qmd-base64="Mg=="><span class="gt_from_md">2</span></span></th>
<td headers="Denmark stub_1_2 Incidence_Male" class="gt_row gt_center"><span data-qmd-base64="OS44ICg3LjMyLCAxMi4zMyk8YnI+MjAwNOKAkzIwMDk="><span class="gt_from_md">9.8 (7.32, 12.33)<br>2004–2009</span></span></td>
<td headers="Denmark stub_1_2 Incidence_Female" class="gt_row gt_center"><span data-qmd-base64="Ni45MiAoNS45NiwgNy44OCk8YnI+MjAwMuKAkzIwMTE="><span class="gt_from_md">6.92 (5.96, 7.88)<br>2002–2011</span></span></td>
<td headers="Denmark stub_1_2 Mortality_Male" class="gt_row gt_center"><span data-qmd-base64="NC4xICgwLjkzLCA3LjM3KTxicj4yMDA04oCTMjAxMg=="><span class="gt_from_md">4.1 (0.93, 7.37)<br>2004–2012</span></span></td>
<td headers="Denmark stub_1_2 Mortality_Female" class="gt_row gt_center"><span data-qmd-base64="MTIuMDkgKC0xMi4yMywgNDMuMTQpPGJyPjIwMTHigJMyMDEz"><span class="gt_from_md">12.09 (-12.23, 43.14)<br>2011–2013</span></span></td></tr>
    <tr><th id="stub_1_3" scope="row" class="gt_row gt_right gt_stub"><span data-qmd-base64="Mw=="><span class="gt_from_md">3</span></span></th>
<td headers="Denmark stub_1_3 Incidence_Male" class="gt_row gt_center"><span data-qmd-base64="My4xICgyLjUzLCAzLjY4KTxicj4yMDA54oCTMjAyMA=="><span class="gt_from_md">3.1 (2.53, 3.68)<br>2009–2020</span></span></td>
<td headers="Denmark stub_1_3 Incidence_Female" class="gt_row gt_center"><span data-qmd-base64="MS43OSAoMS4xOCwgMi40MSk8YnI+MjAxMeKAkzIwMjA="><span class="gt_from_md">1.79 (1.18, 2.41)<br>2011–2020</span></span></td>
<td headers="Denmark stub_1_3 Mortality_Male" class="gt_row gt_center"><span data-qmd-base64="LTIuMTUgKC00LjA4LCAtMC4xOSk8YnI+MjAxMuKAkzIwMjA="><span class="gt_from_md">-2.15 (-4.08, -0.19)<br>2012–2020</span></span></td>
<td headers="Denmark stub_1_3 Mortality_Female" class="gt_row gt_center"><span data-qmd-base64="LTQuMDggKC03LjM1LCAtMC42OSk8YnI+MjAxM+KAkzIwMjA="><span class="gt_from_md">-4.08 (-7.35, -0.69)<br>2013–2020</span></span></td></tr>
    <tr class="gt_group_heading_row">
      <th colspan="5" class="gt_group_heading" scope="colgroup" id="Norway">Norway</th>
    </tr>
    <tr class="gt_row_group_first"><th id="stub_1_4" scope="row" class="gt_row gt_right gt_stub"><span data-qmd-base64="MQ=="><span class="gt_from_md">1</span></span></th>
<td headers="Norway stub_1_4 Incidence_Male" class="gt_row gt_center"><span data-qmd-base64="NS40OSAoNC4zMiwgNi42Nyk8YnI+MTk4MOKAkzE5OTE="><span class="gt_from_md">5.49 (4.32, 6.67)<br>1980–1991</span></span></td>
<td headers="Norway stub_1_4 Incidence_Female" class="gt_row gt_center"><span data-qmd-base64="My44NyAoMi42NywgNS4wOSk8YnI+MTk4MOKAkzE5OTA="><span class="gt_from_md">3.87 (2.67, 5.09)<br>1980–1990</span></span></td>
<td headers="Norway stub_1_4 Mortality_Male" class="gt_row gt_center"><span data-qmd-base64="MTcuMTIgKC0wLjMyLCAzNy42MSk8YnI+MTk4MOKAkzE5ODI="><span class="gt_from_md">17.12 (-0.32, 37.61)<br>1980–1982</span></span></td>
<td headers="Norway stub_1_4 Mortality_Female" class="gt_row gt_center"><span data-qmd-base64="MS4wMiAoMC4xOSwgMS44NSk8YnI+MTk4MOKAkzIwMDA="><span class="gt_from_md">1.02 (0.19, 1.85)<br>1980–2000</span></span></td></tr>
    <tr><th id="stub_1_5" scope="row" class="gt_row gt_right gt_stub"><span data-qmd-base64="Mg=="><span class="gt_from_md">2</span></span></th>
<td headers="Norway stub_1_5 Incidence_Male" class="gt_row gt_center"><span data-qmd-base64="MC40NSAoLTAuNDQsIDEuMzUpPGJyPjE5OTHigJMyMDAy"><span class="gt_from_md">0.45 (-0.44, 1.35)<br>1991–2002</span></span></td>
<td headers="Norway stub_1_5 Incidence_Female" class="gt_row gt_center"><span data-qmd-base64="MC4xNyAoLTAuNjksIDEuMDQpPGJyPjE5OTDigJMyMDAw"><span class="gt_from_md">0.17 (-0.69, 1.04)<br>1990–2000</span></span></td>
<td headers="Norway stub_1_5 Mortality_Male" class="gt_row gt_center"><span data-qmd-base64="MS43OCAoMS4zOSwgMi4xOCk8YnI+MTk4MuKAkzIwMTE="><span class="gt_from_md">1.78 (1.39, 2.18)<br>1982–2011</span></span></td>
<td headers="Norway stub_1_5 Mortality_Female" class="gt_row gt_center"><span data-qmd-base64="Mi42NCAoMS4yNSwgNC4wNSk8YnI+MjAwMOKAkzIwMTM="><span class="gt_from_md">2.64 (1.25, 4.05)<br>2000–2013</span></span></td></tr>
    <tr><th id="stub_1_6" scope="row" class="gt_row gt_right gt_stub"><span data-qmd-base64="Mw=="><span class="gt_from_md">3</span></span></th>
<td headers="Norway stub_1_6 Incidence_Male" class="gt_row gt_center"><span data-qmd-base64="NC4zOSAoNC4wOCwgNC42OSk8YnI+MjAwMuKAkzIwMjA="><span class="gt_from_md">4.39 (4.08, 4.69)<br>2002–2020</span></span></td>
<td headers="Norway stub_1_6 Incidence_Female" class="gt_row gt_center"><span data-qmd-base64="My42NyAoMy4zOSwgMy45NSk8YnI+MjAwMOKAkzIwMjA="><span class="gt_from_md">3.67 (3.39, 3.95)<br>2000–2020</span></span></td>
<td headers="Norway stub_1_6 Mortality_Male" class="gt_row gt_center"><span data-qmd-base64="LTIuMDUgKC0zLjg3LCAtMC4yKTxicj4yMDEx4oCTMjAyMA=="><span class="gt_from_md">-2.05 (-3.87, -0.2)<br>2011–2020</span></span></td>
<td headers="Norway stub_1_6 Mortality_Female" class="gt_row gt_center"><span data-qmd-base64="LTUuMjggKC04LjQxLCAtMi4wNSk8YnI+MjAxM+KAkzIwMjA="><span class="gt_from_md">-5.28 (-8.41, -2.05)<br>2013–2020</span></span></td></tr>
  </tbody>
  
  
</table>
</div>
</div>
</div>
</div>
</div>
</div>
</section>
<section id="comparing-countries" class="level3">
<h3 class="anchored" data-anchor-id="comparing-countries">Comparing countries</h3>
<div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-5-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-5-1" aria-controls="tabset-5-1" aria-selected="true">Incidence</a></li><li class="nav-item"><a class="nav-link" id="tabset-5-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-5-2" aria-controls="tabset-5-2" aria-selected="false">Mortality</a></li></ul>
<div class="tab-content">
<div id="tabset-5-1" class="tab-pane active" aria-labelledby="tabset-5-1-tab">
<div class="cell">
<details open="">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb23-1">mdl_inc <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">glm</span>(</span>
<span id="cb23-2">  <span class="at" style="color: #657422;">data =</span> rate_df,</span>
<span id="cb23-3">  <span class="at" style="color: #657422;">formula =</span> count <span class="sc" style="color: #5E5E5E;">~</span> year <span class="sc" style="color: #5E5E5E;">+</span> sex <span class="sc" style="color: #5E5E5E;">+</span> country,</span>
<span id="cb23-4">  <span class="at" style="color: #657422;">offset =</span> <span class="fu" style="color: #4758AB;">log</span>(population),</span>
<span id="cb23-5">  <span class="at" style="color: #657422;">family =</span> <span class="fu" style="color: #4758AB;">poisson</span>(<span class="at" style="color: #657422;">link =</span> <span class="st" style="color: #20794D;">"log"</span>),</span>
<span id="cb23-6">  <span class="at" style="color: #657422;">subset =</span> type <span class="sc" style="color: #5E5E5E;">==</span> <span class="st" style="color: #20794D;">"Incidence"</span></span>
<span id="cb23-7">)</span>
<span id="cb23-8"></span>
<span id="cb23-9">broom<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">tidy</span>(mdl_inc, <span class="at" style="color: #657422;">conf.int =</span> <span class="cn" style="color: #8f5902;">TRUE</span>, <span class="at" style="color: #657422;">exponentiate =</span> <span class="cn" style="color: #8f5902;">TRUE</span>) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb23-10">  <span class="fu" style="color: #4758AB;">mutate</span>(<span class="fu" style="color: #4758AB;">across</span>(<span class="fu" style="color: #4758AB;">c</span>(<span class="dv" style="color: #AD0000;">2</span><span class="sc" style="color: #5E5E5E;">:</span><span class="dv" style="color: #AD0000;">4</span>, <span class="dv" style="color: #AD0000;">6</span><span class="sc" style="color: #5E5E5E;">:</span><span class="dv" style="color: #AD0000;">7</span>), round, <span class="dv" style="color: #AD0000;">3</span>))</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tidytable: 7 × 7
  term           estimate std.error statistic   p.value conf.low conf.high
  &lt;chr&gt;             &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;    &lt;dbl&gt;     &lt;dbl&gt;
1 (Intercept)       0         0.383   -205.   0            0         0    
2 year              1.04      0        185.   0            1.04      1.04 
3 sexMale           1.01      0.004      1.90 5.68e-  2    1         1.02 
4 countryFinland    0.649     0.007    -62.6  0            0.64      0.658
5 countryIceland    0.494     0.028    -25.6  1.86e-144    0.468     0.521
6 countryNorway     1.05      0.006      8.09 5.84e- 16    1.04      1.06 
7 countrySweden     0.936     0.005    -12.1  1.41e- 33    0.926     0.946</code></pre>
</div>
</div>
</div>
<div id="tabset-5-2" class="tab-pane" aria-labelledby="tabset-5-2-tab">
<div class="cell">
<details open="">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb25" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb25-1">mdl_mor <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">glm</span>(</span>
<span id="cb25-2">  <span class="at" style="color: #657422;">data =</span> rate_df,</span>
<span id="cb25-3">  <span class="at" style="color: #657422;">formula =</span> count <span class="sc" style="color: #5E5E5E;">~</span> year <span class="sc" style="color: #5E5E5E;">+</span> sex <span class="sc" style="color: #5E5E5E;">+</span> country,</span>
<span id="cb25-4">  <span class="at" style="color: #657422;">offset =</span> <span class="fu" style="color: #4758AB;">log</span>(population),</span>
<span id="cb25-5">  <span class="at" style="color: #657422;">family =</span> <span class="fu" style="color: #4758AB;">poisson</span>(<span class="at" style="color: #657422;">link =</span> <span class="st" style="color: #20794D;">"log"</span>),</span>
<span id="cb25-6">  <span class="at" style="color: #657422;">subset =</span> type <span class="sc" style="color: #5E5E5E;">==</span> <span class="st" style="color: #20794D;">"Mortality"</span></span>
<span id="cb25-7">)</span>
<span id="cb25-8"></span>
<span id="cb25-9">broom<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">tidy</span>(mdl_mor, <span class="at" style="color: #657422;">conf.int =</span> <span class="cn" style="color: #8f5902;">TRUE</span>, <span class="at" style="color: #657422;">exponentiate =</span> <span class="cn" style="color: #8f5902;">TRUE</span>) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb25-10">  <span class="fu" style="color: #4758AB;">mutate</span>(<span class="fu" style="color: #4758AB;">across</span>(<span class="fu" style="color: #4758AB;">c</span>(<span class="dv" style="color: #AD0000;">2</span><span class="sc" style="color: #5E5E5E;">:</span><span class="dv" style="color: #AD0000;">4</span>, <span class="dv" style="color: #AD0000;">6</span><span class="sc" style="color: #5E5E5E;">:</span><span class="dv" style="color: #AD0000;">7</span>), round, <span class="dv" style="color: #AD0000;">3</span>))</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tidytable: 7 × 7
  term           estimate std.error statistic   p.value conf.low conf.high
  &lt;chr&gt;             &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;    &lt;dbl&gt;     &lt;dbl&gt;
1 (Intercept)       0         0.844    -40.9  0            0         0    
2 year              1.01      0         29.2  1.53e-187    1.01      1.01 
3 sexMale           1.46      0.01      38.0  0            1.43      1.49 
4 countryFinland    0.731     0.016    -19.2  1.67e- 82    0.708     0.755
5 countryIceland    0.611     0.061     -8.02 1.05e- 15    0.54      0.688
6 countryNorway     1.24      0.015     14.6  3.57e- 48    1.20      1.27 
7 countrySweden     1.03      0.013      2.2  2.78e-  2    1.00      1.06 </code></pre>
</div>
</div>
</div>
</div>
</div>
</section>
</section>
<section id="summary" class="level2">
<h2 class="anchored" data-anchor-id="summary">Summary</h2>
<p>In summary, melanoma trends across the Nordic countries reveal some concerning patterns. Incidence and mortality have been increasing in all countries, with Norway showing the highest mortality rate and the most substantial rise in both incidence and mortality. Denmark saw a particularly rapid increase in melanoma cases between 2002 and 2011 in women and between 2004 and 2009 in men, surpassing Norway during those periods.</p>
<p>However, there is a silver lining: recent years have shown a decrease in melanoma mortality rates across all the Nordic countries. This positive trend suggests that advancements in medical treatments, early detection efforts, and greater public awareness are starting to make a difference. While the rise in incidence remains a challenge, the declining mortality rates provide a hopeful perspective moving forward.</p>
</section>
<section id="references" class="level2">
<h2 class="anchored" data-anchor-id="references">References</h2>
<div id="refs" class="references csl-bib-body hanging-indent">
<div id="ref-engholm_nordcan_2010" class="csl-entry">
Engholm, Gerda, Jacques Ferlay, Niels Christensen, Freddie Bray, Marianne L. Gjerstorff, Åsa Klint, Jóanis E. Køtlum, Elínborg Ólafsdóttir, Eero Pukkala, and Hans H. Storm. 2010. <span>“<span>NORDCAN</span> – a Nordic Tool for Cancer Information, Planning, Quality Control and Research.”</span> <em>Acta Oncologica</em> 49 (5): 725–36. <a href="https://doi.org/ch4598">https://doi.org/ch4598</a>.
</div>
<div id="ref-nordcan-2023" class="csl-entry">
Larønningen, S., J. Ferlay, H. Beydogan, F. Bray, G. Engholm, M. Ervik, J. Gulbrandsen, et al. 2022. <span>“<span>NORDCAN</span>: Cancer Incidence, Mortality, Prevalence and Survival in the Nordic Countries, Version 9.2 (23.06.2022).”</span> 2022. <a href="https://nordcan.iarc.fr/">https://nordcan.iarc.fr/</a>.
</div>
<div id="ref-muggeo_segmented_2008" class="csl-entry">
Muggeo, Vito M. R. 2008. <span>“Segmented: An r Package to Fit Regression Models with Broken-Line Relationships.”</span> <em>R News</em> 8 (1): 20–25. <a href="https://cran.r-project.org/doc/Rnews/">https://cran.r-project.org/doc/Rnews/</a>.
</div>
<div id="ref-cancer_registry_of_norway_cancer_2022-1" class="csl-entry">
Norway, Cancer Registry of. 2022. <span>“Cancer in <span>Norway</span> 2021: <span>Cancer</span> Incidence, Mortality, Survival and Prevalence in <span>Norway</span>.”</span> 0806-3621. Cancer Registry of Norway. <a href="https://www.kreftregisteret.no/globalassets/cancer-in-norway/2021/cin_report.pdf">https://www.kreftregisteret.no/globalassets/cancer-in-norway/2021/cin_report.pdf</a>.
</div>
<div id="ref-r_core_team_r_2020" class="csl-entry">
R Core Team. 2020. <span>“R: <span>A</span> Language and Environment for Statistical Computing.”</span> Manual. Vienna, Austria. <a href="https://www.R-project.org/">https://www.R-project.org/</a>.
</div>
<div id="ref-welch_rapid_2021" class="csl-entry">
Welch, H. Gilbert, Benjamin L. Mazer, and Adewole S. Adamson. 2021. <span>“The <span>Rapid</span> <span>Rise</span> in <span>Cutaneous</span> <span>Melanoma</span> <span>Diagnoses</span>.”</span> <em>New England Journal of Medicine</em> 384 (1): 72–79. <a href="https://doi.org/gm6vs4">https://doi.org/gm6vs4</a>.
</div>
<div id="ref-whiteman_growing_2016" class="csl-entry">
Whiteman, David C., Adele C. Green, and Catherine M. Olsen. 2016. <span>“The <span>Growing</span> <span>Burden</span> of <span>Invasive</span> <span>Melanoma</span>: <span>Projections</span> of <span>Incidence</span> <span>Rates</span> and <span>Numbers</span> of <span>New</span> <span>Cases</span> in <span>Six</span> <span>Susceptible</span> <span>Populations</span> Through 2031.”</span> <em>The Journal of Investigative Dermatology</em> 136 (6): 1161–71. <a href="https://doi.org/f8psv4">https://doi.org/f8psv4</a>.
</div>
</div>


<!-- -->

</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p><code>https://gco.iarc.fr/gateway_prod/api/nordcan/v2/92/data/population/{type}/{sex}/({country})/({cancer})/?ages_group=5_17&amp;year_start=1980&amp;year_end=2020&amp;year_grouped=0</code>↩︎</p></li>
</ol>
</section></div> ]]></description>
  <guid>https://mathatistics.com/blog/posts/2023-06-20-nordic-melanoma/index.html</guid>
  <pubDate>Mon, 19 Jun 2023 22:00:00 GMT</pubDate>
  <media:content url="https://mathatistics.com/blog/posts/2023-06-20-nordic-melanoma/Plots/Sgmt-all.svg" medium="image" type="image/svg+xml"/>
</item>
<item>
  <title>Age Adjusted Rates in Epidemiology</title>
  <dc:creator>TheRimalaya</dc:creator>
  <link>https://mathatistics.com/blog/posts/2022-12-02-age-adjusted-rates/index.html</link>
  <description><![CDATA[ 



<p>In general, rates means how fast something is changing usually over time. In epidemiology uses it to describe how quickly a disese occurs in a population. For example, 35 cases of melanoma cases in 100,000 person per year convey a the sense of speed of spread of disease in that population. Incidence rate and mortality rate are two examples that we will discuss further below.</p>
<blockquote class="blockquote">
<p>In epidemiology, rate measures the frequency of occurance of an event in a given population over certain period of time<sup>1</sup>.</p>
</blockquote>
<p>Let us use melanoma as a outcome in the following discussion. Here, we can calculate a crude incidence rate as,</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BR%7D%20=%20%5Cfrac%7B%5Ctextsf%7Bno.%20of%20melanoma%20cases%7D%7D%7B%5Ctextsf%7Bno.%20of%20person-year%7D%7D%20%5Ctimes%20%5Ctextsf%7Bsome%20multiplier%7D"></p>
<p>In the case of mortality rate, we can replace the numerator of above expression by the number of melanoma deaths.</p>
<section id="age-specific-rate" class="level2">
<h2 class="anchored" data-anchor-id="age-specific-rate">Age-specific rate</h2>
<p>Weather to understand a broder prespecitve or to compare across population, these rates are often analyzed stratified by sex and age. This also helps to remove the confounding effection of these factors. The incidence/mortality rate per age-group is usually referred to as <em>Age-specific rates</em> where rates are computed for each age-groups. This is often desirable since factor <code>age</code> has a strong effect on mortality and incidence of most disease especially the cronic one.</p>
</section>
<section id="age-adjusted-rate" class="level2">
<h2 class="anchored" data-anchor-id="age-adjusted-rate">Age-adjusted rate</h2>
<p>Many research articles, however presents the age-adjusted rates. Age-adjusted rates are standardized (weighted) using some standard population age-structure. For example, many european studies on melanoma uses european standard age distribution. While any reasonal studies have also used world standard population. Cancer registry in their reports sometimes uses age-structure of that country in some given year. For instance, Norway<sup>2</sup> and Finland<sup>3</sup> have used the their population in 2014 as standard population in their recent cancer report while Australia have used 2001 Australian population<sup>4</sup>.</p>
<p>Standardized (adjusted) rates makes comparison between the population possible. Figure Figure&nbsp;1 shows the difference in the age distribution between world population and European population. Following table are some of the standard population often used in the study. Further on standard popuation see <a href="https://seer.cancer.gov/stdpopulations/stdpop.19ages.html"><code>seer.cancer.gov</code></a><sup>5</sup>.</p>
<section id="standard-population-by-age" class="level3">
<h3 class="anchored" data-anchor-id="standard-population-by-age">Standard Population by Age</h3>
<div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-1-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-1" aria-controls="tabset-1-1" aria-selected="true">Table</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-2" aria-controls="tabset-1-2" aria-selected="false">Plot</a></li></ul>
<div class="tab-content">
<div id="tabset-1-1" class="tab-pane active" aria-labelledby="tabset-1-1-tab">
<div class="cell">
<details>
<summary>Standard Population</summary>
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1">us2000 <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">tidytable</span>(</span>
<span id="cb1-2">  <span class="st" style="color: #20794D;">`</span><span class="at" style="color: #657422;">AgeGroup</span><span class="st" style="color: #20794D;">`</span> <span class="ot" style="color: #003B4F;">=</span> <span class="fu" style="color: #4758AB;">c</span>(</span>
<span id="cb1-3">    <span class="st" style="color: #20794D;">"0"</span>, <span class="st" style="color: #20794D;">"1-4"</span>, <span class="st" style="color: #20794D;">"5-9"</span>, <span class="st" style="color: #20794D;">"10-14"</span>, <span class="st" style="color: #20794D;">"15-19"</span>, </span>
<span id="cb1-4">    <span class="st" style="color: #20794D;">"20-24"</span>, <span class="st" style="color: #20794D;">"25-29"</span>, <span class="st" style="color: #20794D;">"30-34"</span>, <span class="st" style="color: #20794D;">"35-39"</span>, <span class="st" style="color: #20794D;">"40-44"</span>, <span class="st" style="color: #20794D;">"45-49"</span>, <span class="st" style="color: #20794D;">"50-54"</span>,</span>
<span id="cb1-5">    <span class="st" style="color: #20794D;">"55-59"</span>, <span class="st" style="color: #20794D;">"60-64"</span>, <span class="st" style="color: #20794D;">"65-69"</span>, <span class="st" style="color: #20794D;">"70-74"</span>, <span class="st" style="color: #20794D;">"75-79"</span>, <span class="st" style="color: #20794D;">"80-84"</span>, <span class="st" style="color: #20794D;">"85+"</span></span>
<span id="cb1-6">  ), </span>
<span id="cb1-7">  <span class="at" style="color: #657422;">US2000 =</span> <span class="fu" style="color: #4758AB;">c</span>(</span>
<span id="cb1-8">    <span class="dv" style="color: #AD0000;">13818</span>, <span class="dv" style="color: #AD0000;">55317</span>, <span class="dv" style="color: #AD0000;">72533</span>, <span class="dv" style="color: #AD0000;">73032</span>, <span class="dv" style="color: #AD0000;">72169</span>, <span class="dv" style="color: #AD0000;">66478</span>, <span class="dv" style="color: #AD0000;">64529</span>, <span class="dv" style="color: #AD0000;">71044</span>, <span class="dv" style="color: #AD0000;">80762</span>, <span class="dv" style="color: #AD0000;">81851</span>, </span>
<span id="cb1-9">    <span class="dv" style="color: #AD0000;">72118</span>, <span class="dv" style="color: #AD0000;">62716</span>, <span class="dv" style="color: #AD0000;">48454</span>, <span class="dv" style="color: #AD0000;">38793</span>, <span class="dv" style="color: #AD0000;">34264</span>, <span class="dv" style="color: #AD0000;">31773</span>, <span class="dv" style="color: #AD0000;">26999</span>, <span class="dv" style="color: #AD0000;">17842</span>, <span class="dv" style="color: #AD0000;">15508</span></span>
<span id="cb1-10">  )</span>
<span id="cb1-11">)</span>
<span id="cb1-12">std_pop <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">as_tidytable</span>(popEpi<span class="sc" style="color: #5E5E5E;">::</span>stdpop18) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb1-13">  <span class="fu" style="color: #4758AB;">mutate</span>(<span class="at" style="color: #657422;">agegroup =</span> <span class="fu" style="color: #4758AB;">case_when</span>(</span>
<span id="cb1-14">    agegroup <span class="sc" style="color: #5E5E5E;">==</span> <span class="st" style="color: #20794D;">"85"</span> <span class="sc" style="color: #5E5E5E;">~</span> <span class="st" style="color: #20794D;">"85+"</span>, </span>
<span id="cb1-15">    <span class="cn" style="color: #8f5902;">TRUE</span> <span class="sc" style="color: #5E5E5E;">~</span> agegroup</span>
<span id="cb1-16">  )) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">mutate</span>(</span>
<span id="cb1-17">    <span class="at" style="color: #657422;">age =</span> <span class="fu" style="color: #4758AB;">case_when</span>(</span>
<span id="cb1-18">      agegroup <span class="sc" style="color: #5E5E5E;">==</span> <span class="st" style="color: #20794D;">"0-4"</span> <span class="sc" style="color: #5E5E5E;">~</span> <span class="fu" style="color: #4758AB;">list</span>(<span class="fu" style="color: #4758AB;">c</span>(<span class="st" style="color: #20794D;">"0"</span>, <span class="st" style="color: #20794D;">"1-4"</span>)),</span>
<span id="cb1-19">      <span class="cn" style="color: #8f5902;">TRUE</span> <span class="sc" style="color: #5E5E5E;">~</span> <span class="fu" style="color: #4758AB;">as.list</span>(agegroup)</span>
<span id="cb1-20">    )</span>
<span id="cb1-21">  ) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">unnest</span>(age) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">rename_with</span>(</span>
<span id="cb1-22">    <span class="sc" style="color: #5E5E5E;">~</span><span class="fu" style="color: #4758AB;">c</span>(<span class="st" style="color: #20794D;">"Age Group"</span>, <span class="st" style="color: #20794D;">"World"</span>, <span class="st" style="color: #20794D;">"Europe"</span>, <span class="st" style="color: #20794D;">"Nordic"</span>, <span class="st" style="color: #20794D;">"AgeGroup"</span>)</span>
<span id="cb1-23">  ) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">left_join</span>(us2000) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb1-24">  tidytable<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">mutate</span>(<span class="fu" style="color: #4758AB;">across</span>(<span class="fu" style="color: #4758AB;">everything</span>(), <span class="cf" style="color: #003B4F;">function</span>(x) {</span>
<span id="cb1-25">    <span class="cf" style="color: #003B4F;">if</span> (x[<span class="dv" style="color: #AD0000;">1</span>] <span class="sc" style="color: #5E5E5E;">==</span> x[<span class="dv" style="color: #AD0000;">2</span>]) x[<span class="dv" style="color: #AD0000;">2</span>] <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="cn" style="color: #8f5902;">NA</span></span>
<span id="cb1-26">    <span class="fu" style="color: #4758AB;">return</span>(x)</span>
<span id="cb1-27">  })) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">mutate</span>(</span>
<span id="cb1-28">    <span class="at" style="color: #657422;">AgeGroup =</span> <span class="fu" style="color: #4758AB;">c</span>(AgeGroup[<span class="dv" style="color: #AD0000;">1</span><span class="sc" style="color: #5E5E5E;">:</span><span class="dv" style="color: #AD0000;">2</span>], <span class="fu" style="color: #4758AB;">rep</span>(<span class="cn" style="color: #8f5902;">NA</span>, <span class="fu" style="color: #4758AB;">length</span>(AgeGroup) <span class="sc" style="color: #5E5E5E;">-</span> <span class="dv" style="color: #AD0000;">2</span>))</span>
<span id="cb1-29">  )</span>
<span id="cb1-30"></span>
<span id="cb1-31">gt<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">gt</span>(std_pop) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb1-32">  gt<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">sub_missing</span>(<span class="at" style="color: #657422;">missing_text =</span> <span class="st" style="color: #20794D;">""</span>) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb1-33">  gt<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">cols_label</span>(<span class="at" style="color: #657422;">AgeGroup =</span> <span class="st" style="color: #20794D;">""</span>) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb1-34">  gt<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">opt_vertical_padding</span>(<span class="fl" style="color: #AD0000;">0.5</span>) <span class="sc" style="color: #5E5E5E;">%&gt;%</span></span>
<span id="cb1-35">  gt<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">tab_style</span>(</span>
<span id="cb1-36">    <span class="at" style="color: #657422;">style =</span> gt<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">cell_borders</span>(<span class="at" style="color: #657422;">sides =</span> <span class="st" style="color: #20794D;">"top"</span>, <span class="at" style="color: #657422;">weight =</span> <span class="st" style="color: #20794D;">"0"</span>),</span>
<span id="cb1-37">    <span class="at" style="color: #657422;">locations =</span> gt<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">cells_body</span>(<span class="dv" style="color: #AD0000;">1</span><span class="sc" style="color: #5E5E5E;">:</span><span class="dv" style="color: #AD0000;">4</span>, <span class="at" style="color: #657422;">rows =</span> <span class="dv" style="color: #AD0000;">2</span>)</span>
<span id="cb1-38">  ) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb1-39">  gt<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">tab_options</span>(<span class="at" style="color: #657422;">column_labels.font.weight =</span> <span class="st" style="color: #20794D;">"bold"</span>) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb1-40">  gt<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">tab_header</span>(<span class="st" style="color: #20794D;">"Standard Population by Age Group"</span>)</span></code></pre></div>
</details>
<div class="cell-output-display">

<div id="gizhvaeihp" style="padding-left:0px;padding-right:0px;padding-top:10px;padding-bottom:10px;overflow-x:auto;overflow-y:auto;width:auto;height:auto;">
<style>#gizhvaeihp table {
  font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji';
  -webkit-font-smoothing: antialiased;
  -moz-osx-font-smoothing: grayscale;
}

#gizhvaeihp thead, #gizhvaeihp tbody, #gizhvaeihp tfoot, #gizhvaeihp tr, #gizhvaeihp td, #gizhvaeihp th {
  border-style: none;
}

#gizhvaeihp p {
  margin: 0;
  padding: 0;
}

#gizhvaeihp .gt_table {
  display: table;
  border-collapse: collapse;
  line-height: normal;
  margin-left: auto;
  margin-right: auto;
  color: #333333;
  font-size: 16px;
  font-weight: normal;
  font-style: normal;
  background-color: #FFFFFF;
  width: auto;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #A8A8A8;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #A8A8A8;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
}

#gizhvaeihp .gt_caption {
  padding-top: 4px;
  padding-bottom: 4px;
}

#gizhvaeihp .gt_title {
  color: #333333;
  font-size: 125%;
  font-weight: initial;
  padding-top: 2px;
  padding-bottom: 2px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-color: #FFFFFF;
  border-bottom-width: 0;
}

#gizhvaeihp .gt_subtitle {
  color: #333333;
  font-size: 85%;
  font-weight: initial;
  padding-top: 1px;
  padding-bottom: 3px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-color: #FFFFFF;
  border-top-width: 0;
}

#gizhvaeihp .gt_heading {
  background-color: #FFFFFF;
  text-align: center;
  border-bottom-color: #FFFFFF;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#gizhvaeihp .gt_bottom_border {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#gizhvaeihp .gt_col_headings {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#gizhvaeihp .gt_col_heading {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: bold;
  text-transform: inherit;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 2.5px;
  padding-bottom: 3.5px;
  padding-left: 5px;
  padding-right: 5px;
  overflow-x: hidden;
}

#gizhvaeihp .gt_column_spanner_outer {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: bold;
  text-transform: inherit;
  padding-top: 0;
  padding-bottom: 0;
  padding-left: 4px;
  padding-right: 4px;
}

#gizhvaeihp .gt_column_spanner_outer:first-child {
  padding-left: 0;
}

#gizhvaeihp .gt_column_spanner_outer:last-child {
  padding-right: 0;
}

#gizhvaeihp .gt_column_spanner {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 2.5px;
  padding-bottom: 2.5px;
  overflow-x: hidden;
  display: inline-block;
  width: 100%;
}

#gizhvaeihp .gt_spanner_row {
  border-bottom-style: hidden;
}

#gizhvaeihp .gt_group_heading {
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  text-align: left;
}

#gizhvaeihp .gt_empty_group_heading {
  padding: 0.5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: middle;
}

#gizhvaeihp .gt_from_md > :first-child {
  margin-top: 0;
}

#gizhvaeihp .gt_from_md > :last-child {
  margin-bottom: 0;
}

#gizhvaeihp .gt_row {
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  margin: 10px;
  border-top-style: solid;
  border-top-width: 1px;
  border-top-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  overflow-x: hidden;
}

#gizhvaeihp .gt_stub {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
}

#gizhvaeihp .gt_stub_row_group {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
  vertical-align: top;
}

#gizhvaeihp .gt_row_group_first td {
  border-top-width: 2px;
}

#gizhvaeihp .gt_row_group_first th {
  border-top-width: 2px;
}

#gizhvaeihp .gt_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#gizhvaeihp .gt_first_summary_row {
  border-top-style: solid;
  border-top-color: #D3D3D3;
}

#gizhvaeihp .gt_first_summary_row.thick {
  border-top-width: 2px;
}

#gizhvaeihp .gt_last_summary_row {
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#gizhvaeihp .gt_grand_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#gizhvaeihp .gt_first_grand_summary_row {
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: double;
  border-top-width: 6px;
  border-top-color: #D3D3D3;
}

#gizhvaeihp .gt_last_grand_summary_row_top {
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: double;
  border-bottom-width: 6px;
  border-bottom-color: #D3D3D3;
}

#gizhvaeihp .gt_striped {
  background-color: rgba(128, 128, 128, 0.05);
}

#gizhvaeihp .gt_table_body {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#gizhvaeihp .gt_footnotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#gizhvaeihp .gt_footnote {
  margin: 0px;
  font-size: 90%;
  padding-top: 2px;
  padding-bottom: 2px;
  padding-left: 5px;
  padding-right: 5px;
}

#gizhvaeihp .gt_sourcenotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#gizhvaeihp .gt_sourcenote {
  font-size: 90%;
  padding-top: 2px;
  padding-bottom: 2px;
  padding-left: 5px;
  padding-right: 5px;
}

#gizhvaeihp .gt_left {
  text-align: left;
}

#gizhvaeihp .gt_center {
  text-align: center;
}

#gizhvaeihp .gt_right {
  text-align: right;
  font-variant-numeric: tabular-nums;
}

#gizhvaeihp .gt_font_normal {
  font-weight: normal;
}

#gizhvaeihp .gt_font_bold {
  font-weight: bold;
}

#gizhvaeihp .gt_font_italic {
  font-style: italic;
}

#gizhvaeihp .gt_super {
  font-size: 65%;
}

#gizhvaeihp .gt_footnote_marks {
  font-size: 75%;
  vertical-align: 0.4em;
  position: initial;
}

#gizhvaeihp .gt_asterisk {
  font-size: 100%;
  vertical-align: 0;
}

#gizhvaeihp .gt_indent_1 {
  text-indent: 5px;
}

#gizhvaeihp .gt_indent_2 {
  text-indent: 10px;
}

#gizhvaeihp .gt_indent_3 {
  text-indent: 15px;
}

#gizhvaeihp .gt_indent_4 {
  text-indent: 20px;
}

#gizhvaeihp .gt_indent_5 {
  text-indent: 25px;
}

#gizhvaeihp .katex-display {
  display: inline-flex !important;
  margin-bottom: 0.75em !important;
}

#gizhvaeihp div.Reactable > div.rt-table > div.rt-thead > div.rt-tr.rt-tr-group-header > div.rt-th-group:after {
  height: 0px !important;
}
</style>
<table class="gt_table" data-quarto-disable-processing="false" data-quarto-bootstrap="false">
  <thead>
    <tr class="gt_heading">
      <td colspan="6" class="gt_heading gt_title gt_font_normal gt_bottom_border" style="">Standard Population by Age Group</td>
    </tr>
    
    <tr class="gt_col_headings">
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="Age-Group">Age Group</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="World">World</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="Europe">Europe</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="Nordic">Nordic</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="AgeGroup"></th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="US2000">US2000</th>
    </tr>
  </thead>
  <tbody class="gt_table_body">
    <tr><td headers="Age Group" class="gt_row gt_right">0-4</td>
<td headers="World" class="gt_row gt_right">12000</td>
<td headers="Europe" class="gt_row gt_right">8000</td>
<td headers="Nordic" class="gt_row gt_right">5900</td>
<td headers="AgeGroup" class="gt_row gt_right">0</td>
<td headers="US2000" class="gt_row gt_right">13818</td></tr>
    <tr><td headers="Age Group" class="gt_row gt_right" style="border-top-width: 0; border-top-style: solid; border-top-color: #000000;"><br></td>
<td headers="World" class="gt_row gt_right" style="border-top-width: 0; border-top-style: solid; border-top-color: #000000;"><br></td>
<td headers="Europe" class="gt_row gt_right" style="border-top-width: 0; border-top-style: solid; border-top-color: #000000;"><br></td>
<td headers="Nordic" class="gt_row gt_right" style="border-top-width: 0; border-top-style: solid; border-top-color: #000000;"><br></td>
<td headers="AgeGroup" class="gt_row gt_right">1-4</td>
<td headers="US2000" class="gt_row gt_right">55317</td></tr>
    <tr><td headers="Age Group" class="gt_row gt_right">5-9</td>
<td headers="World" class="gt_row gt_right">10000</td>
<td headers="Europe" class="gt_row gt_right">7000</td>
<td headers="Nordic" class="gt_row gt_right">6600</td>
<td headers="AgeGroup" class="gt_row gt_right"><br></td>
<td headers="US2000" class="gt_row gt_right">72533</td></tr>
    <tr><td headers="Age Group" class="gt_row gt_right">10-14</td>
<td headers="World" class="gt_row gt_right">9000</td>
<td headers="Europe" class="gt_row gt_right">7000</td>
<td headers="Nordic" class="gt_row gt_right">6200</td>
<td headers="AgeGroup" class="gt_row gt_right"><br></td>
<td headers="US2000" class="gt_row gt_right">73032</td></tr>
    <tr><td headers="Age Group" class="gt_row gt_right">15-19</td>
<td headers="World" class="gt_row gt_right">9000</td>
<td headers="Europe" class="gt_row gt_right">7000</td>
<td headers="Nordic" class="gt_row gt_right">5800</td>
<td headers="AgeGroup" class="gt_row gt_right"><br></td>
<td headers="US2000" class="gt_row gt_right">72169</td></tr>
    <tr><td headers="Age Group" class="gt_row gt_right">20-24</td>
<td headers="World" class="gt_row gt_right">8000</td>
<td headers="Europe" class="gt_row gt_right">7000</td>
<td headers="Nordic" class="gt_row gt_right">6100</td>
<td headers="AgeGroup" class="gt_row gt_right"><br></td>
<td headers="US2000" class="gt_row gt_right">66478</td></tr>
    <tr><td headers="Age Group" class="gt_row gt_right">25-29</td>
<td headers="World" class="gt_row gt_right">8000</td>
<td headers="Europe" class="gt_row gt_right">7000</td>
<td headers="Nordic" class="gt_row gt_right">6800</td>
<td headers="AgeGroup" class="gt_row gt_right"><br></td>
<td headers="US2000" class="gt_row gt_right">64529</td></tr>
    <tr><td headers="Age Group" class="gt_row gt_right">30-34</td>
<td headers="World" class="gt_row gt_right">6000</td>
<td headers="Europe" class="gt_row gt_right">7000</td>
<td headers="Nordic" class="gt_row gt_right">7300</td>
<td headers="AgeGroup" class="gt_row gt_right"><br></td>
<td headers="US2000" class="gt_row gt_right">71044</td></tr>
    <tr><td headers="Age Group" class="gt_row gt_right">35-39</td>
<td headers="World" class="gt_row gt_right">6000</td>
<td headers="Europe" class="gt_row gt_right">7000</td>
<td headers="Nordic" class="gt_row gt_right">7300</td>
<td headers="AgeGroup" class="gt_row gt_right"><br></td>
<td headers="US2000" class="gt_row gt_right">80762</td></tr>
    <tr><td headers="Age Group" class="gt_row gt_right">40-44</td>
<td headers="World" class="gt_row gt_right">6000</td>
<td headers="Europe" class="gt_row gt_right">7000</td>
<td headers="Nordic" class="gt_row gt_right">7000</td>
<td headers="AgeGroup" class="gt_row gt_right"><br></td>
<td headers="US2000" class="gt_row gt_right">81851</td></tr>
    <tr><td headers="Age Group" class="gt_row gt_right">45-49</td>
<td headers="World" class="gt_row gt_right">6000</td>
<td headers="Europe" class="gt_row gt_right">7000</td>
<td headers="Nordic" class="gt_row gt_right">6900</td>
<td headers="AgeGroup" class="gt_row gt_right"><br></td>
<td headers="US2000" class="gt_row gt_right">72118</td></tr>
    <tr><td headers="Age Group" class="gt_row gt_right">50-54</td>
<td headers="World" class="gt_row gt_right">5000</td>
<td headers="Europe" class="gt_row gt_right">7000</td>
<td headers="Nordic" class="gt_row gt_right">7400</td>
<td headers="AgeGroup" class="gt_row gt_right"><br></td>
<td headers="US2000" class="gt_row gt_right">62716</td></tr>
    <tr><td headers="Age Group" class="gt_row gt_right">55-59</td>
<td headers="World" class="gt_row gt_right">4000</td>
<td headers="Europe" class="gt_row gt_right">6000</td>
<td headers="Nordic" class="gt_row gt_right">6100</td>
<td headers="AgeGroup" class="gt_row gt_right"><br></td>
<td headers="US2000" class="gt_row gt_right">48454</td></tr>
    <tr><td headers="Age Group" class="gt_row gt_right">60-64</td>
<td headers="World" class="gt_row gt_right">4000</td>
<td headers="Europe" class="gt_row gt_right">5000</td>
<td headers="Nordic" class="gt_row gt_right">4800</td>
<td headers="AgeGroup" class="gt_row gt_right"><br></td>
<td headers="US2000" class="gt_row gt_right">38793</td></tr>
    <tr><td headers="Age Group" class="gt_row gt_right">65-69</td>
<td headers="World" class="gt_row gt_right">3000</td>
<td headers="Europe" class="gt_row gt_right">4000</td>
<td headers="Nordic" class="gt_row gt_right">4100</td>
<td headers="AgeGroup" class="gt_row gt_right"><br></td>
<td headers="US2000" class="gt_row gt_right">34264</td></tr>
    <tr><td headers="Age Group" class="gt_row gt_right">70-74</td>
<td headers="World" class="gt_row gt_right">2000</td>
<td headers="Europe" class="gt_row gt_right">3000</td>
<td headers="Nordic" class="gt_row gt_right">3900</td>
<td headers="AgeGroup" class="gt_row gt_right"><br></td>
<td headers="US2000" class="gt_row gt_right">31773</td></tr>
    <tr><td headers="Age Group" class="gt_row gt_right">75-79</td>
<td headers="World" class="gt_row gt_right">1000</td>
<td headers="Europe" class="gt_row gt_right">2000</td>
<td headers="Nordic" class="gt_row gt_right">3500</td>
<td headers="AgeGroup" class="gt_row gt_right"><br></td>
<td headers="US2000" class="gt_row gt_right">26999</td></tr>
    <tr><td headers="Age Group" class="gt_row gt_right">80-84</td>
<td headers="World" class="gt_row gt_right">500</td>
<td headers="Europe" class="gt_row gt_right">1000</td>
<td headers="Nordic" class="gt_row gt_right">2400</td>
<td headers="AgeGroup" class="gt_row gt_right"><br></td>
<td headers="US2000" class="gt_row gt_right">17842</td></tr>
    <tr><td headers="Age Group" class="gt_row gt_right">85+</td>
<td headers="World" class="gt_row gt_right">500</td>
<td headers="Europe" class="gt_row gt_right">1000</td>
<td headers="Nordic" class="gt_row gt_right">1900</td>
<td headers="AgeGroup" class="gt_row gt_right"><br></td>
<td headers="US2000" class="gt_row gt_right">15508</td></tr>
  </tbody>
  
  
</table>
</div>
</div>
</div>
</div>
<div id="tabset-1-2" class="tab-pane" aria-labelledby="tabset-1-2-tab">
<div class="cell" data-layout-align="center" data-fig-cap-location="bottom">
<details>
<summary>Standard population plot</summary>
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1">pop_data <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">as_tidytable</span>(popEpi<span class="sc" style="color: #5E5E5E;">::</span>stdpop18) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb2-2">  <span class="fu" style="color: #4758AB;">mutate</span>(<span class="at" style="color: #657422;">nordic =</span> <span class="cn" style="color: #8f5902;">NULL</span>, <span class="at" style="color: #657422;">id =</span> <span class="dv" style="color: #AD0000;">1</span><span class="sc" style="color: #5E5E5E;">:</span><span class="fu" style="color: #4758AB;">n</span>())</span>
<span id="cb2-3"></span>
<span id="cb2-4">pop_data <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb2-5">  <span class="fu" style="color: #4758AB;">mutate</span>(<span class="at" style="color: #657422;">world =</span> world <span class="sc" style="color: #5E5E5E;">*</span> <span class="sc" style="color: #5E5E5E;">-</span><span class="dv" style="color: #AD0000;">1</span>) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb2-6">  <span class="fu" style="color: #4758AB;">pivot_longer</span>(<span class="at" style="color: #657422;">cols =</span> <span class="fu" style="color: #4758AB;">c</span>(world, europe)) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb2-7">  <span class="fu" style="color: #4758AB;">arrange</span>(id) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb2-8">  <span class="fu" style="color: #4758AB;">ggplot</span>(<span class="fu" style="color: #4758AB;">aes</span>(value, <span class="fu" style="color: #4758AB;">reorder</span>(agegroup, id), <span class="at" style="color: #657422;">fill =</span> name)) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb2-9">  <span class="fu" style="color: #4758AB;">geom_col</span>(<span class="at" style="color: #657422;">width =</span> <span class="dv" style="color: #AD0000;">1</span>, <span class="at" style="color: #657422;">color =</span> <span class="st" style="color: #20794D;">"#f0f0f0"</span>, <span class="at" style="color: #657422;">size =</span> <span class="fl" style="color: #AD0000;">0.2</span>) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb2-10">  <span class="fu" style="color: #4758AB;">theme_minimal</span>() <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb2-11">  ggthemes<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">scale_fill_economist</span>(<span class="at" style="color: #657422;">labels =</span> stringr<span class="sc" style="color: #5E5E5E;">::</span>str_to_title) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb2-12">  <span class="fu" style="color: #4758AB;">theme</span>(</span>
<span id="cb2-13">    <span class="at" style="color: #657422;">legend.position =</span> <span class="st" style="color: #20794D;">"none"</span>,</span>
<span id="cb2-14">    <span class="at" style="color: #657422;">panel.grid.major.y =</span> <span class="fu" style="color: #4758AB;">element_blank</span>(),</span>
<span id="cb2-15">    <span class="at" style="color: #657422;">panel.grid.minor.y =</span> <span class="fu" style="color: #4758AB;">element_line</span>(<span class="at" style="color: #657422;">color =</span> <span class="st" style="color: #20794D;">"red"</span>)</span>
<span id="cb2-16">  ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb2-17">  <span class="fu" style="color: #4758AB;">scale_x_continuous</span>(<span class="at" style="color: #657422;">labels =</span> abs, <span class="at" style="color: #657422;">expand =</span> <span class="fu" style="color: #4758AB;">expansion</span>()) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb2-18">  <span class="fu" style="color: #4758AB;">scale_y_discrete</span>(<span class="at" style="color: #657422;">expand =</span> <span class="fu" style="color: #4758AB;">expansion</span>()) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb2-19">  <span class="fu" style="color: #4758AB;">labs</span>(</span>
<span id="cb2-20">    <span class="at" style="color: #657422;">x =</span> <span class="st" style="color: #20794D;">"Number of person"</span>,</span>
<span id="cb2-21">    <span class="at" style="color: #657422;">y =</span> <span class="cn" style="color: #8f5902;">NULL</span>,</span>
<span id="cb2-22">    <span class="at" style="color: #657422;">fill =</span> <span class="st" style="color: #20794D;">"Population"</span></span>
<span id="cb2-23">  ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb2-24">  <span class="fu" style="color: #4758AB;">annotate</span>(</span>
<span id="cb2-25">    <span class="at" style="color: #657422;">x =</span> <span class="sc" style="color: #5E5E5E;">-</span><span class="cn" style="color: #8f5902;">Inf</span>, <span class="at" style="color: #657422;">y =</span> <span class="cn" style="color: #8f5902;">Inf</span>,</span>
<span id="cb2-26">    <span class="at" style="color: #657422;">geom =</span> <span class="st" style="color: #20794D;">"text"</span>, <span class="at" style="color: #657422;">label =</span> <span class="st" style="color: #20794D;">"World"</span>,</span>
<span id="cb2-27">    <span class="at" style="color: #657422;">hjust =</span> <span class="sc" style="color: #5E5E5E;">-</span><span class="fl" style="color: #AD0000;">0.5</span>, <span class="at" style="color: #657422;">vjust =</span> <span class="fl" style="color: #AD0000;">1.5</span></span>
<span id="cb2-28">  ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb2-29">  <span class="fu" style="color: #4758AB;">annotate</span>(</span>
<span id="cb2-30">    <span class="at" style="color: #657422;">x =</span> <span class="cn" style="color: #8f5902;">Inf</span>, <span class="at" style="color: #657422;">y =</span> <span class="cn" style="color: #8f5902;">Inf</span>,</span>
<span id="cb2-31">    <span class="at" style="color: #657422;">geom =</span> <span class="st" style="color: #20794D;">"text"</span>, <span class="at" style="color: #657422;">label =</span> <span class="st" style="color: #20794D;">"Europe"</span>,</span>
<span id="cb2-32">    <span class="at" style="color: #657422;">hjust =</span> <span class="fl" style="color: #AD0000;">1.5</span>, <span class="at" style="color: #657422;">vjust =</span> <span class="fl" style="color: #AD0000;">1.5</span></span>
<span id="cb2-33">  )</span></code></pre></div>
</details>
<div class="cell-output-display">
<div id="fig-age-std" class="quarto-figure quarto-figure-center anchored">
<figure class="figure">
<p><img src="https://mathatistics.com/blog/posts/2022-12-02-age-adjusted-rates/index_files/figure-html/fig-age-std-1.png" class="img-fluid figure-img" style="width:75.0%"></p>
<p></p><figcaption class="figure-caption">Figure&nbsp;1: World and European Standard Population</figcaption><p></p>
</figure>
</div>
</div>
</div>
</div>
</div>
</div>
</section>
</section>
<section id="calculating-age-standardized-rate" class="level2">
<h2 class="anchored" data-anchor-id="calculating-age-standardized-rate">Calculating age-standardized rate</h2>
<p>The age-standardized rate (ASR) is calculated as,</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Ctext%7BAge.Std.%20Rate%7D%20=%20%5Cfrac%7B%5Csum_i%5Cmathcal%7BR%7D_i%5Cmathcal%7Bw%7D_i%7D%7B%5Csum_i%5Cmathcal%7Bw%7D_i%7D"></p>
<p>where, <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7Bw%7D_i"> is the weight corresponding to <img src="https://latex.codecogs.com/png.latex?i%5E%5Ctext%7Bth%7D"> age-group in the reference population.</p>
<p>Let’s explore further with an example from melanoma cases from Australia.</p>
</section>
<section id="example" class="level2">
<h2 class="anchored" data-anchor-id="example">Example</h2>
<p>The following example have used the Austrailian cancer data with 5-year age-group<sup>6</sup> after filtering melanoma cases from 1982 to 2018. The dataset has yearly count and age-specific incidence rate of melanoma for men and women.</p>
<p>Let us use the above European standard population to find the yearly age-standardized incidence by sex.</p>
<div class="cell">
<details>
<summary>Age-specific Data</summary>
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1">std_pop <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">as_tidytable</span>(popEpi<span class="sc" style="color: #5E5E5E;">::</span>stdpop18) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb3-2">  <span class="fu" style="color: #4758AB;">rename_with</span>(<span class="sc" style="color: #5E5E5E;">~</span><span class="fu" style="color: #4758AB;">c</span>(<span class="st" style="color: #20794D;">"AgeGroup"</span>, <span class="st" style="color: #20794D;">"World"</span>, <span class="st" style="color: #20794D;">"Europe"</span>, <span class="st" style="color: #20794D;">"Nordic"</span>)) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb3-3">  <span class="fu" style="color: #4758AB;">mutate</span>(<span class="at" style="color: #657422;">AgeGroup =</span> <span class="fu" style="color: #4758AB;">case_when</span>(</span>
<span id="cb3-4">    AgeGroup <span class="sc" style="color: #5E5E5E;">==</span> <span class="st" style="color: #20794D;">"85"</span> <span class="sc" style="color: #5E5E5E;">~</span> <span class="st" style="color: #20794D;">"85+"</span>, </span>
<span id="cb3-5">    <span class="cn" style="color: #8f5902;">TRUE</span> <span class="sc" style="color: #5E5E5E;">~</span> AgeGroup</span>
<span id="cb3-6">  )) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">mutate</span>(</span>
<span id="cb3-7">    <span class="fu" style="color: #4758AB;">across</span>(World<span class="sc" style="color: #5E5E5E;">:</span>Nordic, prop.table)</span>
<span id="cb3-8">  )</span>
<span id="cb3-9"></span>
<span id="cb3-10">data <span class="ot" style="color: #003B4F;">&lt;-</span> tidytable<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">fread</span>(<span class="st" style="color: #20794D;">"melanoma.csv"</span>) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb3-11">  <span class="fu" style="color: #4758AB;">mutate</span>(<span class="at" style="color: #657422;">AgeGroup =</span> <span class="fu" style="color: #4758AB;">case_when</span>(</span>
<span id="cb3-12">    AgeGroup <span class="sc" style="color: #5E5E5E;">%in%</span> <span class="fu" style="color: #4758AB;">c</span>(<span class="st" style="color: #20794D;">"85-89"</span>, <span class="st" style="color: #20794D;">"90+"</span>) <span class="sc" style="color: #5E5E5E;">~</span> <span class="st" style="color: #20794D;">"85+"</span>,</span>
<span id="cb3-13">    <span class="cn" style="color: #8f5902;">TRUE</span> <span class="sc" style="color: #5E5E5E;">~</span> AgeGroup</span>
<span id="cb3-14">  ))</span>
<span id="cb3-15"></span>
<span id="cb3-16">asp_data <span class="ot" style="color: #003B4F;">&lt;-</span> data <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb3-17">  <span class="fu" style="color: #4758AB;">left_join</span>(</span>
<span id="cb3-18">    std_pop <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">select</span>(AgeGroup, World), </span>
<span id="cb3-19">    <span class="at" style="color: #657422;">by =</span> <span class="st" style="color: #20794D;">"AgeGroup"</span></span>
<span id="cb3-20">  )</span>
<span id="cb3-21"></span>
<span id="cb3-22"><span class="fu" style="color: #4758AB;">head</span>(asp_data)</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tidytable: 6 × 6
   Year Sex   AgeGroup Count   ASR World
  &lt;int&gt; &lt;chr&gt; &lt;chr&gt;    &lt;int&gt; &lt;dbl&gt; &lt;dbl&gt;
1  1982 Males 0-4          0   0    0.12
2  1982 Males 5-9          0   0    0.1 
3  1982 Males 10-14        6   0.9  0.09
4  1982 Males 15-19       30   4.6  0.09
5  1982 Males 20-24       52   7.7  0.08
6  1982 Males 25-29       83  13.1  0.08</code></pre>
</div>
</div>
<div class="cell">
<details>
<summary>Age-standardized Rate</summary>
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">asp <span class="ot" style="color: #003B4F;">&lt;-</span> asp_data <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb5-2">  <span class="fu" style="color: #4758AB;">group_by</span>(Year, Sex) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb5-3">  <span class="fu" style="color: #4758AB;">summarize</span>(<span class="at" style="color: #657422;">AgeAdjRate =</span> <span class="fu" style="color: #4758AB;">sum</span>(ASR <span class="sc" style="color: #5E5E5E;">*</span> World))</span>
<span id="cb5-4"></span>
<span id="cb5-5">asp <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb5-6">  <span class="fu" style="color: #4758AB;">group_by</span>(Sex) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb5-7">  <span class="fu" style="color: #4758AB;">slice_tail</span>(<span class="dv" style="color: #AD0000;">8</span>) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb5-8">  <span class="fu" style="color: #4758AB;">ungroup</span>() <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb5-9">  tidytable<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">pivot_wider</span>(</span>
<span id="cb5-10">    <span class="at" style="color: #657422;">names_from =</span> <span class="st" style="color: #20794D;">"Year"</span>,</span>
<span id="cb5-11">    <span class="at" style="color: #657422;">values_from =</span> <span class="st" style="color: #20794D;">"AgeAdjRate"</span></span>
<span id="cb5-12">  ) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb5-13">  gt<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">gt</span>() <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb5-14">  gt<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">opt_vertical_padding</span>(<span class="fl" style="color: #AD0000;">0.5</span>) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb5-15">  gt<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">fmt_number</span>(<span class="at" style="color: #657422;">columns =</span> <span class="sc" style="color: #5E5E5E;">-</span><span class="dv" style="color: #AD0000;">1</span>) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb5-16">  gt<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">tab_options</span>(<span class="at" style="color: #657422;">table.width =</span> <span class="st" style="color: #20794D;">"100%"</span>)</span></code></pre></div>
</details>
<div class="cell-output-display">

<div id="haopbzlxlf" style="padding-left:0px;padding-right:0px;padding-top:10px;padding-bottom:10px;overflow-x:auto;overflow-y:auto;width:auto;height:auto;">
<style>#haopbzlxlf table {
  font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji';
  -webkit-font-smoothing: antialiased;
  -moz-osx-font-smoothing: grayscale;
}

#haopbzlxlf thead, #haopbzlxlf tbody, #haopbzlxlf tfoot, #haopbzlxlf tr, #haopbzlxlf td, #haopbzlxlf th {
  border-style: none;
}

#haopbzlxlf p {
  margin: 0;
  padding: 0;
}

#haopbzlxlf .gt_table {
  display: table;
  border-collapse: collapse;
  line-height: normal;
  margin-left: auto;
  margin-right: auto;
  color: #333333;
  font-size: 16px;
  font-weight: normal;
  font-style: normal;
  background-color: #FFFFFF;
  width: 100%;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #A8A8A8;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #A8A8A8;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
}

#haopbzlxlf .gt_caption {
  padding-top: 4px;
  padding-bottom: 4px;
}

#haopbzlxlf .gt_title {
  color: #333333;
  font-size: 125%;
  font-weight: initial;
  padding-top: 2px;
  padding-bottom: 2px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-color: #FFFFFF;
  border-bottom-width: 0;
}

#haopbzlxlf .gt_subtitle {
  color: #333333;
  font-size: 85%;
  font-weight: initial;
  padding-top: 1px;
  padding-bottom: 3px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-color: #FFFFFF;
  border-top-width: 0;
}

#haopbzlxlf .gt_heading {
  background-color: #FFFFFF;
  text-align: center;
  border-bottom-color: #FFFFFF;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#haopbzlxlf .gt_bottom_border {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#haopbzlxlf .gt_col_headings {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#haopbzlxlf .gt_col_heading {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 2.5px;
  padding-bottom: 3.5px;
  padding-left: 5px;
  padding-right: 5px;
  overflow-x: hidden;
}

#haopbzlxlf .gt_column_spanner_outer {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  padding-top: 0;
  padding-bottom: 0;
  padding-left: 4px;
  padding-right: 4px;
}

#haopbzlxlf .gt_column_spanner_outer:first-child {
  padding-left: 0;
}

#haopbzlxlf .gt_column_spanner_outer:last-child {
  padding-right: 0;
}

#haopbzlxlf .gt_column_spanner {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 2.5px;
  padding-bottom: 2.5px;
  overflow-x: hidden;
  display: inline-block;
  width: 100%;
}

#haopbzlxlf .gt_spanner_row {
  border-bottom-style: hidden;
}

#haopbzlxlf .gt_group_heading {
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  text-align: left;
}

#haopbzlxlf .gt_empty_group_heading {
  padding: 0.5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: middle;
}

#haopbzlxlf .gt_from_md > :first-child {
  margin-top: 0;
}

#haopbzlxlf .gt_from_md > :last-child {
  margin-bottom: 0;
}

#haopbzlxlf .gt_row {
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  margin: 10px;
  border-top-style: solid;
  border-top-width: 1px;
  border-top-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  overflow-x: hidden;
}

#haopbzlxlf .gt_stub {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
}

#haopbzlxlf .gt_stub_row_group {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
  vertical-align: top;
}

#haopbzlxlf .gt_row_group_first td {
  border-top-width: 2px;
}

#haopbzlxlf .gt_row_group_first th {
  border-top-width: 2px;
}

#haopbzlxlf .gt_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#haopbzlxlf .gt_first_summary_row {
  border-top-style: solid;
  border-top-color: #D3D3D3;
}

#haopbzlxlf .gt_first_summary_row.thick {
  border-top-width: 2px;
}

#haopbzlxlf .gt_last_summary_row {
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#haopbzlxlf .gt_grand_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#haopbzlxlf .gt_first_grand_summary_row {
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: double;
  border-top-width: 6px;
  border-top-color: #D3D3D3;
}

#haopbzlxlf .gt_last_grand_summary_row_top {
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: double;
  border-bottom-width: 6px;
  border-bottom-color: #D3D3D3;
}

#haopbzlxlf .gt_striped {
  background-color: rgba(128, 128, 128, 0.05);
}

#haopbzlxlf .gt_table_body {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#haopbzlxlf .gt_footnotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#haopbzlxlf .gt_footnote {
  margin: 0px;
  font-size: 90%;
  padding-top: 2px;
  padding-bottom: 2px;
  padding-left: 5px;
  padding-right: 5px;
}

#haopbzlxlf .gt_sourcenotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#haopbzlxlf .gt_sourcenote {
  font-size: 90%;
  padding-top: 2px;
  padding-bottom: 2px;
  padding-left: 5px;
  padding-right: 5px;
}

#haopbzlxlf .gt_left {
  text-align: left;
}

#haopbzlxlf .gt_center {
  text-align: center;
}

#haopbzlxlf .gt_right {
  text-align: right;
  font-variant-numeric: tabular-nums;
}

#haopbzlxlf .gt_font_normal {
  font-weight: normal;
}

#haopbzlxlf .gt_font_bold {
  font-weight: bold;
}

#haopbzlxlf .gt_font_italic {
  font-style: italic;
}

#haopbzlxlf .gt_super {
  font-size: 65%;
}

#haopbzlxlf .gt_footnote_marks {
  font-size: 75%;
  vertical-align: 0.4em;
  position: initial;
}

#haopbzlxlf .gt_asterisk {
  font-size: 100%;
  vertical-align: 0;
}

#haopbzlxlf .gt_indent_1 {
  text-indent: 5px;
}

#haopbzlxlf .gt_indent_2 {
  text-indent: 10px;
}

#haopbzlxlf .gt_indent_3 {
  text-indent: 15px;
}

#haopbzlxlf .gt_indent_4 {
  text-indent: 20px;
}

#haopbzlxlf .gt_indent_5 {
  text-indent: 25px;
}

#haopbzlxlf .katex-display {
  display: inline-flex !important;
  margin-bottom: 0.75em !important;
}

#haopbzlxlf div.Reactable > div.rt-table > div.rt-thead > div.rt-tr.rt-tr-group-header > div.rt-th-group:after {
  height: 0px !important;
}
</style>
<table class="gt_table" data-quarto-disable-processing="false" data-quarto-bootstrap="false">
  <thead>
    <tr class="gt_col_headings">
      <th class="gt_col_heading gt_columns_bottom_border gt_left" rowspan="1" colspan="1" scope="col" id="Sex">Sex</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="a2011">2011</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="a2012">2012</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="a2013">2013</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="a2014">2014</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="a2015">2015</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="a2016">2016</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="a2017">2017</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="a2018">2018</th>
    </tr>
  </thead>
  <tbody class="gt_table_body">
    <tr><td headers="Sex" class="gt_row gt_left">Females</td>
<td headers="2011" class="gt_row gt_right">28.91</td>
<td headers="2012" class="gt_row gt_right">29.69</td>
<td headers="2013" class="gt_row gt_right">30.30</td>
<td headers="2014" class="gt_row gt_right">30.52</td>
<td headers="2015" class="gt_row gt_right">30.86</td>
<td headers="2016" class="gt_row gt_right">32.16</td>
<td headers="2017" class="gt_row gt_right">31.51</td>
<td headers="2018" class="gt_row gt_right">32.40</td></tr>
    <tr><td headers="Sex" class="gt_row gt_left">Males</td>
<td headers="2011" class="gt_row gt_right">41.43</td>
<td headers="2012" class="gt_row gt_right">42.38</td>
<td headers="2013" class="gt_row gt_right">43.13</td>
<td headers="2014" class="gt_row gt_right">42.84</td>
<td headers="2015" class="gt_row gt_right">43.73</td>
<td headers="2016" class="gt_row gt_right">45.31</td>
<td headers="2017" class="gt_row gt_right">45.27</td>
<td headers="2018" class="gt_row gt_right">45.04</td></tr>
  </tbody>
  
  
</table>
</div>
</div>
</div>
<p>Now, let us compare the age-specific rates (crude rates) and age-standardized rates with a plot,</p>
<section id="compare-the-age-specific-rates-and-age-adjusted-rates" class="level3">
<h3 class="anchored" data-anchor-id="compare-the-age-specific-rates-and-age-adjusted-rates">Compare the age-specific rates and age-adjusted rates</h3>
<div class="cell">
<details>
<summary>Age-specific rates vs Age-adjusted rates</summary>
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1"><span class="fu" style="color: #4758AB;">ggplot</span>() <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb6-2">  <span class="fu" style="color: #4758AB;">geom_line</span>(</span>
<span id="cb6-3">    <span class="at" style="color: #657422;">data =</span> data,</span>
<span id="cb6-4">    <span class="fu" style="color: #4758AB;">aes</span>(</span>
<span id="cb6-5">      <span class="at" style="color: #657422;">x =</span> Year,</span>
<span id="cb6-6">      <span class="at" style="color: #657422;">y =</span> ASR,</span>
<span id="cb6-7">      <span class="at" style="color: #657422;">group =</span> AgeGroup,</span>
<span id="cb6-8">      <span class="at" style="color: #657422;">color =</span> <span class="st" style="color: #20794D;">"Crude"</span></span>
<span id="cb6-9">    )</span>
<span id="cb6-10">  ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb6-11">  <span class="fu" style="color: #4758AB;">geom_line</span>(</span>
<span id="cb6-12">    <span class="at" style="color: #657422;">data =</span> asp,</span>
<span id="cb6-13">    <span class="fu" style="color: #4758AB;">aes</span>(</span>
<span id="cb6-14">      <span class="at" style="color: #657422;">x =</span> Year,</span>
<span id="cb6-15">      <span class="at" style="color: #657422;">y =</span> AgeAdjRate,</span>
<span id="cb6-16">      <span class="at" style="color: #657422;">group =</span> <span class="dv" style="color: #AD0000;">1</span>,</span>
<span id="cb6-17">      <span class="at" style="color: #657422;">color =</span> <span class="st" style="color: #20794D;">"Age-adjusted"</span></span>
<span id="cb6-18">    )</span>
<span id="cb6-19">  ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb6-20">  <span class="fu" style="color: #4758AB;">geom_text</span>(</span>
<span id="cb6-21">    <span class="at" style="color: #657422;">data =</span> data[Year <span class="sc" style="color: #5E5E5E;">==</span> <span class="fu" style="color: #4758AB;">max</span>(Year)],</span>
<span id="cb6-22">    <span class="at" style="color: #657422;">check_overlap =</span> <span class="cn" style="color: #8f5902;">TRUE</span>,</span>
<span id="cb6-23">    <span class="at" style="color: #657422;">size =</span> <span class="fu" style="color: #4758AB;">rel</span>(<span class="dv" style="color: #AD0000;">2</span>),</span>
<span id="cb6-24">    <span class="at" style="color: #657422;">color =</span> <span class="st" style="color: #20794D;">"#0f0f0f"</span>,</span>
<span id="cb6-25">    <span class="fu" style="color: #4758AB;">aes</span>(</span>
<span id="cb6-26">      <span class="at" style="color: #657422;">x =</span> Year <span class="sc" style="color: #5E5E5E;">+</span> <span class="dv" style="color: #AD0000;">2</span>,</span>
<span id="cb6-27">      <span class="at" style="color: #657422;">y =</span> ASR,</span>
<span id="cb6-28">      <span class="at" style="color: #657422;">label =</span> AgeGroup</span>
<span id="cb6-29">    )</span>
<span id="cb6-30">  ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb6-31">  <span class="fu" style="color: #4758AB;">scale_x_continuous</span>(<span class="at" style="color: #657422;">breaks =</span> scales<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">breaks_extended</span>(<span class="dv" style="color: #AD0000;">8</span>)) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb6-32">  <span class="fu" style="color: #4758AB;">scale_y_continuous</span>(<span class="at" style="color: #657422;">breaks =</span> scales<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">breaks_extended</span>(<span class="dv" style="color: #AD0000;">8</span>)) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb6-33">  <span class="fu" style="color: #4758AB;">scale_color_manual</span>(<span class="cn" style="color: #8f5902;">NULL</span>, <span class="at" style="color: #657422;">values =</span> <span class="fu" style="color: #4758AB;">c</span>(<span class="st" style="color: #20794D;">"firebrick"</span>, <span class="st" style="color: #20794D;">"grey"</span>)) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb6-34">  <span class="fu" style="color: #4758AB;">facet_grid</span>(<span class="at" style="color: #657422;">cols =</span> <span class="fu" style="color: #4758AB;">vars</span>(Sex)) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb6-35">  <span class="fu" style="color: #4758AB;">theme_minimal</span>() <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb6-36">  <span class="fu" style="color: #4758AB;">theme</span>(</span>
<span id="cb6-37">    <span class="at" style="color: #657422;">panel.border =</span> <span class="fu" style="color: #4758AB;">element_rect</span>(<span class="at" style="color: #657422;">fill =</span> <span class="cn" style="color: #8f5902;">NA</span>, <span class="at" style="color: #657422;">color =</span> <span class="st" style="color: #20794D;">"darkgrey"</span>),</span>
<span id="cb6-38">    <span class="at" style="color: #657422;">legend.position =</span> <span class="st" style="color: #20794D;">"inside"</span>,</span>
<span id="cb6-39">    <span class="at" style="color: #657422;">legend.position.inside =</span> <span class="fu" style="color: #4758AB;">c</span>(<span class="dv" style="color: #AD0000;">0</span>, <span class="dv" style="color: #AD0000;">1</span>),</span>
<span id="cb6-40">    <span class="at" style="color: #657422;">legend.justification =</span> <span class="fu" style="color: #4758AB;">c</span>(<span class="dv" style="color: #AD0000;">0</span>, <span class="dv" style="color: #AD0000;">1</span>)</span>
<span id="cb6-41">  ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb6-42">  <span class="fu" style="color: #4758AB;">labs</span>(</span>
<span id="cb6-43">    <span class="at" style="color: #657422;">x =</span> <span class="st" style="color: #20794D;">"Diagnosis year"</span>,</span>
<span id="cb6-44">    <span class="at" style="color: #657422;">y =</span> <span class="fu" style="color: #4758AB;">paste</span>(</span>
<span id="cb6-45">      <span class="st" style="color: #20794D;">"Age-adjusted incidence rate"</span>,</span>
<span id="cb6-46">      <span class="st" style="color: #20794D;">"per 100,000 person year"</span>,</span>
<span id="cb6-47">      <span class="at" style="color: #657422;">sep =</span> <span class="st" style="color: #20794D;">"</span><span class="sc" style="color: #5E5E5E;">\n</span><span class="st" style="color: #20794D;">"</span></span>
<span id="cb6-48">    )</span>
<span id="cb6-49">  )</span></code></pre></div>
</details>
<div class="cell-output-display">
<div id="fig-rate-plot" class="quarto-figure quarto-figure-center anchored">
<figure class="figure">
<p><img src="https://mathatistics.com/blog/posts/2022-12-02-age-adjusted-rates/index_files/figure-html/fig-rate-plot-1.png" class="img-fluid figure-img" width="672"></p>
<p></p><figcaption class="figure-caption">Figure&nbsp;2: Age-specific and Age-adjusted rate showing why age-adjustement is necessary</figcaption><p></p>
</figure>
</div>
</div>
</div>
</section>
<section id="compare-the-age-adjusted-rates-by-sex" class="level3">
<h3 class="anchored" data-anchor-id="compare-the-age-adjusted-rates-by-sex">Compare the age-adjusted rates by sex</h3>
<div class="cell">
<details>
<summary>Age-adjusted rates by sex</summary>
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><span class="fu" style="color: #4758AB;">ggplot</span>(asp, <span class="fu" style="color: #4758AB;">aes</span>(<span class="at" style="color: #657422;">x =</span> Year, <span class="at" style="color: #657422;">y =</span> AgeAdjRate, <span class="at" style="color: #657422;">color =</span> Sex)) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb7-2">  <span class="fu" style="color: #4758AB;">geom_line</span>() <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb7-3">  <span class="fu" style="color: #4758AB;">geom_point</span>(<span class="at" style="color: #657422;">fill =</span> <span class="st" style="color: #20794D;">"whitesmoke"</span>, <span class="at" style="color: #657422;">shape =</span> <span class="dv" style="color: #AD0000;">21</span>) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb7-4">  <span class="fu" style="color: #4758AB;">scale_x_continuous</span>(<span class="at" style="color: #657422;">breaks =</span> scales<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">breaks_extended</span>(<span class="dv" style="color: #AD0000;">8</span>)) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb7-5">  <span class="fu" style="color: #4758AB;">scale_y_continuous</span>(<span class="at" style="color: #657422;">breaks =</span> scales<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">breaks_extended</span>(<span class="dv" style="color: #AD0000;">8</span>)) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb7-6">  <span class="fu" style="color: #4758AB;">scale_color_brewer</span>(<span class="at" style="color: #657422;">palette =</span> <span class="st" style="color: #20794D;">"Set1"</span>) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb7-7">  <span class="fu" style="color: #4758AB;">theme_minimal</span>() <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb7-8">  <span class="fu" style="color: #4758AB;">theme</span>(</span>
<span id="cb7-9">    <span class="at" style="color: #657422;">panel.border =</span> <span class="fu" style="color: #4758AB;">element_rect</span>(<span class="at" style="color: #657422;">fill =</span> <span class="cn" style="color: #8f5902;">NA</span>, <span class="at" style="color: #657422;">color =</span> <span class="st" style="color: #20794D;">"darkgrey"</span>),</span>
<span id="cb7-10">    <span class="at" style="color: #657422;">legend.position =</span> <span class="fu" style="color: #4758AB;">c</span>(<span class="dv" style="color: #AD0000;">0</span>, <span class="dv" style="color: #AD0000;">1</span>),</span>
<span id="cb7-11">    <span class="at" style="color: #657422;">legend.justification =</span> <span class="fu" style="color: #4758AB;">c</span>(<span class="dv" style="color: #AD0000;">0</span>, <span class="dv" style="color: #AD0000;">1</span>)</span>
<span id="cb7-12">  ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb7-13">  <span class="fu" style="color: #4758AB;">labs</span>(</span>
<span id="cb7-14">    <span class="at" style="color: #657422;">x =</span> <span class="st" style="color: #20794D;">"Diagnosis year"</span>,</span>
<span id="cb7-15">    <span class="at" style="color: #657422;">y =</span> <span class="fu" style="color: #4758AB;">paste</span>(</span>
<span id="cb7-16">      <span class="st" style="color: #20794D;">"Age-adjusted incidence rate"</span>,</span>
<span id="cb7-17">      <span class="st" style="color: #20794D;">"per 100,000 person year"</span>,</span>
<span id="cb7-18">      <span class="at" style="color: #657422;">sep =</span> <span class="st" style="color: #20794D;">"</span><span class="sc" style="color: #5E5E5E;">\n</span><span class="st" style="color: #20794D;">"</span></span>
<span id="cb7-19">    )</span>
<span id="cb7-20">  ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb7-21">  <span class="fu" style="color: #4758AB;">expand_limits</span>(<span class="at" style="color: #657422;">y =</span> <span class="dv" style="color: #AD0000;">0</span>)</span></code></pre></div>
</details>
<div class="cell-output-display">
<div id="fig-adj-rate-plot" class="quarto-figure quarto-figure-center anchored">
<figure class="figure">
<p><img src="https://mathatistics.com/blog/posts/2022-12-02-age-adjusted-rates/index_files/figure-html/fig-adj-rate-plot-1.png" class="img-fluid figure-img" width="672"></p>
<p></p><figcaption class="figure-caption">Figure&nbsp;3: Age-adjusted rates by sex for melanoma patients</figcaption><p></p>
</figure>
</div>
</div>
</div>
</section>
</section>
<section id="discussion" class="level2">
<h2 class="anchored" data-anchor-id="discussion">Discussion</h2>
<p>Figure Figure&nbsp;2 shows that the incidence of melanoma has larger difference in men between the age-groups than in women and men also have a sharp increase in older age group. In addition, the Figure Figure&nbsp;3 shows that males have higher age-adjusted incidence of melanoma than women in Australia and this trend is increasing over time with rapid increase before 1983 before a drop.</p>
<p>Age-adjusted rates are useful for comparing rates between population but it cannot give the interpretation required for comparing within a population or over a time period in that population. This is one of the reason, cancer registry uses the internal (population structure of their own population) to compute the age-adjusted rates.</p>


<!-- -->

</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>https://www.cdc.gov/csels/dsepd/ss1978/lesson3/section1.html↩︎</p></li>
<li id="fn2"><p>https://www.kreftregisteret.no/globalassets/cancer-in-norway/2021/cin_report.pdf↩︎</p></li>
<li id="fn3"><p>https://cancerregistry.fi/reports-and-publications/annual-report-on-cancer-in-finland/↩︎</p></li>
<li id="fn4"><p>https://www.aihw.gov.au/reports/cancer/cancer-in-australia-2021/summary↩︎</p></li>
<li id="fn5"><p>https://seer.cancer.gov/stdpopulations/stdpop.19ages.html↩︎</p></li>
<li id="fn6"><p>https://www.aihw.gov.au/reports/cancer/cancer-data-in-australia/data↩︎</p></li>
</ol>
</section></div> ]]></description>
  <guid>https://mathatistics.com/blog/posts/2022-12-02-age-adjusted-rates/index.html</guid>
  <pubDate>Mon, 19 Jun 2023 22:00:00 GMT</pubDate>
</item>
<item>
  <title>Simulating data for ANOVA similar to existing dataset for analysis</title>
  <dc:creator>TheRimalaya</dc:creator>
  <link>https://mathatistics.com/blog/posts/2023-02-04-simulation-anova-and-analysis/index.html</link>
  <description><![CDATA[ 



<p>Simulating data is an important tool in both education and research, and it has been extremely helpful for testing, comparing, and understanding concepts in practical and applied settings.</p>
<p>Often, we use Analysis of Variance (ANOVA) to analyze variances to find out if different cases result in similar outcomes and if the differences are significant. Some simple examples include:</p>
<ul>
<li>The effect of different diets on the growth of fish</li>
<li>Comparing the heights of three different plant species</li>
<li>The type of flour used for baking bread</li>
</ul>
<p>These are common examples where, in some cases, data are collected by setting up an experiment, and in other cases, they are collected through sampling. This article explains how ANOVA analyzes the variance and in what situations are they significant through both simulated and real data.</p>
<p>Consider the following model with <img src="https://latex.codecogs.com/png.latex?i=3"> groups and <img src="https://latex.codecogs.com/png.latex?j=n"> observations,</p>
<p><img src="https://latex.codecogs.com/png.latex?%0Ay_%7Bij%7D%20=%20%5Cmu%20+%20%5Ctau_i%20+%20%5Cvarepsilon_%7Bij%7D,%20%5C;%20i%20=%201,%202,%203%0A%5Ctexttt%7B%20and%20%7D%20j%20=%201,%202,%20%5Cldots%20n%0A"></p>
<p>Here, <img src="https://latex.codecogs.com/png.latex?%5Ctau_i"> is the effect corresponding to group <img src="https://latex.codecogs.com/png.latex?i"> and <img src="https://latex.codecogs.com/png.latex?%5Cvarepsilon_%7Bij%7D%20%5Csim%20%5Cmathrm%7BN%7D(0,%20%5Csigma%5E2)">, the usual assumption of linear model. The simulation example below describes it in detail.</p>
<section id="simulation-example" class="level1">
<h1>Simulation Example</h1>
<p>In this simulation example, I aim to replicate specific elements of the <code>USArrests</code> dataset by simulating 50 cases for three types of crimes: Murder, Assault, and Rape. For each crime, I categorize individuals into three groups based on illiteracy levels and simulate their respective arrest rates. Here’s concise overview of the process and analysis steps:</p>
<section id="simulation-design" class="level2">
<h2 class="anchored" data-anchor-id="simulation-design">Simulation Design</h2>
<div class="cell">
<details>
<summary>Simulation design</summary>
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1">sim_design <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">tidytable</span>(</span>
<span id="cb1-2">  <span class="at" style="color: #657422;">Illiteracy =</span> <span class="fu" style="color: #4758AB;">factor</span>(</span>
<span id="cb1-3">    <span class="fu" style="color: #4758AB;">rep</span>(<span class="fu" style="color: #4758AB;">c</span>(<span class="dv" style="color: #AD0000;">1</span>, <span class="dv" style="color: #AD0000;">2</span>, <span class="dv" style="color: #AD0000;">3</span>), <span class="dv" style="color: #AD0000;">3</span>),</span>
<span id="cb1-4">    <span class="at" style="color: #657422;">labels =</span> <span class="fu" style="color: #4758AB;">c</span>(<span class="st" style="color: #20794D;">"high"</span>, <span class="st" style="color: #20794D;">"medium"</span>, <span class="st" style="color: #20794D;">"low"</span>)</span>
<span id="cb1-5">  ),</span>
<span id="cb1-6">  <span class="at" style="color: #657422;">Crime =</span> <span class="fu" style="color: #4758AB;">factor</span>(</span>
<span id="cb1-7">    <span class="fu" style="color: #4758AB;">rep</span>(<span class="fu" style="color: #4758AB;">c</span>(<span class="dv" style="color: #AD0000;">1</span>, <span class="dv" style="color: #AD0000;">2</span>, <span class="dv" style="color: #AD0000;">3</span>), <span class="at" style="color: #657422;">each =</span> <span class="dv" style="color: #AD0000;">3</span>),</span>
<span id="cb1-8">    <span class="at" style="color: #657422;">labels =</span> <span class="fu" style="color: #4758AB;">c</span>(<span class="st" style="color: #20794D;">"Murder"</span>, <span class="st" style="color: #20794D;">"Assault"</span>, <span class="st" style="color: #20794D;">"Rape"</span>)</span>
<span id="cb1-9">  ),</span>
<span id="cb1-10">  <span class="at" style="color: #657422;">mean =</span> <span class="fu" style="color: #4758AB;">c</span>(<span class="dv" style="color: #AD0000;">11</span>, <span class="dv" style="color: #AD0000;">8</span>, <span class="dv" style="color: #AD0000;">5</span>, <span class="dv" style="color: #AD0000;">214</span>, <span class="dv" style="color: #AD0000;">190</span>, <span class="dv" style="color: #AD0000;">114</span>, <span class="dv" style="color: #AD0000;">23</span>, <span class="dv" style="color: #AD0000;">21</span>, <span class="dv" style="color: #AD0000;">19</span>),</span>
<span id="cb1-11">  <span class="at" style="color: #657422;">sd =</span> <span class="fu" style="color: #4758AB;">c</span>(<span class="dv" style="color: #AD0000;">3</span>, <span class="dv" style="color: #AD0000;">4</span>, <span class="dv" style="color: #AD0000;">3</span>, <span class="dv" style="color: #AD0000;">79</span>, <span class="dv" style="color: #AD0000;">82</span>, <span class="dv" style="color: #AD0000;">55</span>, <span class="dv" style="color: #AD0000;">8</span>, <span class="dv" style="color: #AD0000;">10</span>, <span class="dv" style="color: #AD0000;">10</span>)</span>
<span id="cb1-12">)</span>
<span id="cb1-13"></span>
<span id="cb1-14">sim_design</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tidytable: 9 × 4
  Illiteracy Crime    mean    sd
  &lt;fct&gt;      &lt;fct&gt;   &lt;dbl&gt; &lt;dbl&gt;
1 high       Murder     11     3
2 medium     Murder      8     4
3 low        Murder      5     3
4 high       Assault   214    79
5 medium     Assault   190    82
6 low        Assault   114    55
7 high       Rape       23     8
8 medium     Rape       21    10
9 low        Rape       19    10</code></pre>
</div>
</div>
<p>Since these data cannot contain negative values so instead of using <code>rnorm</code> available in <code>stats</code> package, I will use <code>truncnorm</code> available in GitHub. There are other options as well which can be used such as: …</p>
<p>If not installed, install the package as <code>remotes::install_github("olafmersmann/truncnorm")</code> or <code>devtools::install_github("olafmersmann/truncnorm")</code>.</p>
<section id="code-for-simulation-analysis-and-plot" class="level3">
<h3 class="anchored" data-anchor-id="code-for-simulation-analysis-and-plot">Code for Simulation, Analysis, and Plot</h3>
<p>Let’s simulate 50 observation Arrest <code>Rate</code> in each levels of <code>Illiteracy</code>, and <code>Crime</code> in simulation design.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1">nsim <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="dv" style="color: #AD0000;">50</span></span></code></pre></div>
</div>
<div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-2-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-1" aria-controls="tabset-2-1" aria-selected="true">Simulation Code</a></li><li class="nav-item"><a class="nav-link" id="tabset-2-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-2" aria-controls="tabset-2-2" aria-selected="false">Data from simulation</a></li></ul>
<div class="tab-content">
<div id="tabset-2-1" class="tab-pane active" aria-labelledby="tabset-2-1-tab">
<div class="cell">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">sim_data <span class="ot" style="color: #003B4F;">&lt;-</span> sim_design <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb4-2">  <span class="fu" style="color: #4758AB;">group_by</span>(Illiteracy, Crime) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb4-3">  <span class="fu" style="color: #4758AB;">mutate</span>(<span class="at" style="color: #657422;">rate =</span> <span class="fu" style="color: #4758AB;">map2</span>(mean, sd, <span class="sc" style="color: #5E5E5E;">~</span><span class="fu" style="color: #4758AB;">tidytable</span>(</span>
<span id="cb4-4">    <span class="at" style="color: #657422;">Rate =</span> truncnorm<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">rtruncnorm</span>(</span>
<span id="cb4-5">      <span class="at" style="color: #657422;">n =</span> nsim, <span class="at" style="color: #657422;">a =</span> <span class="dv" style="color: #AD0000;">0</span>, <span class="at" style="color: #657422;">b =</span> <span class="cn" style="color: #8f5902;">Inf</span>, <span class="at" style="color: #657422;">mean =</span> .x, <span class="at" style="color: #657422;">sd =</span> .y</span>
<span id="cb4-6">    ) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">round</span>()</span>
<span id="cb4-7">  ))) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb4-8">  <span class="fu" style="color: #4758AB;">unnest</span>() <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">ungroup</span>() <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb4-9">  <span class="fu" style="color: #4758AB;">nest</span>(<span class="at" style="color: #657422;">.by =</span> <span class="fu" style="color: #4758AB;">c</span>(Crime))</span></code></pre></div>
</div>
<p>Here, Arrest rates were generated from a normal distribution using <code>truncnorm</code> from <code>truncnorm</code> package to get only positive value and also mirroring the mean and standard deviation in <code>USArrests</code>. Using normal distributions ensures the synthetic data mimics the actual data’s variation and mean, making the simulations realistic.</p>
</div>
<div id="tabset-2-2" class="tab-pane" aria-labelledby="tabset-2-2-tab">
<div class="cell">
<details>
<summary>Simulated data by group</summary>
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">sim_data</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tidytable: 3 × 2
  Crime   data                 
  &lt;fct&gt;   &lt;list&gt;               
1 Murder  &lt;tidytable [150 × 4]&gt;
2 Assault &lt;tidytable [150 × 4]&gt;
3 Rape    &lt;tidytable [150 × 4]&gt;</code></pre>
</div>
</div>
<div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-1-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-1" aria-controls="tabset-1-1" aria-selected="true">Murder</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-2" aria-controls="tabset-1-2" aria-selected="false">Assault</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-3-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-3" aria-controls="tabset-1-3" aria-selected="false">Rape</a></li></ul>
<div class="tab-content">
<div id="tabset-1-1" class="tab-pane active" aria-labelledby="tabset-1-1-tab">
<div class="cell">
<details>
<summary>Simulated data for Murder</summary>
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><span class="fu" style="color: #4758AB;">head</span>(sim_data[Crime <span class="sc" style="color: #5E5E5E;">==</span> <span class="st" style="color: #20794D;">"Murder"</span>, data][[<span class="dv" style="color: #AD0000;">1</span>]])</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tidytable: 6 × 4
  Illiteracy  mean    sd  Rate
  &lt;fct&gt;      &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1 high          11     3     7
2 high          11     3    11
3 high          11     3     9
4 high          11     3    12
5 high          11     3    15
6 high          11     3    13</code></pre>
</div>
</div>
</div>
<div id="tabset-1-2" class="tab-pane" aria-labelledby="tabset-1-2-tab">
<div class="cell">
<details>
<summary>Simulated data for Assault</summary>
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1"><span class="fu" style="color: #4758AB;">head</span>(sim_data[Crime <span class="sc" style="color: #5E5E5E;">==</span> <span class="st" style="color: #20794D;">"Assault"</span>, data][[<span class="dv" style="color: #AD0000;">1</span>]])</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tidytable: 6 × 4
  Illiteracy  mean    sd  Rate
  &lt;fct&gt;      &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1 high         214    79   178
2 high         214    79   163
3 high         214    79   174
4 high         214    79   234
5 high         214    79   181
6 high         214    79   139</code></pre>
</div>
</div>
</div>
<div id="tabset-1-3" class="tab-pane" aria-labelledby="tabset-1-3-tab">
<div class="cell">
<details>
<summary>Simulated data for Rape</summary>
<div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1"><span class="fu" style="color: #4758AB;">head</span>(sim_data[Crime <span class="sc" style="color: #5E5E5E;">==</span> <span class="st" style="color: #20794D;">"Rape"</span>, data][[<span class="dv" style="color: #AD0000;">1</span>]])</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tidytable: 6 × 4
  Illiteracy  mean    sd  Rate
  &lt;fct&gt;      &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1 high          23     8    25
2 high          23     8    22
3 high          23     8    16
4 high          23     8    15
5 high          23     8    35
6 high          23     8    23</code></pre>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<p>Using the simulated data above, we now fit an Anova model with <code>Illiteracy</code> as a factor (group) variable that affects the Arrest <code>Rate</code> (response variable) separately for each <code>Crime</code>. I have also made a density plot for the <code>Rate</code> variable for both simulated data and plot it with normal curve with corresponding mean and standard deviation. Following are the codes for fitting the Anova model, and creating density plot and box plot. Also we will perform a Posthoc test using Tukey’s method to make a pairwise comparison of different <code>Illiteracy</code> levels.</p>
<div class="callout-tip callout callout-style-default callout-captioned">
<div class="callout-header d-flex align-content-center" data-bs-toggle="collapse" data-bs-target=".callout-1-contents" aria-controls="callout-1" aria-expanded="false" aria-label="Toggle callout">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-caption-container flex-fill">
Code for Model Fit and Plotting
</div>
<div class="callout-btn-toggle d-inline-block border-0 py-1 ps-1 pe-0 float-end"><i class="callout-toggle"></i></div>
</div>
<div id="callout-1" class="callout-1-contents callout-collapse collapse">
<div class="callout-body-container callout-body">
<div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-3-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-3-1" aria-controls="tabset-3-1" aria-selected="true">Model and plot data</a></li><li class="nav-item"><a class="nav-link" id="tabset-3-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-3-2" aria-controls="tabset-3-2" aria-selected="false">Density plot</a></li><li class="nav-item"><a class="nav-link" id="tabset-3-3-tab" data-bs-toggle="tab" data-bs-target="#tabset-3-3" aria-controls="tabset-3-3" aria-selected="false">Box plot</a></li><li class="nav-item"><a class="nav-link" id="tabset-3-4-tab" data-bs-toggle="tab" data-bs-target="#tabset-3-4" aria-controls="tabset-3-4" aria-selected="false">Posthoc plot</a></li></ul>
<div class="tab-content">
<div id="tabset-3-1" class="tab-pane active" aria-labelledby="tabset-3-1-tab">
<div class="cell">
<div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1">mdl_fit <span class="ot" style="color: #003B4F;">&lt;-</span> sim_data <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb13-2">  <span class="fu" style="color: #4758AB;">mutate</span>(</span>
<span id="cb13-3">    <span class="at" style="color: #657422;">Fit =</span> <span class="fu" style="color: #4758AB;">map</span>(data, <span class="sc" style="color: #5E5E5E;">~</span><span class="fu" style="color: #4758AB;">lm</span>(Rate <span class="sc" style="color: #5E5E5E;">~</span> Illiteracy, <span class="at" style="color: #657422;">data =</span> .x)),</span>
<span id="cb13-4">    <span class="at" style="color: #657422;">Summary =</span> <span class="fu" style="color: #4758AB;">map</span>(Fit, summary),</span>
<span id="cb13-5">    <span class="at" style="color: #657422;">Anova =</span> <span class="fu" style="color: #4758AB;">map</span>(Fit, anova),</span>
<span id="cb13-6">    <span class="at" style="color: #657422;">Tukey =</span> <span class="fu" style="color: #4758AB;">map</span>(Fit, aov) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">map</span>(TukeyHSD)</span>
<span id="cb13-7">  )</span>
<span id="cb13-8"></span>
<span id="cb13-9">mdl_est <span class="ot" style="color: #003B4F;">&lt;-</span> mdl_fit <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb13-10">  <span class="fu" style="color: #4758AB;">summarize</span>(</span>
<span id="cb13-11">    <span class="fu" style="color: #4758AB;">across</span>(Summary, map, broom<span class="sc" style="color: #5E5E5E;">::</span>tidy), </span>
<span id="cb13-12">    <span class="at" style="color: #657422;">.by =</span> <span class="fu" style="color: #4758AB;">c</span>(Crime)</span>
<span id="cb13-13">  ) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">unnest</span>()</span>
<span id="cb13-14"></span>
<span id="cb13-15">mdl_fit_df <span class="ot" style="color: #003B4F;">&lt;-</span> mdl_fit <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb13-16">  <span class="fu" style="color: #4758AB;">summarize</span>(</span>
<span id="cb13-17">    <span class="fu" style="color: #4758AB;">across</span>(Fit, map, broom<span class="sc" style="color: #5E5E5E;">::</span>augment),</span>
<span id="cb13-18">    <span class="at" style="color: #657422;">.by =</span> <span class="fu" style="color: #4758AB;">c</span>(Crime)</span>
<span id="cb13-19">  ) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">unnest</span>()</span>
<span id="cb13-20"></span>
<span id="cb13-21">eff_df <span class="ot" style="color: #003B4F;">&lt;-</span> mdl_fit <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb13-22">  <span class="fu" style="color: #4758AB;">summarize</span>(</span>
<span id="cb13-23">    <span class="fu" style="color: #4758AB;">across</span>(Fit, map, <span class="cf" style="color: #003B4F;">function</span>(.fit) {</span>
<span id="cb13-24">      effects<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">Effect</span>(<span class="st" style="color: #20794D;">"Illiteracy"</span>, .fit) <span class="sc" style="color: #5E5E5E;">%&gt;%</span></span>
<span id="cb13-25">        <span class="fu" style="color: #4758AB;">as_tidytable</span>()</span>
<span id="cb13-26">    }),</span>
<span id="cb13-27">    <span class="at" style="color: #657422;">.by =</span> <span class="st" style="color: #20794D;">"Crime"</span></span>
<span id="cb13-28">  ) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">unnest</span>()</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>Warning in check_dep_version(): ABI version mismatch: 
lme4 was built with Matrix ABI version 1
Current Matrix ABI version is 2
Please re-install lme4 from source or restore original 'Matrix' package</code></pre>
</div>
<div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1">tky_df <span class="ot" style="color: #003B4F;">&lt;-</span> mdl_fit <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb15-2">  <span class="fu" style="color: #4758AB;">summarize</span>(</span>
<span id="cb15-3">    <span class="fu" style="color: #4758AB;">across</span>(Tukey, <span class="cf" style="color: #003B4F;">function</span>(tky) {</span>
<span id="cb15-4">      <span class="fu" style="color: #4758AB;">map</span>(tky, purrr<span class="sc" style="color: #5E5E5E;">::</span>pluck, <span class="st" style="color: #20794D;">"Illiteracy"</span>) <span class="sc" style="color: #5E5E5E;">%&gt;%</span></span>
<span id="cb15-5">        <span class="fu" style="color: #4758AB;">map</span>(as_tidytable, <span class="at" style="color: #657422;">.keep_rownames =</span> <span class="st" style="color: #20794D;">"terms"</span>)</span>
<span id="cb15-6">    }),</span>
<span id="cb15-7">    <span class="at" style="color: #657422;">.by =</span> <span class="st" style="color: #20794D;">"Crime"</span></span>
<span id="cb15-8">  ) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">unnest</span>()</span></code></pre></div>
</div>
</div>
<div id="tabset-3-2" class="tab-pane" aria-labelledby="tabset-3-2-tab">
<div class="cell">
<div class="sourceCode cell-code" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb16-1">density_plot <span class="ot" style="color: #003B4F;">&lt;-</span> sim_data <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb16-2">  <span class="fu" style="color: #4758AB;">unnest</span>(data) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb16-3">  <span class="fu" style="color: #4758AB;">ggplot</span>(<span class="fu" style="color: #4758AB;">aes</span>(Rate, <span class="at" style="color: #657422;">color =</span> Illiteracy)) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb16-4">    <span class="fu" style="color: #4758AB;">facet_wrap</span>(</span>
<span id="cb16-5">      <span class="at" style="color: #657422;">facets =</span> <span class="fu" style="color: #4758AB;">vars</span>(Crime),</span>
<span id="cb16-6">      <span class="at" style="color: #657422;">ncol =</span> <span class="dv" style="color: #AD0000;">3</span>,</span>
<span id="cb16-7">      <span class="at" style="color: #657422;">scales =</span> <span class="st" style="color: #20794D;">"free"</span>,</span>
<span id="cb16-8">    ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb16-9">    <span class="fu" style="color: #4758AB;">geom_density</span>(<span class="fu" style="color: #4758AB;">aes</span>(<span class="at" style="color: #657422;">linetype =</span> <span class="st" style="color: #20794D;">"Simulated"</span>)) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb16-10">    <span class="fu" style="color: #4758AB;">geom_density</span>(</span>
<span id="cb16-11">      <span class="fu" style="color: #4758AB;">aes</span>(<span class="at" style="color: #657422;">linetype =</span> <span class="st" style="color: #20794D;">"Fitted"</span>, <span class="at" style="color: #657422;">x =</span> .fitted),</span>
<span id="cb16-12">      <span class="at" style="color: #657422;">data =</span> mdl_fit_df</span>
<span id="cb16-13">    ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb16-14">    <span class="fu" style="color: #4758AB;">geom_rug</span>(</span>
<span id="cb16-15">      <span class="at" style="color: #657422;">data =</span> eff_df,</span>
<span id="cb16-16">      <span class="fu" style="color: #4758AB;">aes</span>(<span class="at" style="color: #657422;">x =</span> fit)</span>
<span id="cb16-17">    ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb16-18">    <span class="fu" style="color: #4758AB;">scale_linetype_manual</span>(</span>
<span id="cb16-19">      <span class="at" style="color: #657422;">breaks =</span> <span class="fu" style="color: #4758AB;">c</span>(<span class="st" style="color: #20794D;">"Simulated"</span>, <span class="st" style="color: #20794D;">"Fitted"</span>),</span>
<span id="cb16-20">      <span class="at" style="color: #657422;">values =</span> <span class="fu" style="color: #4758AB;">c</span>(<span class="st" style="color: #20794D;">"solid"</span>, <span class="st" style="color: #20794D;">"dashed"</span>)</span>
<span id="cb16-21">    ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb16-22">    <span class="fu" style="color: #4758AB;">scale_color_brewer</span>(<span class="at" style="color: #657422;">palette =</span> <span class="st" style="color: #20794D;">"Set1"</span>) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb16-23">    <span class="fu" style="color: #4758AB;">theme</span>(<span class="at" style="color: #657422;">legend.position =</span> <span class="st" style="color: #20794D;">"bottom"</span>) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb16-24">    <span class="fu" style="color: #4758AB;">labs</span>(</span>
<span id="cb16-25">      <span class="at" style="color: #657422;">x =</span> <span class="st" style="color: #20794D;">"Arrest Rate"</span>,</span>
<span id="cb16-26">      <span class="at" style="color: #657422;">y =</span> <span class="st" style="color: #20794D;">"Density"</span>,</span>
<span id="cb16-27">      <span class="at" style="color: #657422;">linetype =</span> <span class="cn" style="color: #8f5902;">NULL</span></span>
<span id="cb16-28">    )</span></code></pre></div>
</div>
</div>
<div id="tabset-3-3" class="tab-pane" aria-labelledby="tabset-3-3-tab">
<div class="cell">
<div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb17-1">effect_plot <span class="ot" style="color: #003B4F;">&lt;-</span> mdl_fit_df <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb17-2">  <span class="fu" style="color: #4758AB;">ggplot</span>(<span class="fu" style="color: #4758AB;">aes</span>(Rate, Illiteracy)) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb17-3">    <span class="fu" style="color: #4758AB;">facet_grid</span>(</span>
<span id="cb17-4">      <span class="at" style="color: #657422;">cols =</span> <span class="fu" style="color: #4758AB;">vars</span>(Crime),</span>
<span id="cb17-5">      <span class="at" style="color: #657422;">scales =</span> <span class="st" style="color: #20794D;">"free_x"</span></span>
<span id="cb17-6">    ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb17-7">    <span class="fu" style="color: #4758AB;">geom_boxplot</span>(</span>
<span id="cb17-8">      <span class="at" style="color: #657422;">notch =</span> <span class="cn" style="color: #8f5902;">TRUE</span>, </span>
<span id="cb17-9">      <span class="at" style="color: #657422;">color =</span> <span class="st" style="color: #20794D;">"grey"</span>,</span>
<span id="cb17-10">      <span class="at" style="color: #657422;">outlier.colour =</span> <span class="st" style="color: #20794D;">"grey"</span></span>
<span id="cb17-11">    ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb17-12">    <span class="fu" style="color: #4758AB;">geom_point</span>(</span>
<span id="cb17-13">      <span class="at" style="color: #657422;">position =</span> <span class="fu" style="color: #4758AB;">position_jitter</span>(<span class="at" style="color: #657422;">height =</span> <span class="fl" style="color: #AD0000;">0.25</span>),</span>
<span id="cb17-14">      <span class="at" style="color: #657422;">color =</span> <span class="st" style="color: #20794D;">"grey"</span>,</span>
<span id="cb17-15">      <span class="at" style="color: #657422;">size =</span> <span class="fu" style="color: #4758AB;">rel</span>(<span class="fl" style="color: #AD0000;">0.9</span>)</span>
<span id="cb17-16">    ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb17-17">    <span class="fu" style="color: #4758AB;">geom_pointrange</span>(</span>
<span id="cb17-18">      <span class="fu" style="color: #4758AB;">aes</span>(</span>
<span id="cb17-19">        <span class="at" style="color: #657422;">color =</span> <span class="st" style="color: #20794D;">"Estimated"</span>,</span>
<span id="cb17-20">        <span class="at" style="color: #657422;">xmin =</span> lower,</span>
<span id="cb17-21">        <span class="at" style="color: #657422;">xmax =</span> upper,</span>
<span id="cb17-22">        <span class="at" style="color: #657422;">x =</span> fit</span>
<span id="cb17-23">      ),</span>
<span id="cb17-24">      <span class="at" style="color: #657422;">data =</span> eff_df</span>
<span id="cb17-25">    ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb17-26">    <span class="fu" style="color: #4758AB;">geom_point</span>(</span>
<span id="cb17-27">      <span class="fu" style="color: #4758AB;">aes</span>(<span class="at" style="color: #657422;">color =</span> <span class="st" style="color: #20794D;">"True Mean"</span>, <span class="at" style="color: #657422;">x =</span> mean),</span>
<span id="cb17-28">      <span class="at" style="color: #657422;">data =</span> sim_design</span>
<span id="cb17-29">    ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb17-30">    <span class="fu" style="color: #4758AB;">scale_color_brewer</span>(</span>
<span id="cb17-31">      <span class="at" style="color: #657422;">name =</span> <span class="st" style="color: #20794D;">"Mean"</span>,</span>
<span id="cb17-32">      <span class="at" style="color: #657422;">palette =</span> <span class="st" style="color: #20794D;">"Set1"</span></span>
<span id="cb17-33">    ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb17-34">    <span class="fu" style="color: #4758AB;">theme</span>(</span>
<span id="cb17-35">      <span class="at" style="color: #657422;">legend.position =</span> <span class="st" style="color: #20794D;">"bottom"</span></span>
<span id="cb17-36">    )</span></code></pre></div>
</div>
</div>
<div id="tabset-3-4" class="tab-pane" aria-labelledby="tabset-3-4-tab">
<div class="cell">
<div class="sourceCode cell-code" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb18-1">tukey_plot <span class="ot" style="color: #003B4F;">&lt;-</span> tky_df <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb18-2">  <span class="fu" style="color: #4758AB;">ggplot</span>(<span class="fu" style="color: #4758AB;">aes</span>(diff, terms)) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb18-3">    <span class="fu" style="color: #4758AB;">facet_grid</span>(</span>
<span id="cb18-4">      <span class="at" style="color: #657422;">cols =</span> <span class="fu" style="color: #4758AB;">vars</span>(Crime), </span>
<span id="cb18-5">      <span class="at" style="color: #657422;">scales =</span> <span class="st" style="color: #20794D;">"free_x"</span></span>
<span id="cb18-6">    ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb18-7">    <span class="fu" style="color: #4758AB;">geom_pointrange</span>(</span>
<span id="cb18-8">      <span class="fu" style="color: #4758AB;">aes</span>(<span class="at" style="color: #657422;">xmin =</span> lwr, <span class="at" style="color: #657422;">xmax =</span> upr, <span class="at" style="color: #657422;">x =</span> diff),</span>
<span id="cb18-9">      <span class="at" style="color: #657422;">shape =</span> <span class="dv" style="color: #AD0000;">21</span>,</span>
<span id="cb18-10">      <span class="at" style="color: #657422;">fill =</span> <span class="st" style="color: #20794D;">"whitesmoke"</span></span>
<span id="cb18-11">    ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb18-12">    <span class="fu" style="color: #4758AB;">geom_vline</span>(</span>
<span id="cb18-13">      <span class="at" style="color: #657422;">xintercept =</span> <span class="dv" style="color: #AD0000;">0</span>,</span>
<span id="cb18-14">      <span class="at" style="color: #657422;">linetype =</span> <span class="st" style="color: #20794D;">"dashed"</span>,</span>
<span id="cb18-15">      <span class="at" style="color: #657422;">color =</span> <span class="st" style="color: #20794D;">"royalblue"</span></span>
<span id="cb18-16">    ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb18-17">    <span class="fu" style="color: #4758AB;">scale_color_brewer</span>(</span>
<span id="cb18-18">      <span class="at" style="color: #657422;">name =</span> <span class="st" style="color: #20794D;">"Mean"</span>,</span>
<span id="cb18-19">      <span class="at" style="color: #657422;">palette =</span> <span class="st" style="color: #20794D;">"Set1"</span></span>
<span id="cb18-20">    ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb18-21">    <span class="fu" style="color: #4758AB;">labs</span>(</span>
<span id="cb18-22">      <span class="at" style="color: #657422;">y =</span> <span class="st" style="color: #20794D;">"Illiteracy"</span>,</span>
<span id="cb18-23">      <span class="at" style="color: #657422;">x =</span> <span class="st" style="color: #20794D;">"Effect difference"</span>,</span>
<span id="cb18-24">      <span class="at" style="color: #657422;">title =</span> <span class="st" style="color: #20794D;">"Pairwise comparison of levels of illitracy"</span></span>
<span id="cb18-25">    ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb18-26">    <span class="fu" style="color: #4758AB;">expand_limits</span>(<span class="at" style="color: #657422;">x =</span> <span class="dv" style="color: #AD0000;">0</span>)</span></code></pre></div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</section>
</section>
<section id="analysis" class="level2">
<h2 class="anchored" data-anchor-id="analysis">Analysis</h2>
<div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-5-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-5-1" aria-controls="tabset-5-1" aria-selected="true">Distribution</a></li><li class="nav-item"><a class="nav-link" id="tabset-5-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-5-2" aria-controls="tabset-5-2" aria-selected="false">Model Fit</a></li><li class="nav-item"><a class="nav-link" id="tabset-5-3-tab" data-bs-toggle="tab" data-bs-target="#tabset-5-3" aria-controls="tabset-5-3" aria-selected="false">Effect Plot</a></li><li class="nav-item"><a class="nav-link" id="tabset-5-4-tab" data-bs-toggle="tab" data-bs-target="#tabset-5-4" aria-controls="tabset-5-4" aria-selected="false">Post-hoc</a></li></ul>
<div class="tab-content">
<div id="tabset-5-1" class="tab-pane active" aria-labelledby="tabset-5-1-tab">
<div class="cell" data-fig.asp="0.5">
<details>
<summary>Density of simulated data</summary>
<div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb19-1">density_plot</span></code></pre></div>
</details>
<div class="cell-output-display">
<p><img src="https://mathatistics.com/blog/posts/2023-02-04-simulation-anova-and-analysis/index_files/figure-html/unnamed-chunk-13-1.svg" class="img-fluid" style="width:100.0%"></p>
</div>
</div>
<p>Here, the kernel density plots for arrest rates were shown alongside the normal density curve. This visual assessment checked the goodness of fit between simulated data and the expected normal distribution. The close match between kernel density and normal density validates that the data follows a normal distribution, confirming the simulation’s accuracy.</p>
</div>
<div id="tabset-5-2" class="tab-pane" aria-labelledby="tabset-5-2-tab">
<p>A one-way ANOVA output below helps to find if there is any difference between arrest rate based on illiteracy level for each crime. Here we see that in all crimes high illiteracy level was considered as reference and compared to this both medium and low illiteracy levels have lower arrest rate. This suggest that the higher illiteracy rate corresponds to higher arrest rate. However, for crimes: assault and rape, the effect of medium illiteracy rate has high p-value and can not be considered to have significant effect on arrest rate.</p>
<div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-4-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-4-1" aria-controls="tabset-4-1" aria-selected="true">Murder</a></li><li class="nav-item"><a class="nav-link" id="tabset-4-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-4-2" aria-controls="tabset-4-2" aria-selected="false">Assault</a></li><li class="nav-item"><a class="nav-link" id="tabset-4-3-tab" data-bs-toggle="tab" data-bs-target="#tabset-4-3" aria-controls="tabset-4-3" aria-selected="false">Rape</a></li></ul>
<div class="tab-content">
<div id="tabset-4-1" class="tab-pane active" aria-labelledby="tabset-4-1-tab">
<div class="cell">
<details>
<summary>ANOVA output for crime: Murder</summary>
<div class="sourceCode cell-code" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb20-1">mdl_fit[Crime <span class="sc" style="color: #5E5E5E;">==</span> <span class="st" style="color: #20794D;">"Murder"</span>, Summary][[<span class="dv" style="color: #AD0000;">1</span>]]</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>
Call:
lm(formula = Rate ~ Illiteracy, data = .x)

Residuals:
   Min     1Q Median     3Q    Max 
 -8.44  -2.31   0.08   1.95   7.56 

Coefficients:
                 Estimate Std. Error t value Pr(&gt;|t|)    
(Intercept)       10.9200     0.4247  25.713  &lt; 2e-16 ***
Illiteracymedium  -2.4800     0.6006  -4.129 6.08e-05 ***
Illiteracylow     -6.0000     0.6006  -9.990  &lt; 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.003 on 147 degrees of freedom
Multiple R-squared:  0.4068,    Adjusted R-squared:  0.3987 
F-statistic:  50.4 on 2 and 147 DF,  p-value: &lt; 2.2e-16</code></pre>
</div>
</div>
</div>
<div id="tabset-4-2" class="tab-pane" aria-labelledby="tabset-4-2-tab">
<div class="cell">
<details>
<summary>ANOVA output for crime: Assault</summary>
<div class="sourceCode cell-code" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb22-1">mdl_fit[Crime <span class="sc" style="color: #5E5E5E;">==</span> <span class="st" style="color: #20794D;">"Assault"</span>, Summary][[<span class="dv" style="color: #AD0000;">1</span>]]</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>
Call:
lm(formula = Rate ~ Illiteracy, data = .x)

Residuals:
    Min      1Q  Median      3Q     Max 
-158.32  -42.22   -3.11   38.74  221.68 

Coefficients:
                 Estimate Std. Error t value Pr(&gt;|t|)    
(Intercept)        208.32       9.64  21.609  &lt; 2e-16 ***
Illiteracymedium    -5.00      13.63  -0.367    0.714    
Illiteracylow      -95.42      13.63  -6.999 8.51e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 68.17 on 147 degrees of freedom
Multiple R-squared:  0.2969,    Adjusted R-squared:  0.2873 
F-statistic: 31.03 on 2 and 147 DF,  p-value: 5.709e-12</code></pre>
</div>
</div>
</div>
<div id="tabset-4-3" class="tab-pane" aria-labelledby="tabset-4-3-tab">
<div class="cell">
<details>
<summary>ANOVA output for crime: Rape</summary>
<div class="sourceCode cell-code" id="cb24" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb24-1">mdl_fit[Crime <span class="sc" style="color: #5E5E5E;">==</span> <span class="st" style="color: #20794D;">"Rape"</span>, Summary][[<span class="dv" style="color: #AD0000;">1</span>]]</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>
Call:
lm(formula = Rate ~ Illiteracy, data = .x)

Residuals:
   Min     1Q Median     3Q    Max 
-20.10  -6.10  -0.11   5.56  20.90 

Coefficients:
                 Estimate Std. Error t value Pr(&gt;|t|)    
(Intercept)        22.440      1.219  18.403  &lt; 2e-16 ***
Illiteracymedium   -1.340      1.724  -0.777  0.43837    
Illiteracylow      -5.320      1.724  -3.085  0.00243 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.622 on 147 degrees of freedom
Multiple R-squared:  0.06547,   Adjusted R-squared:  0.05276 
F-statistic:  5.15 on 2 and 147 DF,  p-value: 0.006894</code></pre>
</div>
</div>
</div>
</div>
</div>
</div>
<div id="tabset-5-3" class="tab-pane" aria-labelledby="tabset-5-3-tab">
<div class="cell" data-fig.asp="0.5">
<details>
<summary>Boxplot with fitted and true mean</summary>
<div class="sourceCode cell-code" id="cb26" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb26-1">effect_plot</span></code></pre></div>
</details>
<div class="cell-output-display">
<p><img src="https://mathatistics.com/blog/posts/2023-02-04-simulation-anova-and-analysis/index_files/figure-html/unnamed-chunk-17-1.svg" class="img-fluid" style="width:100.0%"></p>
</div>
</div>
<p>Boxplots displayed arrest rate distributions within each illiteracy group stratified by crime. Points were scattered for detailed visualization, along with fitted means and confidence intervals. Here for all crimes, higher illiteracy corresponds to higher arrest rate and is more visible in murder.</p>
</div>
<div id="tabset-5-4" class="tab-pane" aria-labelledby="tabset-5-4-tab">
<div class="cell" data-fig.asp="0.5">
<details>
<summary>Post-hoc plot comparing pairwise difference</summary>
<div class="sourceCode cell-code" id="cb27" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb27-1">tukey_plot</span></code></pre></div>
</details>
<div class="cell-output-display">
<p><img src="https://mathatistics.com/blog/posts/2023-02-04-simulation-anova-and-analysis/index_files/figure-html/unnamed-chunk-18-1.svg" class="img-fluid" style="width:100.0%"></p>
</div>
</div>
<p>The post-hoc plot has highlighted statistically significant differences between different levels of illiteracy. Here, all pairs of illiteracy levels differ significantly at 95% confidence level for Murder however there is not such significant difference between medium and high illiteracy level for assault and rape.</p>
</div>
</div>
</div>
</section>
</section>
<section id="real-data-example" class="level1">
<h1>Real Data Example</h1>
<section id="data-preparation-and-dataset" class="level2">
<h2 class="anchored" data-anchor-id="data-preparation-and-dataset">Data preparation and Dataset</h2>
<p>Here, I have used <code>USArrests</code> dataset excluding the crime <code>UrbanPop</code> and merged it with another dataset <code>state.x77</code> using its <code>Illiteracy</code> variable for 50 states. The Illiteracy was than categorized using its quantiles into three categories <code>low</code>, <code>medium</code>, and <code>high</code> mimiking the simulation example above.</p>
<div class="cell">
<details>
<summary>Merging USArrests and state.x77</summary>
<div class="sourceCode cell-code" id="cb28" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb28-1">arrest <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">as_tidytable</span>(USArrests, <span class="at" style="color: #657422;">.keep_rownames =</span> <span class="st" style="color: #20794D;">"States"</span>) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb28-2">  tidytable<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">left_join</span>(</span>
<span id="cb28-3">    <span class="fu" style="color: #4758AB;">as_tidytable</span>(</span>
<span id="cb28-4">      state.x77[, <span class="st" style="color: #20794D;">"Illiteracy"</span>, <span class="at" style="color: #657422;">drop =</span> <span class="cn" style="color: #8f5902;">FALSE</span>],</span>
<span id="cb28-5">      <span class="at" style="color: #657422;">.keep_rownames =</span> <span class="st" style="color: #20794D;">"States"</span></span>
<span id="cb28-6">    ),</span>
<span id="cb28-7">    <span class="at" style="color: #657422;">by =</span> <span class="st" style="color: #20794D;">"States"</span></span>
<span id="cb28-8">  ) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb28-9">  <span class="fu" style="color: #4758AB;">select</span>(<span class="sc" style="color: #5E5E5E;">-</span>UrbanPop) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb28-10">  <span class="fu" style="color: #4758AB;">mutate</span>(<span class="fu" style="color: #4758AB;">across</span>(Murder<span class="sc" style="color: #5E5E5E;">:</span>Illiteracy, as.numeric)) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb28-11">  <span class="fu" style="color: #4758AB;">mutate</span>(<span class="at" style="color: #657422;">Illiteracy =</span> <span class="fu" style="color: #4758AB;">cut.default</span>(</span>
<span id="cb28-12">    Illiteracy,</span>
<span id="cb28-13">    <span class="at" style="color: #657422;">breaks =</span> <span class="fu" style="color: #4758AB;">quantile</span>(Illiteracy, <span class="fu" style="color: #4758AB;">c</span>(<span class="dv" style="color: #AD0000;">0</span>, <span class="dv" style="color: #AD0000;">1</span><span class="sc" style="color: #5E5E5E;">/</span><span class="dv" style="color: #AD0000;">3</span>, <span class="dv" style="color: #AD0000;">2</span><span class="sc" style="color: #5E5E5E;">/</span><span class="dv" style="color: #AD0000;">3</span>, <span class="dv" style="color: #AD0000;">1</span>)),</span>
<span id="cb28-14">    <span class="at" style="color: #657422;">labels =</span> <span class="fu" style="color: #4758AB;">c</span>(<span class="st" style="color: #20794D;">"low"</span>, <span class="st" style="color: #20794D;">"medium"</span>, <span class="st" style="color: #20794D;">"high"</span>),</span>
<span id="cb28-15">    <span class="at" style="color: #657422;">include.lowest =</span> <span class="cn" style="color: #8f5902;">TRUE</span></span>
<span id="cb28-16">  )) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb28-17">  <span class="fu" style="color: #4758AB;">mutate</span>(<span class="at" style="color: #657422;">Illiteracy =</span> <span class="fu" style="color: #4758AB;">factor</span>(</span>
<span id="cb28-18">    Illiteracy,</span>
<span id="cb28-19">    <span class="at" style="color: #657422;">levels =</span> <span class="fu" style="color: #4758AB;">c</span>(<span class="st" style="color: #20794D;">"high"</span>, <span class="st" style="color: #20794D;">"medium"</span>, <span class="st" style="color: #20794D;">"low"</span>)</span>
<span id="cb28-20">  )) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb28-21">  <span class="fu" style="color: #4758AB;">pivot_longer</span>(</span>
<span id="cb28-22">    <span class="at" style="color: #657422;">cols =</span> Murder<span class="sc" style="color: #5E5E5E;">:</span>Rape,</span>
<span id="cb28-23">    <span class="at" style="color: #657422;">names_to =</span> <span class="st" style="color: #20794D;">"Crime"</span>,</span>
<span id="cb28-24">    <span class="at" style="color: #657422;">values_to =</span> <span class="st" style="color: #20794D;">"Rate"</span></span>
<span id="cb28-25">  ) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">nest</span>(<span class="at" style="color: #657422;">.by =</span> <span class="fu" style="color: #4758AB;">c</span>(Crime))</span></code></pre></div>
</details>
</div>
<div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-6-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-6-1" aria-controls="tabset-6-1" aria-selected="true">Data by Crime</a></li><li class="nav-item"><a class="nav-link" id="tabset-6-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-6-2" aria-controls="tabset-6-2" aria-selected="false">Murder</a></li><li class="nav-item"><a class="nav-link" id="tabset-6-3-tab" data-bs-toggle="tab" data-bs-target="#tabset-6-3" aria-controls="tabset-6-3" aria-selected="false">Assault</a></li><li class="nav-item"><a class="nav-link" id="tabset-6-4-tab" data-bs-toggle="tab" data-bs-target="#tabset-6-4" aria-controls="tabset-6-4" aria-selected="false">Rape</a></li></ul>
<div class="tab-content">
<div id="tabset-6-1" class="tab-pane active" aria-labelledby="tabset-6-1-tab">
<div class="cell">
<details>
<summary>Data by group</summary>
<div class="sourceCode cell-code" id="cb29" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb29-1">arrest</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tidytable: 3 × 2
  Crime   data                
  &lt;chr&gt;   &lt;list&gt;              
1 Murder  &lt;tidytable [50 × 3]&gt;
2 Assault &lt;tidytable [50 × 3]&gt;
3 Rape    &lt;tidytable [50 × 3]&gt;</code></pre>
</div>
</div>
</div>
<div id="tabset-6-2" class="tab-pane" aria-labelledby="tabset-6-2-tab">
<div class="cell">
<details>
<summary>Data for crime: Murder</summary>
<div class="sourceCode cell-code" id="cb31" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb31-1"><span class="fu" style="color: #4758AB;">head</span>(arrest[Crime <span class="sc" style="color: #5E5E5E;">==</span> <span class="st" style="color: #20794D;">"Murder"</span>, data][[<span class="dv" style="color: #AD0000;">1</span>]], <span class="dv" style="color: #AD0000;">3</span>)</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tidytable: 3 × 3
  States  Illiteracy  Rate
  &lt;chr&gt;   &lt;fct&gt;      &lt;dbl&gt;
1 Alabama high        13.2
2 Alaska  high        10  
3 Arizona high         8.1</code></pre>
</div>
</div>
</div>
<div id="tabset-6-3" class="tab-pane" aria-labelledby="tabset-6-3-tab">
<div class="cell">
<details>
<summary>Data for crime: Assault</summary>
<div class="sourceCode cell-code" id="cb33" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb33-1"><span class="fu" style="color: #4758AB;">head</span>(arrest[Crime <span class="sc" style="color: #5E5E5E;">==</span> <span class="st" style="color: #20794D;">"Assault"</span>, data][[<span class="dv" style="color: #AD0000;">1</span>]], <span class="dv" style="color: #AD0000;">3</span>)</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tidytable: 3 × 3
  States  Illiteracy  Rate
  &lt;chr&gt;   &lt;fct&gt;      &lt;dbl&gt;
1 Alabama high         236
2 Alaska  high         263
3 Arizona high         294</code></pre>
</div>
</div>
</div>
<div id="tabset-6-4" class="tab-pane" aria-labelledby="tabset-6-4-tab">
<div class="cell">
<details>
<summary>Data for crime: Rape</summary>
<div class="sourceCode cell-code" id="cb35" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb35-1"><span class="fu" style="color: #4758AB;">head</span>(arrest[Crime <span class="sc" style="color: #5E5E5E;">==</span> <span class="st" style="color: #20794D;">"Rape"</span>, data][[<span class="dv" style="color: #AD0000;">1</span>]], <span class="dv" style="color: #AD0000;">3</span>)</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tidytable: 3 × 3
  States  Illiteracy  Rate
  &lt;chr&gt;   &lt;fct&gt;      &lt;dbl&gt;
1 Alabama high        21.2
2 Alaska  high        44.5
3 Arizona high        31  </code></pre>
</div>
</div>
</div>
</div>
</div>
</section>
<section id="analysis-1" class="level2">
<h2 class="anchored" data-anchor-id="analysis-1">Analysis</h2>
<p>I am following a similar pattern as in the analysis of simulated data: distribution plot, fitting an ANOVA model, effect plot showing the fitted value with a boxplot, and a Post-hoc showing pairwise comparison of the effect of illiteracy levels on arrest rate.</p>
<div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-8-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-8-1" aria-controls="tabset-8-1" aria-selected="true">Distribution</a></li><li class="nav-item"><a class="nav-link" id="tabset-8-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-8-2" aria-controls="tabset-8-2" aria-selected="false">Fit</a></li><li class="nav-item"><a class="nav-link" id="tabset-8-3-tab" data-bs-toggle="tab" data-bs-target="#tabset-8-3" aria-controls="tabset-8-3" aria-selected="false">Effects</a></li><li class="nav-item"><a class="nav-link" id="tabset-8-4-tab" data-bs-toggle="tab" data-bs-target="#tabset-8-4" aria-controls="tabset-8-4" aria-selected="false">Post-hoc</a></li></ul>
<div class="tab-content">
<div id="tabset-8-1" class="tab-pane active" aria-labelledby="tabset-8-1-tab">
<div class="cell" data-fig.asp="0.5">
<details>
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb37" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb37-1"><span class="fu" style="color: #4758AB;">ggplot</span>(<span class="fu" style="color: #4758AB;">unnest</span>(arrest), <span class="fu" style="color: #4758AB;">aes</span>(Rate, <span class="at" style="color: #657422;">color =</span> Illiteracy)) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb37-2">  <span class="fu" style="color: #4758AB;">geom_density</span>(<span class="fu" style="color: #4758AB;">aes</span>(<span class="at" style="color: #657422;">linetype =</span> <span class="st" style="color: #20794D;">"Simulated"</span>)) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb37-3">  <span class="fu" style="color: #4758AB;">geom_density</span>(</span>
<span id="cb37-4">    <span class="fu" style="color: #4758AB;">aes</span>(<span class="at" style="color: #657422;">linetype =</span> <span class="st" style="color: #20794D;">"Fitted"</span>, <span class="at" style="color: #657422;">x =</span> .fitted),</span>
<span id="cb37-5">    <span class="at" style="color: #657422;">data =</span> mdl_fit_df</span>
<span id="cb37-6">  ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb37-7">  <span class="fu" style="color: #4758AB;">geom_rug</span>(</span>
<span id="cb37-8">    <span class="at" style="color: #657422;">data =</span> eff_df,</span>
<span id="cb37-9">    <span class="fu" style="color: #4758AB;">aes</span>(<span class="at" style="color: #657422;">x =</span> fit)</span>
<span id="cb37-10">  ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb37-11">  <span class="fu" style="color: #4758AB;">scale_linetype_manual</span>(</span>
<span id="cb37-12">    <span class="at" style="color: #657422;">breaks =</span> <span class="fu" style="color: #4758AB;">c</span>(<span class="st" style="color: #20794D;">"Simulated"</span>, <span class="st" style="color: #20794D;">"Fitted"</span>),</span>
<span id="cb37-13">    <span class="at" style="color: #657422;">values =</span> <span class="fu" style="color: #4758AB;">c</span>(<span class="st" style="color: #20794D;">"solid"</span>, <span class="st" style="color: #20794D;">"dashed"</span>)</span>
<span id="cb37-14">  ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb37-15">  <span class="fu" style="color: #4758AB;">scale_color_brewer</span>(<span class="at" style="color: #657422;">palette =</span> <span class="st" style="color: #20794D;">"Set1"</span>) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb37-16">  <span class="fu" style="color: #4758AB;">theme</span>(<span class="at" style="color: #657422;">legend.position =</span> <span class="st" style="color: #20794D;">"bottom"</span>) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb37-17">  <span class="fu" style="color: #4758AB;">labs</span>(</span>
<span id="cb37-18">    <span class="at" style="color: #657422;">x =</span> <span class="st" style="color: #20794D;">"Crime"</span>,</span>
<span id="cb37-19">    <span class="at" style="color: #657422;">y =</span> <span class="st" style="color: #20794D;">"Density"</span>,</span>
<span id="cb37-20">    <span class="at" style="color: #657422;">linetype =</span> <span class="cn" style="color: #8f5902;">NULL</span></span>
<span id="cb37-21">  ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb37-22">  <span class="fu" style="color: #4758AB;">facet_wrap</span>(</span>
<span id="cb37-23">    <span class="at" style="color: #657422;">facets =</span> <span class="fu" style="color: #4758AB;">vars</span>(Crime),</span>
<span id="cb37-24">    <span class="at" style="color: #657422;">scales =</span> <span class="st" style="color: #20794D;">"free"</span></span>
<span id="cb37-25">  )</span></code></pre></div>
</details>
<div class="cell-output-display">
<p><img src="https://mathatistics.com/blog/posts/2023-02-04-simulation-anova-and-analysis/index_files/figure-html/unnamed-chunk-25-1.svg" class="img-fluid" style="width:100.0%"></p>
</div>
</div>
<p>Kernel density alongside normal density curves for each crime shows and validates the normal distribution of the real data and help confirm the normality assumption for ANOVA, ensuring that the real data analysis aligns with the assumptions necessary for valid inference.</p>
</div>
<div id="tabset-8-2" class="tab-pane" aria-labelledby="tabset-8-2-tab">
<div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-7-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-7-1" aria-controls="tabset-7-1" aria-selected="true">Murder</a></li><li class="nav-item"><a class="nav-link" id="tabset-7-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-7-2" aria-controls="tabset-7-2" aria-selected="false">Assult</a></li><li class="nav-item"><a class="nav-link" id="tabset-7-3-tab" data-bs-toggle="tab" data-bs-target="#tabset-7-3" aria-controls="tabset-7-3" aria-selected="false">Rape</a></li></ul>
<div class="tab-content">
<div id="tabset-7-1" class="tab-pane active" aria-labelledby="tabset-7-1-tab">
<div class="cell">
<details>
<summary>ANOVA output for crime: Murder</summary>
<div class="sourceCode cell-code" id="cb38" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb38-1">mdl_fit[Crime <span class="sc" style="color: #5E5E5E;">==</span> <span class="st" style="color: #20794D;">"Murder"</span>, Summary][[<span class="dv" style="color: #AD0000;">1</span>]]</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>
Call:
lm(formula = Rate ~ Illiteracy, data = .x)

Residuals:
    Min      1Q  Median      3Q     Max 
-6.7067 -2.3000 -0.3059  1.7882  7.8933 

Coefficients:
                 Estimate Std. Error t value Pr(&gt;|t|)    
(Intercept)       11.4118     0.8084  14.116  &lt; 2e-16 ***
Illiteracymedium  -3.9051     1.1808  -3.307  0.00181 ** 
Illiteracylow     -6.8118     1.1273  -6.043 2.32e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.333 on 47 degrees of freedom
Multiple R-squared:  0.4382,    Adjusted R-squared:  0.4143 
F-statistic: 18.33 on 2 and 47 DF,  p-value: 1.302e-06</code></pre>
</div>
</div>
</div>
<div id="tabset-7-2" class="tab-pane" aria-labelledby="tabset-7-2-tab">
<div class="cell">
<details>
<summary>ANOVA output for crime: Assault</summary>
<div class="sourceCode cell-code" id="cb40" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb40-1">mdl_fit[Crime <span class="sc" style="color: #5E5E5E;">==</span> <span class="st" style="color: #20794D;">"Assault"</span>, Summary][[<span class="dv" style="color: #AD0000;">1</span>]]</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>
Call:
lm(formula = Rate ~ Illiteracy, data = .x)

Residuals:
     Min       1Q   Median       3Q      Max 
-168.000  -41.792   -4.083   47.958  145.333 

Coefficients:
                 Estimate Std. Error t value Pr(&gt;|t|)    
(Intercept)        214.00      17.53  12.208 3.51e-16 ***
Illiteracymedium   -24.33      25.60  -0.950 0.346771    
Illiteracylow      -99.83      24.44  -4.084 0.000171 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 72.28 on 47 degrees of freedom
Multiple R-squared:  0.2786,    Adjusted R-squared:  0.2479 
F-statistic: 9.074 on 2 and 47 DF,  p-value: 0.0004653</code></pre>
</div>
</div>
</div>
<div id="tabset-7-3" class="tab-pane" aria-labelledby="tabset-7-3-tab">
<div class="cell">
<details>
<summary>ANOVA output for crime: Rape</summary>
<div class="sourceCode cell-code" id="cb42" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb42-1">mdl_fit[Crime <span class="sc" style="color: #5E5E5E;">==</span> <span class="st" style="color: #20794D;">"Rape"</span>, Summary][[<span class="dv" style="color: #AD0000;">1</span>]]</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>
Call:
lm(formula = Rate ~ Illiteracy, data = .x)

Residuals:
    Min      1Q  Median      3Q     Max 
-14.133  -6.259  -2.357   3.766  26.939 

Coefficients:
                 Estimate Std. Error t value Pr(&gt;|t|)    
(Intercept)        23.353      2.275  10.263 1.38e-13 ***
Illiteracymedium   -1.920      3.323  -0.578    0.566    
Illiteracylow      -4.292      3.173  -1.353    0.183    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 9.382 on 47 degrees of freedom
Multiple R-squared:  0.03766,   Adjusted R-squared:  -0.003286 
F-statistic: 0.9198 on 2 and 47 DF,  p-value: 0.4057</code></pre>
</div>
</div>
</div>
</div>
</div>
</div>
<div id="tabset-8-3" class="tab-pane" aria-labelledby="tabset-8-3-tab">
<div class="cell" data-fig.asp="0.5">
<details>
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb44" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb44-1"><span class="fu" style="color: #4758AB;">ggplot</span>(<span class="fu" style="color: #4758AB;">unnest</span>(arrest), <span class="fu" style="color: #4758AB;">aes</span>(Rate, Illiteracy)) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb44-2">  <span class="fu" style="color: #4758AB;">geom_boxplot</span>(</span>
<span id="cb44-3">    <span class="at" style="color: #657422;">notch =</span> <span class="cn" style="color: #8f5902;">FALSE</span>,</span>
<span id="cb44-4">    <span class="at" style="color: #657422;">color =</span> <span class="st" style="color: #20794D;">"grey"</span>,</span>
<span id="cb44-5">    <span class="at" style="color: #657422;">outlier.colour =</span> <span class="st" style="color: #20794D;">"grey"</span></span>
<span id="cb44-6">  ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb44-7">  <span class="fu" style="color: #4758AB;">geom_point</span>(</span>
<span id="cb44-8">    <span class="at" style="color: #657422;">position =</span> <span class="fu" style="color: #4758AB;">position_jitter</span>(<span class="at" style="color: #657422;">height =</span> <span class="fl" style="color: #AD0000;">0.25</span>),</span>
<span id="cb44-9">    <span class="at" style="color: #657422;">color =</span> <span class="st" style="color: #20794D;">"grey"</span>,</span>
<span id="cb44-10">  ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb44-11">  <span class="fu" style="color: #4758AB;">geom_pointrange</span>(</span>
<span id="cb44-12">    <span class="fu" style="color: #4758AB;">aes</span>(</span>
<span id="cb44-13">      <span class="at" style="color: #657422;">xmin =</span> lower,</span>
<span id="cb44-14">      <span class="at" style="color: #657422;">xmax =</span> upper,</span>
<span id="cb44-15">      <span class="at" style="color: #657422;">x =</span> fit</span>
<span id="cb44-16">    ),</span>
<span id="cb44-17">    <span class="at" style="color: #657422;">color =</span> <span class="st" style="color: #20794D;">"firebrick"</span>,</span>
<span id="cb44-18">    <span class="at" style="color: #657422;">data =</span> eff_df</span>
<span id="cb44-19">  ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb44-20">  <span class="fu" style="color: #4758AB;">scale_color_brewer</span>(</span>
<span id="cb44-21">    <span class="at" style="color: #657422;">name =</span> <span class="st" style="color: #20794D;">"Mean"</span>,</span>
<span id="cb44-22">    <span class="at" style="color: #657422;">palette =</span> <span class="st" style="color: #20794D;">"Set1"</span></span>
<span id="cb44-23">  ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb44-24">  <span class="fu" style="color: #4758AB;">theme</span>(</span>
<span id="cb44-25">    <span class="at" style="color: #657422;">legend.position =</span> <span class="st" style="color: #20794D;">"bottom"</span></span>
<span id="cb44-26">  ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb44-27">  <span class="fu" style="color: #4758AB;">facet_wrap</span>(<span class="at" style="color: #657422;">facets =</span> <span class="fu" style="color: #4758AB;">vars</span>(Crime), <span class="at" style="color: #657422;">scales =</span> <span class="st" style="color: #20794D;">"free_x"</span>)</span></code></pre></div>
</details>
<div class="cell-output-display">
<p><img src="https://mathatistics.com/blog/posts/2023-02-04-simulation-anova-and-analysis/index_files/figure-html/unnamed-chunk-29-1.svg" class="img-fluid" style="width:100.0%"></p>
</div>
</div>
</div>
<div id="tabset-8-4" class="tab-pane" aria-labelledby="tabset-8-4-tab">
<div class="cell" data-fig.asp="0.5">
<details>
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb45" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb45-1"><span class="fu" style="color: #4758AB;">ggplot</span>(tky_df, <span class="fu" style="color: #4758AB;">aes</span>(diff, terms)) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb45-2">  <span class="fu" style="color: #4758AB;">geom_pointrange</span>(</span>
<span id="cb45-3">    <span class="fu" style="color: #4758AB;">aes</span>(<span class="at" style="color: #657422;">xmin =</span> lwr, <span class="at" style="color: #657422;">xmax =</span> upr, <span class="at" style="color: #657422;">x =</span> diff),</span>
<span id="cb45-4">    <span class="at" style="color: #657422;">shape =</span> <span class="dv" style="color: #AD0000;">21</span>,</span>
<span id="cb45-5">    <span class="at" style="color: #657422;">fill =</span> <span class="st" style="color: #20794D;">"whitesmoke"</span></span>
<span id="cb45-6">  ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb45-7">  <span class="fu" style="color: #4758AB;">geom_vline</span>(</span>
<span id="cb45-8">    <span class="at" style="color: #657422;">xintercept =</span> <span class="dv" style="color: #AD0000;">0</span>,</span>
<span id="cb45-9">    <span class="at" style="color: #657422;">linetype =</span> <span class="st" style="color: #20794D;">"dashed"</span>,</span>
<span id="cb45-10">    <span class="at" style="color: #657422;">color =</span> <span class="st" style="color: #20794D;">"royalblue"</span></span>
<span id="cb45-11">  ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb45-12">  <span class="fu" style="color: #4758AB;">scale_color_brewer</span>(</span>
<span id="cb45-13">    <span class="at" style="color: #657422;">name =</span> <span class="st" style="color: #20794D;">"Mean"</span>,</span>
<span id="cb45-14">    <span class="at" style="color: #657422;">palette =</span> <span class="st" style="color: #20794D;">"Set1"</span></span>
<span id="cb45-15">  ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb45-16">  <span class="fu" style="color: #4758AB;">labs</span>(</span>
<span id="cb45-17">    <span class="at" style="color: #657422;">y =</span> <span class="st" style="color: #20794D;">"Illiteracy"</span>,</span>
<span id="cb45-18">    <span class="at" style="color: #657422;">x =</span> <span class="st" style="color: #20794D;">"Effect difference"</span>,</span>
<span id="cb45-19">    <span class="at" style="color: #657422;">title =</span> <span class="st" style="color: #20794D;">"Pairwise comparison of levels of illiteracy"</span></span>
<span id="cb45-20">  ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb45-21">  <span class="fu" style="color: #4758AB;">expand_limits</span>(<span class="at" style="color: #657422;">x =</span> <span class="dv" style="color: #AD0000;">0</span>) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb45-22">  <span class="fu" style="color: #4758AB;">facet_wrap</span>(<span class="at" style="color: #657422;">facets =</span> <span class="fu" style="color: #4758AB;">vars</span>(Crime), <span class="at" style="color: #657422;">scales =</span> <span class="st" style="color: #20794D;">"free_x"</span>)</span></code></pre></div>
</details>
<div class="cell-output-display">
<p><img src="https://mathatistics.com/blog/posts/2023-02-04-simulation-anova-and-analysis/index_files/figure-html/unnamed-chunk-30-1.svg" class="img-fluid" style="width:100.0%"></p>
</div>
</div>
</div>
</div>
</div>
<p>The analysis using both simulated and real datasets demonstrates the effectiveness of ANOVA in uncovering patterns. Simulating data that closely mirrors the USArrests dataset provided a controlled environment for testing and understanding variable interactions.</p>
<p>When applied to real data, the analysis confirmed significant differences in arrest rates across illiteracy groups, validating the method. By comparing these results, I highlighted how well-designed simulations can replicate real-world scenarios, offering valuable insights and preparing for real-world analyses.</p>
<p>This approach underscores the utility of combining simulated and real data, showcasing the robustness and reliability of the analytical methods used.</p>


</section>
</section>

 ]]></description>
  <category>Simulation</category>
  <category>Statistics</category>
  <guid>https://mathatistics.com/blog/posts/2023-02-04-simulation-anova-and-analysis/index.html</guid>
  <pubDate>Fri, 03 Feb 2023 23:00:00 GMT</pubDate>
</item>
<item>
  <title>How ANOVA analyze the variance</title>
  <dc:creator>TheRimalaya</dc:creator>
  <link>https://mathatistics.com/blog/posts/2021-03-29-how-anova-analyze-variance/index.html</link>
  <description><![CDATA[ 



<p>Analysis of Variance (ANOVA) is a powerful statistical tool used to analyze the variance among group means and determine whether these differences are statistically significant. It’s commonly used in various fields such as agriculture, biology, psychology, and more to test hypotheses about different groups. Below are some practical examples:</p>
<ul>
<li>The effect of different diets on the growth of fishes</li>
<li>Comparing the height of three different species of a plant</li>
<li>The type of flour used for baking bread</li>
</ul>
<p>Data for ANOVA could be collected through designed experiments or by sampling from populations. This article helps explain how ANOVA analyzes variance and identifies situations when these differences are significant, using both simulated and real data.</p>
<p>Often we Analysis of Variance (ANOVA) to analyze the variances to find if different cases results in similar outcome and if the difference is significant. Following are some simple examples,</p>
<p>Consider the following model with <img src="https://latex.codecogs.com/png.latex?i=3"> groups and <img src="https://latex.codecogs.com/png.latex?j=n"> observations,</p>
<p><img src="https://latex.codecogs.com/png.latex?%0Ay_%7Bij%7D%20=%20%5Cmu%20+%20%5Ctau_i%20+%20%5Cvarepsilon_%7Bij%7D,%20%5C;%20i%20=%201,%202,%203%20%5Ctexttt%7B%20and%20%7D%20j%20=%201,%202,%20%5Cldots%20n%0A"></p>
<p>where, <img src="https://latex.codecogs.com/png.latex?%5Ctau_i"> represetns the effect corresponding to group <img src="https://latex.codecogs.com/png.latex?i"> and <img src="https://latex.codecogs.com/png.latex?%5Cvarepsilon_%7Bij%7D%20%5Csim%20%5Cmathrm%7BN%7D(0,%20%5Csigma%5E2)">, the usual assumption of linear model. In order to better understand how ANOVA finds the differences between groups and how the group mean and their standard deviation influence the results from ANOVA, we will explore the following four cases:</p>
<div class="cell">
<details>
<summary>Simultion design</summary>
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1">Design <span class="ot" style="color: #003B4F;">&lt;-</span> tidytable<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">bind_rows</span>(</span>
<span id="cb1-2">    <span class="at" style="color: #657422;">Case1 =</span> <span class="fu" style="color: #4758AB;">data.table</span>(</span>
<span id="cb1-3">        <span class="at" style="color: #657422;">Group =</span> <span class="fu" style="color: #4758AB;">paste</span>(<span class="st" style="color: #20794D;">"Group"</span>, <span class="dv" style="color: #AD0000;">1</span><span class="sc" style="color: #5E5E5E;">:</span><span class="dv" style="color: #AD0000;">3</span>, <span class="at" style="color: #657422;">sep =</span> <span class="st" style="color: #20794D;">""</span>),</span>
<span id="cb1-4">        <span class="at" style="color: #657422;">Mean =</span> <span class="fu" style="color: #4758AB;">c</span>(<span class="dv" style="color: #AD0000;">10</span>, <span class="dv" style="color: #AD0000;">10</span>, <span class="dv" style="color: #AD0000;">10</span>),</span>
<span id="cb1-5">        <span class="at" style="color: #657422;">SD =</span> <span class="fu" style="color: #4758AB;">c</span>(<span class="dv" style="color: #AD0000;">5</span>, <span class="dv" style="color: #AD0000;">5</span>, <span class="dv" style="color: #AD0000;">5</span>)</span>
<span id="cb1-6">    ),</span>
<span id="cb1-7">    <span class="at" style="color: #657422;">Case2 =</span> <span class="fu" style="color: #4758AB;">data.table</span>(</span>
<span id="cb1-8">        <span class="at" style="color: #657422;">Group =</span> <span class="fu" style="color: #4758AB;">paste</span>(<span class="st" style="color: #20794D;">"Group"</span>, <span class="dv" style="color: #AD0000;">1</span><span class="sc" style="color: #5E5E5E;">:</span><span class="dv" style="color: #AD0000;">3</span>, <span class="at" style="color: #657422;">sep =</span> <span class="st" style="color: #20794D;">""</span>),</span>
<span id="cb1-9">        <span class="at" style="color: #657422;">Mean =</span> <span class="fu" style="color: #4758AB;">c</span>(<span class="dv" style="color: #AD0000;">10</span>, <span class="dv" style="color: #AD0000;">10</span>, <span class="dv" style="color: #AD0000;">10</span>),</span>
<span id="cb1-10">        <span class="at" style="color: #657422;">SD =</span> <span class="fu" style="color: #4758AB;">c</span>(<span class="dv" style="color: #AD0000;">1</span>, <span class="dv" style="color: #AD0000;">1</span>, <span class="dv" style="color: #AD0000;">1</span>)</span>
<span id="cb1-11">    ),</span>
<span id="cb1-12">    <span class="at" style="color: #657422;">Case3 =</span> <span class="fu" style="color: #4758AB;">data.table</span>(</span>
<span id="cb1-13">        <span class="at" style="color: #657422;">Group =</span> <span class="fu" style="color: #4758AB;">paste</span>(<span class="st" style="color: #20794D;">"Group"</span>, <span class="dv" style="color: #AD0000;">1</span><span class="sc" style="color: #5E5E5E;">:</span><span class="dv" style="color: #AD0000;">3</span>, <span class="at" style="color: #657422;">sep =</span> <span class="st" style="color: #20794D;">""</span>),</span>
<span id="cb1-14">        <span class="at" style="color: #657422;">Mean =</span> <span class="fu" style="color: #4758AB;">c</span>(<span class="dv" style="color: #AD0000;">5</span>, <span class="dv" style="color: #AD0000;">10</span>, <span class="dv" style="color: #AD0000;">15</span>),</span>
<span id="cb1-15">        <span class="at" style="color: #657422;">SD =</span> <span class="fu" style="color: #4758AB;">c</span>(<span class="dv" style="color: #AD0000;">5</span>, <span class="dv" style="color: #AD0000;">5</span>, <span class="dv" style="color: #AD0000;">5</span>)</span>
<span id="cb1-16">    ),</span>
<span id="cb1-17">    <span class="at" style="color: #657422;">Case4 =</span> <span class="fu" style="color: #4758AB;">data.table</span>(</span>
<span id="cb1-18">        <span class="at" style="color: #657422;">Group =</span> <span class="fu" style="color: #4758AB;">paste</span>(<span class="st" style="color: #20794D;">"Group"</span>, <span class="dv" style="color: #AD0000;">1</span><span class="sc" style="color: #5E5E5E;">:</span><span class="dv" style="color: #AD0000;">3</span>, <span class="at" style="color: #657422;">sep =</span> <span class="st" style="color: #20794D;">""</span>),</span>
<span id="cb1-19">        <span class="at" style="color: #657422;">Mean =</span> <span class="fu" style="color: #4758AB;">c</span>(<span class="dv" style="color: #AD0000;">5</span>, <span class="dv" style="color: #AD0000;">10</span>, <span class="dv" style="color: #AD0000;">15</span>),</span>
<span id="cb1-20">        <span class="at" style="color: #657422;">SD =</span> <span class="fu" style="color: #4758AB;">c</span>(<span class="dv" style="color: #AD0000;">1</span>, <span class="dv" style="color: #AD0000;">1</span>, <span class="dv" style="color: #AD0000;">1</span>)</span>
<span id="cb1-21">    ), <span class="at" style="color: #657422;">.id =</span> <span class="st" style="color: #20794D;">"Cases"</span>)</span></code></pre></div>
</details>
</div>
<div class="columns">
<div class="column">
<ul>
<li><strong>Case 1:</strong> Similar group means with high variation within the groups</li>
<li><strong>Case 2:</strong> Similar group means with low variation within the groups</li>
<li><strong>Case 3:</strong> Distant group means with high variation within the groups</li>
<li><strong>Case 4:</strong> Distant group means with low variation within the groups</li>
</ul>
</div><div class="column">
<div class="cell">
<div class="cell-output-display">

<div id="agjexfmjao" style="padding-left:0px;padding-right:0px;padding-top:10px;padding-bottom:10px;overflow-x:auto;overflow-y:auto;width:auto;height:auto;">
<style>#agjexfmjao table {
  font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji';
  -webkit-font-smoothing: antialiased;
  -moz-osx-font-smoothing: grayscale;
}

#agjexfmjao thead, #agjexfmjao tbody, #agjexfmjao tfoot, #agjexfmjao tr, #agjexfmjao td, #agjexfmjao th {
  border-style: none;
}

#agjexfmjao p {
  margin: 0;
  padding: 0;
}

#agjexfmjao .gt_table {
  display: table;
  border-collapse: collapse;
  line-height: normal;
  margin-left: auto;
  margin-right: auto;
  color: #333333;
  font-size: 16px;
  font-weight: normal;
  font-style: normal;
  background-color: #FFFFFF;
  width: auto;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #A8A8A8;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #A8A8A8;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
}

#agjexfmjao .gt_caption {
  padding-top: 4px;
  padding-bottom: 4px;
}

#agjexfmjao .gt_title {
  color: #333333;
  font-size: 125%;
  font-weight: initial;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-color: #FFFFFF;
  border-bottom-width: 0;
}

#agjexfmjao .gt_subtitle {
  color: #333333;
  font-size: 85%;
  font-weight: initial;
  padding-top: 3px;
  padding-bottom: 5px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-color: #FFFFFF;
  border-top-width: 0;
}

#agjexfmjao .gt_heading {
  background-color: #FFFFFF;
  text-align: center;
  border-bottom-color: #FFFFFF;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#agjexfmjao .gt_bottom_border {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#agjexfmjao .gt_col_headings {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#agjexfmjao .gt_col_heading {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  overflow-x: hidden;
}

#agjexfmjao .gt_column_spanner_outer {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  padding-top: 0;
  padding-bottom: 0;
  padding-left: 4px;
  padding-right: 4px;
}

#agjexfmjao .gt_column_spanner_outer:first-child {
  padding-left: 0;
}

#agjexfmjao .gt_column_spanner_outer:last-child {
  padding-right: 0;
}

#agjexfmjao .gt_column_spanner {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 5px;
  overflow-x: hidden;
  display: inline-block;
  width: 100%;
}

#agjexfmjao .gt_spanner_row {
  border-bottom-style: hidden;
}

#agjexfmjao .gt_group_heading {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  text-align: left;
}

#agjexfmjao .gt_empty_group_heading {
  padding: 0.5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: middle;
}

#agjexfmjao .gt_from_md > :first-child {
  margin-top: 0;
}

#agjexfmjao .gt_from_md > :last-child {
  margin-bottom: 0;
}

#agjexfmjao .gt_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  margin: 10px;
  border-top-style: solid;
  border-top-width: 1px;
  border-top-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  overflow-x: hidden;
}

#agjexfmjao .gt_stub {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
}

#agjexfmjao .gt_stub_row_group {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
  vertical-align: top;
}

#agjexfmjao .gt_row_group_first td {
  border-top-width: 2px;
}

#agjexfmjao .gt_row_group_first th {
  border-top-width: 2px;
}

#agjexfmjao .gt_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#agjexfmjao .gt_first_summary_row {
  border-top-style: solid;
  border-top-color: #D3D3D3;
}

#agjexfmjao .gt_first_summary_row.thick {
  border-top-width: 2px;
}

#agjexfmjao .gt_last_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#agjexfmjao .gt_grand_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#agjexfmjao .gt_first_grand_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: double;
  border-top-width: 6px;
  border-top-color: #D3D3D3;
}

#agjexfmjao .gt_last_grand_summary_row_top {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: double;
  border-bottom-width: 6px;
  border-bottom-color: #D3D3D3;
}

#agjexfmjao .gt_striped {
  background-color: rgba(128, 128, 128, 0.05);
}

#agjexfmjao .gt_table_body {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#agjexfmjao .gt_footnotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#agjexfmjao .gt_footnote {
  margin: 0px;
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#agjexfmjao .gt_sourcenotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#agjexfmjao .gt_sourcenote {
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#agjexfmjao .gt_left {
  text-align: left;
}

#agjexfmjao .gt_center {
  text-align: center;
}

#agjexfmjao .gt_right {
  text-align: right;
  font-variant-numeric: tabular-nums;
}

#agjexfmjao .gt_font_normal {
  font-weight: normal;
}

#agjexfmjao .gt_font_bold {
  font-weight: bold;
}

#agjexfmjao .gt_font_italic {
  font-style: italic;
}

#agjexfmjao .gt_super {
  font-size: 65%;
}

#agjexfmjao .gt_footnote_marks {
  font-size: 75%;
  vertical-align: 0.4em;
  position: initial;
}

#agjexfmjao .gt_asterisk {
  font-size: 100%;
  vertical-align: 0;
}

#agjexfmjao .gt_indent_1 {
  text-indent: 5px;
}

#agjexfmjao .gt_indent_2 {
  text-indent: 10px;
}

#agjexfmjao .gt_indent_3 {
  text-indent: 15px;
}

#agjexfmjao .gt_indent_4 {
  text-indent: 20px;
}

#agjexfmjao .gt_indent_5 {
  text-indent: 25px;
}

#agjexfmjao .katex-display {
  display: inline-flex !important;
  margin-bottom: 0.75em !important;
}

#agjexfmjao div.Reactable > div.rt-table > div.rt-thead > div.rt-tr.rt-tr-group-header > div.rt-th-group:after {
  height: 0px !important;
}
</style>
<table class="gt_table" data-quarto-disable-processing="false" data-quarto-bootstrap="false">
  <thead>
    <tr class="gt_col_headings gt_spanner_row">
      <th class="gt_col_heading gt_columns_bottom_border gt_left" rowspan="2" colspan="1" scope="col" id="Cases">Cases</th>
      <th class="gt_center gt_columns_top_border gt_column_spanner_outer" rowspan="1" colspan="2" scope="colgroup" id="spanner-Group1_Mean">
        <div class="gt_column_spanner">Group1</div>
      </th>
      <th class="gt_center gt_columns_top_border gt_column_spanner_outer" rowspan="1" colspan="2" scope="colgroup" id="spanner-Group2_Mean">
        <div class="gt_column_spanner">Group2</div>
      </th>
      <th class="gt_center gt_columns_top_border gt_column_spanner_outer" rowspan="1" colspan="2" scope="colgroup" id="spanner-Group3_Mean">
        <div class="gt_column_spanner">Group3</div>
      </th>
    </tr>
    <tr class="gt_col_headings">
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="Group1_Mean">Mean</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="Group1_SD">SD</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="Group2_Mean">Mean</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="Group2_SD">SD</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="Group3_Mean">Mean</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="Group3_SD">SD</th>
    </tr>
  </thead>
  <tbody class="gt_table_body">
    <tr><td headers="Cases" class="gt_row gt_left">Case1</td>
<td headers="Group1_Mean" class="gt_row gt_right">10</td>
<td headers="Group1_SD" class="gt_row gt_right">5</td>
<td headers="Group2_Mean" class="gt_row gt_right">10</td>
<td headers="Group2_SD" class="gt_row gt_right">5</td>
<td headers="Group3_Mean" class="gt_row gt_right">10</td>
<td headers="Group3_SD" class="gt_row gt_right">5</td></tr>
    <tr><td headers="Cases" class="gt_row gt_left">Case2</td>
<td headers="Group1_Mean" class="gt_row gt_right">10</td>
<td headers="Group1_SD" class="gt_row gt_right">1</td>
<td headers="Group2_Mean" class="gt_row gt_right">10</td>
<td headers="Group2_SD" class="gt_row gt_right">1</td>
<td headers="Group3_Mean" class="gt_row gt_right">10</td>
<td headers="Group3_SD" class="gt_row gt_right">1</td></tr>
    <tr><td headers="Cases" class="gt_row gt_left">Case3</td>
<td headers="Group1_Mean" class="gt_row gt_right">5</td>
<td headers="Group1_SD" class="gt_row gt_right">5</td>
<td headers="Group2_Mean" class="gt_row gt_right">10</td>
<td headers="Group2_SD" class="gt_row gt_right">5</td>
<td headers="Group3_Mean" class="gt_row gt_right">15</td>
<td headers="Group3_SD" class="gt_row gt_right">5</td></tr>
    <tr><td headers="Cases" class="gt_row gt_left">Case4</td>
<td headers="Group1_Mean" class="gt_row gt_right">5</td>
<td headers="Group1_SD" class="gt_row gt_right">1</td>
<td headers="Group2_Mean" class="gt_row gt_right">10</td>
<td headers="Group2_SD" class="gt_row gt_right">1</td>
<td headers="Group3_Mean" class="gt_row gt_right">15</td>
<td headers="Group3_SD" class="gt_row gt_right">1</td></tr>
  </tbody>
  
  
</table>
</div>
</div>
</div>
</div>
</div>
<section id="fitting-anova-model-for-each-cases" class="level2">
<h2 class="anchored" data-anchor-id="fitting-anova-model-for-each-cases">Fitting ANOVA model for each cases</h2>
<div class="cell">
<details>
<summary>Simulate and fit ANOVA</summary>
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1">generate_data <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="cf" style="color: #003B4F;">function</span>(mean, sd, <span class="at" style="color: #657422;">nobs =</span> <span class="dv" style="color: #AD0000;">50</span>) {</span>
<span id="cb2-2">    Response <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">rnorm</span>(nobs, <span class="at" style="color: #657422;">mean =</span> mean, sd)</span>
<span id="cb2-3">    <span class="fu" style="color: #4758AB;">tidytable</span>(<span class="at" style="color: #657422;">ID =</span> <span class="dv" style="color: #AD0000;">1</span><span class="sc" style="color: #5E5E5E;">:</span>nobs, <span class="at" style="color: #657422;">Response =</span> Response)</span>
<span id="cb2-4">}</span>
<span id="cb2-5"></span>
<span id="cb2-6">Model <span class="ot" style="color: #003B4F;">&lt;-</span> Design <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb2-7">    <span class="fu" style="color: #4758AB;">mutate</span>(<span class="at" style="color: #657422;">Data =</span> <span class="fu" style="color: #4758AB;">map2</span>(Mean, SD, generate_data)) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb2-8">    <span class="fu" style="color: #4758AB;">unnest</span>() <span class="sc" style="color: #5E5E5E;">%&gt;%</span></span>
<span id="cb2-9">    <span class="fu" style="color: #4758AB;">nest</span>(<span class="at" style="color: #657422;">.by =</span> <span class="st" style="color: #20794D;">"Cases"</span>) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb2-10">    <span class="fu" style="color: #4758AB;">mutate</span>(<span class="at" style="color: #657422;">fit =</span> <span class="fu" style="color: #4758AB;">map</span>(data, <span class="cf" style="color: #003B4F;">function</span>(dta) {</span>
<span id="cb2-11">        <span class="fu" style="color: #4758AB;">lm</span>(Response <span class="sc" style="color: #5E5E5E;">~</span> Group, <span class="at" style="color: #657422;">data =</span> dta)</span>
<span id="cb2-12">    }))</span>
<span id="cb2-13"></span>
<span id="cb2-14">Model</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tidytable: 4 × 3
  Cases data                  fit   
  &lt;chr&gt; &lt;list&gt;                &lt;list&gt;
1 Case1 &lt;tidytable [150 × 5]&gt; &lt;lm&gt;  
2 Case2 &lt;tidytable [150 × 5]&gt; &lt;lm&gt;  
3 Case3 &lt;tidytable [150 × 5]&gt; &lt;lm&gt;  
4 Case4 &lt;tidytable [150 × 5]&gt; &lt;lm&gt;  </code></pre>
</div>
</div>
</section>
<section id="distribution-of-data" class="level2">
<h2 class="anchored" data-anchor-id="distribution-of-data">Distribution of data</h2>
<div class="cell" data-fig.asp="1">
<details>
<summary>Data distribution</summary>
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">Model[, <span class="fu" style="color: #4758AB;">map_df</span>(fit, broom<span class="sc" style="color: #5E5E5E;">::</span>augment), by <span class="ot" style="color: #003B4F;">=</span> Cases] <span class="sc" style="color: #5E5E5E;">%&gt;%</span></span>
<span id="cb4-2">  <span class="fu" style="color: #4758AB;">ggplot</span>(<span class="fu" style="color: #4758AB;">aes</span>(Response, Group)) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb4-3">  <span class="fu" style="color: #4758AB;">geom_boxplot</span>(</span>
<span id="cb4-4">    <span class="fu" style="color: #4758AB;">aes</span>(<span class="at" style="color: #657422;">fill =</span> Group, <span class="at" style="color: #657422;">color =</span> Group), </span>
<span id="cb4-5">    <span class="at" style="color: #657422;">alpha =</span> <span class="fl" style="color: #AD0000;">0.25</span>, <span class="at" style="color: #657422;">width =</span> <span class="fl" style="color: #AD0000;">0.25</span></span>
<span id="cb4-6">  ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb4-7">  <span class="fu" style="color: #4758AB;">geom_point</span>(<span class="fu" style="color: #4758AB;">aes</span>(<span class="at" style="color: #657422;">fill =</span> Group),</span>
<span id="cb4-8">    <span class="at" style="color: #657422;">position =</span> <span class="fu" style="color: #4758AB;">position_jitter</span>(<span class="at" style="color: #657422;">height =</span> <span class="fl" style="color: #AD0000;">0.1</span>),</span>
<span id="cb4-9">    <span class="at" style="color: #657422;">shape =</span> <span class="dv" style="color: #AD0000;">21</span>, <span class="at" style="color: #657422;">size =</span> <span class="dv" style="color: #AD0000;">2</span>, <span class="at" style="color: #657422;">stroke =</span> <span class="fl" style="color: #AD0000;">0.25</span>, <span class="at" style="color: #657422;">alpha =</span> <span class="fl" style="color: #AD0000;">0.25</span></span>
<span id="cb4-10">  ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb4-11">  <span class="fu" style="color: #4758AB;">stat_summary</span>(</span>
<span id="cb4-12">    <span class="at" style="color: #657422;">fun =</span> mean, <span class="at" style="color: #657422;">geom =</span> <span class="st" style="color: #20794D;">"point"</span>, <span class="fu" style="color: #4758AB;">aes</span>(<span class="at" style="color: #657422;">color =</span> Group),</span>
<span id="cb4-13">    <span class="at" style="color: #657422;">size =</span> <span class="dv" style="color: #AD0000;">2</span>, <span class="at" style="color: #657422;">shape =</span> <span class="dv" style="color: #AD0000;">21</span>, <span class="at" style="color: #657422;">fill =</span> <span class="st" style="color: #20794D;">"whitesmoke"</span>, <span class="at" style="color: #657422;">stroke =</span> <span class="fl" style="color: #AD0000;">0.75</span></span>
<span id="cb4-14">  ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb4-15">  <span class="fu" style="color: #4758AB;">facet_wrap</span>(<span class="at" style="color: #657422;">facets =</span> <span class="fu" style="color: #4758AB;">vars</span>(Cases), <span class="at" style="color: #657422;">scales =</span> <span class="st" style="color: #20794D;">"free_x"</span>) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb4-16">  ggridges<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">geom_density_ridges</span>(</span>
<span id="cb4-17">    <span class="fu" style="color: #4758AB;">aes</span>(<span class="at" style="color: #657422;">color =</span> Group),</span>
<span id="cb4-18">    <span class="at" style="color: #657422;">fill =</span> <span class="cn" style="color: #8f5902;">NA</span>,</span>
<span id="cb4-19">    <span class="at" style="color: #657422;">panel_scaling =</span> <span class="cn" style="color: #8f5902;">FALSE</span></span>
<span id="cb4-20">  ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb4-21">  <span class="fu" style="color: #4758AB;">scale_color_brewer</span>(<span class="at" style="color: #657422;">palette =</span> <span class="st" style="color: #20794D;">"Set1"</span>) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb4-22">  <span class="fu" style="color: #4758AB;">scale_fill_brewer</span>(<span class="at" style="color: #657422;">palette =</span> <span class="st" style="color: #20794D;">"Set1"</span>) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb4-23">  <span class="fu" style="color: #4758AB;">theme</span>(</span>
<span id="cb4-24">    <span class="at" style="color: #657422;">legend.position =</span> <span class="st" style="color: #20794D;">"none"</span></span>
<span id="cb4-25">  )</span></code></pre></div>
</details>
<div class="cell-output-display">
<p><img src="https://mathatistics.com/blog/posts/2021-03-29-how-anova-analyze-variance/index_files/figure-html/unnamed-chunk-5-1.svg" class="img-fluid" style="width:100.0%"></p>
</div>
</div>
</section>
<section id="model-comparison" class="level2">
<h2 class="anchored" data-anchor-id="model-comparison">Model comparison</h2>
<div class="cell">
<details>
<summary>ANOVA for the four cases</summary>
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">anova_result <span class="ot" style="color: #003B4F;">&lt;-</span> Model[, <span class="fu" style="color: #4758AB;">map_df</span>(</span>
<span id="cb5-2">  <span class="at" style="color: #657422;">.x =</span> fit,</span>
<span id="cb5-3">  <span class="at" style="color: #657422;">.f =</span> <span class="sc" style="color: #5E5E5E;">~</span> broom<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">tidy</span>(<span class="fu" style="color: #4758AB;">anova</span>(.x))</span>
<span id="cb5-4">), by <span class="ot" style="color: #003B4F;">=</span> Cases] <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">rename</span>(</span>
<span id="cb5-5">  <span class="at" style="color: #657422;">DF =</span> df,</span>
<span id="cb5-6">  <span class="at" style="color: #657422;">SSE =</span> sumsq,</span>
<span id="cb5-7">  <span class="at" style="color: #657422;">MSE =</span> meansq,</span>
<span id="cb5-8">  <span class="at" style="color: #657422;">Statistic =</span> statistic,</span>
<span id="cb5-9">  <span class="st" style="color: #20794D;">`</span><span class="at" style="color: #657422;">p value</span><span class="st" style="color: #20794D;">`</span> <span class="ot" style="color: #003B4F;">=</span> p.value</span>
<span id="cb5-10">)</span>
<span id="cb5-11"></span>
<span id="cb5-12">anova_result <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb5-13">  <span class="fu" style="color: #4758AB;">mutate</span>(</span>
<span id="cb5-14">    <span class="at" style="color: #657422;">Cases =</span> <span class="fu" style="color: #4758AB;">case_when</span>(</span>
<span id="cb5-15">        Cases <span class="sc" style="color: #5E5E5E;">==</span> <span class="st" style="color: #20794D;">"Case1"</span> <span class="sc" style="color: #5E5E5E;">~</span> <span class="st" style="color: #20794D;">"Case 1: Similar group means with high variation within the groups"</span>,</span>
<span id="cb5-16">        Cases <span class="sc" style="color: #5E5E5E;">==</span> <span class="st" style="color: #20794D;">"Case2"</span> <span class="sc" style="color: #5E5E5E;">~</span> <span class="st" style="color: #20794D;">"Case 2: Similar group means with low variation within the groups"</span>,</span>
<span id="cb5-17">        Cases <span class="sc" style="color: #5E5E5E;">==</span> <span class="st" style="color: #20794D;">"Case3"</span> <span class="sc" style="color: #5E5E5E;">~</span> <span class="st" style="color: #20794D;">"Case 3: Distant group means with high variation within the groups"</span>,</span>
<span id="cb5-18">        <span class="cn" style="color: #8f5902;">TRUE</span> <span class="sc" style="color: #5E5E5E;">~</span> <span class="st" style="color: #20794D;">"Case 4: Distant group means with low variation within the groups"</span></span>
<span id="cb5-19">    )</span>
<span id="cb5-20">  ) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb5-21">  gt<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">gt</span>(<span class="at" style="color: #657422;">groupname_col =</span> <span class="st" style="color: #20794D;">"Cases"</span>, <span class="at" style="color: #657422;">rowname_col =</span> <span class="st" style="color: #20794D;">"term"</span>) <span class="sc" style="color: #5E5E5E;">%&gt;%</span></span>
<span id="cb5-22">  gt<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">fmt_number</span>(<span class="at" style="color: #657422;">columns =</span> <span class="dv" style="color: #AD0000;">4</span><span class="sc" style="color: #5E5E5E;">:</span><span class="dv" style="color: #AD0000;">6</span>) <span class="sc" style="color: #5E5E5E;">%&gt;%</span></span>
<span id="cb5-23">  gt<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">fmt_number</span>(<span class="at" style="color: #657422;">columns =</span> <span class="dv" style="color: #AD0000;">7</span>, <span class="at" style="color: #657422;">decimals =</span> <span class="dv" style="color: #AD0000;">4</span>) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb5-24">  gt<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">sub_missing</span>(<span class="at" style="color: #657422;">missing_text =</span> <span class="st" style="color: #20794D;">""</span>) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb5-25">  gt<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">tab_options</span>(</span>
<span id="cb5-26">    <span class="at" style="color: #657422;">table.width =</span> <span class="st" style="color: #20794D;">"100%"</span>,</span>
<span id="cb5-27">    <span class="at" style="color: #657422;">row_group.font.weight =</span> <span class="st" style="color: #20794D;">"600"</span></span>
<span id="cb5-28">  )</span></code></pre></div>
</details>
<div class="cell-output-display">

<div id="gfpmfekkqh" style="padding-left:0px;padding-right:0px;padding-top:10px;padding-bottom:10px;overflow-x:auto;overflow-y:auto;width:auto;height:auto;">
<style>#gfpmfekkqh table {
  font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji';
  -webkit-font-smoothing: antialiased;
  -moz-osx-font-smoothing: grayscale;
}

#gfpmfekkqh thead, #gfpmfekkqh tbody, #gfpmfekkqh tfoot, #gfpmfekkqh tr, #gfpmfekkqh td, #gfpmfekkqh th {
  border-style: none;
}

#gfpmfekkqh p {
  margin: 0;
  padding: 0;
}

#gfpmfekkqh .gt_table {
  display: table;
  border-collapse: collapse;
  line-height: normal;
  margin-left: auto;
  margin-right: auto;
  color: #333333;
  font-size: 16px;
  font-weight: normal;
  font-style: normal;
  background-color: #FFFFFF;
  width: 100%;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #A8A8A8;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #A8A8A8;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
}

#gfpmfekkqh .gt_caption {
  padding-top: 4px;
  padding-bottom: 4px;
}

#gfpmfekkqh .gt_title {
  color: #333333;
  font-size: 125%;
  font-weight: initial;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-color: #FFFFFF;
  border-bottom-width: 0;
}

#gfpmfekkqh .gt_subtitle {
  color: #333333;
  font-size: 85%;
  font-weight: initial;
  padding-top: 3px;
  padding-bottom: 5px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-color: #FFFFFF;
  border-top-width: 0;
}

#gfpmfekkqh .gt_heading {
  background-color: #FFFFFF;
  text-align: center;
  border-bottom-color: #FFFFFF;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#gfpmfekkqh .gt_bottom_border {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#gfpmfekkqh .gt_col_headings {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#gfpmfekkqh .gt_col_heading {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  overflow-x: hidden;
}

#gfpmfekkqh .gt_column_spanner_outer {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  padding-top: 0;
  padding-bottom: 0;
  padding-left: 4px;
  padding-right: 4px;
}

#gfpmfekkqh .gt_column_spanner_outer:first-child {
  padding-left: 0;
}

#gfpmfekkqh .gt_column_spanner_outer:last-child {
  padding-right: 0;
}

#gfpmfekkqh .gt_column_spanner {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 5px;
  overflow-x: hidden;
  display: inline-block;
  width: 100%;
}

#gfpmfekkqh .gt_spanner_row {
  border-bottom-style: hidden;
}

#gfpmfekkqh .gt_group_heading {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: 600;
  text-transform: inherit;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  text-align: left;
}

#gfpmfekkqh .gt_empty_group_heading {
  padding: 0.5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: 600;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: middle;
}

#gfpmfekkqh .gt_from_md > :first-child {
  margin-top: 0;
}

#gfpmfekkqh .gt_from_md > :last-child {
  margin-bottom: 0;
}

#gfpmfekkqh .gt_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  margin: 10px;
  border-top-style: solid;
  border-top-width: 1px;
  border-top-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  overflow-x: hidden;
}

#gfpmfekkqh .gt_stub {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
}

#gfpmfekkqh .gt_stub_row_group {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
  vertical-align: top;
}

#gfpmfekkqh .gt_row_group_first td {
  border-top-width: 2px;
}

#gfpmfekkqh .gt_row_group_first th {
  border-top-width: 2px;
}

#gfpmfekkqh .gt_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#gfpmfekkqh .gt_first_summary_row {
  border-top-style: solid;
  border-top-color: #D3D3D3;
}

#gfpmfekkqh .gt_first_summary_row.thick {
  border-top-width: 2px;
}

#gfpmfekkqh .gt_last_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#gfpmfekkqh .gt_grand_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#gfpmfekkqh .gt_first_grand_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: double;
  border-top-width: 6px;
  border-top-color: #D3D3D3;
}

#gfpmfekkqh .gt_last_grand_summary_row_top {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: double;
  border-bottom-width: 6px;
  border-bottom-color: #D3D3D3;
}

#gfpmfekkqh .gt_striped {
  background-color: rgba(128, 128, 128, 0.05);
}

#gfpmfekkqh .gt_table_body {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#gfpmfekkqh .gt_footnotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#gfpmfekkqh .gt_footnote {
  margin: 0px;
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#gfpmfekkqh .gt_sourcenotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#gfpmfekkqh .gt_sourcenote {
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#gfpmfekkqh .gt_left {
  text-align: left;
}

#gfpmfekkqh .gt_center {
  text-align: center;
}

#gfpmfekkqh .gt_right {
  text-align: right;
  font-variant-numeric: tabular-nums;
}

#gfpmfekkqh .gt_font_normal {
  font-weight: normal;
}

#gfpmfekkqh .gt_font_bold {
  font-weight: bold;
}

#gfpmfekkqh .gt_font_italic {
  font-style: italic;
}

#gfpmfekkqh .gt_super {
  font-size: 65%;
}

#gfpmfekkqh .gt_footnote_marks {
  font-size: 75%;
  vertical-align: 0.4em;
  position: initial;
}

#gfpmfekkqh .gt_asterisk {
  font-size: 100%;
  vertical-align: 0;
}

#gfpmfekkqh .gt_indent_1 {
  text-indent: 5px;
}

#gfpmfekkqh .gt_indent_2 {
  text-indent: 10px;
}

#gfpmfekkqh .gt_indent_3 {
  text-indent: 15px;
}

#gfpmfekkqh .gt_indent_4 {
  text-indent: 20px;
}

#gfpmfekkqh .gt_indent_5 {
  text-indent: 25px;
}

#gfpmfekkqh .katex-display {
  display: inline-flex !important;
  margin-bottom: 0.75em !important;
}

#gfpmfekkqh div.Reactable > div.rt-table > div.rt-thead > div.rt-tr.rt-tr-group-header > div.rt-th-group:after {
  height: 0px !important;
}
</style>
<table class="gt_table" data-quarto-disable-processing="false" data-quarto-bootstrap="false">
  <thead>
    <tr class="gt_col_headings">
      <th class="gt_col_heading gt_columns_bottom_border gt_left" rowspan="1" colspan="1" scope="col" id="a::stub"></th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="DF">DF</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="SSE">SSE</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="MSE">MSE</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="Statistic">Statistic</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="p-value">p value</th>
    </tr>
  </thead>
  <tbody class="gt_table_body">
    <tr class="gt_group_heading_row">
      <th colspan="6" class="gt_group_heading" scope="colgroup" id="Case 1: Similar group means with high variation within the groups">Case 1: Similar group means with high variation within the groups</th>
    </tr>
    <tr class="gt_row_group_first"><th id="stub_1_1" scope="row" class="gt_row gt_left gt_stub">Group</th>
<td headers="Case 1: Similar group means with high variation within the groups stub_1_1 DF" class="gt_row gt_right">2</td>
<td headers="Case 1: Similar group means with high variation within the groups stub_1_1 SSE" class="gt_row gt_right">32.09</td>
<td headers="Case 1: Similar group means with high variation within the groups stub_1_1 MSE" class="gt_row gt_right">16.05</td>
<td headers="Case 1: Similar group means with high variation within the groups stub_1_1 Statistic" class="gt_row gt_right">0.64</td>
<td headers="Case 1: Similar group means with high variation within the groups stub_1_1 p value" class="gt_row gt_right">0.5304</td></tr>
    <tr><th id="stub_1_2" scope="row" class="gt_row gt_left gt_stub">Residuals</th>
<td headers="Case 1: Similar group means with high variation within the groups stub_1_2 DF" class="gt_row gt_right">147</td>
<td headers="Case 1: Similar group means with high variation within the groups stub_1_2 SSE" class="gt_row gt_right">3,704.26</td>
<td headers="Case 1: Similar group means with high variation within the groups stub_1_2 MSE" class="gt_row gt_right">25.20</td>
<td headers="Case 1: Similar group means with high variation within the groups stub_1_2 Statistic" class="gt_row gt_right"><br></td>
<td headers="Case 1: Similar group means with high variation within the groups stub_1_2 p value" class="gt_row gt_right"><br></td></tr>
    <tr class="gt_group_heading_row">
      <th colspan="6" class="gt_group_heading" scope="colgroup" id="Case 2: Similar group means with low variation within the groups">Case 2: Similar group means with low variation within the groups</th>
    </tr>
    <tr class="gt_row_group_first"><th id="stub_1_3" scope="row" class="gt_row gt_left gt_stub">Group</th>
<td headers="Case 2: Similar group means with low variation within the groups stub_1_3 DF" class="gt_row gt_right">2</td>
<td headers="Case 2: Similar group means with low variation within the groups stub_1_3 SSE" class="gt_row gt_right">1.98</td>
<td headers="Case 2: Similar group means with low variation within the groups stub_1_3 MSE" class="gt_row gt_right">0.99</td>
<td headers="Case 2: Similar group means with low variation within the groups stub_1_3 Statistic" class="gt_row gt_right">1.00</td>
<td headers="Case 2: Similar group means with low variation within the groups stub_1_3 p value" class="gt_row gt_right">0.3714</td></tr>
    <tr><th id="stub_1_4" scope="row" class="gt_row gt_left gt_stub">Residuals</th>
<td headers="Case 2: Similar group means with low variation within the groups stub_1_4 DF" class="gt_row gt_right">147</td>
<td headers="Case 2: Similar group means with low variation within the groups stub_1_4 SSE" class="gt_row gt_right">146.03</td>
<td headers="Case 2: Similar group means with low variation within the groups stub_1_4 MSE" class="gt_row gt_right">0.99</td>
<td headers="Case 2: Similar group means with low variation within the groups stub_1_4 Statistic" class="gt_row gt_right"><br></td>
<td headers="Case 2: Similar group means with low variation within the groups stub_1_4 p value" class="gt_row gt_right"><br></td></tr>
    <tr class="gt_group_heading_row">
      <th colspan="6" class="gt_group_heading" scope="colgroup" id="Case 3: Distant group means with high variation within the groups">Case 3: Distant group means with high variation within the groups</th>
    </tr>
    <tr class="gt_row_group_first"><th id="stub_1_5" scope="row" class="gt_row gt_left gt_stub">Group</th>
<td headers="Case 3: Distant group means with high variation within the groups stub_1_5 DF" class="gt_row gt_right">2</td>
<td headers="Case 3: Distant group means with high variation within the groups stub_1_5 SSE" class="gt_row gt_right">1,951.86</td>
<td headers="Case 3: Distant group means with high variation within the groups stub_1_5 MSE" class="gt_row gt_right">975.93</td>
<td headers="Case 3: Distant group means with high variation within the groups stub_1_5 Statistic" class="gt_row gt_right">50.15</td>
<td headers="Case 3: Distant group means with high variation within the groups stub_1_5 p value" class="gt_row gt_right">0.0000</td></tr>
    <tr><th id="stub_1_6" scope="row" class="gt_row gt_left gt_stub">Residuals</th>
<td headers="Case 3: Distant group means with high variation within the groups stub_1_6 DF" class="gt_row gt_right">147</td>
<td headers="Case 3: Distant group means with high variation within the groups stub_1_6 SSE" class="gt_row gt_right">2,860.66</td>
<td headers="Case 3: Distant group means with high variation within the groups stub_1_6 MSE" class="gt_row gt_right">19.46</td>
<td headers="Case 3: Distant group means with high variation within the groups stub_1_6 Statistic" class="gt_row gt_right"><br></td>
<td headers="Case 3: Distant group means with high variation within the groups stub_1_6 p value" class="gt_row gt_right"><br></td></tr>
    <tr class="gt_group_heading_row">
      <th colspan="6" class="gt_group_heading" scope="colgroup" id="Case 4: Distant group means with low variation within the groups">Case 4: Distant group means with low variation within the groups</th>
    </tr>
    <tr class="gt_row_group_first"><th id="stub_1_7" scope="row" class="gt_row gt_left gt_stub">Group</th>
<td headers="Case 4: Distant group means with low variation within the groups stub_1_7 DF" class="gt_row gt_right">2</td>
<td headers="Case 4: Distant group means with low variation within the groups stub_1_7 SSE" class="gt_row gt_right">2,516.51</td>
<td headers="Case 4: Distant group means with low variation within the groups stub_1_7 MSE" class="gt_row gt_right">1,258.26</td>
<td headers="Case 4: Distant group means with low variation within the groups stub_1_7 Statistic" class="gt_row gt_right">1,263.82</td>
<td headers="Case 4: Distant group means with low variation within the groups stub_1_7 p value" class="gt_row gt_right">0.0000</td></tr>
    <tr><th id="stub_1_8" scope="row" class="gt_row gt_left gt_stub">Residuals</th>
<td headers="Case 4: Distant group means with low variation within the groups stub_1_8 DF" class="gt_row gt_right">147</td>
<td headers="Case 4: Distant group means with low variation within the groups stub_1_8 SSE" class="gt_row gt_right">146.35</td>
<td headers="Case 4: Distant group means with low variation within the groups stub_1_8 MSE" class="gt_row gt_right">1.00</td>
<td headers="Case 4: Distant group means with low variation within the groups stub_1_8 Statistic" class="gt_row gt_right"><br></td>
<td headers="Case 4: Distant group means with low variation within the groups stub_1_8 p value" class="gt_row gt_right"><br></td></tr>
  </tbody>
  
  
</table>
</div>
</div>
</div>
<section id="interpretetion" class="level3">
<h3 class="anchored" data-anchor-id="interpretetion">Interpretetion</h3>
<div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-1-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-1" aria-controls="tabset-1-1" aria-selected="true">Case 1</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-2" aria-controls="tabset-1-2" aria-selected="false">Case 2</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-3-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-3" aria-controls="tabset-1-3" aria-selected="false">Case 3</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-4-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-4" aria-controls="tabset-1-4" aria-selected="false">Case 4</a></li></ul>
<div class="tab-content">
<div id="tabset-1-1" class="tab-pane active" aria-labelledby="tabset-1-1-tab">
<p>The results show a high p-value, indicating no significant difference between the groups due to high within-group variability.</p>
</div>
<div id="tabset-1-2" class="tab-pane" aria-labelledby="tabset-1-2-tab">
<p>Here, the p-value is still high, suggesting no significant difference, but the small variance within groups provides clearer insights compared to Case 1.</p>
</div>
<div id="tabset-1-3" class="tab-pane" aria-labelledby="tabset-1-3-tab">
<p>Despite the high variation within groups, the distant group means lead to a low p-value, indicating statistically significant differences among the groups.</p>
</div>
<div id="tabset-1-4" class="tab-pane" aria-labelledby="tabset-1-4-tab">
<p>With low within-group variation and distant means, the p-value remains extremely low, strongly indicating significant group differences.</p>
</div>
</div>
</div>
<p>In conclusion, ANOVA helps determine if there are significant differences between multiple group means by comparing variances within groups to variances between groups. The power of ANOVA lies in its ability to detect even subtle differences when variations are minimal within groups.</p>


</section>
</section>

 ]]></description>
  <category>Statistics</category>
  <category>ANOVA</category>
  <guid>https://mathatistics.com/blog/posts/2021-03-29-how-anova-analyze-variance/index.html</guid>
  <pubDate>Sun, 28 Mar 2021 22:00:00 GMT</pubDate>
</item>
<item>
  <title>Interpretating Biplot</title>
  <dc:creator>TheRimalaya</dc:creator>
  <link>https://mathatistics.com/blog/posts/2017-03-18-interpreting-biplot/index.html</link>
  <description><![CDATA[ 



<p>A biplot is a powerful graphical tool that represents data in two dimensions, where both the observations and variables are represented. Biplots are particularly useful for multivariate data, allowing users to examine relationships between variables and identify patterns.</p>
<p>Consider a data matrix <img src="https://latex.codecogs.com/png.latex?%5Cmathbf%7BX%7D_%7Bn%5Ctimes%20p%7D"> with <img src="https://latex.codecogs.com/png.latex?p"> variables and <img src="https://latex.codecogs.com/png.latex?n"> observations. To explore the data further with biplot, principal component analysis (PCA) is used here.</p>
<div class="columns">
<div class="column">
<p><img src="https://mathatistics.com/blog/posts/2017-03-18-interpreting-biplot/images/Img-1.jpg" class="img-fluid" style="width:90.0%"></p>
</div><div class="column">
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Bpmatrix%7D%0A%5Cmathbf%7Bx%7D_1%20%5C%5C%20%5Cmathbf%7Bx%7D_2%20%5C%5C%20%5Cvdots%20%5C%5C%20%5Cmathbf%7Bx%7D_p%0A%5Cend%7Bpmatrix%7D%5ET%20=%0A%5Cbegin%7Bpmatrix%7D%0Ax_%7B11%7D%20&amp;%20x_%7B12%7D%20&amp;%20%5Cldots%20&amp;%20x_%7B1p%7D%20%5C%5C%0Ax_%7B21%7D%20&amp;%20x_%7B22%7D%20&amp;%20%5Cldots%20&amp;%20x_%7B2p%7D%20%5C%5C%0A%5Cvdots%20&amp;%20%5Cvdots%20&amp;%20%5Cddots%20&amp;%20%5Cvdots%20%5C%5C%0Ax_%7Bn1%7D%20&amp;%20x_%7Bn2%7D%20&amp;%20%5Cldots%20&amp;%20x_%7Bnp%7D%0A%5Cend%7Bpmatrix%7D%0A"> where <img src="https://latex.codecogs.com/png.latex?%5Cmathbf%7Bx%7D_%7Bi%7D%20=%20%5Cbegin%7Bpmatrix%7Dx_%7B1i%7D%20&amp;%20%5Cldots%20&amp;%20x_%7Bni%7D%5Cend%7Bpmatrix%7D"> is the i<sup>th</sup> variable.</p>
</div>
</div>
<p>Principal component analysis (PCA) compresses the variance of the data matrix to create a new set of orthogonal (linearly independent) variables <img src="https://latex.codecogs.com/png.latex?%5Cmathbf%7BZ%7D%20=%20%5Cbegin%7Bpmatrix%7D%5Cmathbf%7Bz%7D_1%20&amp;%20%5Cmathbf%7Bz%7D_2%20&amp;%20%5Cldots%20&amp;%20%5Cmathbf%7Bz%7D_q%5Cend%7Bpmatrix%7D">, where <img src="https://latex.codecogs.com/png.latex?q%20=%20%5Cmin(n,%20p)">, often termed as principal components (PC) or scores. In PCA, the most variation is captured by the first component and rest in the subsequent components in decreasing order. In other words, the first principal component (<img src="https://latex.codecogs.com/png.latex?%5Cmathbf%7Bz%7D_1">) captures the highest variation and second principal component (<img src="https://latex.codecogs.com/png.latex?%5Cmathbf%7Bz%7D_2">) captures the maximum of remaing variation and so on.</p>
<div class="columns">
<div class="column">
<p><img src="https://mathatistics.com/blog/posts/2017-03-18-interpreting-biplot/images/Img-2.jpg" class="img-fluid" style="width:95.0%"></p>
</div><div class="column">
<p><span id="eq-pca"><img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Baligned%7D%0A%5Cmathbf%7Bz%7D_%7B1%7D%20&amp;=%20w_%7B11%7D%5Cmathbf%7Bx%7D_%7B1%7D%20+%20%5Cldots%20+%20w_%7B1p%7D%5Cmathbf%7Bx%7D_%7Bp%7D%20%5C%5C%0A%5Cmathbf%7Bz%7D_%7B2%7D%20&amp;=%20w_%7B21%7D%5Cmathbf%7Bx%7D_%7B1%7D%20+%20%5Cldots%20+%20w_%7B2p%7D%5Cmathbf%7Bx%7D_%7Bp%7D%20%5C%5C%0A%5Cvdots%20%5C%5C%0A%5Cmathbf%7Bz%7D_%7Bq%7D%20&amp;=%20w_%7Bq1%7D%5Cmathbf%7Bx%7D_%7B1%7D%20+%20%5Cldots%20+%20w_%7Bqp%7D%5Cmathbf%7Bx%7D_%7Bp%7D%20%5C%5C%0A%5Cend%7Baligned%7D%0A%5Ctag%7B1%7D"></span></p>
</div>
</div>
<p>We can use eigenvalue decomposition or singular value decompostion for this purpose.</p>
<p>These principle components are created using linear combination of the original variables. For example, <img src="https://latex.codecogs.com/png.latex?%5Cmathbf%7Bz%7D_1%20=%20w_%7B11%7D%5Cmathbf%7Bx%7D_1%20+%20%5Cldots%20+%20w_%7B1p%7D%5Cmathbf%7Bx%7D_p"> and similarly for <img src="https://latex.codecogs.com/png.latex?%5Cmathbf%7Bz%7D_2,%20%5Cldots,%20%5Cmathbf%7Bz%7D_q">. Here we can estimate weights <img src="https://latex.codecogs.com/png.latex?w_%7Bij%7D">, where <img src="https://latex.codecogs.com/png.latex?i%20=%201%20%5Cldots%20q"> and <img src="https://latex.codecogs.com/png.latex?j%20=%201%20%5Cldots%20p"> using <a href="https://en.wikipedia.org/wiki/Eigendecomposition_of_a_matrix">eigenvalue decomposition</a> or <a href="https://en.wikipedia.org/wiki/Singular_value_decomposition">singular value (SVD)</a>. These weights are also refered as loading or the matrix of these weights as rotation matrix.</p>
<p>Biplot plots the scores from two principal components together with loadings (weight) for each variables in the same plot. The following example uses <a href="https://www.rdocumentation.org/packages/datasets/versions/3.3.2/topics/USArrests"><code>USArrests</code></a> data from <code>datasets</code> package in R. The dataset contains the number of arrests per 100,000 residents for assault, murder, and rape in each of the 50 US states in 1973.</p>
<p>Two functions <code>prcomp</code> and <code>princomp</code> in R performs the principal component analysis. The function <code>prcomp</code> uses SVD on data matrix <img src="https://latex.codecogs.com/png.latex?%5Cmathbf%7BX%7D"> while <code>princomp</code> uses eigenvalue decomposition on covariance or correlation matrix of the data matrix. We will use <code>prcomp</code> for our example.</p>
<section id="dataset-usarrest" class="level3">
<h3 class="anchored" data-anchor-id="dataset-usarrest">Dataset: USArrest</h3>
<div class="cell" data-layout-align="center">
<details>
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1">USArrests <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">as_tidytable</span>(USArrests, <span class="at" style="color: #657422;">.keep_rownames =</span> <span class="st" style="color: #20794D;">"State"</span>)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;">head</span>(USArrests)</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tidytable: 6 × 5
  State      Murder Assault UrbanPop  Rape
  &lt;chr&gt;       &lt;dbl&gt;   &lt;int&gt;    &lt;int&gt; &lt;dbl&gt;
1 Alabama      13.2     236       58  21.2
2 Alaska       10       263       48  44.5
3 Arizona       8.1     294       80  31  
4 Arkansas      8.8     190       50  19.5
5 California    9       276       91  40.6
6 Colorado      7.9     204       78  38.7</code></pre>
</div>
</div>
</section>
<section id="principal-component-analysis" class="level3">
<h3 class="anchored" data-anchor-id="principal-component-analysis">Principal component analysis</h3>
<div class="cell" data-layout-align="center">
<details>
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1">pca <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">prcomp</span>(USArrests[, <span class="sc" style="color: #5E5E5E;">-</span><span class="dv" style="color: #AD0000;">1</span>], <span class="at" style="color: #657422;">scale. =</span> <span class="cn" style="color: #8f5902;">TRUE</span>)</span>
<span id="cb3-2"><span class="fu" style="color: #4758AB;">str</span>(pca)</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>List of 5
 $ sdev    : num [1:4] 1.575 0.995 0.597 0.416
 $ rotation: num [1:4, 1:4] -0.536 -0.583 -0.278 -0.543 -0.418 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:4] "Murder" "Assault" "UrbanPop" "Rape"
  .. ..$ : chr [1:4] "PC1" "PC2" "PC3" "PC4"
 $ center  : Named num [1:4] 7.79 170.76 65.54 21.23
  ..- attr(*, "names")= chr [1:4] "Murder" "Assault" "UrbanPop" "Rape"
 $ scale   : Named num [1:4] 4.36 83.34 14.47 9.37
  ..- attr(*, "names")= chr [1:4] "Murder" "Assault" "UrbanPop" "Rape"
 $ x       : num [1:50, 1:4] -0.976 -1.931 -1.745 0.14 -2.499 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr [1:4] "PC1" "PC2" "PC3" "PC4"
 - attr(*, "class")= chr "prcomp"</code></pre>
</div>
</div>
<div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-1-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-1" aria-controls="tabset-1-1" aria-selected="true">Biplot</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-2" aria-controls="tabset-1-2" aria-selected="false">Variance explained</a></li></ul>
<div class="tab-content">
<div id="tabset-1-1" class="tab-pane active" aria-labelledby="tabset-1-1-tab">
<div class="cell" data-layout-align="center">
<details>
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">score_df <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">as_tidytable</span>(pca<span class="sc" style="color: #5E5E5E;">$</span>x) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb5-2">  <span class="fu" style="color: #4758AB;">mutate</span>(<span class="at" style="color: #657422;">State =</span> USArrests[, State])</span>
<span id="cb5-3"></span>
<span id="cb5-4">loading_df <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">as_tidytable</span>(</span>
<span id="cb5-5">  pca<span class="sc" style="color: #5E5E5E;">$</span>rotation, </span>
<span id="cb5-6">  <span class="at" style="color: #657422;">.keep_rownames =</span> <span class="st" style="color: #20794D;">"Variable"</span></span>
<span id="cb5-7">)</span>
<span id="cb5-8"></span>
<span id="cb5-9"><span class="fu" style="color: #4758AB;">ggplot</span>(score_df, <span class="fu" style="color: #4758AB;">aes</span>(PC1, PC2)) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb5-10">  <span class="fu" style="color: #4758AB;">geom_hline</span>(</span>
<span id="cb5-11">    <span class="at" style="color: #657422;">yintercept =</span> <span class="dv" style="color: #AD0000;">0</span>, </span>
<span id="cb5-12">    <span class="at" style="color: #657422;">color =</span> <span class="st" style="color: #20794D;">"royalblue"</span>, </span>
<span id="cb5-13">    <span class="at" style="color: #657422;">linetype =</span> <span class="st" style="color: #20794D;">"dashed"</span></span>
<span id="cb5-14">  ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb5-15">  <span class="fu" style="color: #4758AB;">geom_vline</span>(</span>
<span id="cb5-16">    <span class="at" style="color: #657422;">xintercept =</span> <span class="dv" style="color: #AD0000;">0</span>, </span>
<span id="cb5-17">    <span class="at" style="color: #657422;">color =</span> <span class="st" style="color: #20794D;">"royalblue"</span>, </span>
<span id="cb5-18">    <span class="at" style="color: #657422;">linetype =</span> <span class="st" style="color: #20794D;">"dashed"</span></span>
<span id="cb5-19">  ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb5-20">  <span class="fu" style="color: #4758AB;">geom_segment</span>(</span>
<span id="cb5-21">    <span class="at" style="color: #657422;">data =</span> loading_df,</span>
<span id="cb5-22">    <span class="fu" style="color: #4758AB;">aes</span>(</span>
<span id="cb5-23">      <span class="at" style="color: #657422;">x =</span> <span class="dv" style="color: #AD0000;">0</span>, <span class="at" style="color: #657422;">xend =</span> PC1 <span class="sc" style="color: #5E5E5E;">*</span> <span class="dv" style="color: #AD0000;">2</span>,</span>
<span id="cb5-24">      <span class="at" style="color: #657422;">y =</span> <span class="dv" style="color: #AD0000;">0</span>, <span class="at" style="color: #657422;">yend =</span> PC2 <span class="sc" style="color: #5E5E5E;">*</span> <span class="dv" style="color: #AD0000;">2</span></span>
<span id="cb5-25">    ),</span>
<span id="cb5-26">    <span class="at" style="color: #657422;">color =</span> <span class="st" style="color: #20794D;">"firebrick"</span>,</span>
<span id="cb5-27">    <span class="at" style="color: #657422;">arrow =</span> <span class="fu" style="color: #4758AB;">arrow</span>(<span class="at" style="color: #657422;">length =</span> <span class="fu" style="color: #4758AB;">unit</span>(<span class="dv" style="color: #AD0000;">2</span>, <span class="st" style="color: #20794D;">"mm"</span>))</span>
<span id="cb5-28">  ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb5-29">  <span class="fu" style="color: #4758AB;">geom_text</span>(</span>
<span id="cb5-30">    <span class="at" style="color: #657422;">data =</span> loading_df,</span>
<span id="cb5-31">    <span class="fu" style="color: #4758AB;">aes</span>(</span>
<span id="cb5-32">      <span class="at" style="color: #657422;">x =</span> PC1 <span class="sc" style="color: #5E5E5E;">*</span> <span class="fl" style="color: #AD0000;">2.6</span>, </span>
<span id="cb5-33">      <span class="at" style="color: #657422;">y =</span> PC2 <span class="sc" style="color: #5E5E5E;">*</span> <span class="fl" style="color: #AD0000;">2.6</span>, </span>
<span id="cb5-34">      <span class="at" style="color: #657422;">label =</span> Variable</span>
<span id="cb5-35">    ),</span>
<span id="cb5-36">    <span class="at" style="color: #657422;">color =</span> <span class="st" style="color: #20794D;">"firebrick"</span>,</span>
<span id="cb5-37">    <span class="at" style="color: #657422;">size =</span> <span class="fu" style="color: #4758AB;">rel</span>(<span class="dv" style="color: #AD0000;">5</span>)</span>
<span id="cb5-38">  ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb5-39">  <span class="fu" style="color: #4758AB;">geom_text</span>(</span>
<span id="cb5-40">    <span class="fu" style="color: #4758AB;">aes</span>(<span class="at" style="color: #657422;">label =</span> State), </span>
<span id="cb5-41">    <span class="at" style="color: #657422;">check_overlap =</span> <span class="cn" style="color: #8f5902;">TRUE</span>,</span>
<span id="cb5-42">    <span class="at" style="color: #657422;">color =</span> <span class="st" style="color: #20794D;">"gray50"</span></span>
<span id="cb5-43">  ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb5-44">  <span class="fu" style="color: #4758AB;">theme_minimal</span>(<span class="at" style="color: #657422;">base_size =</span> <span class="dv" style="color: #AD0000;">18</span>) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb5-45">  <span class="fu" style="color: #4758AB;">theme</span>(</span>
<span id="cb5-46">    <span class="at" style="color: #657422;">panel.background =</span> <span class="fu" style="color: #4758AB;">element_rect</span>(<span class="at" style="color: #657422;">color =</span> <span class="st" style="color: #20794D;">"gray50"</span>)</span>
<span id="cb5-47">  )</span></code></pre></div>
</details>
<div class="cell-output-display">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://mathatistics.com/blog/posts/2017-03-18-interpreting-biplot/index_files/figure-html/unnamed-chunk-3-1.png" class="img-fluid figure-img" style="width:100.0%"></p>
</figure>
</div>
</div>
</div>
</div>
<div id="tabset-1-2" class="tab-pane" aria-labelledby="tabset-1-2-tab">
<div class="cell" data-layout-align="center">
<details>
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1"><span class="fu" style="color: #4758AB;">summary</span>(pca)</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>Importance of components:
                          PC1    PC2     PC3     PC4
Standard deviation     1.5749 0.9949 0.59713 0.41645
Proportion of Variance 0.6201 0.2474 0.08914 0.04336
Cumulative Proportion  0.6201 0.8675 0.95664 1.00000</code></pre>
</div>
<details>
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1">var_df <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">tidytable</span>(</span>
<span id="cb8-2">  <span class="at" style="color: #657422;">PC =</span> <span class="fu" style="color: #4758AB;">paste0</span>(<span class="st" style="color: #20794D;">"PC"</span>, <span class="dv" style="color: #AD0000;">1</span><span class="sc" style="color: #5E5E5E;">:</span><span class="fu" style="color: #4758AB;">length</span>(pca<span class="sc" style="color: #5E5E5E;">$</span>sdev)),</span>
<span id="cb8-3">  <span class="at" style="color: #657422;">Variance =</span> pca<span class="sc" style="color: #5E5E5E;">$</span>sdev<span class="sc" style="color: #5E5E5E;">^</span><span class="dv" style="color: #AD0000;">2</span></span>
<span id="cb8-4">)</span>
<span id="cb8-5"><span class="fu" style="color: #4758AB;">ggplot</span>(var_df, <span class="fu" style="color: #4758AB;">aes</span>(PC, Variance)) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb8-6">  <span class="fu" style="color: #4758AB;">geom_line</span>(<span class="at" style="color: #657422;">group =</span> <span class="dv" style="color: #AD0000;">1</span>) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb8-7">  <span class="fu" style="color: #4758AB;">geom_point</span>(</span>
<span id="cb8-8">    <span class="at" style="color: #657422;">stroke =</span> <span class="dv" style="color: #AD0000;">1</span>, <span class="at" style="color: #657422;">size =</span> <span class="dv" style="color: #AD0000;">3</span>, </span>
<span id="cb8-9">    <span class="at" style="color: #657422;">shape =</span> <span class="dv" style="color: #AD0000;">21</span>, <span class="at" style="color: #657422;">fill =</span> <span class="st" style="color: #20794D;">"whitesmoke"</span></span>
<span id="cb8-10">  ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb8-11">  <span class="fu" style="color: #4758AB;">theme_minimal</span>(<span class="at" style="color: #657422;">base_size =</span> <span class="dv" style="color: #AD0000;">18</span>) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb8-12">  <span class="fu" style="color: #4758AB;">theme</span>(</span>
<span id="cb8-13">    <span class="at" style="color: #657422;">panel.background =</span> <span class="fu" style="color: #4758AB;">element_rect</span>(<span class="at" style="color: #657422;">color =</span> <span class="st" style="color: #20794D;">"gray20"</span>)</span>
<span id="cb8-14">  )</span></code></pre></div>
</details>
<div class="cell-output-display">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://mathatistics.com/blog/posts/2017-03-18-interpreting-biplot/index_files/figure-html/unnamed-chunk-4-1.png" class="img-fluid figure-img" style="width:100.0%"></p>
</figure>
</div>
</div>
</div>
<p>Here we see that more than 87% of variation in <img src="https://latex.codecogs.com/png.latex?%5Cmathbf%7BX%7D"> was captured by first two principal components (65% by the first and 25% by the second component).</p>
</div>
</div>
</div>
</section>
<section id="explore-data-using-biplot" class="level2">
<h2 class="anchored" data-anchor-id="explore-data-using-biplot">Explore data using biplot</h2>
<p>Now lets compare the scores from the first two components, observations and variables in the data matrix. From the <code>USArrests</code> data, lets look at the top five states with highest and lowest arrest for each of these crimes.</p>
<div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-4-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-4-1" aria-controls="tabset-4-1" aria-selected="true">States with highest arrests</a></li><li class="nav-item"><a class="nav-link" id="tabset-4-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-4-2" aria-controls="tabset-4-2" aria-selected="false">States with lowest arrests</a></li></ul>
<div class="tab-content">
<div id="tabset-4-1" class="tab-pane active" aria-labelledby="tabset-4-1-tab">
<div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-2-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-1" aria-controls="tabset-2-1" aria-selected="true">Murder</a></li><li class="nav-item"><a class="nav-link" id="tabset-2-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-2" aria-controls="tabset-2-2" aria-selected="false">Assault</a></li><li class="nav-item"><a class="nav-link" id="tabset-2-3-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-3" aria-controls="tabset-2-3" aria-selected="false">UrbanPop</a></li><li class="nav-item"><a class="nav-link" id="tabset-2-4-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-4" aria-controls="tabset-2-4" aria-selected="false">Rape</a></li></ul>
<div class="tab-content">
<div id="tabset-2-1" class="tab-pane active" aria-labelledby="tabset-2-1-tab">
<div class="cell" data-layout-align="center">
<details>
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1">USArrests <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">arrange</span>(<span class="sc" style="color: #5E5E5E;">-</span>Murder) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">top_n</span>()</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tidytable: 5 × 5
  State          Murder Assault UrbanPop  Rape
  &lt;chr&gt;           &lt;dbl&gt;   &lt;int&gt;    &lt;int&gt; &lt;dbl&gt;
1 Georgia          17.4     211       60  25.8
2 Mississippi      16.1     259       44  17.1
3 Florida          15.4     335       80  31.9
4 Louisiana        15.4     249       66  22.2
5 South Carolina   14.4     279       48  22.5</code></pre>
</div>
</div>
</div>
<div id="tabset-2-2" class="tab-pane" aria-labelledby="tabset-2-2-tab">
<div class="cell" data-layout-align="center">
<details>
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1">USArrests <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">arrange</span>(<span class="sc" style="color: #5E5E5E;">-</span>Assault) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">top_n</span>()</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tidytable: 5 × 5
  State          Murder Assault UrbanPop  Rape
  &lt;chr&gt;           &lt;dbl&gt;   &lt;int&gt;    &lt;int&gt; &lt;dbl&gt;
1 North Carolina   13       337       45  16.1
2 Florida          15.4     335       80  31.9
3 Maryland         11.3     300       67  27.8
4 Arizona           8.1     294       80  31  
5 New Mexico       11.4     285       70  32.1</code></pre>
</div>
</div>
</div>
<div id="tabset-2-3" class="tab-pane" aria-labelledby="tabset-2-3-tab">
<div class="cell" data-layout-align="center">
<details>
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1">USArrests <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">arrange</span>(<span class="sc" style="color: #5E5E5E;">-</span>UrbanPop) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">top_n</span>()</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tidytable: 5 × 5
  State         Murder Assault UrbanPop  Rape
  &lt;chr&gt;          &lt;dbl&gt;   &lt;int&gt;    &lt;int&gt; &lt;dbl&gt;
1 California       9       276       91  40.6
2 New Jersey       7.4     159       89  18.8
3 Rhode Island     3.4     174       87   8.3
4 New York        11.1     254       86  26.1
5 Massachusetts    4.4     149       85  16.3</code></pre>
</div>
</div>
</div>
<div id="tabset-2-4" class="tab-pane" aria-labelledby="tabset-2-4-tab">
<div class="cell" data-layout-align="center">
<details>
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1">USArrests <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">arrange</span>(<span class="sc" style="color: #5E5E5E;">-</span>Rape) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">top_n</span>()</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tidytable: 5 × 5
  State      Murder Assault UrbanPop  Rape
  &lt;chr&gt;       &lt;dbl&gt;   &lt;int&gt;    &lt;int&gt; &lt;dbl&gt;
1 Nevada       12.2     252       81  46  
2 Alaska       10       263       48  44.5
3 California    9       276       91  40.6
4 Colorado      7.9     204       78  38.7
5 Michigan     12.1     255       74  35.1</code></pre>
</div>
</div>
</div>
</div>
</div>
</div>
<div id="tabset-4-2" class="tab-pane" aria-labelledby="tabset-4-2-tab">
<div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-3-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-3-1" aria-controls="tabset-3-1" aria-selected="true">Murder</a></li><li class="nav-item"><a class="nav-link" id="tabset-3-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-3-2" aria-controls="tabset-3-2" aria-selected="false">Assault</a></li><li class="nav-item"><a class="nav-link" id="tabset-3-3-tab" data-bs-toggle="tab" data-bs-target="#tabset-3-3" aria-controls="tabset-3-3" aria-selected="false">UrbanPop</a></li><li class="nav-item"><a class="nav-link" id="tabset-3-4-tab" data-bs-toggle="tab" data-bs-target="#tabset-3-4" aria-controls="tabset-3-4" aria-selected="false">Rape</a></li></ul>
<div class="tab-content">
<div id="tabset-3-1" class="tab-pane active" aria-labelledby="tabset-3-1-tab">
<div class="cell" data-layout-align="center">
<details>
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb17-1">USArrests <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">arrange</span>(Murder) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">top_n</span>()</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tidytable: 5 × 5
  State         Murder Assault UrbanPop  Rape
  &lt;chr&gt;          &lt;dbl&gt;   &lt;int&gt;    &lt;int&gt; &lt;dbl&gt;
1 North Dakota     0.8      45       44   7.3
2 Maine            2.1      83       51   7.8
3 New Hampshire    2.1      57       56   9.5
4 Iowa             2.2      56       57  11.3
5 Vermont          2.2      48       32  11.2</code></pre>
</div>
</div>
</div>
<div id="tabset-3-2" class="tab-pane" aria-labelledby="tabset-3-2-tab">
<div class="cell" data-layout-align="center">
<details>
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb19-1">USArrests <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">arrange</span>(Assault) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">top_n</span>()</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tidytable: 5 × 5
  State        Murder Assault UrbanPop  Rape
  &lt;chr&gt;         &lt;dbl&gt;   &lt;int&gt;    &lt;int&gt; &lt;dbl&gt;
1 North Dakota    0.8      45       44   7.3
2 Hawaii          5.3      46       83  20.2
3 Vermont         2.2      48       32  11.2
4 Wisconsin       2.6      53       66  10.8
5 Iowa            2.2      56       57  11.3</code></pre>
</div>
</div>
</div>
<div id="tabset-3-3" class="tab-pane" aria-labelledby="tabset-3-3-tab">
<div class="cell" data-layout-align="center">
<details>
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb21-1">USArrests <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">arrange</span>(UrbanPop) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">top_n</span>()</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tidytable: 5 × 5
  State          Murder Assault UrbanPop  Rape
  &lt;chr&gt;           &lt;dbl&gt;   &lt;int&gt;    &lt;int&gt; &lt;dbl&gt;
1 Vermont           2.2      48       32  11.2
2 West Virginia     5.7      81       39   9.3
3 Mississippi      16.1     259       44  17.1
4 North Dakota      0.8      45       44   7.3
5 North Carolina   13       337       45  16.1</code></pre>
</div>
</div>
</div>
<div id="tabset-3-4" class="tab-pane" aria-labelledby="tabset-3-4-tab">
<div class="cell" data-layout-align="center">
<details>
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb23-1">USArrests <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">arrange</span>(Rape) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">top_n</span>()</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code># A tidytable: 5 × 5
  State         Murder Assault UrbanPop  Rape
  &lt;chr&gt;          &lt;dbl&gt;   &lt;int&gt;    &lt;int&gt; &lt;dbl&gt;
1 North Dakota     0.8      45       44   7.3
2 Maine            2.1      83       51   7.8
3 Rhode Island     3.4     174       87   8.3
4 West Virginia    5.7      81       39   9.3
5 New Hampshire    2.1      57       56   9.5</code></pre>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<p>Comparing these highest and lowest arrests with the biplot, we can see a pattern. The weights corresponding to PC1 for all the variables are negative and are directed towards states like Florida, Nevada, and California. These states have the highest number of arrests for all of these crimes where as states that are in the oppositve direction like Iowa, North Dakota, and Vermont have the lowest arrest.</p>
<p>Similarly, UrbanPop have the highest weights corresponding to PC2 so the states in that direction such as California, Hawaii, and New Jersey have highest arrest related to UrbanPop. The states in the opposite direction, i.e.&nbsp;with negative PC2 scores such as Mississippi, North Carolina, Vermont, and South Carolina have the lowest arrest related to UrbanPop.</p>
<p>The weights for all variables are negative and towards states like . So in our data these states must have the highest arrests in all these crimes where as states like New Dakota, Vermont, and Iowa have the lowest arrests.</p>


</section>

 ]]></description>
  <category>statistics</category>
  <category>machine learning</category>
  <category>data science</category>
  <guid>https://mathatistics.com/blog/posts/2017-03-18-interpreting-biplot/index.html</guid>
  <pubDate>Fri, 17 Mar 2017 23:00:00 GMT</pubDate>
  <media:content url="https://mathatistics.com/blog/posts/2017-03-18-interpreting-biplot/images/Img-1.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>Model assessment and variable selection</title>
  <dc:creator>TheRimalaya</dc:creator>
  <link>https://mathatistics.com/blog/posts/2017-03-05-model-assessment-and-variable-selection-prodecure/index.html</link>
  <description><![CDATA[ 



<p>Adding new variables to a model can introduce noise, complicating analysis. Simpler models tend to be better because they’re easier to understand and contain less noise. Statistical methods can help us choose the best variables and improve our models.</p>
<p>This tutorial will show you how to compare models to find the best one. Using the mtcars dataset available in R, it will explore methods for variable selection that help build efficient models with fewer variables. Here are two popular techniques for selecting the best subset of variables:</p>
<section id="best-subset-method" class="level1">
<h1>Best subset method</h1>
<p>The best subset selection procedure identifies the optimal regression model by evaluating all possible subsets of variables. As the number of variables increases, the number of potential combinations grows exponentially. For instance, with 2 variables, there are 4 possible subsets: one with no predictors, two with single predictors, and one with both predictors. With 4 variables, the number of possible subsets jumps to 16. Consequently, fitting all possible subsets of a large number of predictors can become computationally intensive.</p>
<p>After fitting all possible models, they are compared based on various criteria. These criteria may include the coefficient of determination (<img src="https://latex.codecogs.com/png.latex?R%5E2">), adjusted <img src="https://latex.codecogs.com/png.latex?R%5E2">, Mallow’s CP, AIC, and BIC. In R, we can use the <a href="https://www.rdocumentation.org/packages/leaps/versions/2.1-1">leaps</a> package for this task. Let’s look at the <code>mtcars</code> example to see this in action.</p>
<section id="completefull-model" class="level2">
<h2 class="anchored" data-anchor-id="completefull-model">Complete/full model</h2>
<div class="cell" data-layout-align="center">
<details>
<summary>Fitting a complete model</summary>
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1">full.model <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">lm</span>(mpg <span class="sc" style="color: #5E5E5E;">~</span> ., <span class="at" style="color: #657422;">data =</span> mtcars)</span>
<span id="cb1-2">smry <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">summary</span>(full.model)</span>
<span id="cb1-3">smry</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>
Call:
lm(formula = mpg ~ ., data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.4506 -1.6044 -0.1196  1.2193  4.6271 

Coefficients:
            Estimate Std. Error t value Pr(&gt;|t|)  
(Intercept) 12.30337   18.71788   0.657   0.5181  
cyl         -0.11144    1.04502  -0.107   0.9161  
disp         0.01334    0.01786   0.747   0.4635  
hp          -0.02148    0.02177  -0.987   0.3350  
drat         0.78711    1.63537   0.481   0.6353  
wt          -3.71530    1.89441  -1.961   0.0633 .
qsec         0.82104    0.73084   1.123   0.2739  
vs           0.31776    2.10451   0.151   0.8814  
am           2.52023    2.05665   1.225   0.2340  
gear         0.65541    1.49326   0.439   0.6652  
carb        -0.19942    0.82875  -0.241   0.8122  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.65 on 21 degrees of freedom
Multiple R-squared:  0.869, Adjusted R-squared:  0.8066 
F-statistic: 13.93 on 10 and 21 DF,  p-value: 3.793e-07</code></pre>
</div>
</div>
<p>Here, we can see that the model has explained almost 86.9 percent of variation present in <code>mpg</code>, but non of the predictors are significant. This is a hint of having unnecessary variables that has increased model error. Using <code>regsubsets</code> function from <code>leaps</code> package, we can select a subset of predictors based on some criteria.</p>
</section>
<section id="selecting-best-subset" class="level2">
<h2 class="anchored" data-anchor-id="selecting-best-subset">Selecting best subset</h2>
<div class="cell" data-layout-align="center">
<details>
<summary>Selecting best subset model</summary>
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="fu" style="color: #4758AB;">library</span>(leaps)</span>
<span id="cb3-2">best.subset <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">regsubsets</span>(</span>
<span id="cb3-3">  <span class="at" style="color: #657422;">x      =</span> mtcars[, <span class="sc" style="color: #5E5E5E;">-</span><span class="dv" style="color: #AD0000;">1</span>], <span class="co" style="color: #5E5E5E;"># predictor variables</span></span>
<span id="cb3-4">  <span class="at" style="color: #657422;">y      =</span> mtcars[, <span class="dv" style="color: #AD0000;">1</span>], <span class="co" style="color: #5E5E5E;"># response variable (mpg)</span></span>
<span id="cb3-5">  <span class="at" style="color: #657422;">nbest  =</span> <span class="dv" style="color: #AD0000;">1</span>, <span class="co" style="color: #5E5E5E;"># top 1 best model</span></span>
<span id="cb3-6">  <span class="at" style="color: #657422;">nvmax  =</span> <span class="fu" style="color: #4758AB;">ncol</span>(mtcars) <span class="sc" style="color: #5E5E5E;">-</span> <span class="dv" style="color: #AD0000;">1</span>, <span class="co" style="color: #5E5E5E;"># max. number of variable (all)</span></span>
<span id="cb3-7">  <span class="at" style="color: #657422;">method =</span> <span class="st" style="color: #20794D;">"exhaustive"</span> <span class="co" style="color: #5E5E5E;"># search all possible subset</span></span>
<span id="cb3-8">)</span>
<span id="cb3-9">bs.smry <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">summary</span>(best.subset)</span></code></pre></div>
</details>
</div>
<p>We can combine following summary output with a plot created from additional estimates to get some insight. These estimates are also found in the summary object. The output show which variables are included with a star(<code>*</code>).</p>
<div class="cell" data-layout-align="center">
<details>
<summary>Summary of best subset</summary>
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">bs.smry<span class="sc" style="color: #5E5E5E;">$</span>outmat <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb4-2">  <span class="fu" style="color: #4758AB;">as_tidytable</span>(<span class="at" style="color: #657422;">.keep_rownames =</span> <span class="st" style="color: #20794D;">"model"</span>) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb4-3">  gt<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">gt</span>(<span class="at" style="color: #657422;">rowname_col =</span> <span class="st" style="color: #20794D;">"model"</span>) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb4-4">  gt<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">opt_vertical_padding</span>(<span class="fl" style="color: #AD0000;">0.5</span>) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb4-5">  gt<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">opt_row_striping</span>() <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb4-6">  gt<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">tab_options</span>(</span>
<span id="cb4-7">    <span class="at" style="color: #657422;">column_labels.font.weight =</span> <span class="st" style="color: #20794D;">"bold"</span>,</span>
<span id="cb4-8">    <span class="at" style="color: #657422;">stub.font.weight =</span> <span class="st" style="color: #20794D;">"bold"</span></span>
<span id="cb4-9">  )</span>
<span id="cb4-10"></span>
<span id="cb4-11">bs.est <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">tidytable</span>(</span>
<span id="cb4-12">  <span class="at" style="color: #657422;">nvar   =</span> <span class="dv" style="color: #AD0000;">1</span><span class="sc" style="color: #5E5E5E;">:</span>(best.subset<span class="sc" style="color: #5E5E5E;">$</span>nvmax <span class="sc" style="color: #5E5E5E;">-</span> <span class="dv" style="color: #AD0000;">1</span>),</span>
<span id="cb4-13">  <span class="at" style="color: #657422;">adj.r2 =</span> <span class="fu" style="color: #4758AB;">round</span>(bs.smry<span class="sc" style="color: #5E5E5E;">$</span>adjr2, <span class="dv" style="color: #AD0000;">3</span>),</span>
<span id="cb4-14">  <span class="at" style="color: #657422;">cp     =</span> <span class="fu" style="color: #4758AB;">round</span>(bs.smry<span class="sc" style="color: #5E5E5E;">$</span>cp, <span class="dv" style="color: #AD0000;">3</span>),</span>
<span id="cb4-15">  <span class="at" style="color: #657422;">bic    =</span> <span class="fu" style="color: #4758AB;">round</span>(bs.smry<span class="sc" style="color: #5E5E5E;">$</span>bic, <span class="dv" style="color: #AD0000;">3</span>)</span>
<span id="cb4-16">) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">pivot_longer</span>(</span>
<span id="cb4-17">  <span class="at" style="color: #657422;">cols =</span> <span class="fu" style="color: #4758AB;">c</span>(adj.r2<span class="sc" style="color: #5E5E5E;">:</span>bic),</span>
<span id="cb4-18">  <span class="at" style="color: #657422;">names_to =</span> <span class="st" style="color: #20794D;">"estimates"</span>,</span>
<span id="cb4-19">  <span class="at" style="color: #657422;">values_to =</span> <span class="st" style="color: #20794D;">"value"</span></span>
<span id="cb4-20">)</span></code></pre></div>
</details>
<div class="cell-output-display">

<div id="qeivkqyifu" style="padding-left:0px;padding-right:0px;padding-top:10px;padding-bottom:10px;overflow-x:auto;overflow-y:auto;width:auto;height:auto;">
<style>#qeivkqyifu table {
  font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji';
  -webkit-font-smoothing: antialiased;
  -moz-osx-font-smoothing: grayscale;
}

#qeivkqyifu thead, #qeivkqyifu tbody, #qeivkqyifu tfoot, #qeivkqyifu tr, #qeivkqyifu td, #qeivkqyifu th {
  border-style: none;
}

#qeivkqyifu p {
  margin: 0;
  padding: 0;
}

#qeivkqyifu .gt_table {
  display: table;
  border-collapse: collapse;
  line-height: normal;
  margin-left: auto;
  margin-right: auto;
  color: #333333;
  font-size: 16px;
  font-weight: normal;
  font-style: normal;
  background-color: #FFFFFF;
  width: auto;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #A8A8A8;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #A8A8A8;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
}

#qeivkqyifu .gt_caption {
  padding-top: 4px;
  padding-bottom: 4px;
}

#qeivkqyifu .gt_title {
  color: #333333;
  font-size: 125%;
  font-weight: initial;
  padding-top: 2px;
  padding-bottom: 2px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-color: #FFFFFF;
  border-bottom-width: 0;
}

#qeivkqyifu .gt_subtitle {
  color: #333333;
  font-size: 85%;
  font-weight: initial;
  padding-top: 1px;
  padding-bottom: 3px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-color: #FFFFFF;
  border-top-width: 0;
}

#qeivkqyifu .gt_heading {
  background-color: #FFFFFF;
  text-align: center;
  border-bottom-color: #FFFFFF;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#qeivkqyifu .gt_bottom_border {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#qeivkqyifu .gt_col_headings {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#qeivkqyifu .gt_col_heading {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: bold;
  text-transform: inherit;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 2.5px;
  padding-bottom: 3.5px;
  padding-left: 5px;
  padding-right: 5px;
  overflow-x: hidden;
}

#qeivkqyifu .gt_column_spanner_outer {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: bold;
  text-transform: inherit;
  padding-top: 0;
  padding-bottom: 0;
  padding-left: 4px;
  padding-right: 4px;
}

#qeivkqyifu .gt_column_spanner_outer:first-child {
  padding-left: 0;
}

#qeivkqyifu .gt_column_spanner_outer:last-child {
  padding-right: 0;
}

#qeivkqyifu .gt_column_spanner {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 2.5px;
  padding-bottom: 2.5px;
  overflow-x: hidden;
  display: inline-block;
  width: 100%;
}

#qeivkqyifu .gt_spanner_row {
  border-bottom-style: hidden;
}

#qeivkqyifu .gt_group_heading {
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  text-align: left;
}

#qeivkqyifu .gt_empty_group_heading {
  padding: 0.5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: middle;
}

#qeivkqyifu .gt_from_md > :first-child {
  margin-top: 0;
}

#qeivkqyifu .gt_from_md > :last-child {
  margin-bottom: 0;
}

#qeivkqyifu .gt_row {
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  margin: 10px;
  border-top-style: solid;
  border-top-width: 1px;
  border-top-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  overflow-x: hidden;
}

#qeivkqyifu .gt_stub {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: bold;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
}

#qeivkqyifu .gt_stub_row_group {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
  vertical-align: top;
}

#qeivkqyifu .gt_row_group_first td {
  border-top-width: 2px;
}

#qeivkqyifu .gt_row_group_first th {
  border-top-width: 2px;
}

#qeivkqyifu .gt_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#qeivkqyifu .gt_first_summary_row {
  border-top-style: solid;
  border-top-color: #D3D3D3;
}

#qeivkqyifu .gt_first_summary_row.thick {
  border-top-width: 2px;
}

#qeivkqyifu .gt_last_summary_row {
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#qeivkqyifu .gt_grand_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#qeivkqyifu .gt_first_grand_summary_row {
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: double;
  border-top-width: 6px;
  border-top-color: #D3D3D3;
}

#qeivkqyifu .gt_last_grand_summary_row_top {
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: double;
  border-bottom-width: 6px;
  border-bottom-color: #D3D3D3;
}

#qeivkqyifu .gt_striped {
  background-color: rgba(128, 128, 128, 0.05);
}

#qeivkqyifu .gt_table_body {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#qeivkqyifu .gt_footnotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#qeivkqyifu .gt_footnote {
  margin: 0px;
  font-size: 90%;
  padding-top: 2px;
  padding-bottom: 2px;
  padding-left: 5px;
  padding-right: 5px;
}

#qeivkqyifu .gt_sourcenotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#qeivkqyifu .gt_sourcenote {
  font-size: 90%;
  padding-top: 2px;
  padding-bottom: 2px;
  padding-left: 5px;
  padding-right: 5px;
}

#qeivkqyifu .gt_left {
  text-align: left;
}

#qeivkqyifu .gt_center {
  text-align: center;
}

#qeivkqyifu .gt_right {
  text-align: right;
  font-variant-numeric: tabular-nums;
}

#qeivkqyifu .gt_font_normal {
  font-weight: normal;
}

#qeivkqyifu .gt_font_bold {
  font-weight: bold;
}

#qeivkqyifu .gt_font_italic {
  font-style: italic;
}

#qeivkqyifu .gt_super {
  font-size: 65%;
}

#qeivkqyifu .gt_footnote_marks {
  font-size: 75%;
  vertical-align: 0.4em;
  position: initial;
}

#qeivkqyifu .gt_asterisk {
  font-size: 100%;
  vertical-align: 0;
}

#qeivkqyifu .gt_indent_1 {
  text-indent: 5px;
}

#qeivkqyifu .gt_indent_2 {
  text-indent: 10px;
}

#qeivkqyifu .gt_indent_3 {
  text-indent: 15px;
}

#qeivkqyifu .gt_indent_4 {
  text-indent: 20px;
}

#qeivkqyifu .gt_indent_5 {
  text-indent: 25px;
}

#qeivkqyifu .katex-display {
  display: inline-flex !important;
  margin-bottom: 0.75em !important;
}

#qeivkqyifu div.Reactable > div.rt-table > div.rt-thead > div.rt-tr.rt-tr-group-header > div.rt-th-group:after {
  height: 0px !important;
}
</style>
<table class="gt_table" data-quarto-disable-processing="false" data-quarto-bootstrap="false">
  <thead>
    <tr class="gt_col_headings">
      <th class="gt_col_heading gt_columns_bottom_border gt_left" rowspan="1" colspan="1" scope="col" id="a::stub"></th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="cyl">cyl</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="disp">disp</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="hp">hp</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="drat">drat</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="wt">wt</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="qsec">qsec</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="vs">vs</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="am">am</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="gear">gear</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="carb">carb</th>
    </tr>
  </thead>
  <tbody class="gt_table_body">
    <tr><th id="stub_1_1" scope="row" class="gt_row gt_right gt_stub">1  ( 1 )</th>
<td headers="stub_1_1 cyl" class="gt_row gt_right"> </td>
<td headers="stub_1_1 disp" class="gt_row gt_right"> </td>
<td headers="stub_1_1 hp" class="gt_row gt_right"> </td>
<td headers="stub_1_1 drat" class="gt_row gt_right"> </td>
<td headers="stub_1_1 wt" class="gt_row gt_right">*</td>
<td headers="stub_1_1 qsec" class="gt_row gt_right"> </td>
<td headers="stub_1_1 vs" class="gt_row gt_right"> </td>
<td headers="stub_1_1 am" class="gt_row gt_right"> </td>
<td headers="stub_1_1 gear" class="gt_row gt_right"> </td>
<td headers="stub_1_1 carb" class="gt_row gt_right"> </td></tr>
    <tr><th id="stub_1_2" scope="row" class="gt_row gt_right gt_stub">2  ( 1 )</th>
<td headers="stub_1_2 cyl" class="gt_row gt_right gt_striped">*</td>
<td headers="stub_1_2 disp" class="gt_row gt_right gt_striped"> </td>
<td headers="stub_1_2 hp" class="gt_row gt_right gt_striped"> </td>
<td headers="stub_1_2 drat" class="gt_row gt_right gt_striped"> </td>
<td headers="stub_1_2 wt" class="gt_row gt_right gt_striped">*</td>
<td headers="stub_1_2 qsec" class="gt_row gt_right gt_striped"> </td>
<td headers="stub_1_2 vs" class="gt_row gt_right gt_striped"> </td>
<td headers="stub_1_2 am" class="gt_row gt_right gt_striped"> </td>
<td headers="stub_1_2 gear" class="gt_row gt_right gt_striped"> </td>
<td headers="stub_1_2 carb" class="gt_row gt_right gt_striped"> </td></tr>
    <tr><th id="stub_1_3" scope="row" class="gt_row gt_right gt_stub">3  ( 1 )</th>
<td headers="stub_1_3 cyl" class="gt_row gt_right"> </td>
<td headers="stub_1_3 disp" class="gt_row gt_right"> </td>
<td headers="stub_1_3 hp" class="gt_row gt_right"> </td>
<td headers="stub_1_3 drat" class="gt_row gt_right"> </td>
<td headers="stub_1_3 wt" class="gt_row gt_right">*</td>
<td headers="stub_1_3 qsec" class="gt_row gt_right">*</td>
<td headers="stub_1_3 vs" class="gt_row gt_right"> </td>
<td headers="stub_1_3 am" class="gt_row gt_right">*</td>
<td headers="stub_1_3 gear" class="gt_row gt_right"> </td>
<td headers="stub_1_3 carb" class="gt_row gt_right"> </td></tr>
    <tr><th id="stub_1_4" scope="row" class="gt_row gt_right gt_stub">4  ( 1 )</th>
<td headers="stub_1_4 cyl" class="gt_row gt_right gt_striped"> </td>
<td headers="stub_1_4 disp" class="gt_row gt_right gt_striped"> </td>
<td headers="stub_1_4 hp" class="gt_row gt_right gt_striped">*</td>
<td headers="stub_1_4 drat" class="gt_row gt_right gt_striped"> </td>
<td headers="stub_1_4 wt" class="gt_row gt_right gt_striped">*</td>
<td headers="stub_1_4 qsec" class="gt_row gt_right gt_striped">*</td>
<td headers="stub_1_4 vs" class="gt_row gt_right gt_striped"> </td>
<td headers="stub_1_4 am" class="gt_row gt_right gt_striped">*</td>
<td headers="stub_1_4 gear" class="gt_row gt_right gt_striped"> </td>
<td headers="stub_1_4 carb" class="gt_row gt_right gt_striped"> </td></tr>
    <tr><th id="stub_1_5" scope="row" class="gt_row gt_right gt_stub">5  ( 1 )</th>
<td headers="stub_1_5 cyl" class="gt_row gt_right"> </td>
<td headers="stub_1_5 disp" class="gt_row gt_right">*</td>
<td headers="stub_1_5 hp" class="gt_row gt_right">*</td>
<td headers="stub_1_5 drat" class="gt_row gt_right"> </td>
<td headers="stub_1_5 wt" class="gt_row gt_right">*</td>
<td headers="stub_1_5 qsec" class="gt_row gt_right">*</td>
<td headers="stub_1_5 vs" class="gt_row gt_right"> </td>
<td headers="stub_1_5 am" class="gt_row gt_right">*</td>
<td headers="stub_1_5 gear" class="gt_row gt_right"> </td>
<td headers="stub_1_5 carb" class="gt_row gt_right"> </td></tr>
    <tr><th id="stub_1_6" scope="row" class="gt_row gt_right gt_stub">6  ( 1 )</th>
<td headers="stub_1_6 cyl" class="gt_row gt_right gt_striped"> </td>
<td headers="stub_1_6 disp" class="gt_row gt_right gt_striped">*</td>
<td headers="stub_1_6 hp" class="gt_row gt_right gt_striped">*</td>
<td headers="stub_1_6 drat" class="gt_row gt_right gt_striped">*</td>
<td headers="stub_1_6 wt" class="gt_row gt_right gt_striped">*</td>
<td headers="stub_1_6 qsec" class="gt_row gt_right gt_striped">*</td>
<td headers="stub_1_6 vs" class="gt_row gt_right gt_striped"> </td>
<td headers="stub_1_6 am" class="gt_row gt_right gt_striped">*</td>
<td headers="stub_1_6 gear" class="gt_row gt_right gt_striped"> </td>
<td headers="stub_1_6 carb" class="gt_row gt_right gt_striped"> </td></tr>
    <tr><th id="stub_1_7" scope="row" class="gt_row gt_right gt_stub">7  ( 1 )</th>
<td headers="stub_1_7 cyl" class="gt_row gt_right"> </td>
<td headers="stub_1_7 disp" class="gt_row gt_right">*</td>
<td headers="stub_1_7 hp" class="gt_row gt_right">*</td>
<td headers="stub_1_7 drat" class="gt_row gt_right">*</td>
<td headers="stub_1_7 wt" class="gt_row gt_right">*</td>
<td headers="stub_1_7 qsec" class="gt_row gt_right">*</td>
<td headers="stub_1_7 vs" class="gt_row gt_right"> </td>
<td headers="stub_1_7 am" class="gt_row gt_right">*</td>
<td headers="stub_1_7 gear" class="gt_row gt_right">*</td>
<td headers="stub_1_7 carb" class="gt_row gt_right"> </td></tr>
    <tr><th id="stub_1_8" scope="row" class="gt_row gt_right gt_stub">8  ( 1 )</th>
<td headers="stub_1_8 cyl" class="gt_row gt_right gt_striped"> </td>
<td headers="stub_1_8 disp" class="gt_row gt_right gt_striped">*</td>
<td headers="stub_1_8 hp" class="gt_row gt_right gt_striped">*</td>
<td headers="stub_1_8 drat" class="gt_row gt_right gt_striped">*</td>
<td headers="stub_1_8 wt" class="gt_row gt_right gt_striped">*</td>
<td headers="stub_1_8 qsec" class="gt_row gt_right gt_striped">*</td>
<td headers="stub_1_8 vs" class="gt_row gt_right gt_striped"> </td>
<td headers="stub_1_8 am" class="gt_row gt_right gt_striped">*</td>
<td headers="stub_1_8 gear" class="gt_row gt_right gt_striped">*</td>
<td headers="stub_1_8 carb" class="gt_row gt_right gt_striped">*</td></tr>
    <tr><th id="stub_1_9" scope="row" class="gt_row gt_right gt_stub">9  ( 1 )</th>
<td headers="stub_1_9 cyl" class="gt_row gt_right"> </td>
<td headers="stub_1_9 disp" class="gt_row gt_right">*</td>
<td headers="stub_1_9 hp" class="gt_row gt_right">*</td>
<td headers="stub_1_9 drat" class="gt_row gt_right">*</td>
<td headers="stub_1_9 wt" class="gt_row gt_right">*</td>
<td headers="stub_1_9 qsec" class="gt_row gt_right">*</td>
<td headers="stub_1_9 vs" class="gt_row gt_right">*</td>
<td headers="stub_1_9 am" class="gt_row gt_right">*</td>
<td headers="stub_1_9 gear" class="gt_row gt_right">*</td>
<td headers="stub_1_9 carb" class="gt_row gt_right">*</td></tr>
    <tr><th id="stub_1_10" scope="row" class="gt_row gt_right gt_stub">10  ( 1 )</th>
<td headers="stub_1_10 cyl" class="gt_row gt_right gt_striped">*</td>
<td headers="stub_1_10 disp" class="gt_row gt_right gt_striped">*</td>
<td headers="stub_1_10 hp" class="gt_row gt_right gt_striped">*</td>
<td headers="stub_1_10 drat" class="gt_row gt_right gt_striped">*</td>
<td headers="stub_1_10 wt" class="gt_row gt_right gt_striped">*</td>
<td headers="stub_1_10 qsec" class="gt_row gt_right gt_striped">*</td>
<td headers="stub_1_10 vs" class="gt_row gt_right gt_striped">*</td>
<td headers="stub_1_10 am" class="gt_row gt_right gt_striped">*</td>
<td headers="stub_1_10 gear" class="gt_row gt_right gt_striped">*</td>
<td headers="stub_1_10 carb" class="gt_row gt_right gt_striped">*</td></tr>
  </tbody>
  
  
</table>
</div>
</div>
</div>
<p>We can make a plot to visualise the properties of these individual models and select a model with specific number of predictor that can give minimum BIC, or minimum CP or maximum adjusted rsquared.</p>
<div class="cell" data-layout-align="center">
<details>
<summary>Plotting Adj-Rsq, BIC, and CP</summary>
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">bs.est.select <span class="ot" style="color: #003B4F;">&lt;-</span> bs.est <span class="sc" style="color: #5E5E5E;">%&gt;%</span></span>
<span id="cb5-2">  <span class="fu" style="color: #4758AB;">group_by</span>(estimates) <span class="sc" style="color: #5E5E5E;">%&gt;%</span></span>
<span id="cb5-3">  <span class="fu" style="color: #4758AB;">filter</span>(</span>
<span id="cb5-4">    (value <span class="sc" style="color: #5E5E5E;">==</span> <span class="fu" style="color: #4758AB;">max</span>(value) <span class="sc" style="color: #5E5E5E;">&amp;</span> estimates <span class="sc" style="color: #5E5E5E;">==</span> <span class="st" style="color: #20794D;">"adj.r2"</span>) <span class="sc" style="color: #5E5E5E;">|</span></span>
<span id="cb5-5">      (value <span class="sc" style="color: #5E5E5E;">==</span> <span class="fu" style="color: #4758AB;">min</span>(value) <span class="sc" style="color: #5E5E5E;">&amp;</span> estimates <span class="sc" style="color: #5E5E5E;">!=</span> <span class="st" style="color: #20794D;">"adj.r2"</span>)</span>
<span id="cb5-6">  )</span>
<span id="cb5-7"><span class="fu" style="color: #4758AB;">ggplot</span>(bs.est, <span class="fu" style="color: #4758AB;">aes</span>(nvar, value, <span class="at" style="color: #657422;">color =</span> estimates)) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb5-8">  <span class="fu" style="color: #4758AB;">geom_point</span>(<span class="at" style="color: #657422;">shape =</span> <span class="dv" style="color: #AD0000;">21</span>, <span class="at" style="color: #657422;">fill =</span> <span class="st" style="color: #20794D;">"lightgray"</span>) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb5-9">  <span class="fu" style="color: #4758AB;">geom_line</span>() <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb5-10">  <span class="fu" style="color: #4758AB;">facet_wrap</span>(<span class="sc" style="color: #5E5E5E;">~</span>estimates, <span class="at" style="color: #657422;">scale =</span> <span class="st" style="color: #20794D;">"free_y"</span>) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb5-11">  <span class="fu" style="color: #4758AB;">theme</span>(<span class="at" style="color: #657422;">legend.position =</span> <span class="st" style="color: #20794D;">"top"</span>) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb5-12">  <span class="fu" style="color: #4758AB;">labs</span>(</span>
<span id="cb5-13">    <span class="at" style="color: #657422;">x =</span> <span class="st" style="color: #20794D;">"Number of variables in the model"</span>,</span>
<span id="cb5-14">    <span class="at" style="color: #657422;">y =</span> <span class="st" style="color: #20794D;">"Value of Estimate"</span></span>
<span id="cb5-15">  ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb5-16">  <span class="fu" style="color: #4758AB;">scale_x_continuous</span>(<span class="at" style="color: #657422;">breaks =</span> <span class="fu" style="color: #4758AB;">seq</span>(<span class="dv" style="color: #AD0000;">0</span>, <span class="dv" style="color: #AD0000;">10</span>, <span class="dv" style="color: #AD0000;">2</span>)) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb5-17">  <span class="fu" style="color: #4758AB;">geom_point</span>(</span>
<span id="cb5-18">    <span class="at" style="color: #657422;">data =</span> bs.est.select, <span class="at" style="color: #657422;">fill =</span> <span class="st" style="color: #20794D;">"red"</span>,</span>
<span id="cb5-19">    <span class="at" style="color: #657422;">shape =</span> <span class="dv" style="color: #AD0000;">21</span></span>
<span id="cb5-20">  ) <span class="sc" style="color: #5E5E5E;">+</span></span>
<span id="cb5-21">  <span class="fu" style="color: #4758AB;">geom_text</span>(</span>
<span id="cb5-22">    <span class="fu" style="color: #4758AB;">aes</span>(<span class="at" style="color: #657422;">label =</span> <span class="fu" style="color: #4758AB;">paste0</span>(<span class="st" style="color: #20794D;">"nvar:"</span>, nvar, <span class="st" style="color: #20794D;">"</span><span class="sc" style="color: #5E5E5E;">\n</span><span class="st" style="color: #20794D;">"</span>, <span class="st" style="color: #20794D;">"value:"</span>, value)),</span>
<span id="cb5-23">    <span class="at" style="color: #657422;">data =</span> bs.est.select,</span>
<span id="cb5-24">    <span class="at" style="color: #657422;">size =</span> <span class="dv" style="color: #AD0000;">3</span>, <span class="at" style="color: #657422;">hjust =</span> <span class="dv" style="color: #AD0000;">0</span>, <span class="at" style="color: #657422;">vjust =</span> <span class="fu" style="color: #4758AB;">c</span>(<span class="dv" style="color: #AD0000;">1</span>, <span class="sc" style="color: #5E5E5E;">-</span><span class="dv" style="color: #AD0000;">1</span>, <span class="sc" style="color: #5E5E5E;">-</span><span class="dv" style="color: #AD0000;">1</span>),</span>
<span id="cb5-25">    <span class="at" style="color: #657422;">color =</span> <span class="st" style="color: #20794D;">"black"</span>, <span class="at" style="color: #657422;">family =</span> <span class="st" style="color: #20794D;">"monospace"</span></span>
<span id="cb5-26">  )</span></code></pre></div>
</details>
<div class="cell-output-display">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://mathatistics.com/blog/posts/2017-03-05-model-assessment-and-variable-selection-prodecure/index_files/figure-html/plt-best-subset-1.png" class="img-fluid figure-img" style="width:100.0%"></p>
<p></p><figcaption class="figure-caption">Adj-Rsq, BIC, and Mallows’ CP for models with increasing number of predictors</figcaption><p></p>
</figure>
</div>
</div>
</div>
<p>From these plots, we see that with 5 variables we will obtain maximum adjusted coefficient of determination (<img src="https://latex.codecogs.com/png.latex?R%5E2">). Similarly, both BIC and Mallow CP will be minimum for models with only 3 predictor variables. With the help of table above, we can identify these variables. From the table, row corresponding to 3 variables, we see that the three predictors are <code>wt</code>, <code>qsec</code> and <code>am</code>. To obtain maximum adjusted <img src="https://latex.codecogs.com/png.latex?R%5E2">, <code>disp</code> and <code>hp</code> should be added to the previous 3 predictors.</p>
<p>This way, we can reduce a model to few variables optimising different assessment criteria. Let look at the fit of these reduced models:</p>
<div class="cell" data-layout-align="center">
<details>
<summary>Reduced Model</summary>
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">model<span class="fl" style="color: #AD0000;">.3</span> <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">lm</span>(mpg <span class="sc" style="color: #5E5E5E;">~</span> wt <span class="sc" style="color: #5E5E5E;">+</span> qsec <span class="sc" style="color: #5E5E5E;">+</span> am, <span class="at" style="color: #657422;">data =</span> mtcars)</span>
<span id="cb6-2">model<span class="fl" style="color: #AD0000;">.5</span> <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">update</span>(model<span class="fl" style="color: #AD0000;">.3</span>, . <span class="sc" style="color: #5E5E5E;">~</span> . <span class="sc" style="color: #5E5E5E;">+</span> disp <span class="sc" style="color: #5E5E5E;">+</span> hp)</span></code></pre></div>
</details>
</div>
</section>
<section id="model-summaries" class="level2">
<h2 class="anchored" data-anchor-id="model-summaries">Model summaries</h2>
<div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-1-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-1" aria-controls="tabset-1-1" aria-selected="true">3 Variable Model</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-2" aria-controls="tabset-1-2" aria-selected="false">5 Variable Model</a></li></ul>
<div class="tab-content">
<div id="tabset-1-1" class="tab-pane active" aria-labelledby="tabset-1-1-tab">
<div class="cell" data-layout-align="center">
<details>
<summary>Model summary of model with 3 variables</summary>
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><span class="fu" style="color: #4758AB;">summary</span>(model<span class="fl" style="color: #AD0000;">.3</span>)</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>
Call:
lm(formula = mpg ~ wt + qsec + am, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.4811 -1.5555 -0.7257  1.4110  4.6610 

Coefficients:
            Estimate Std. Error t value Pr(&gt;|t|)    
(Intercept)   9.6178     6.9596   1.382 0.177915    
wt           -3.9165     0.7112  -5.507 6.95e-06 ***
qsec          1.2259     0.2887   4.247 0.000216 ***
am            2.9358     1.4109   2.081 0.046716 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.459 on 28 degrees of freedom
Multiple R-squared:  0.8497,    Adjusted R-squared:  0.8336 
F-statistic: 52.75 on 3 and 28 DF,  p-value: 1.21e-11</code></pre>
</div>
</div>
</div>
<div id="tabset-1-2" class="tab-pane" aria-labelledby="tabset-1-2-tab">
<div class="cell" data-layout-align="center">
<details>
<summary>Model summary of model with 5 variables</summary>
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1"><span class="fu" style="color: #4758AB;">summary</span>(model<span class="fl" style="color: #AD0000;">.5</span>)</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>
Call:
lm(formula = mpg ~ wt + qsec + am + disp + hp, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.5399 -1.7398 -0.3196  1.1676  4.5534 

Coefficients:
            Estimate Std. Error t value Pr(&gt;|t|)   
(Intercept) 14.36190    9.74079   1.474  0.15238   
wt          -4.08433    1.19410  -3.420  0.00208 **
qsec         1.00690    0.47543   2.118  0.04391 * 
am           3.47045    1.48578   2.336  0.02749 * 
disp         0.01124    0.01060   1.060  0.29897   
hp          -0.02117    0.01450  -1.460  0.15639   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.429 on 26 degrees of freedom
Multiple R-squared:  0.8637,    Adjusted R-squared:  0.8375 
F-statistic: 32.96 on 5 and 26 DF,  p-value: 1.844e-10</code></pre>
</div>
</div>
</div>
</div>
</div>
<p>From these output, it seems that although adjusted <img src="https://latex.codecogs.com/png.latex?R%5E2"> has increased in later model, the additional variables are not significant. we can compare these two model with an ANOVA test which compares the residual variance between these two models. We can write the hypothesis as,</p>
<p><img src="https://latex.codecogs.com/png.latex?H_0:"> <em>Model 1</em> and <em>Model 2</em> are same vs <img src="https://latex.codecogs.com/png.latex?H_1:"> <em>Model 1</em> and <em>Model 2</em> are different</p>
<p>where, <em>Model 1</em> and <em>Model 2</em> represents 3 variable and 5 variable model</p>
<div class="cell" data-layout-align="center">
<details>
<summary>Anova comparing Model with 3 and 5 variables</summary>
<div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1"><span class="fu" style="color: #4758AB;">anova</span>(model<span class="fl" style="color: #AD0000;">.3</span>, model<span class="fl" style="color: #AD0000;">.5</span>)</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>Analysis of Variance Table

Model 1: mpg ~ wt + qsec + am
Model 2: mpg ~ wt + qsec + am + disp + hp
  Res.Df    RSS Df Sum of Sq      F Pr(&gt;F)
1     28 169.29                           
2     26 153.44  2    15.848 1.3427 0.2786</code></pre>
</div>
</div>
<p>The ANOVA result can not reject the hypothesis so claim that <em>Model 1</em> and <em>Model 2</em> are same. So, it is better to select the simpler model with 3 predictor variables.</p>
</section>
</section>
<section id="step-wise-selection" class="level1">
<h1>Step-wise selection</h1>
<p>In this section, we’ll explore another type of variable selection method, similar yet distinct from best subset selection method. We will explore forward and backward stepwise selection methods using the <code>mtcars</code> dataset in R. We’ll briefly compare these methods and provide insights into their application.</p>
<div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-2-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-1" aria-controls="tabset-2-1" aria-selected="true">Forward Selection</a></li><li class="nav-item"><a class="nav-link" id="tabset-2-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-2" aria-controls="tabset-2-2" aria-selected="false">Backward Selection</a></li></ul>
<div class="tab-content">
<div id="tabset-2-1" class="tab-pane active" aria-labelledby="tabset-2-1-tab">
<p>Forward selection starts with an empty model, and predictors are added one by one based on a selection criterion, typically the Akaike Information Criterion (AIC).</p>
<div class="cell" data-layout-align="center">
<details>
<summary>Forward selection</summary>
<div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1"><span class="co" style="color: #5E5E5E;"># Initial empty model</span></span>
<span id="cb13-2">initial_model <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">lm</span>(mpg <span class="sc" style="color: #5E5E5E;">~</span> <span class="dv" style="color: #AD0000;">1</span>, <span class="at" style="color: #657422;">data =</span> mtcars)</span>
<span id="cb13-3"></span>
<span id="cb13-4"><span class="co" style="color: #5E5E5E;"># Full model with all predictors</span></span>
<span id="cb13-5">full_model <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">lm</span>(mpg <span class="sc" style="color: #5E5E5E;">~</span> ., <span class="at" style="color: #657422;">data =</span> mtcars)</span>
<span id="cb13-6"></span>
<span id="cb13-7"><span class="co" style="color: #5E5E5E;"># Stepwise forward selection</span></span>
<span id="cb13-8">forward_model <span class="ot" style="color: #003B4F;">&lt;-</span> MASS<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">stepAIC</span>(</span>
<span id="cb13-9">  initial_model, </span>
<span id="cb13-10">  <span class="at" style="color: #657422;">direction =</span> <span class="st" style="color: #20794D;">"forward"</span>, </span>
<span id="cb13-11">  <span class="at" style="color: #657422;">scope =</span> <span class="fu" style="color: #4758AB;">list</span>(<span class="at" style="color: #657422;">lower =</span> initial_model, <span class="at" style="color: #657422;">upper =</span> full_model)</span>
<span id="cb13-12">)</span>
<span id="cb13-13"></span>
<span id="cb13-14"><span class="co" style="color: #5E5E5E;"># Summary of the forward selection model</span></span>
<span id="cb13-15"><span class="fu" style="color: #4758AB;">summary</span>(forward_model)</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>Start:  AIC=115.94
mpg ~ 1

       Df Sum of Sq     RSS     AIC
+ wt    1    847.73  278.32  73.217
+ cyl   1    817.71  308.33  76.494
+ disp  1    808.89  317.16  77.397
+ hp    1    678.37  447.67  88.427
+ drat  1    522.48  603.57  97.988
+ vs    1    496.53  629.52  99.335
+ am    1    405.15  720.90 103.672
+ carb  1    341.78  784.27 106.369
+ gear  1    259.75  866.30 109.552
+ qsec  1    197.39  928.66 111.776
&lt;none&gt;              1126.05 115.943

Step:  AIC=73.22
mpg ~ wt

       Df Sum of Sq    RSS    AIC
+ cyl   1    87.150 191.17 63.198
+ hp    1    83.274 195.05 63.840
+ qsec  1    82.858 195.46 63.908
+ vs    1    54.228 224.09 68.283
+ carb  1    44.602 233.72 69.628
+ disp  1    31.639 246.68 71.356
&lt;none&gt;              278.32 73.217
+ drat  1     9.081 269.24 74.156
+ gear  1     1.137 277.19 75.086
+ am    1     0.002 278.32 75.217

Step:  AIC=63.2
mpg ~ wt + cyl

       Df Sum of Sq    RSS    AIC
+ hp    1   14.5514 176.62 62.665
+ carb  1   13.7724 177.40 62.805
&lt;none&gt;              191.17 63.198
+ qsec  1   10.5674 180.60 63.378
+ gear  1    3.0281 188.14 64.687
+ disp  1    2.6796 188.49 64.746
+ vs    1    0.7059 190.47 65.080
+ am    1    0.1249 191.05 65.177
+ drat  1    0.0010 191.17 65.198

Step:  AIC=62.66
mpg ~ wt + cyl + hp

       Df Sum of Sq    RSS    AIC
&lt;none&gt;              176.62 62.665
+ am    1    6.6228 170.00 63.442
+ disp  1    6.1762 170.44 63.526
+ carb  1    2.5187 174.10 64.205
+ drat  1    2.2453 174.38 64.255
+ qsec  1    1.4010 175.22 64.410
+ gear  1    0.8558 175.76 64.509
+ vs    1    0.0599 176.56 64.654

Call:
lm(formula = mpg ~ wt + cyl + hp, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.9290 -1.5598 -0.5311  1.1850  5.8986 

Coefficients:
            Estimate Std. Error t value Pr(&gt;|t|)    
(Intercept) 38.75179    1.78686  21.687  &lt; 2e-16 ***
wt          -3.16697    0.74058  -4.276 0.000199 ***
cyl         -0.94162    0.55092  -1.709 0.098480 .  
hp          -0.01804    0.01188  -1.519 0.140015    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.512 on 28 degrees of freedom
Multiple R-squared:  0.8431,    Adjusted R-squared:  0.8263 
F-statistic: 50.17 on 3 and 28 DF,  p-value: 2.184e-11</code></pre>
</div>
</div>
</div>
<div id="tabset-2-2" class="tab-pane" aria-labelledby="tabset-2-2-tab">
<p>Backward selection begins with the full model, removing one predictor at a time to improve the model according to the chosen criterion, typically AIC.</p>
<div class="cell" data-layout-align="center">
<details>
<summary>Backward selection</summary>
<div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1"><span class="co" style="color: #5E5E5E;"># Full model with all predictors</span></span>
<span id="cb15-2">full_model <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">lm</span>(mpg <span class="sc" style="color: #5E5E5E;">~</span> ., <span class="at" style="color: #657422;">data =</span> mtcars)</span>
<span id="cb15-3"></span>
<span id="cb15-4"><span class="co" style="color: #5E5E5E;"># Stepwise backward selection</span></span>
<span id="cb15-5">backward_model <span class="ot" style="color: #003B4F;">&lt;-</span> MASS<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">stepAIC</span>(</span>
<span id="cb15-6">  full_model, </span>
<span id="cb15-7">  <span class="at" style="color: #657422;">direction =</span> <span class="st" style="color: #20794D;">"backward"</span></span>
<span id="cb15-8">)</span>
<span id="cb15-9"></span>
<span id="cb15-10"><span class="co" style="color: #5E5E5E;"># Summary of the backward selection model</span></span>
<span id="cb15-11"><span class="fu" style="color: #4758AB;">summary</span>(backward_model)</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>Start:  AIC=70.9
mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb

       Df Sum of Sq    RSS    AIC
- cyl   1    0.0799 147.57 68.915
- vs    1    0.1601 147.66 68.932
- carb  1    0.4067 147.90 68.986
- gear  1    1.3531 148.85 69.190
- drat  1    1.6270 149.12 69.249
- disp  1    3.9167 151.41 69.736
- hp    1    6.8399 154.33 70.348
- qsec  1    8.8641 156.36 70.765
&lt;none&gt;              147.49 70.898
- am    1   10.5467 158.04 71.108
- wt    1   27.0144 174.51 74.280

Step:  AIC=68.92
mpg ~ disp + hp + drat + wt + qsec + vs + am + gear + carb

       Df Sum of Sq    RSS    AIC
- vs    1    0.2685 147.84 66.973
- carb  1    0.5201 148.09 67.028
- gear  1    1.8211 149.40 67.308
- drat  1    1.9826 149.56 67.342
- disp  1    3.9009 151.47 67.750
- hp    1    7.3632 154.94 68.473
&lt;none&gt;              147.57 68.915
- qsec  1   10.0933 157.67 69.032
- am    1   11.8359 159.41 69.384
- wt    1   27.0280 174.60 72.297

Step:  AIC=66.97
mpg ~ disp + hp + drat + wt + qsec + am + gear + carb

       Df Sum of Sq    RSS    AIC
- carb  1    0.6855 148.53 65.121
- gear  1    2.1437 149.99 65.434
- drat  1    2.2139 150.06 65.449
- disp  1    3.6467 151.49 65.753
- hp    1    7.1060 154.95 66.475
&lt;none&gt;              147.84 66.973
- am    1   11.5694 159.41 67.384
- qsec  1   15.6830 163.53 68.200
- wt    1   27.3799 175.22 70.410

Step:  AIC=65.12
mpg ~ disp + hp + drat + wt + qsec + am + gear

       Df Sum of Sq    RSS    AIC
- gear  1     1.565 150.09 63.457
- drat  1     1.932 150.46 63.535
&lt;none&gt;              148.53 65.121
- disp  1    10.110 158.64 65.229
- am    1    12.323 160.85 65.672
- hp    1    14.826 163.35 66.166
- qsec  1    26.408 174.94 68.358
- wt    1    69.127 217.66 75.350

Step:  AIC=63.46
mpg ~ disp + hp + drat + wt + qsec + am

       Df Sum of Sq    RSS    AIC
- drat  1     3.345 153.44 62.162
- disp  1     8.545 158.64 63.229
&lt;none&gt;              150.09 63.457
- hp    1    13.285 163.38 64.171
- am    1    20.036 170.13 65.466
- qsec  1    25.574 175.67 66.491
- wt    1    67.572 217.66 73.351

Step:  AIC=62.16
mpg ~ disp + hp + wt + qsec + am

       Df Sum of Sq    RSS    AIC
- disp  1     6.629 160.07 61.515
&lt;none&gt;              153.44 62.162
- hp    1    12.572 166.01 62.682
- qsec  1    26.470 179.91 65.255
- am    1    32.198 185.63 66.258
- wt    1    69.043 222.48 72.051

Step:  AIC=61.52
mpg ~ hp + wt + qsec + am

       Df Sum of Sq    RSS    AIC
- hp    1     9.219 169.29 61.307
&lt;none&gt;              160.07 61.515
- qsec  1    20.225 180.29 63.323
- am    1    25.993 186.06 64.331
- wt    1    78.494 238.56 72.284

Step:  AIC=61.31
mpg ~ wt + qsec + am

       Df Sum of Sq    RSS    AIC
&lt;none&gt;              169.29 61.307
- am    1    26.178 195.46 63.908
- qsec  1   109.034 278.32 75.217
- wt    1   183.347 352.63 82.790

Call:
lm(formula = mpg ~ wt + qsec + am, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.4811 -1.5555 -0.7257  1.4110  4.6610 

Coefficients:
            Estimate Std. Error t value Pr(&gt;|t|)    
(Intercept)   9.6178     6.9596   1.382 0.177915    
wt           -3.9165     0.7112  -5.507 6.95e-06 ***
qsec          1.2259     0.2887   4.247 0.000216 ***
am            2.9358     1.4109   2.081 0.046716 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.459 on 28 degrees of freedom
Multiple R-squared:  0.8497,    Adjusted R-squared:  0.8336 
F-statistic: 52.75 on 3 and 28 DF,  p-value: 1.21e-11</code></pre>
</div>
</div>
</div>
</div>
</div>
</section>
<section id="comparing-best-subset-and-stepwise-selection-method" class="level1">
<h1>Comparing best subset and stepwise selection method</h1>
<p>Best subset selection considers all possible combinations of predictors and selects the model with the best performance based on a specified criterion (e.g., AIC, BIC, adjusted R²). While comprehensive, this method can be computationally expensive for datasets with a large number of predictors.</p>
<ul>
<li><strong>Best Subset Selection</strong>: Evaluates every possible model, providing the optimal model based on the criterion. It ensures the best fit but at a high computational cost.</li>
<li><strong>Stepwise Selection (Forward and Backward)</strong>: Evaluates models sequentially. It’s more computationally efficient but may not guarantee the globally optimal model.</li>
</ul>
<p>Stepwise selection methods, whether forward or backward, provide a practical approach to model selection by adding or removing predictors based on defined criteria. While not as exhaustive as best subset selection, they offer a balance between computational efficiency and model performance. By understanding and applying these methods within the <code>mtcars</code> dataset, we can navigate model selection systematically and efficiently.</p>
<section id="glossary" class="level2">
<h2 class="anchored" data-anchor-id="glossary">Glossary</h2>
<ol type="1">
<li><p><strong><a href="https://en.wikipedia.org/wiki/Coefficient_of_determination">R-squared (R²)</a></strong>: A statistical measure representing the proportion of the variance for the dependent variable that’s explained by the independent variables in the model. R² values range from 0 to 1, with higher values indicating better model performance.</p></li>
<li><p><strong><a href="https://en.wikipedia.org/wiki/Coefficient_of_determination#Adjusted_R2">Adjusted R-squared (R² adjusted)</a></strong>: Adjusted R² modifies R² to account for the number of predictors in the model. It provides a more accurate measure when comparing models with a different number of predictors, as it penalizes the addition of non-informative predictors.</p></li>
<li><p><strong><a href="https://en.wikipedia.org/wiki/Akaike_information_criterion">Akaike Information Criterion (AIC)</a></strong>: AIC is an estimator of the relative quality of statistical models for a given set of data. It balances the goodness of fit of the model with the number of predictors, penalizing more complex models to avoid overfitting. Lower AIC values indicate better models.</p></li>
<li><p><strong><a href="https://en.wikipedia.org/wiki/Bayesian_information_criterion">Bayesian Information Criterion (BIC)</a></strong>: Similar to AIC, BIC also evaluates model quality but imposes a more substantial penalty for the number of predictors. BIC is used to discourage overfitting, and lower BIC values indicate a better model.</p></li>
<li><p><strong><a href="https://en.wikipedia.org/wiki/Mallows%27_Cp">Mallow’s CP</a></strong>: Mallow’s CP criterion assesses the trade-off between model complexity and goodness of fit. Lower values of CP are desired, with CP values close to the number of predictors plus one indicating well-fitted models.</p></li>
</ol>


</section>
</section>

 ]]></description>
  <guid>https://mathatistics.com/blog/posts/2017-03-05-model-assessment-and-variable-selection-prodecure/index.html</guid>
  <pubDate>Sat, 04 Mar 2017 23:00:00 GMT</pubDate>
</item>
<item>
  <title>Importing and Exporting data in R</title>
  <dc:creator>TheRimalaya</dc:creator>
  <link>https://mathatistics.com/blog/posts/2017-03-01-importing-and-exporting-data-in-R/index.html</link>
  <description><![CDATA[ 



<p>Importing and loading data is a crucial skill in data analysis. R offers various methods to handle different data formats efficiently. This article covers the essential techniques for importing data into R, using datasets from different packages, and explores open data sources and useful resources.</p>
<section id="different-formats-of-data" class="level2">
<h2 class="anchored" data-anchor-id="different-formats-of-data">Different Formats of Data</h2>
<p>Data comes in various formats, each suited for different scenarios:</p>
<ul>
<li><strong>CSV (Comma Separated Values):</strong> Simple text files where values are separated by commas.</li>
<li><strong>Excel:</strong> Commonly used .xls and .xlsx files.</li>
<li><strong>SQL Databases:</strong> Structured data stored in tables within relational databases.</li>
<li><strong>JSON (JavaScript Object Notation):</strong> Lightweight data interchange format.</li>
<li><strong>HTML:</strong> Webpage data.</li>
<li><strong>SPSS, SAS, Stata:</strong> Formats used by specialized statistical software.</li>
<li><strong>RData and RDS:</strong> Native R formats for storing R objects.</li>
</ul>
</section>
<section id="data-available-in-different-packages" class="level2">
<h2 class="anchored" data-anchor-id="data-available-in-different-packages">Data Available in Different Packages</h2>
<p>R comes with a plethora of packages that include built-in datasets, perfect for learning and practice:</p>
<div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-1-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-1" aria-controls="tabset-1-1" aria-selected="true">datasets</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-2" aria-controls="tabset-1-2" aria-selected="false">ggplot2</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-3-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-3" aria-controls="tabset-1-3" aria-selected="false">dplyr</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-4-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-4" aria-controls="tabset-1-4" aria-selected="false">MASS</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-5-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-5" aria-controls="tabset-1-5" aria-selected="false">carData</a></li></ul>
<div class="tab-content">
<div id="tabset-1-1" class="tab-pane active" aria-labelledby="tabset-1-1-tab">
<p>Includes classic datasets like Iris, mtcars, and airquality.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;">library</span>(datasets)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;">data</span>(iris)</span>
<span id="cb1-3"><span class="fu" style="color: #4758AB;">str</span>(iris)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>'data.frame':   150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...</code></pre>
</div>
</div>
</div>
<div id="tabset-1-2" class="tab-pane" aria-labelledby="tabset-1-2-tab">
<p>Contains datasets such as <code>mpg</code> for practicing data visualization.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="fu" style="color: #4758AB;">library</span>(ggplot2)</span>
<span id="cb3-2"><span class="fu" style="color: #4758AB;">data</span>(mpg)</span>
<span id="cb3-3"><span class="fu" style="color: #4758AB;">str</span>(mpg)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>tibble [234 × 11] (S3: tbl_df/tbl/data.frame)
 $ manufacturer: chr [1:234] "audi" "audi" "audi" "audi" ...
 $ model       : chr [1:234] "a4" "a4" "a4" "a4" ...
 $ displ       : num [1:234] 1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
 $ year        : int [1:234] 1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
 $ cyl         : int [1:234] 4 4 4 4 6 6 6 4 4 4 ...
 $ trans       : chr [1:234] "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...
 $ drv         : chr [1:234] "f" "f" "f" "f" ...
 $ cty         : int [1:234] 18 21 20 21 16 18 18 18 16 20 ...
 $ hwy         : int [1:234] 29 29 31 30 26 26 27 26 25 28 ...
 $ fl          : chr [1:234] "p" "p" "p" "p" ...
 $ class       : chr [1:234] "compact" "compact" "compact" "compact" ...</code></pre>
</div>
</div>
</div>
<div id="tabset-1-3" class="tab-pane" aria-labelledby="tabset-1-3-tab">
<p>Provides the <code>starwars</code> and <code>storms</code> datasets, useful for demonstrating data manipulation techniques.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1"><span class="fu" style="color: #4758AB;">library</span>(dplyr)</span>
<span id="cb5-2"><span class="fu" style="color: #4758AB;">data</span>(starwars)</span>
<span id="cb5-3"><span class="fu" style="color: #4758AB;">str</span>(starwars)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>tibble [87 × 14] (S3: tbl_df/tbl/data.frame)
 $ name      : chr [1:87] "Luke Skywalker" "C-3PO" "R2-D2" "Darth Vader" ...
 $ height    : int [1:87] 172 167 96 202 150 178 165 97 183 182 ...
 $ mass      : num [1:87] 77 75 32 136 49 120 75 32 84 77 ...
 $ hair_color: chr [1:87] "blond" NA NA "none" ...
 $ skin_color: chr [1:87] "fair" "gold" "white, blue" "white" ...
 $ eye_color : chr [1:87] "blue" "yellow" "red" "yellow" ...
 $ birth_year: num [1:87] 19 112 33 41.9 19 52 47 NA 24 57 ...
 $ sex       : chr [1:87] "male" "none" "none" "male" ...
 $ gender    : chr [1:87] "masculine" "masculine" "masculine" "masculine" ...
 $ homeworld : chr [1:87] "Tatooine" "Tatooine" "Naboo" "Tatooine" ...
 $ species   : chr [1:87] "Human" "Droid" "Droid" "Human" ...
 $ films     :List of 87
  ..$ : chr [1:5] "A New Hope" "The Empire Strikes Back" "Return of the Jedi" "Revenge of the Sith" ...
  ..$ : chr [1:6] "A New Hope" "The Empire Strikes Back" "Return of the Jedi" "The Phantom Menace" ...
  ..$ : chr [1:7] "A New Hope" "The Empire Strikes Back" "Return of the Jedi" "The Phantom Menace" ...
  ..$ : chr [1:4] "A New Hope" "The Empire Strikes Back" "Return of the Jedi" "Revenge of the Sith"
  ..$ : chr [1:5] "A New Hope" "The Empire Strikes Back" "Return of the Jedi" "Revenge of the Sith" ...
  ..$ : chr [1:3] "A New Hope" "Attack of the Clones" "Revenge of the Sith"
  ..$ : chr [1:3] "A New Hope" "Attack of the Clones" "Revenge of the Sith"
  ..$ : chr "A New Hope"
  ..$ : chr "A New Hope"
  ..$ : chr [1:6] "A New Hope" "The Empire Strikes Back" "Return of the Jedi" "The Phantom Menace" ...
  ..$ : chr [1:3] "The Phantom Menace" "Attack of the Clones" "Revenge of the Sith"
  ..$ : chr [1:2] "A New Hope" "Revenge of the Sith"
  ..$ : chr [1:5] "A New Hope" "The Empire Strikes Back" "Return of the Jedi" "Revenge of the Sith" ...
  ..$ : chr [1:4] "A New Hope" "The Empire Strikes Back" "Return of the Jedi" "The Force Awakens"
  ..$ : chr "A New Hope"
  ..$ : chr [1:3] "A New Hope" "Return of the Jedi" "The Phantom Menace"
  ..$ : chr [1:3] "A New Hope" "The Empire Strikes Back" "Return of the Jedi"
  ..$ : chr "A New Hope"
  ..$ : chr [1:5] "The Empire Strikes Back" "Return of the Jedi" "The Phantom Menace" "Attack of the Clones" ...
  ..$ : chr [1:5] "The Empire Strikes Back" "Return of the Jedi" "The Phantom Menace" "Attack of the Clones" ...
  ..$ : chr [1:3] "The Empire Strikes Back" "Return of the Jedi" "Attack of the Clones"
  ..$ : chr "The Empire Strikes Back"
  ..$ : chr "The Empire Strikes Back"
  ..$ : chr [1:2] "The Empire Strikes Back" "Return of the Jedi"
  ..$ : chr "The Empire Strikes Back"
  ..$ : chr [1:2] "Return of the Jedi" "The Force Awakens"
  ..$ : chr "Return of the Jedi"
  ..$ : chr "Return of the Jedi"
  ..$ : chr "Return of the Jedi"
  ..$ : chr "Return of the Jedi"
  ..$ : chr "The Phantom Menace"
  ..$ : chr [1:3] "The Phantom Menace" "Attack of the Clones" "Revenge of the Sith"
  ..$ : chr "The Phantom Menace"
  ..$ : chr [1:3] "The Phantom Menace" "Attack of the Clones" "Revenge of the Sith"
  ..$ : chr [1:2] "The Phantom Menace" "Attack of the Clones"
  ..$ : chr "The Phantom Menace"
  ..$ : chr "The Phantom Menace"
  ..$ : chr "The Phantom Menace"
  ..$ : chr [1:2] "The Phantom Menace" "Attack of the Clones"
  ..$ : chr "The Phantom Menace"
  ..$ : chr "The Phantom Menace"
  ..$ : chr [1:2] "The Phantom Menace" "Attack of the Clones"
  ..$ : chr "The Phantom Menace"
  ..$ : chr "Return of the Jedi"
  ..$ : chr [1:3] "The Phantom Menace" "Attack of the Clones" "Revenge of the Sith"
  ..$ : chr "The Phantom Menace"
  ..$ : chr "The Phantom Menace"
  ..$ : chr "The Phantom Menace"
  ..$ : chr "The Phantom Menace"
  ..$ : chr [1:3] "The Phantom Menace" "Attack of the Clones" "Revenge of the Sith"
  ..$ : chr [1:3] "The Phantom Menace" "Attack of the Clones" "Revenge of the Sith"
  ..$ : chr [1:3] "The Phantom Menace" "Attack of the Clones" "Revenge of the Sith"
  ..$ : chr [1:2] "The Phantom Menace" "Revenge of the Sith"
  ..$ : chr [1:2] "The Phantom Menace" "Revenge of the Sith"
  ..$ : chr [1:2] "The Phantom Menace" "Revenge of the Sith"
  ..$ : chr "The Phantom Menace"
  ..$ : chr [1:3] "The Phantom Menace" "Attack of the Clones" "Revenge of the Sith"
  ..$ : chr [1:2] "The Phantom Menace" "Attack of the Clones"
  ..$ : chr "Attack of the Clones"
  ..$ : chr "Attack of the Clones"
  ..$ : chr "Attack of the Clones"
  ..$ : chr [1:2] "Attack of the Clones" "Revenge of the Sith"
  ..$ : chr [1:2] "Attack of the Clones" "Revenge of the Sith"
  ..$ : chr "Attack of the Clones"
  ..$ : chr "Attack of the Clones"
  ..$ : chr [1:2] "Attack of the Clones" "Revenge of the Sith"
  ..$ : chr [1:2] "Attack of the Clones" "Revenge of the Sith"
  ..$ : chr "Attack of the Clones"
  ..$ : chr "Attack of the Clones"
  ..$ : chr "Attack of the Clones"
  ..$ : chr "Attack of the Clones"
  ..$ : chr "Attack of the Clones"
  ..$ : chr "Attack of the Clones"
  ..$ : chr [1:2] "Attack of the Clones" "Revenge of the Sith"
  ..$ : chr "Attack of the Clones"
  ..$ : chr "Attack of the Clones"
  ..$ : chr [1:2] "Attack of the Clones" "Revenge of the Sith"
  ..$ : chr "Revenge of the Sith"
  ..$ : chr "Revenge of the Sith"
  ..$ : chr [1:2] "A New Hope" "Revenge of the Sith"
  ..$ : chr [1:2] "Attack of the Clones" "Revenge of the Sith"
  ..$ : chr "Revenge of the Sith"
  ..$ : chr "The Force Awakens"
  ..$ : chr "The Force Awakens"
  ..$ : chr "The Force Awakens"
  ..$ : chr "The Force Awakens"
  ..$ : chr "The Force Awakens"
 $ vehicles  :List of 87
  ..$ : chr [1:2] "Snowspeeder" "Imperial Speeder Bike"
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr "Imperial Speeder Bike"
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr "Tribubble bongo"
  ..$ : chr [1:2] "Zephyr-G swoop bike" "XJ-6 airspeeder"
  ..$ : chr(0) 
  ..$ : chr "AT-ST"
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr "Snowspeeder"
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr "Tribubble bongo"
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr "Sith speeder"
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr "Flitknot speeder"
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr "Koro-2 Exodrive airspeeder"
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr "Tsmeu-6 personal wheel bike"
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
 $ starships :List of 87
  ..$ : chr [1:2] "X-wing" "Imperial shuttle"
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr "TIE Advanced x1"
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr "X-wing"
  ..$ : chr [1:5] "Jedi starfighter" "Trade Federation cruiser" "Naboo star skiff" "Jedi Interceptor" ...
  ..$ : chr [1:3] "Naboo fighter" "Trade Federation cruiser" "Jedi Interceptor"
  ..$ : chr(0) 
  ..$ : chr [1:2] "Millennium Falcon" "Imperial shuttle"
  ..$ : chr [1:2] "Millennium Falcon" "Imperial shuttle"
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr "X-wing"
  ..$ : chr "X-wing"
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr "Slave 1"
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr "Millennium Falcon"
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr "A-wing"
  ..$ : chr(0) 
  ..$ : chr "Millennium Falcon"
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr [1:3] "Naboo fighter" "H-type Nubian yacht" "Naboo star skiff"
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr "Naboo Royal Starship"
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr "Scimitar"
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr "Jedi starfighter"
  ..$ : chr(0) 
  ..$ : chr "Naboo fighter"
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr "Belbullab-22 starfighter"
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr(0) 
  ..$ : chr "X-wing"
  ..$ : chr(0) 
  ..$ : chr(0) </code></pre>
</div>
</div>
</div>
<div id="tabset-1-4" class="tab-pane" aria-labelledby="tabset-1-4-tab">
<p>Offers datasets for applied statistics, including the Boston housing data.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><span class="fu" style="color: #4758AB;">library</span>(MASS)</span>
<span id="cb7-2"><span class="fu" style="color: #4758AB;">data</span>(Boston)</span>
<span id="cb7-3"><span class="fu" style="color: #4758AB;">str</span>(Boston)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>'data.frame':   506 obs. of  14 variables:
 $ crim   : num  0.00632 0.02731 0.02729 0.03237 0.06905 ...
 $ zn     : num  18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
 $ indus  : num  2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ...
 $ chas   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ nox    : num  0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ...
 $ rm     : num  6.58 6.42 7.18 7 7.15 ...
 $ age    : num  65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...
 $ dis    : num  4.09 4.97 4.97 6.06 6.06 ...
 $ rad    : int  1 2 2 3 3 3 5 5 5 5 ...
 $ tax    : num  296 242 242 222 222 222 311 311 311 311 ...
 $ ptratio: num  15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ...
 $ black  : num  397 397 393 395 397 ...
 $ lstat  : num  4.98 9.14 4.03 2.94 5.33 ...
 $ medv   : num  24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...</code></pre>
</div>
</div>
</div>
<div id="tabset-1-5" class="tab-pane" aria-labelledby="tabset-1-5-tab">
<p>Contains datasets ideal for regression, ANOVA, and generalized linear models.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1"><span class="fu" style="color: #4758AB;">library</span>(carData)</span>
<span id="cb9-2"><span class="fu" style="color: #4758AB;">data</span>(MplsStops)</span>
<span id="cb9-3"><span class="fu" style="color: #4758AB;">str</span>(MplsStops)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>'data.frame':   51920 obs. of  14 variables:
 $ idNum         : Factor w/ 61212 levels "16-395258","16-395296",..: 6823 6824 6825 6826 6827 6828 6829 6830 6831 6832 ...
 $ date          : POSIXct, format: "2017-01-01 00:00:42" "2017-01-01 00:03:07" ...
 $ problem       : Factor w/ 2 levels "suspicious","traffic": 1 1 2 1 2 2 1 2 2 2 ...
 $ MDC           : Factor w/ 2 levels "MDC","other": 1 1 1 1 1 1 1 1 1 1 ...
 $ citationIssued: Factor w/ 2 levels "NO","YES": NA NA NA NA NA NA NA NA NA NA ...
 $ personSearch  : Factor w/ 2 levels "NO","YES": 1 1 1 1 1 1 1 1 1 1 ...
 $ vehicleSearch : Factor w/ 2 levels "NO","YES": 1 1 1 1 1 1 1 1 1 1 ...
 $ preRace       : Factor w/ 8 levels "Black","White",..: 3 3 3 3 3 3 3 3 3 3 ...
 $ race          : Factor w/ 8 levels "Black","White",..: 3 3 2 4 2 4 1 7 2 1 ...
 $ gender        : Factor w/ 3 levels "Female","Male",..: 3 2 1 2 1 2 2 1 2 2 ...
 $ lat           : num  45 45 44.9 44.9 45 ...
 $ long          : num  -93.2 -93.3 -93.3 -93.3 -93.3 ...
 $ policePrecinct: int  1 1 5 5 1 1 1 2 2 4 ...
 $ neighborhood  : Factor w/ 87 levels "Armatage","Audubon Park",..: 11 20 84 84 20 20 20 51 59 28 ...</code></pre>
</div>
</div>
</div>
</div>
</div>
</section>
<section id="importing-different-data-formats-into-r" class="level2">
<h2 class="anchored" data-anchor-id="importing-different-data-formats-into-r">Importing Different Data Formats into R</h2>
<p>Before diving into examples, let’s introduce some useful packages for data import:</p>
<ul>
<li><strong>haven:</strong> For SPSS, SAS, and Stata files.</li>
<li><strong>data.table:</strong> Efficient data manipulation and import.</li>
<li><strong>readxl:</strong> For reading Excel files.</li>
<li><strong>jsonlite:</strong> For importing JSON files.</li>
</ul>
<p>Here is how you can import different data formats into R:</p>
<div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-2-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-1" aria-controls="tabset-2-1" aria-selected="true">CSV</a></li><li class="nav-item"><a class="nav-link" id="tabset-2-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-2" aria-controls="tabset-2-2" aria-selected="false">Excel</a></li><li class="nav-item"><a class="nav-link" id="tabset-2-3-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-3" aria-controls="tabset-2-3" aria-selected="false">SQL</a></li><li class="nav-item"><a class="nav-link" id="tabset-2-4-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-4" aria-controls="tabset-2-4" aria-selected="false">JSON</a></li></ul>
<div class="tab-content">
<div id="tabset-2-1" class="tab-pane active" aria-labelledby="tabset-2-1-tab">
<div class="cell">
<div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1">data <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">read.csv</span>(<span class="st" style="color: #20794D;">"data/airtravel.csv"</span>)</span>
<span id="cb11-2"><span class="fu" style="color: #4758AB;">head</span>(data)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>  Month X1958 X1959 X1960
1   JAN   340   360   417
2   FEB   318   342   391
3   MAR   362   406   419
4   APR   348   396   461
5   MAY   363   420   472
6   JUN   435   472   535</code></pre>
</div>
</div>
</div>
<div id="tabset-2-2" class="tab-pane" aria-labelledby="tabset-2-2-tab">
<div class="cell">
<div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1"><span class="fu" style="color: #4758AB;">library</span>(readxl)</span>
<span id="cb13-2">data <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">read_excel</span>(<span class="st" style="color: #20794D;">"data/Melanoma.xlsx"</span>)</span>
<span id="cb13-3"><span class="fu" style="color: #4758AB;">head</span>(data)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 6 × 7
   time status   sex   age  year thickness ulcer
  &lt;dbl&gt;  &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;     &lt;dbl&gt; &lt;dbl&gt;
1    10      3     1    76  1972      6.76     1
2    30      3     1    56  1968      0.65     0
3    35      2     1    41  1977      1.34     0
4    99      3     0    71  1968      2.9      0
5   185      1     1    52  1965     12.1      1
6   204      1     1    28  1971      4.84     1</code></pre>
</div>
</div>
</div>
<div id="tabset-2-3" class="tab-pane" aria-labelledby="tabset-2-3-tab">
<div class="cell">
<div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1"><span class="fu" style="color: #4758AB;">library</span>(DBI)</span>
<span id="cb15-2">con <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">dbConnect</span>(RSQLite<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">SQLite</span>(), <span class="st" style="color: #20794D;">"data/medal.db"</span>)</span>
<span id="cb15-3">data <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">dbGetQuery</span>(con, <span class="st" style="color: #20794D;">"SELECT * FROM Olympic2024"</span>)</span>
<span id="cb15-4"><span class="fu" style="color: #4758AB;">head</span>(data)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>               Team              Olympic No Gold Silver Bronze Total
1 Afghanistan&nbsp;(AFG)       Combined total 16    0      0      2     2
2 Afghanistan&nbsp;(AFG) Summer Olympic Games 16    0      0      2     2
3 Afghanistan&nbsp;(AFG) Winter Olympic Games  0    0      0      0     0
4     Albania&nbsp;(ALB)       Combined total 15    0      0      2     2
5     Albania&nbsp;(ALB) Summer Olympic Games 10    0      0      2     2
6     Albania&nbsp;(ALB) Winter Olympic Games  5    0      0      0     0</code></pre>
</div>
</div>
</div>
<div id="tabset-2-4" class="tab-pane" aria-labelledby="tabset-2-4-tab">
<div class="cell">
<div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb17-1"><span class="fu" style="color: #4758AB;">library</span>(jsonlite)</span>
<span id="cb17-2">data <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">fromJSON</span>(<span class="st" style="color: #20794D;">"data/sdg-goals.json"</span>)</span>
<span id="cb17-3"><span class="fu" style="color: #4758AB;">str</span>(data, <span class="at" style="color: #657422;">list.len =</span> <span class="dv" style="color: #AD0000;">2</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>'data.frame':   17 obs. of  6 variables:
 $ goal     :List of 17
  ..$ : int 1
  ..$ : int 2
  .. [list output truncated]
 $ title    :List of 17
  ..$ : chr "End poverty in all its forms everywhere"
  ..$ : chr "End hunger, achieve food security and improved nutrition and promote sustainable agriculture"
  .. [list output truncated]
  [list output truncated]</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-3-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-3-1" aria-controls="tabset-3-1" aria-selected="true">HTML</a></li><li class="nav-item"><a class="nav-link" id="tabset-3-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-3-2" aria-controls="tabset-3-2" aria-selected="false">RData and RDS</a></li><li class="nav-item"><a class="nav-link" id="tabset-3-3-tab" data-bs-toggle="tab" data-bs-target="#tabset-3-3" aria-controls="tabset-3-3" aria-selected="false">SPSS</a></li><li class="nav-item"><a class="nav-link" id="tabset-3-4-tab" data-bs-toggle="tab" data-bs-target="#tabset-3-4" aria-controls="tabset-3-4" aria-selected="false">Stata</a></li><li class="nav-item"><a class="nav-link" id="tabset-3-5-tab" data-bs-toggle="tab" data-bs-target="#tabset-3-5" aria-controls="tabset-3-5" aria-selected="false">SAS</a></li></ul>
<div class="tab-content">
<div id="tabset-3-1" class="tab-pane active" aria-labelledby="tabset-3-1-tab">
<div class="cell">
<div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb19-1"><span class="fu" style="color: #4758AB;">library</span>(rvest)</span>
<span id="cb19-2"><span class="co" style="color: #5E5E5E;"># url &lt;- "https://en.wikipedia.org/wiki/Booker_Prize"</span></span>
<span id="cb19-3">url <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="st" style="color: #20794D;">"data/Booker-Prize.html"</span></span>
<span id="cb19-4">data <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">read_html</span>(url) <span class="sc" style="color: #5E5E5E;">%&gt;%</span></span>
<span id="cb19-5">    <span class="fu" style="color: #4758AB;">html_node</span>(<span class="st" style="color: #20794D;">".wikitable"</span>) <span class="sc" style="color: #5E5E5E;">%&gt;%</span></span>
<span id="cb19-6">    <span class="fu" style="color: #4758AB;">html_table</span>()</span>
<span id="cb19-7"><span class="fu" style="color: #4758AB;">head</span>(data)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 6 × 5
   Year Author              Title                   `Genre(s)`       hideCountry
  &lt;int&gt; &lt;chr&gt;               &lt;chr&gt;                   &lt;chr&gt;            &lt;chr&gt;      
1  1969 P. H. Newby[62]     Something to Answer For Literary fiction UK         
2  1970 Bernice Rubens[63]  The Elected Member      Literary fiction UK         
3  1971 V. S. Naipaul[64]   In a Free State         Literary fiction UK&nbsp;TTO     
4  1972 John Berger[65]     G.                      Experimental li… UK         
5  1973 J. G. Farrell[66]   The Siege of Krishnapur Literary fiction UK&nbsp;IRL     
6  1974 Nadine Gordimer[67] The Conservationist     Literary fiction ZAF        </code></pre>
</div>
</div>
</div>
<div id="tabset-3-2" class="tab-pane" aria-labelledby="tabset-3-2-tab">
<p>We can load RData file with <code>load</code> function and Rds file using <code>readRDS</code> function. RData file can contain multiple R objects and when loaded we can find the objects saved in RData file in R environment.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb21-1"><span class="fu" style="color: #4758AB;">load</span>(<span class="st" style="color: #20794D;">"data/cancer.rds"</span>)</span>
<span id="cb21-2"><span class="fu" style="color: #4758AB;">head</span>(cancer)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>  inst time status age sex ph.ecog ph.karno pat.karno meal.cal wt.loss
1    3  306      2  74   1       1       90       100     1175      NA
2    3  455      2  68   1       0       90        90     1225      15
3    3 1010      1  56   1       0       90        90       NA      15
4    5  210      2  57   1       1       90        60     1150      11
5    1  883      2  60   1       0      100        90       NA       0
6   12 1022      1  74   1       1       50        80      513       0</code></pre>
</div>
<div class="sourceCode cell-code" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb23-1">popmort <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">readRDS</span>(<span class="st" style="color: #20794D;">"data/popmort.rds"</span>)</span>
<span id="cb23-2"><span class="fu" style="color: #4758AB;">str</span>(popmort)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code> 'ratetable' num [1:110, 1:2, 1:81] 1.47e-04 1.52e-05 7.92e-06 5.51e-06 4.44e-06 ...
 - attr(*, "dimnames")=List of 3
  ..$ age : chr [1:110] "0" "1" "2" "3" ...
  ..$ sex : chr [1:2] "male" "female"
  ..$ year: chr [1:81] "1940" "1941" "1942" "1943" ...
 - attr(*, "type")= num [1:3] 2 1 4
 - attr(*, "cutpoints")=List of 3
  ..$ : num [1:110] 0 365 730 1096 1461 ...
  ..$ : NULL
  ..$ : Date[1:81], format: "1940-01-01" "1941-01-01" ...
 - attr(*, "summary")=function (R)  
  ..- attr(*, "srcref")= 'srcref' int [1:8] 7 13 15 3 13 3 7 15
  .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' &lt;environment: 0x5d4b1a8a8440&gt; </code></pre>
</div>
</div>
</div>
<div id="tabset-3-3" class="tab-pane" aria-labelledby="tabset-3-3-tab">
<div class="cell">
<div class="sourceCode cell-code" id="cb25" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb25-1"><span class="fu" style="color: #4758AB;">library</span>(haven)</span>
<span id="cb25-2">spss_file <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">read_sav</span>(<span class="st" style="color: #20794D;">"data/melanoma.sav"</span>)</span>
<span id="cb25-3"><span class="fu" style="color: #4758AB;">head</span>(spss_file)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 6 × 14
  sex          age stage     mmdx  yydx surv_mm surv_yy status  subsite year8594
  &lt;dbl+lbl&gt;  &lt;dbl&gt; &lt;dbl+lb&gt; &lt;dbl&gt; &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt; &lt;dbl+l&gt; &lt;dbl+l&gt; &lt;dbl+lb&gt;
1 2 [Female]    81 1 [Loca…     2  1981    26.5     2.5 2 [Dea… 1 [Hea… 0 [Diag…
2 2 [Female]    75 1 [Loca…     9  1975    55.5     4.5 2 [Dea… 1 [Hea… 0 [Diag…
3 2 [Female]    78 1 [Loca…     2  1978   178.     14.5 2 [Dea… 3 [Lim… 0 [Diag…
4 2 [Female]    75 0 [Unkn…     8  1975    29.5     2.5 1 [Dea… 4 [Mul… 0 [Diag…
5 2 [Female]    81 0 [Unkn…     7  1981    57.5     4.5 2 [Dea… 1 [Hea… 0 [Diag…
6 2 [Female]    75 1 [Loca…     9  1975    19.5     1.5 1 [Dea… 2 [Tru… 0 [Diag…
# ℹ 4 more variables: dx &lt;date&gt;, exit &lt;date&gt;, agegrp &lt;dbl+lbl&gt;, id &lt;dbl&gt;</code></pre>
</div>
</div>
</div>
<div id="tabset-3-4" class="tab-pane" aria-labelledby="tabset-3-4-tab">
<div class="cell">
<div class="sourceCode cell-code" id="cb27" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb27-1">stata_file <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">read_dta</span>(<span class="st" style="color: #20794D;">"data/colon.dta"</span>)</span>
<span id="cb27-2"><span class="fu" style="color: #4758AB;">head</span>(stata_file)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 6 × 14
  sex          age stage     mmdx  yydx surv_mm surv_yy status  subsite year8594
  &lt;dbl+lbl&gt;  &lt;dbl&gt; &lt;dbl+lb&gt; &lt;dbl&gt; &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt; &lt;dbl+l&gt; &lt;dbl+l&gt; &lt;dbl+lb&gt;
1 2 [Female]    77 3 [Dist…     3  1977    16.5     1.5 1 [Dea… 2 [Tra… 0 [Diag…
2 2 [Female]    78 1 [Loca…     7  1978    82.5     6.5 2 [Dea… 1 [Coe… 0 [Diag…
3 1 [Male]      78 3 [Dist…    10  1978     1.5     0.5 1 [Dea… 3 [Des… 0 [Diag…
4 1 [Male]      76 3 [Dist…    10  1976     1.5     0.5 1 [Dea… 3 [Des… 0 [Diag…
5 1 [Male]      80 1 [Loca…    12  1980     8.5     0.5 1 [Dea… 3 [Des… 0 [Diag…
6 2 [Female]    75 1 [Loca…    11  1975    23.5     1.5 1 [Dea… 1 [Coe… 0 [Diag…
# ℹ 4 more variables: agegrp &lt;dbl+lbl&gt;, dx &lt;date&gt;, exit &lt;date&gt;, id &lt;dbl&gt;</code></pre>
</div>
</div>
</div>
<div id="tabset-3-5" class="tab-pane" aria-labelledby="tabset-3-5-tab">
<div class="cell">
<div class="sourceCode cell-code" id="cb29" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb29-1">sas_file <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">read_xpt</span>(<span class="st" style="color: #20794D;">"data/mpg.xpt"</span>)</span>
<span id="cb29-2"><span class="fu" style="color: #4758AB;">head</span>(sas_file)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 6 × 11
  manufacturer model displ  year   cyl trans      drv     cty   hwy fl    class 
  &lt;chr&gt;        &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt;      &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; 
1 audi         a4      1.8  1999     4 auto(l5)   f        18    29 p     compa…
2 audi         a4      1.8  1999     4 manual(m5) f        21    29 p     compa…
3 audi         a4      2    2008     4 manual(m6) f        20    31 p     compa…
4 audi         a4      2    2008     4 auto(av)   f        21    30 p     compa…
5 audi         a4      2.8  1999     6 auto(l5)   f        16    26 p     compa…
6 audi         a4      2.8  1999     6 manual(m5) f        18    26 p     compa…</code></pre>
</div>
</div>
</div>
</div>
</div>
</section>
<section id="exporting-data-to-different-formats-in-r" class="level2">
<h2 class="anchored" data-anchor-id="exporting-data-to-different-formats-in-r">Exporting Data to Different Formats in R</h2>
<p>R also allows you to export data to various formats:</p>
<div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-4-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-4-1" aria-controls="tabset-4-1" aria-selected="true">CSV</a></li><li class="nav-item"><a class="nav-link" id="tabset-4-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-4-2" aria-controls="tabset-4-2" aria-selected="false">Excel</a></li><li class="nav-item"><a class="nav-link" id="tabset-4-3-tab" data-bs-toggle="tab" data-bs-target="#tabset-4-3" aria-controls="tabset-4-3" aria-selected="false">SQL</a></li><li class="nav-item"><a class="nav-link" id="tabset-4-4-tab" data-bs-toggle="tab" data-bs-target="#tabset-4-4" aria-controls="tabset-4-4" aria-selected="false">JSON</a></li><li class="nav-item"><a class="nav-link" id="tabset-4-5-tab" data-bs-toggle="tab" data-bs-target="#tabset-4-5" aria-controls="tabset-4-5" aria-selected="false">HTML</a></li></ul>
<div class="tab-content">
<div id="tabset-4-1" class="tab-pane active" aria-labelledby="tabset-4-1-tab">
<div class="sourceCode" id="cb31" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb31-1"><span class="fu" style="color: #4758AB;">write.csv</span>(data, <span class="st" style="color: #20794D;">"data.csv"</span>)</span></code></pre></div>
</div>
<div id="tabset-4-2" class="tab-pane" aria-labelledby="tabset-4-2-tab">
<div class="sourceCode" id="cb32" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb32-1"><span class="fu" style="color: #4758AB;">library</span>(writexl)</span>
<span id="cb32-2"><span class="fu" style="color: #4758AB;">write_xlsx</span>(data, <span class="st" style="color: #20794D;">"data.xlsx"</span>)</span></code></pre></div>
</div>
<div id="tabset-4-3" class="tab-pane" aria-labelledby="tabset-4-3-tab">
<div class="sourceCode" id="cb33" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb33-1"><span class="fu" style="color: #4758AB;">library</span>(DBI)</span>
<span id="cb33-2">con <span class="ot" style="color: #003B4F;">&lt;-</span> <span class="fu" style="color: #4758AB;">dbConnect</span>(RSQLite<span class="sc" style="color: #5E5E5E;">::</span><span class="fu" style="color: #4758AB;">SQLite</span>(), <span class="st" style="color: #20794D;">"example.db"</span>)</span>
<span id="cb33-3"><span class="fu" style="color: #4758AB;">dbWriteTable</span>(con, <span class="st" style="color: #20794D;">"tablename"</span>, data)</span></code></pre></div>
</div>
<div id="tabset-4-4" class="tab-pane" aria-labelledby="tabset-4-4-tab">
<div class="sourceCode" id="cb34" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb34-1"><span class="fu" style="color: #4758AB;">library</span>(jsonlite)</span>
<span id="cb34-2"><span class="fu" style="color: #4758AB;">write_json</span>(data, <span class="st" style="color: #20794D;">"data.json"</span>)</span></code></pre></div>
</div>
<div id="tabset-4-5" class="tab-pane" aria-labelledby="tabset-4-5-tab">
<div class="sourceCode" id="cb35" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb35-1"><span class="fu" style="color: #4758AB;">library</span>(xml2)</span>
<span id="cb35-2"><span class="fu" style="color: #4758AB;">write_html</span>(<span class="fu" style="color: #4758AB;">as_xml_document</span>(<span class="fu" style="color: #4758AB;">as.character</span>(data)), <span class="st" style="color: #20794D;">"data.html"</span>)</span></code></pre></div>
</div>
</div>
</div>
<div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-5-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-5-1" aria-controls="tabset-5-1" aria-selected="true">RData and RDS</a></li><li class="nav-item"><a class="nav-link" id="tabset-5-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-5-2" aria-controls="tabset-5-2" aria-selected="false">SPSS</a></li><li class="nav-item"><a class="nav-link" id="tabset-5-3-tab" data-bs-toggle="tab" data-bs-target="#tabset-5-3" aria-controls="tabset-5-3" aria-selected="false">Stata</a></li><li class="nav-item"><a class="nav-link" id="tabset-5-4-tab" data-bs-toggle="tab" data-bs-target="#tabset-5-4" aria-controls="tabset-5-4" aria-selected="false">SAS</a></li></ul>
<div class="tab-content">
<div id="tabset-5-1" class="tab-pane active" aria-labelledby="tabset-5-1-tab">
<div class="sourceCode" id="cb36" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb36-1"><span class="fu" style="color: #4758AB;">save</span>(data, <span class="at" style="color: #657422;">file =</span> <span class="st" style="color: #20794D;">"data.RData"</span>)</span>
<span id="cb36-2"><span class="fu" style="color: #4758AB;">saveRDS</span>(data, <span class="st" style="color: #20794D;">"data.rds"</span>)</span></code></pre></div>
</div>
<div id="tabset-5-2" class="tab-pane" aria-labelledby="tabset-5-2-tab">
<div class="sourceCode" id="cb37" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb37-1"><span class="fu" style="color: #4758AB;">library</span>(haven)</span>
<span id="cb37-2"><span class="fu" style="color: #4758AB;">write_sav</span>(data, <span class="st" style="color: #20794D;">"data.sav"</span>) <span class="co" style="color: #5E5E5E;"># SPSS file</span></span></code></pre></div>
</div>
<div id="tabset-5-3" class="tab-pane" aria-labelledby="tabset-5-3-tab">
<div class="sourceCode" id="cb38" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb38-1"><span class="fu" style="color: #4758AB;">library</span>(haven)</span>
<span id="cb38-2"><span class="fu" style="color: #4758AB;">write_dta</span>(data, <span class="st" style="color: #20794D;">"data.dta"</span>) <span class="co" style="color: #5E5E5E;"># Stata file</span></span></code></pre></div>
</div>
<div id="tabset-5-4" class="tab-pane" aria-labelledby="tabset-5-4-tab">
<div class="sourceCode" id="cb39" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb39-1"><span class="fu" style="color: #4758AB;">library</span>(haven)</span>
<span id="cb39-2"><span class="fu" style="color: #4758AB;">write_xpt</span>(data, <span class="st" style="color: #20794D;">"data.xpt"</span>) <span class="co" style="color: #5E5E5E;"># SAS file</span></span></code></pre></div>
</div>
</div>
</div>
</section>
<section id="available-open-data" class="level2">
<h2 class="anchored" data-anchor-id="available-open-data">Available Open Data</h2>
<p>There are numerous open data sources available for free use:</p>
<ul>
<li><strong><a href="https://www.kaggle.com/datasets">Kaggle</a>:</strong> A platform for data science competitions with a vast array of datasets.</li>
<li><strong><a href="http://archive.ics.uci.edu/ml/index.php">UCI Machine Learning Repository</a>:</strong> A valuable resource for datasets widely used in machine learning.</li>
<li><strong><a href="https://data.gov">Government Portals</a>:</strong> Websites such as <a href="https://data.gov">data.gov</a> and <a href="https://data.gov.uk">data.gov.uk</a> provide a range of datasets.</li>
<li><strong><a href="https://data.worldbank.org">World Bank Open Data</a>:</strong> Access to global development data.</li>
</ul>
</section>
<section id="useful-online-resources" class="level2">
<h2 class="anchored" data-anchor-id="useful-online-resources">Useful Online Resources</h2>
<p>Here are some essential resources to deepen your understanding:</p>
<ul>
<li><strong><a href="https://vincentarelbundock.github.io/Rdatasets/articles/data.html">Dataset in R</a>:</strong> Datasets from different package with CSV link and documentation</li>
<li><strong><a href="https://r4ds.had.co.nz/">R for Data Science</a>:</strong> A comprehensive guide covering data import techniques.</li>
<li><strong><a href="https://www.rdocumentation.org/">RDocumentation</a>:</strong> Detailed documentation on R packages and functions.</li>
<li><strong><a href="https://www.statmethods.net/">Quick-R</a>:</strong> A quick reference for reading and writing data.</li>
<li><strong><a href="https://www.tidyverse.org/blog/">Tidyverse Blog</a>:</strong> Articles and tutorials on data science using Tidyverse packages.</li>
<li><strong><a href="https://www.datacamp.com/">DataCamp</a>:</strong> Online courses on various topics, including data import in R.</li>
</ul>


</section>

 ]]></description>
  <category>Data Science Tutorials</category>
  <category>R Programming</category>
  <guid>https://mathatistics.com/blog/posts/2017-03-01-importing-and-exporting-data-in-R/index.html</guid>
  <pubDate>Tue, 28 Feb 2017 23:00:00 GMT</pubDate>
</item>
</channel>
</rss>
