summaryrefslogtreecommitdiffstats
path: root/answer_sheet/main.tex
blob: 002a7554e00cb7f9db54ddecc532c7e2c57b898a (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
\documentclass{article}
\input{structure.tex}

\title{CENG567: Homework \#3}
\author{Yiğit Sever}
\date{\today}

\addbibresource{mylib.bib}

\begin{document}

\maketitle

\section{Job Scheduling}%
\label{sec:job_scheduling}

First we will state the variables we will use.
A job $n_i \in n$ is split into preprocessing $p_i \in p$ and indexing $f_i \in f$ per the question text.
With the given question text, we will ignore any kind of utilisation or efficiency concerns.
In other words, we will not care about the number of PCs we employ nor the time they will stay idle.
So, every indexing job $f_i \in f$ will be run on a separate PC\@.

Since we cannot pipeline the preprocessing jobs $p_{i} \in p$, any kind of real choice we have will be done on the $f_i \in f$.
The trivial case is when the preprocessing jobs are the \enquote{bottleneck} of the system, where;

\begin{equation*}
    \forall p_{i} \in p > \forall f_{i} \in f
\end{equation*}

The completion time in this case is minimized by sorting the $f_{i} \in f$ and placing the \emph{shortest} $f_{i}$ last.

\begin{figure}[htpb]
    \centering
    \includegraphics[width=0.5\linewidth]{q1_1}
    \caption{The trivial case where preprocessing jobs are all longer than the indexing jobs}%
    \label{fig:q1_1}
\end{figure}

This approach extends to the case where the preprocessing time can be shorter than the indexing time as well.
In the example given in Figure~\ref{fig:1_cases}, two preprocessing jobs $p_1$ and $p_2$ both take 1 unit of time.
Whereas, the indexing jobs tied to the preprocessing jobs $f_1$ and $f_2$ take 5 and 1 units of time, respectively.
By scheduling the preprocessing task that is tied to the longest indexing job $p_1$ first, we can finish the whole computation in 6 time units which takes 7 time units in the other case.

Formally, the algorithm we propose simply sorts the $f_i \in f$ in $\mathcal{O}(n\log(n))$ time and schedules the jobs with respect to $f_i \in f$ (so the $p_i \in p$ tied to $f_i \in f$) from longest to shortest.

\begin{figure}
    \centering
    \begin{subfigure}[b]{0.45\textwidth}
        \includegraphics[width=\textwidth]{q1_2}
        \caption{The longest task is scheduled first}%
        \label{subfig:1_1}
    \end{subfigure}
    ~
    \begin{subfigure}[b]{0.45\textwidth}
        \includegraphics[width=\textwidth]{q1_3}
        \caption{The longest task is scheduled last}%
        \label{subfig:1_2}
    \end{subfigure}
    \caption{The worked example of the cases}%
    \label{fig:1_cases}
\end{figure}

\section{Spanning Tree}%
\label{sec:spanning_tree}

In this question, leveraging the discussion on the \href{https://odtuclass.metu.edu.tr/mod/forum/discuss.php?d=14248}{lecture forum}, we know that there are $n = |E|$ edges in $G = (V, E)$ in which $k + \delta$ edges are red and the rest are white.
$k$ is not given as an input in the question text but we will assume so to continue with the discussion, otherwise the algorithm cannot be properly stated.

Given $G = (V, E)$ where $ r \le |E|$ edges are red, can we find $k$ such edges that form a spanning tree.

We will start by running Kruskal's algorithm on $G$ with a modification; we will assign a positive weight $\gamma$ on every \emph{white} edge and $0$ on every \emph{red} edge.
Kruskal's algorithm will pick only red edges as a result.
This will either;

\begin{itemize}
    \item yield a MST with less than $k$ edges. In this case, we can state no such tree with $k$ red edges exists and report this per case (ii) in the question text.
    \item yield a MST with \emph{exactly} $k$ edges, which we can return immediately per case (i) in the question text.
    \item yield a MST with more than $k$ edges, which we will have to continue operating with.
\end{itemize}

Since we are continuing with the algorithm, we can name the minimum spanning tree we got from the steps above $T_{r}$.

At this point, we should prepare another MST using only white edges. Since the MST we are aiming to return at the end of the algorithm does not have to consists of only red edges, we can substitute white edges with respect to either Cut Property or Cycle Property as given in the Greedy Algorithms II lecture notes.

We will prepare a MST with only white edges by setting the weights of white edges on $G$ to 0 and red edges on $G$ to $\beta$ and running Kruskal's algorithm on $G$.
Kruskal's algorithm will pick only white edges as a result and we will call this tree $T_{w}$.
If $T_{w}$ does not contain at least $k - E'$ for $T_{r} = (V, E')$ edges we will conclude that no such tree exists and report per case (ii).

Now, with $T_{r}$ and $T_{w}$, we are aiming to reduce the number of red edges in $T_{r}$ to $k$ by removing a red edge from it and connecting the tree back using a white edge from $T_{w}$.
Since both $T_{r}$ and $T_{w}$ are MST, they have $V-1$ edges, removing any edge will disconnect the tree and adding any edge will introduce a cycle, per the definition of trees.

Our algorithm is then as follows;
\begin{enumerate}
    \item Remove an edge $e \in E(T_{r})$ that is not in $T_{w}$.
    \item We can add $e$ to $T_{w}$ but this will introduce a cycle (per above). However, we can pick an edge in the newly introduced cycle in $T_{w}$ that is not in $T_{r}$ and add that edge $e'$ to $T_{w}$. We are able to do this since trees cannot contain cycles hence at least one of the edges in the newly introduced cycle has to be missing from $T_{r}$.
    \item When we have $k$ red edges (and $V - k - 1$ white edges) in $T_{r}$, then we can report it per case (i).
\end{enumerate}

The cycle checking step, step 2 in the algorithm above uses the Union Find. Since we did not go into the detail of Union Find in the lectures we will use the naive approach which uses $\mathcal{O}(n^{2})$ operations. On top of that, we have to run Kruskal's algorithm twice for $T_{r}$ and $T_{w}$ which runs in order $\mathcal{O}(k\log(n))$ ($k$ can be substituted with the number of white edges, if higher). Overall, the runtime of the algorithm is $\mathcal{O}(n^{2} k\log(n))$.


\end{document}