题目1 : Email Merge

时间限制:10000ms
单点时限:1000ms
内存限制:256MB

描述

You are given a list of usernames and their email addresses in the following format:

alice 2 alice@hihocoder.com alice@gmail.com
bob 1 bob@qq.com
alicebest 2 alice@gmail.com alice@qq.com
alice2016 1 alice@qq.com

Your task is to merge the usernames if they share common email address:

alice alicebest alice2016
bob

输入

The first line contains an integer N, denoting the number of usernames. (1 < N ≤ 10000)

The following N lines contain N usernames and their emails in the previous mentioned format.

Each username may have 10 emails at most.

输出

Output one merged group per line.

In each group output the usernames in the same order as the input.

Output the groups in the same order as their first usernames appear in the input.

样例输入

4
alice 2 alice@hihocoder.com alice@gmail.com
bob 1 bob@qq.com
alicebest 2 alice@gmail.com alice@qq.com
alice2016 1 alice@qq.com 

样例输出

alice alicebest alice2016
bob 

思路

《Email Merge》题目分析
本题直接暴力的做法就是枚举两个用户名,然后比较这两个用户名是否存在相同的邮箱。存在的话就是同一个人:

mark[1 .. N] = false
for i = 1 .. N-1:
if mark[i]:
continue
print username[i] //本组第一个用户名
for j = i + 1:
if share-same-email(i, j):
mark[j] = true
print username[j] + ‘ ‘
print ‘\n’
这个算法的复杂度是O(N^2) x O(share-same-email的复杂度),很有可能超时。

下面我们介绍一个利用倒排索引+并查集的优化算法。

我们知道判断两个字符串A和B是否相等的复杂度是与字符串长度成正比的,要比比较两个整数是否相同的复杂度高。所以首先我们可以做的优化就是把用户名和邮箱都用hashmap映射到整数1, 2, 3 … 上。这样比较就是O(1)的了。

输入给的数据是每个用户名对应哪些邮箱,我们利用倒排索引可以得到每个邮箱对应哪些用户名:

alice@hihocoder.com: alice
alice@gmail.com: alice alicetest
bob@qq.com: bob
alice@qq.com: alicetest alice2016

注意这里我们为了描述方便使用的还是字符串,实际上你的程序这时处理的应该是映射后的整数。

然后我们使用并查集把有相同邮箱的用户名合并到一个集合。注意由于输入的每个用户名最多10个邮箱,所以这里最多做O(10N)次合并。

最后我们要把所有的集合输出。注意这里还需要花费点代码保证输出顺序与题目要求一致。

CODE

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
#include <bits/stdc++.h>
using namespace std;
//using namespace stdext;
//#include <ext/hash_map>
using namespace __gnu_cxx;
//The include file <ext/hash_map> refers to the GNU extension hash map class and this is declared in namespace __gnu_cxx.

int father[10005];
int FindFather(int x)
{
if(father[x]!=x)
father[x] = FindFather(father[x]);
return father[x];
}

void Union(int x, int y)
{
int fx = FindFather(x);
int fy = FindFather(y);
if(fx > fy) father[fx] = fy;
else if(fx < fy) father[fy] = fx;
return;
}

int main()
{
//hash_map<string, int> username;
//hash_map<string, int> email;
//map<string, int> username;
unordered_map<string, vector<int> > email;
string username[10005];
int N;
cin >> N;
for(int i = 1; i <= N; i++) {
father[i] = i;
cin >> username[i];
//if(username.find(temp) != username.end()) { //put username into integer
// username[temp] = i; //every row input different username
//}
int cnt;
cin >> cnt;
while(cnt--) {
string temp;
cin >> temp;
for(int j : email[temp]) Union(j, i); //寻找共同父亲根
email[temp].push_back(i);
}
}
map<int, vector<int> > path;
for(int i = 1; i <= N; i++) {
int fx = FindFather(i); // 将当前用户名添加到第一次出现的父亲用户名容器里
path[fx].push_back(i);
}
for(map<int, vector<int> >::iterator it = path.begin(); it != path.end(); it++) {
for(int i : it -> second) {
cout << username[i] << ' ';
}
cout << endl;
}
return 0;
}