查找"大"子矩阵在一个大的稀疏矩阵密集矩阵、稀疏、密集、QUOT

2023-09-11 22:52:03 作者:得不到的永远在骚动#

给定一个大稀疏矩阵(说10,000 +用1M +)Ⅰ需要找到形成致密的基质(所有非零元素)的行和列的子集,不一定是连续的,。我想此子矩阵是尽可能的大(未最大总和,但最大数量的元素)内的一些宽高比约束

是否有任何已知的确切或aproxamate解决这个问题?的

在谷歌的快速扫描,似乎给很多亲密,但并非完全的结果。 什么条件,我应该找谁?的

编辑:的只是为了澄清;子矩阵的不必连续的。事实上,行和列的顺序完全是任意的,邻居是完全不相干的。

根据乍得Okere的想法思想

顺序最大计数行到最小数(不是必要的,但可能会帮助PERF) 有一个大重叠选择两行 添加所有其他行不会减少重叠 记录设置 添加任何行减少了至少重叠 重复在#3,直到结果变小 重新开始在#2具有不同的起始对 继续操作,直到您决定结果是不够好 解决方案

我想,你希望是这样的。你有一个像

矩阵

  1100101
1110101
0100101
 

您希望1,2,5,7列和行1和2,对不对?该子矩阵将4X2有8个元素。或者你可以去与1,5,7与行1,2,3这将是一个3x3矩阵列。

用WORD怎么打出矩阵

如果你想要一个近似的方法,你可以从一个单一的非零元素,然后去找到另一个非零元素并将其添加到您的行和列的清单。在某些时候,你会碰到一个非零元素,如果它的行和列添加到您的收藏,您的集合将不再是完全不为零。

因此​​,对于上述矩阵,如果添加1,1和2,2你会行您的收藏1,2和列1,2。如果你试图添加3,7它会导致一个问题,因为1,3为零。所以,你不能添加它。您可以添加2,5和2,7虽然。创建4×2子矩阵。

您会基本上重复,直到你找不到任何更多新的行和列的补充。这将让你过一个当地最低。您可以将结果存储,并与另一个起始点(也许一个不适合您当前解决方案的)再次启动。

然后就停止,当你无法找到经过一段时间了。

,很明显,需要很长的时间,但我不知道,如果你将能够更快地做任何。

Given a large sparse matrix (say 10k+ by 1M+) I need to find a subset, not necessarily continuous, of the rows and columns that form a dense matrix (all non-zero elements). I want this sub matrix to be as large as possible (not the largest sum, but the largest number of elements) within some aspect ratio constraints.

Are there any known exact or aproxamate solutions to this problem?

A quick scan on Google seems to give a lot of close-but-not-exactly results. What terms should I be looking for?

edit: Just to clarify; the sub matrix need not be continuous. In fact the row and column order is completely arbitrary so adjacency is completely irrelevant.

A thought based on Chad Okere's idea

Order the rows from largest count to smallest count (not necessary but might help perf) Select two rows that have a "large" overlap Add all other rows that won't reduce the overlap Record that set Add whatever row reduces the overlap by the least Repeat at #3 until the result gets to small Start over at #2 with a different starting pair Continue until you decide the result is good enough

解决方案

I assume you want something like this. You have a matrix like

1100101
1110101
0100101

You want columns 1,2,5,7 and rows 1 and 2, right? That submatrix would 4x2 with 8 elements. Or you could go with columns 1,5,7 with rows 1,2,3 which would be a 3x3 matrix.

If you want an 'approximate' method, you could start with a single non-zero element, then go on to find another non-zero element and add it to your list of rows and columns. At some point you'll run into a non-zero element that, if it's rows and columns were added to your collection, your collection would no longer be entirely non-zero.

So for the above matrix, if you added 1,1 and 2,2 you would have rows 1,2 and columns 1,2 in your collection. If you tried to add 3,7 it would cause a problem because 1,3 is zero. So you couldn't add it. You could add 2,5 and 2,7 though. Creating the 4x2 submatrix.

You would basically iterate until you can't find any more new rows and columns to add. That would get you too a local minimum. You could store the result and start again with another start point (perhaps one that didn't fit into your current solution).

Then just stop when you can't find any more after a while.

That, obviously, would take a long time, but I don't know if you'll be able to do it any more quickly.